# Prohledávání hyperparametrů pro model MobileNetV2 nad augmentovaným datasetem CIFAR100

Tento notebook slouží k nalezení optimálních hyperparametrů nad augmentovaným datasetem CIFAR100 pro model MobileNetV2. Hyperparametry jsou hledány pro všechny varianty modelu (náhodně inicializovaný, předtrénovaný (doučení klasifikační hlavy) a předtrénovaný (doučení celého modelu)). Pro každou z variant jsou hledány hyperparametry pro normální trénink a trénink s destilací. 

K prohledávání je využito knihovny Optuna s algoritmem Hyperband. Nejlepší konfigurace je volena na základě F1-skóre, zkoušeno je 150 kombinací hyperparametrů pro každou variantu modelu. 

## Import knihoven a základní nastavení

In [1]:
from transformers import Trainer
from torch.utils.data import ConcatDataset
import optuna
import torch
import math
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [None]:
dataset_part = base.get_dataset_part()

Resetování náhodného seedu pro replikovatelnost výsledků.

In [3]:
base.reset_seed()

Ověření dostupnosti GPU.

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA H100 PCIe


Načtení datasetu a aplikace základních transformací.

In [5]:
DATASET = "cifar100"

In [6]:
transform = base.base_transforms()

train = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TRAIN, transform=transform, device="cpu")
eval = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.EVAL, transform=transform, device="cpu")
test = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TEST, transform=transform, device="cpu")

Provedení filtrace augmentovaného datasetu dle popsaného mechanismu.

In [7]:
augment_transform = base.aug_transforms()

train_aug = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TRAIN, transform=augment_transform)
train_aug = base.remove_diff_pred_class(train, train_aug, pytorch_dataset=True)
train_combo = ConcatDataset([train, train_aug])

Removing entries from augmented dataset that are different from the base one - based on saved logits:   0%|   …

Základní konfigurace tréninku během prohledávání. Optuna nepracuje s epochami, ale s kroky. Níže je prováděn přepočet. 

Minimální délka tréninku jsou tři epochy, maximální deset epoch. Maximální počet kroků pro warm up je nastaven na 10 % první epochy.

In [9]:
num_epochs = 10
batch_size = 128

In [10]:
data_length = len(train)
min_r = math.ceil(data_length/batch_size)*3
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

## Prohledávání s normálním tréninkem náhodně inicializovaného modelu
Definice hledaných hyperparametrů a jejich rozmezí.

In [11]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [12]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [13]:
base.reset_seed()

Konfigurace jednotlivých tréninků.

In [35]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random_hp-search", logging_dir=f"~/logs/{DATASET}/random_hp-search", epochs=num_epochs, batch_size=batch_size)

Konfigurace trenéra pro jednotlivé tréninky. 

In [36]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(100)
)
  

Nastavení prohledávání.

In [None]:
best_base_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-random",
    n_trials=150
)

[I 2025-04-03 22:34:22,000] A new study created in memory with name: Base-random


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0504,3.384866,0.1656,0.152353,0.1656,0.12685
2,3.2109,2.734753,0.2897,0.284892,0.2897,0.269228


[I 2025-04-03 22:37:18,915] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0511,3.503729,0.1453,0.141749,0.1453,0.112573
2,3.3627,2.903738,0.2529,0.247573,0.2529,0.227808
3,2.8639,2.540016,0.3346,0.340189,0.3346,0.317748
4,2.5253,2.310606,0.3836,0.384279,0.3836,0.367251


[I 2025-04-03 22:42:56,228] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.406,3.823216,0.1079,0.076711,0.1079,0.069922
2,3.832,3.423342,0.1745,0.157743,0.1745,0.139485


[I 2025-04-03 22:45:48,161] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1704,3.755341,0.1017,0.068947,0.1017,0.059526
2,3.6097,3.237379,0.1857,0.180903,0.1857,0.157006
3,3.1534,2.810696,0.2709,0.267439,0.2709,0.245765
4,2.8051,2.547436,0.328,0.330122,0.328,0.30933


[I 2025-04-03 22:51:35,964] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2297,3.935549,0.0728,0.040916,0.0728,0.038934
2,3.7627,3.397838,0.1557,0.13999,0.1557,0.126694


[I 2025-04-03 22:54:32,902] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2341,3.59284,0.1435,0.123268,0.1435,0.108619
2,3.5403,3.077325,0.2325,0.225397,0.2325,0.200346
3,3.078,2.706465,0.3066,0.296903,0.3066,0.278813
4,2.7276,2.483984,0.3585,0.348706,0.3585,0.337403


[I 2025-04-03 23:00:19,060] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0404,3.398918,0.1577,0.144241,0.1577,0.12287
2,3.2199,2.766438,0.2801,0.282527,0.2801,0.257924
3,2.6954,2.340284,0.3778,0.383198,0.3778,0.362091
4,2.3214,2.097469,0.4305,0.434891,0.4305,0.417819


[I 2025-04-03 23:06:01,260] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2653,3.638933,0.1357,0.117204,0.1357,0.095535
2,3.6032,3.148652,0.2245,0.230813,0.2245,0.192047
3,3.1572,2.789283,0.2906,0.282397,0.2906,0.263642
4,2.8343,2.578395,0.3384,0.333652,0.3384,0.319125


[I 2025-04-03 23:11:53,151] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9807,3.308048,0.1794,0.155728,0.1794,0.141825
2,3.165,2.675873,0.3015,0.301919,0.3015,0.281677
3,2.6279,2.281471,0.3875,0.392239,0.3875,0.370133
4,2.2528,2.052632,0.4375,0.439695,0.4375,0.424779
5,1.9517,1.935927,0.4694,0.467094,0.4694,0.453799
6,1.7,1.860889,0.489,0.496131,0.489,0.479101
7,1.4678,1.737223,0.5221,0.520931,0.5221,0.513882
8,1.2486,1.698948,0.5328,0.54202,0.5328,0.531229


[I 2025-04-03 23:23:35,947] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0022,3.338965,0.1721,0.144328,0.1721,0.136399
2,3.1816,2.756485,0.2813,0.289234,0.2813,0.262508
3,2.674,2.353746,0.37,0.379521,0.37,0.355387
4,2.324,2.109109,0.4294,0.427306,0.4294,0.412919


[I 2025-04-03 23:29:16,113] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3625,3.788287,0.1141,0.087675,0.1141,0.072196
2,3.7959,3.390602,0.1823,0.179086,0.1823,0.151389


[I 2025-04-03 23:32:03,009] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0020781267255701565, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2322,3.893159,0.077,0.053836,0.077,0.040671
2,3.7877,3.460648,0.1525,0.141597,0.1525,0.121127
3,3.3765,3.00567,0.2373,0.224987,0.2373,0.209958
4,2.997,2.739567,0.2943,0.302536,0.2943,0.272666


[I 2025-04-03 23:37:51,453] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0004229895735463087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9981,3.348337,0.1714,0.148069,0.1714,0.139655
2,3.1758,2.687888,0.2965,0.305635,0.2965,0.276644
3,2.6548,2.320779,0.3772,0.386891,0.3772,0.36156
4,2.2864,2.070857,0.4359,0.440543,0.4359,0.422706
5,1.9952,1.939235,0.465,0.461166,0.465,0.448493
6,1.732,1.865884,0.4877,0.491748,0.4877,0.475889
7,1.5047,1.758692,0.513,0.509019,0.513,0.503702
8,1.2878,1.732418,0.5253,0.534593,0.5253,0.52367


[I 2025-04-03 23:49:19,109] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0002893591596161301, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0142,3.333153,0.1732,0.16651,0.1732,0.138532
2,3.1755,2.683775,0.3027,0.29381,0.3027,0.282699
3,2.6353,2.305937,0.3812,0.377903,0.3812,0.365303
4,2.2521,2.075957,0.4343,0.437461,0.4343,0.420023
5,1.9472,1.929614,0.4741,0.467787,0.4741,0.457391
6,1.6763,1.885442,0.4856,0.488969,0.4856,0.473095
7,1.4366,1.781632,0.5075,0.506396,0.5075,0.498893
8,1.2139,1.754452,0.5244,0.530547,0.5244,0.522134


[I 2025-04-04 00:00:52,789] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.00036841844828218917, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.974,3.34503,0.1738,0.163182,0.1738,0.140121
2,3.1306,2.664895,0.3048,0.308393,0.3048,0.289544
3,2.5994,2.263056,0.3921,0.397891,0.3921,0.378157
4,2.2207,2.038794,0.4461,0.45405,0.4461,0.433625
5,1.9213,1.8953,0.4773,0.473006,0.4773,0.462087
6,1.6598,1.835376,0.4951,0.504702,0.4951,0.486623
7,1.4314,1.731481,0.523,0.523258,0.523,0.51515
8,1.2086,1.699542,0.5371,0.546259,0.5371,0.53538


[I 2025-04-04 00:12:20,708] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0010578942221340086, 'weight_decay': 0.007, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.116,3.722416,0.1023,0.074356,0.1023,0.065949
2,3.5381,3.134626,0.2172,0.209646,0.2172,0.18881
3,3.0827,2.734677,0.2878,0.283026,0.2878,0.263066
4,2.7264,2.473349,0.3453,0.338074,0.3453,0.324677


[I 2025-04-04 00:18:11,227] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0003881338358083761, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0047,3.303432,0.1735,0.159105,0.1735,0.138854
2,3.14,2.683778,0.2995,0.297916,0.2995,0.281214
3,2.6083,2.242202,0.394,0.395812,0.394,0.378057
4,2.241,2.053361,0.4394,0.443105,0.4394,0.425306
5,1.9388,1.90997,0.4718,0.466838,0.4718,0.456634
6,1.6847,1.847659,0.4948,0.496977,0.4948,0.483824
7,1.4502,1.758266,0.5191,0.51755,0.5191,0.510925
8,1.2354,1.725846,0.5276,0.537844,0.5276,0.526865


[I 2025-04-04 00:29:46,421] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0039520620859354325, 'weight_decay': 0.006, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2589,3.91477,0.0744,0.041878,0.0744,0.0384
2,3.866,3.67619,0.1171,0.114328,0.1171,0.091508
3,3.5572,3.241906,0.1913,0.179208,0.1913,0.162937
4,3.2423,2.992097,0.2422,0.251971,0.2422,0.222907


[I 2025-04-04 00:35:30,869] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0001044907148504563, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2696,3.621501,0.1405,0.111844,0.1405,0.102834
2,3.5795,3.135546,0.2239,0.209676,0.2239,0.192163
3,3.1318,2.756445,0.3004,0.294454,0.3004,0.273299
4,2.7896,2.518728,0.3523,0.341778,0.3523,0.330605
5,2.5134,2.355754,0.3821,0.367555,0.3821,0.361283
6,2.281,2.287928,0.3943,0.382672,0.3943,0.376667
7,2.0924,2.171769,0.4233,0.416621,0.4233,0.40945
8,1.9342,2.115612,0.4363,0.432471,0.4363,0.425268


[I 2025-04-04 00:46:47,532] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00018298494591362534, 'weight_decay': 0.01, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1134,3.417509,0.1651,0.173666,0.1651,0.130659
2,3.2682,2.763305,0.2981,0.293791,0.2981,0.273258
3,2.7157,2.356666,0.3796,0.377824,0.3796,0.362081
4,2.3296,2.129044,0.4284,0.426364,0.4284,0.413463


[I 2025-04-04 00:52:37,815] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0013326498867251948, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1489,3.775451,0.1034,0.087121,0.1034,0.06918
2,3.6317,3.243518,0.1933,0.197586,0.1933,0.167503
3,3.174,2.851647,0.2634,0.261817,0.2634,0.234307
4,2.8008,2.526158,0.3352,0.334103,0.3352,0.315134


[I 2025-04-04 00:58:21,320] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00020591268049360804, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0872,3.425876,0.1627,0.156555,0.1627,0.129516
2,3.3011,2.817838,0.2844,0.279932,0.2844,0.258588


[I 2025-04-04 01:01:14,492] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0007918093079969369, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0875,3.576623,0.1284,0.115538,0.1284,0.090832
2,3.3867,2.942143,0.2477,0.247453,0.2477,0.219919
3,2.8834,2.524106,0.3314,0.335865,0.3314,0.312723
4,2.5212,2.292197,0.3862,0.384696,0.3862,0.366962
5,2.2428,2.100213,0.4297,0.428103,0.4297,0.409505
6,1.9973,2.022,0.4483,0.449613,0.4483,0.434035
7,1.7856,1.895646,0.4803,0.476239,0.4803,0.469236
8,1.5792,1.860871,0.4943,0.501279,0.4943,0.490259


[I 2025-04-04 01:12:59,701] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0005417179837593901, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9735,3.307565,0.1747,0.173551,0.1747,0.141703
2,3.1566,2.706718,0.2918,0.302184,0.2918,0.271166
3,2.6712,2.320581,0.3753,0.379423,0.3753,0.359699
4,2.333,2.132115,0.422,0.416832,0.422,0.40551


[I 2025-04-04 01:18:46,679] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0007766117718318264, 'weight_decay': 0.004, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0899,3.597991,0.1172,0.111603,0.1172,0.083154
2,3.3979,2.923582,0.2503,0.244241,0.2503,0.225089
3,2.8901,2.513103,0.3342,0.335625,0.3342,0.316364
4,2.5298,2.277473,0.3876,0.391763,0.3876,0.371619
5,2.2393,2.093308,0.4249,0.423844,0.4249,0.409347
6,1.9868,2.022877,0.4504,0.444928,0.4504,0.435661
7,1.7741,1.892874,0.4842,0.481791,0.4842,0.474175
8,1.5733,1.857488,0.4986,0.509469,0.4986,0.496304


[I 2025-04-04 01:30:25,665] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0027693395374376512, 'weight_decay': 0.0, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2431,3.932607,0.0813,0.044346,0.0813,0.047685
2,3.8353,3.538507,0.1378,0.117744,0.1378,0.10752


[I 2025-04-04 01:33:18,926] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.000140707263625762, 'weight_decay': 0.006, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.166,3.52085,0.1524,0.134287,0.1524,0.118113
2,3.434,2.954512,0.2584,0.251761,0.2584,0.23019
3,2.9343,2.546413,0.3374,0.338621,0.3374,0.31541
4,2.548,2.320972,0.3921,0.389724,0.3921,0.37548
5,2.2506,2.170393,0.4253,0.416417,0.4253,0.40564
6,2.0041,2.097179,0.4342,0.426085,0.4342,0.41825
7,1.7963,2.008955,0.4599,0.455502,0.4599,0.447917
8,1.6123,1.97622,0.466,0.468804,0.466,0.459986


[I 2025-04-04 01:44:46,036] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00021059103361382344, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0956,3.433292,0.1635,0.153093,0.1635,0.128774
2,3.2498,2.750121,0.2951,0.290723,0.2951,0.2705
3,2.7037,2.331492,0.3823,0.385327,0.3823,0.366136
4,2.3086,2.12335,0.4287,0.4263,0.4287,0.413255


[I 2025-04-04 01:50:34,414] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.000281049646423106, 'weight_decay': 0.007, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0266,3.343548,0.1754,0.154512,0.1754,0.144044
2,3.1563,2.664926,0.3093,0.305894,0.3093,0.289255
3,2.6141,2.288336,0.3875,0.391178,0.3875,0.370958
4,2.2317,2.065791,0.4398,0.435362,0.4398,0.42438
5,1.9302,1.940308,0.4668,0.461834,0.4668,0.450675
6,1.6668,1.853488,0.4903,0.489544,0.4903,0.480123
7,1.4327,1.768653,0.5133,0.511306,0.5133,0.503538
8,1.2162,1.771078,0.5178,0.526137,0.5178,0.515907


[I 2025-04-04 02:02:06,352] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0009396466769414022, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.086,3.5805,0.1304,0.114281,0.1304,0.093158
2,3.4462,3.024645,0.2332,0.227355,0.2332,0.205518
3,2.9492,2.612463,0.3129,0.318686,0.3129,0.294607
4,2.6073,2.344911,0.3715,0.372488,0.3715,0.354308
5,2.3248,2.203698,0.4026,0.401756,0.4026,0.379697
6,2.0967,2.114463,0.4263,0.42868,0.4263,0.410187
7,1.8847,1.967835,0.4653,0.463875,0.4653,0.453743
8,1.6937,1.885524,0.4916,0.493782,0.4916,0.486053


[I 2025-04-04 02:13:50,789] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0003281096770779073, 'weight_decay': 0.007, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0235,3.319159,0.1832,0.16635,0.1832,0.148144
2,3.171,2.693461,0.3037,0.303715,0.3037,0.285096
3,2.6214,2.296022,0.3832,0.38703,0.3832,0.367979
4,2.2432,2.049135,0.4436,0.444285,0.4436,0.4299
5,1.9408,1.918711,0.4698,0.463103,0.4698,0.453391
6,1.6783,1.847368,0.4927,0.496585,0.4927,0.480437
7,1.4504,1.748939,0.5216,0.518949,0.5216,0.512997
8,1.2311,1.707075,0.5387,0.549729,0.5387,0.537581
9,1.0476,1.684475,0.5342,0.53834,0.5342,0.530947
10,0.8951,1.702986,0.5371,0.540814,0.5371,0.534564


[I 2025-04-04 02:28:20,697] Trial 30 finished with value: 0.5345639641299113 and parameters: {'learning_rate': 0.0003281096770779073, 'weight_decay': 0.007, 'warmup_steps': 22}. Best is trial 30 with value: 0.5345639641299113.


Trial 31 with params: {'learning_rate': 0.0003049847353808361, 'weight_decay': 0.008, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0632,3.410763,0.1667,0.161019,0.1667,0.132597
2,3.2197,2.721037,0.2951,0.301895,0.2951,0.274672
3,2.6688,2.305769,0.3837,0.387239,0.3837,0.366686
4,2.2745,2.094093,0.4343,0.443606,0.4343,0.42216
5,1.973,1.928692,0.475,0.469935,0.475,0.460372
6,1.7078,1.894972,0.4835,0.481731,0.4835,0.472614
7,1.4781,1.788039,0.5074,0.504863,0.5074,0.498248
8,1.2588,1.771623,0.5221,0.528725,0.5221,0.518335


[I 2025-04-04 02:39:56,172] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.00044016339994501963, 'weight_decay': 0.007, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.987,3.291961,0.1812,0.175405,0.1812,0.15084
2,3.1521,2.713796,0.2901,0.295347,0.2901,0.268894
3,2.632,2.275568,0.3804,0.3869,0.3804,0.364899
4,2.269,2.059194,0.4449,0.450232,0.4449,0.433023
5,1.9692,1.925335,0.4721,0.473103,0.4721,0.456246
6,1.7103,1.859702,0.4924,0.495237,0.4924,0.480961
7,1.488,1.749889,0.5209,0.518702,0.5209,0.511804
8,1.2684,1.716306,0.5293,0.538717,0.5293,0.526183
9,1.0793,1.686171,0.5372,0.541961,0.5372,0.534215
10,0.9236,1.709307,0.5312,0.54138,0.5312,0.529743


[I 2025-04-04 02:54:16,002] Trial 32 finished with value: 0.5297431032818857 and parameters: {'learning_rate': 0.00044016339994501963, 'weight_decay': 0.007, 'warmup_steps': 22}. Best is trial 30 with value: 0.5345639641299113.


Trial 33 with params: {'learning_rate': 0.0005169451920249135, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.009,3.377619,0.1719,0.165172,0.1719,0.133624
2,3.1644,2.6719,0.3028,0.309967,0.3028,0.282275
3,2.6435,2.293117,0.3826,0.382962,0.3826,0.36677
4,2.2883,2.062771,0.4397,0.445915,0.4397,0.426422
5,2.0044,1.956997,0.4636,0.461919,0.4636,0.447226
6,1.7496,1.865038,0.4904,0.492818,0.4904,0.477964
7,1.526,1.745076,0.521,0.519995,0.521,0.512965
8,1.3139,1.708137,0.535,0.541279,0.535,0.531506
9,1.1295,1.67606,0.542,0.542003,0.542,0.536825
10,0.9696,1.683453,0.5413,0.54536,0.5413,0.538258


[I 2025-04-04 03:08:41,662] Trial 33 finished with value: 0.5382578746242731 and parameters: {'learning_rate': 0.0005169451920249135, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}. Best is trial 33 with value: 0.5382578746242731.


Trial 34 with params: {'learning_rate': 0.0004523559529385543, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0078,3.380189,0.1582,0.141148,0.1582,0.124908
2,3.1916,2.718793,0.2943,0.295345,0.2943,0.274616
3,2.666,2.316252,0.3743,0.384337,0.3743,0.35983
4,2.2956,2.096554,0.4295,0.43103,0.4295,0.415095
5,2.0031,1.93554,0.4713,0.467908,0.4713,0.45561
6,1.7427,1.871621,0.487,0.492601,0.487,0.476984
7,1.5153,1.747648,0.513,0.514149,0.513,0.50454
8,1.2957,1.713032,0.5284,0.537304,0.5284,0.52653
9,1.1091,1.677652,0.5366,0.536112,0.5366,0.532069
10,0.9513,1.68996,0.5368,0.538309,0.5368,0.534355


[I 2025-04-04 03:22:48,911] Trial 34 finished with value: 0.5343551523050474 and parameters: {'learning_rate': 0.0004523559529385543, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23}. Best is trial 33 with value: 0.5382578746242731.


Trial 35 with params: {'learning_rate': 0.0003371340229150094, 'weight_decay': 0.01, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0087,3.325574,0.1784,0.159765,0.1784,0.144262
2,3.1433,2.668566,0.3077,0.310772,0.3077,0.288301
3,2.6106,2.277571,0.3947,0.397544,0.3947,0.378101
4,2.2351,2.067954,0.4425,0.451629,0.4425,0.43024
5,1.9418,1.936556,0.4735,0.476048,0.4735,0.458412
6,1.6815,1.888724,0.4856,0.497105,0.4856,0.477367
7,1.448,1.755511,0.5142,0.512604,0.5142,0.505502
8,1.2282,1.751598,0.5179,0.526607,0.5179,0.515879


[I 2025-04-04 03:34:07,777] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0009005037557009593, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1009,3.593002,0.1259,0.135816,0.1259,0.092008
2,3.4229,2.986712,0.2321,0.241913,0.2321,0.206775
3,2.9542,2.588765,0.3124,0.31151,0.3124,0.290838
4,2.5961,2.334493,0.3741,0.377432,0.3741,0.356328


[I 2025-04-04 03:40:02,980] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0004956441011792705, 'weight_decay': 0.007, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0245,3.342936,0.1725,0.166595,0.1725,0.140406
2,3.1796,2.701841,0.2924,0.296424,0.2924,0.273722
3,2.6667,2.333184,0.3736,0.380914,0.3736,0.354902
4,2.2997,2.109985,0.4269,0.432183,0.4269,0.412417


[I 2025-04-04 03:45:45,341] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0001878302037836598, 'weight_decay': 0.007, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1107,3.410375,0.1727,0.152699,0.1727,0.136056
2,3.295,2.796293,0.2872,0.282111,0.2872,0.261349


[I 2025-04-04 03:48:39,171] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0005133869609118287, 'weight_decay': 0.01, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9972,3.372694,0.1642,0.157285,0.1642,0.132009
2,3.2088,2.740143,0.2908,0.291897,0.2908,0.268642
3,2.7172,2.376714,0.3644,0.373272,0.3644,0.348421
4,2.356,2.145539,0.4196,0.422475,0.4196,0.406443
5,2.069,2.010479,0.4479,0.448574,0.4479,0.432587
6,1.8043,1.924835,0.4731,0.471807,0.4731,0.459346
7,1.5809,1.784341,0.5058,0.502332,0.5058,0.496317
8,1.3645,1.745463,0.52,0.531395,0.52,0.517684


[I 2025-04-04 04:00:14,589] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.000293909076464439, 'weight_decay': 0.008, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.035,3.370224,0.1755,0.161859,0.1755,0.141808
2,3.1605,2.660793,0.306,0.304102,0.306,0.287627
3,2.6283,2.297847,0.386,0.385661,0.386,0.370261
4,2.2421,2.063211,0.4457,0.44809,0.4457,0.433272
5,1.9422,1.92099,0.478,0.475218,0.478,0.463023
6,1.6771,1.871366,0.4888,0.492133,0.4888,0.4786
7,1.4421,1.777115,0.5106,0.510545,0.5106,0.501935
8,1.2233,1.731472,0.5267,0.534845,0.5267,0.524067
9,1.0397,1.722198,0.5301,0.529797,0.5301,0.524638
10,0.8919,1.736712,0.5299,0.535687,0.5299,0.526729


[I 2025-04-04 04:14:56,119] Trial 40 finished with value: 0.5267291401184344 and parameters: {'learning_rate': 0.000293909076464439, 'weight_decay': 0.008, 'warmup_steps': 23}. Best is trial 33 with value: 0.5382578746242731.


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.4032,3.86751,0.1014,0.06789,0.1014,0.06191
2,3.8495,3.448321,0.1708,0.160219,0.1708,0.134756
3,3.4793,3.10663,0.2373,0.218825,0.2373,0.201664
4,3.2091,2.930471,0.2764,0.266993,0.2764,0.250969


[I 2025-04-04 04:20:40,278] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0005370141139358678, 'weight_decay': 0.007, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0133,3.378395,0.1639,0.172032,0.1639,0.130567
2,3.2154,2.75857,0.283,0.280635,0.283,0.262281


[I 2025-04-04 04:23:33,709] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0002567101802635103, 'weight_decay': 0.007, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0637,3.348941,0.1754,0.151664,0.1754,0.142501
2,3.2204,2.72009,0.3005,0.30038,0.3005,0.278361
3,2.6952,2.319156,0.3798,0.375019,0.3798,0.361801
4,2.2988,2.082379,0.4358,0.439654,0.4358,0.422955
5,1.9858,1.958839,0.4631,0.457676,0.4631,0.446969
6,1.7231,1.899767,0.4846,0.487007,0.4846,0.47275
7,1.4898,1.77281,0.5089,0.5087,0.5089,0.500447
8,1.2783,1.754205,0.5212,0.525925,0.5212,0.517786
9,1.0923,1.727353,0.527,0.52724,0.527,0.522992
10,0.9497,1.751665,0.5246,0.528914,0.5246,0.521213


[I 2025-04-04 04:38:02,444] Trial 43 finished with value: 0.5212130025698173 and parameters: {'learning_rate': 0.0002567101802635103, 'weight_decay': 0.007, 'warmup_steps': 23}. Best is trial 33 with value: 0.5382578746242731.


Trial 44 with params: {'learning_rate': 0.00039043649303556124, 'weight_decay': 0.008, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0176,3.387897,0.1682,0.155927,0.1682,0.134151
2,3.1624,2.681143,0.2994,0.296355,0.2994,0.280299
3,2.6398,2.324214,0.3765,0.381139,0.3765,0.35983
4,2.2709,2.064607,0.4415,0.441851,0.4415,0.427389
5,1.9731,1.927654,0.4718,0.467173,0.4718,0.454939
6,1.7091,1.877386,0.4891,0.495571,0.4891,0.479949
7,1.473,1.758125,0.5135,0.513375,0.5135,0.505835
8,1.2505,1.729342,0.526,0.534207,0.526,0.523758


[I 2025-04-04 04:49:29,906] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.00017909540134884036, 'weight_decay': 0.009000000000000001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1149,3.486187,0.1597,0.148959,0.1597,0.12521
2,3.2996,2.803726,0.2892,0.278953,0.2892,0.261673
3,2.7672,2.398751,0.3656,0.363042,0.3656,0.348676
4,2.3813,2.196003,0.4123,0.413689,0.4123,0.396178
5,2.0851,2.021308,0.4497,0.434812,0.4497,0.430755
6,1.8247,1.984484,0.4652,0.462936,0.4652,0.452172
7,1.603,1.871417,0.4897,0.486986,0.4897,0.481438
8,1.3987,1.880812,0.4919,0.499138,0.4919,0.488194


[I 2025-04-04 05:00:50,808] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00036047754992388323, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0177,3.348714,0.1705,0.161407,0.1705,0.136958
2,3.1604,2.69306,0.299,0.304676,0.299,0.279974
3,2.6152,2.274303,0.3904,0.394837,0.3904,0.373611
4,2.247,2.056409,0.4424,0.451766,0.4424,0.42781
5,1.9521,1.929902,0.4746,0.479273,0.4746,0.459859
6,1.6895,1.857055,0.49,0.490217,0.49,0.479554
7,1.4613,1.761066,0.5219,0.522959,0.5219,0.513369
8,1.242,1.720171,0.5359,0.545612,0.5359,0.535697
9,1.0505,1.6947,0.5415,0.54362,0.5415,0.538288
10,0.8982,1.733478,0.5334,0.539248,0.5334,0.530454


[I 2025-04-04 05:15:12,757] Trial 46 finished with value: 0.5304539033996721 and parameters: {'learning_rate': 0.00036047754992388323, 'weight_decay': 0.005, 'warmup_steps': 30}. Best is trial 33 with value: 0.5382578746242731.


Trial 47 with params: {'learning_rate': 0.00040948554031791873, 'weight_decay': 0.004, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0359,3.371776,0.1741,0.173863,0.1741,0.140819
2,3.1983,2.709494,0.2967,0.297703,0.2967,0.275798
3,2.6571,2.304059,0.3761,0.3833,0.3761,0.359898
4,2.2981,2.109107,0.428,0.427592,0.428,0.413231
5,2.0036,1.948426,0.4622,0.457113,0.4622,0.445379
6,1.7385,1.880631,0.4824,0.484509,0.4824,0.47167
7,1.5116,1.752165,0.5157,0.51679,0.5157,0.508321
8,1.2903,1.717431,0.5274,0.535248,0.5274,0.525445
9,1.1004,1.688556,0.5346,0.533062,0.5346,0.530347
10,0.9453,1.696272,0.5326,0.534491,0.5326,0.529395


[I 2025-04-04 05:29:34,056] Trial 47 finished with value: 0.5293949696368664 and parameters: {'learning_rate': 0.00040948554031791873, 'weight_decay': 0.004, 'warmup_steps': 28}. Best is trial 33 with value: 0.5382578746242731.


Trial 48 with params: {'learning_rate': 0.00018259032952353866, 'weight_decay': 0.004, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1398,3.51585,0.1487,0.130478,0.1487,0.114696
2,3.3346,2.804631,0.2846,0.281874,0.2846,0.257856
3,2.7792,2.415549,0.3621,0.361773,0.3621,0.34329
4,2.3922,2.202123,0.4123,0.407907,0.4123,0.395478
5,2.0923,2.048663,0.4487,0.442341,0.4487,0.432694
6,1.8356,1.993629,0.4629,0.457765,0.4629,0.448849
7,1.6175,1.894258,0.4878,0.48564,0.4878,0.478872
8,1.4174,1.88694,0.4919,0.499588,0.4919,0.489034


[I 2025-04-04 05:40:56,463] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0019833740182084707, 'weight_decay': 0.005, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1757,3.828696,0.0944,0.06794,0.0944,0.059362
2,3.7269,3.428806,0.1591,0.161202,0.1591,0.130569


[I 2025-04-04 05:43:48,066] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0007211421872138092, 'weight_decay': 0.008, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9972,3.38785,0.1547,0.131946,0.1547,0.113472
2,3.2551,2.839245,0.2644,0.256278,0.2644,0.239813


[I 2025-04-04 05:46:39,847] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00039102569085393334, 'weight_decay': 0.004, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.028,3.425843,0.159,0.132196,0.159,0.1263
2,3.2013,2.749826,0.2845,0.284514,0.2845,0.263443
3,2.6643,2.298694,0.3817,0.382272,0.3817,0.364892
4,2.2978,2.084096,0.4332,0.435656,0.4332,0.420177
5,1.997,1.957156,0.4593,0.457705,0.4593,0.44254
6,1.7385,1.890615,0.4808,0.486062,0.4808,0.469697
7,1.5077,1.765357,0.5114,0.509619,0.5114,0.502361
8,1.291,1.752228,0.5218,0.535725,0.5218,0.52181
9,1.1058,1.701463,0.533,0.531515,0.533,0.52751
10,0.9525,1.730079,0.5261,0.530716,0.5261,0.523759


[I 2025-04-04 06:01:05,359] Trial 51 finished with value: 0.5237592997551667 and parameters: {'learning_rate': 0.00039102569085393334, 'weight_decay': 0.004, 'warmup_steps': 27}. Best is trial 33 with value: 0.5382578746242731.


Trial 52 with params: {'learning_rate': 0.0003610132134443412, 'weight_decay': 0.005, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0136,3.373919,0.1608,0.159062,0.1608,0.126723
2,3.189,2.727154,0.2969,0.296547,0.2969,0.277454
3,2.6485,2.30193,0.3825,0.385676,0.3825,0.367057
4,2.2629,2.069276,0.4349,0.440254,0.4349,0.422661


[I 2025-04-04 06:06:48,773] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.000758207971407956, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0924,3.606327,0.1262,0.097791,0.1262,0.091677
2,3.4094,3.012765,0.2268,0.22501,0.2268,0.201591
3,2.9389,2.549779,0.3204,0.316946,0.3204,0.299534
4,2.5787,2.368121,0.3647,0.367373,0.3647,0.347748


[I 2025-04-04 06:12:36,292] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.000403916017640712, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0236,3.382737,0.1667,0.152678,0.1667,0.131715
2,3.1381,2.642907,0.3121,0.317747,0.3121,0.29557
3,2.5866,2.233239,0.3956,0.403545,0.3956,0.381703
4,2.2086,2.023836,0.4461,0.451308,0.4461,0.435755
5,1.9139,1.902153,0.4808,0.478655,0.4808,0.465101
6,1.6634,1.840072,0.4933,0.499335,0.4933,0.482905
7,1.4373,1.723594,0.5203,0.517793,0.5203,0.512805
8,1.2175,1.699761,0.5337,0.543596,0.5337,0.532115
9,1.027,1.671752,0.5437,0.543311,0.5437,0.53985
10,0.8715,1.697416,0.539,0.544239,0.539,0.537335


[I 2025-04-04 06:26:55,162] Trial 54 finished with value: 0.53733545555154 and parameters: {'learning_rate': 0.000403916017640712, 'weight_decay': 0.0, 'warmup_steps': 23}. Best is trial 33 with value: 0.5382578746242731.


Trial 55 with params: {'learning_rate': 0.0003644598615777491, 'weight_decay': 0.001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0323,3.359752,0.1669,0.162547,0.1669,0.132387
2,3.1935,2.69637,0.297,0.297213,0.297,0.277713
3,2.6437,2.30577,0.3829,0.393499,0.3829,0.366408
4,2.2556,2.044224,0.4453,0.445286,0.4453,0.43242
5,1.9523,1.921371,0.4767,0.473913,0.4767,0.461625
6,1.6984,1.842948,0.4951,0.496679,0.4951,0.485969
7,1.4706,1.73855,0.5206,0.51734,0.5206,0.511713
8,1.2516,1.720962,0.5294,0.536086,0.5294,0.526408
9,1.0663,1.675933,0.5403,0.537384,0.5403,0.534368
10,0.9151,1.699202,0.5391,0.545361,0.5391,0.537109


[I 2025-04-04 06:41:21,782] Trial 55 finished with value: 0.5371087642918608 and parameters: {'learning_rate': 0.0003644598615777491, 'weight_decay': 0.001, 'warmup_steps': 23}. Best is trial 33 with value: 0.5382578746242731.


Trial 56 with params: {'learning_rate': 0.0006720275707572625, 'weight_decay': 0.001, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0672,3.471627,0.1455,0.142352,0.1455,0.108911
2,3.3229,2.836525,0.2686,0.272083,0.2686,0.246477


[I 2025-04-04 06:44:18,388] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00019595270715851178, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1267,3.503455,0.1495,0.137147,0.1495,0.118424
2,3.3069,2.798786,0.2898,0.294271,0.2898,0.266419
3,2.7687,2.403084,0.3677,0.365719,0.3677,0.3487
4,2.3754,2.163947,0.4191,0.419368,0.4191,0.404411
5,2.0714,2.027099,0.4487,0.44154,0.4487,0.431424
6,1.8051,1.955083,0.4733,0.471197,0.4733,0.461541
7,1.5737,1.875153,0.4889,0.489234,0.4889,0.480568
8,1.3689,1.84322,0.5019,0.510039,0.5019,0.499374


[I 2025-04-04 06:55:45,491] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0003871300548029472, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0168,3.356032,0.171,0.171,0.171,0.137334
2,3.1738,2.709019,0.2943,0.29632,0.2943,0.274139
3,2.635,2.263179,0.388,0.391655,0.388,0.370121
4,2.249,2.06083,0.4395,0.438492,0.4395,0.423785
5,1.9507,1.916282,0.4739,0.474605,0.4739,0.458431
6,1.6907,1.848049,0.4969,0.504382,0.4969,0.487228
7,1.4637,1.732108,0.5218,0.519481,0.5218,0.513012
8,1.2438,1.708569,0.531,0.539098,0.531,0.529444
9,1.057,1.683825,0.5359,0.534709,0.5359,0.530667
10,0.8996,1.706031,0.5363,0.540049,0.5363,0.533194


[I 2025-04-04 07:10:06,271] Trial 58 finished with value: 0.5331937407362749 and parameters: {'learning_rate': 0.0003871300548029472, 'weight_decay': 0.0, 'warmup_steps': 14}. Best is trial 33 with value: 0.5382578746242731.


Trial 59 with params: {'learning_rate': 0.0002471824952041614, 'weight_decay': 0.001, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0633,3.354403,0.1754,0.16312,0.1754,0.13948
2,3.1766,2.68434,0.3075,0.300252,0.3075,0.287589
3,2.6299,2.314373,0.3804,0.389692,0.3804,0.362264
4,2.2419,2.070317,0.4362,0.437313,0.4362,0.423058


[I 2025-04-04 07:15:53,543] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.0002727548931746249, 'weight_decay': 0.001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0259,3.346786,0.1731,0.177587,0.1731,0.138633
2,3.1644,2.684291,0.3093,0.312243,0.3093,0.288519
3,2.6183,2.277107,0.3935,0.391296,0.3935,0.376314
4,2.2361,2.049547,0.4484,0.451505,0.4484,0.435863
5,1.9299,1.928113,0.4741,0.468423,0.4741,0.458053
6,1.6692,1.868804,0.4915,0.493007,0.4915,0.480794
7,1.4341,1.786233,0.5104,0.511674,0.5104,0.502868
8,1.2178,1.780813,0.5177,0.527063,0.5177,0.515091


[I 2025-04-04 07:27:26,039] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0003671243659843697, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0254,3.34604,0.1658,0.158709,0.1658,0.130392
2,3.1634,2.65952,0.3021,0.302322,0.3021,0.281205
3,2.6393,2.301445,0.3795,0.382546,0.3795,0.362616
4,2.2655,2.08621,0.4319,0.438677,0.4319,0.41895


[I 2025-04-04 07:33:13,964] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0005891875245600678, 'weight_decay': 0.001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0321,3.437713,0.1538,0.139413,0.1538,0.116874
2,3.3037,2.842641,0.2648,0.26135,0.2648,0.243086
3,2.7907,2.434716,0.349,0.352563,0.349,0.332633
4,2.4206,2.198372,0.4015,0.398223,0.4015,0.38392
5,2.1229,2.026779,0.4421,0.437744,0.4421,0.423254
6,1.8631,1.914477,0.4732,0.474632,0.4732,0.460354
7,1.6452,1.815347,0.5034,0.501616,0.5034,0.492992
8,1.4353,1.750995,0.5185,0.52352,0.5185,0.51471


[I 2025-04-04 07:44:48,145] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0004915154105661953, 'weight_decay': 0.0, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9981,3.358736,0.172,0.155702,0.172,0.140132
2,3.1865,2.734582,0.2857,0.282727,0.2857,0.263104


[I 2025-04-04 07:47:39,203] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0008404644665706116, 'weight_decay': 0.0, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1203,3.619814,0.1139,0.100043,0.1139,0.081206
2,3.4567,3.004842,0.2274,0.218662,0.2274,0.200289
3,2.9607,2.596965,0.3095,0.309939,0.3095,0.288808
4,2.616,2.344864,0.3803,0.390004,0.3803,0.364043


[I 2025-04-04 07:53:27,107] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0008406294139913865, 'weight_decay': 0.003, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0754,3.511653,0.1336,0.118314,0.1336,0.099369
2,3.3375,2.917971,0.2494,0.255421,0.2494,0.224548


[I 2025-04-04 07:56:21,643] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.00029841098830451894, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0191,3.288209,0.1837,0.156974,0.1837,0.146594
2,3.193,2.721153,0.2931,0.294239,0.2931,0.272376
3,2.6681,2.338634,0.3786,0.376467,0.3786,0.359616
4,2.2903,2.091597,0.4344,0.431166,0.4344,0.420827
5,1.9854,1.952904,0.469,0.464673,0.469,0.454584
6,1.7196,1.926421,0.4738,0.481067,0.4738,0.463609
7,1.4878,1.789031,0.5073,0.50657,0.5073,0.498917
8,1.2651,1.769378,0.5192,0.530174,0.5192,0.518492
9,1.082,1.744968,0.5249,0.524959,0.5249,0.520395
10,0.9309,1.769325,0.5207,0.527213,0.5207,0.518537


[I 2025-04-04 08:10:48,549] Trial 66 finished with value: 0.518537059234903 and parameters: {'learning_rate': 0.00029841098830451894, 'weight_decay': 0.0, 'warmup_steps': 14}. Best is trial 33 with value: 0.5382578746242731.


Trial 67 with params: {'learning_rate': 0.0008547824349901803, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0935,3.579277,0.1307,0.110683,0.1307,0.094609
2,3.4458,3.014219,0.2255,0.229611,0.2255,0.195464


[I 2025-04-04 08:13:44,472] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.00026885239623501174, 'weight_decay': 0.003, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0428,3.365268,0.1774,0.163437,0.1774,0.143141
2,3.2028,2.691079,0.307,0.308505,0.307,0.28569
3,2.6401,2.282687,0.3869,0.392525,0.3869,0.370135
4,2.2476,2.074661,0.4378,0.436918,0.4378,0.423549
5,1.9465,1.945043,0.4701,0.465403,0.4701,0.453442
6,1.6816,1.861655,0.4902,0.489809,0.4902,0.480742
7,1.4466,1.776543,0.512,0.511359,0.512,0.50341
8,1.2279,1.754795,0.5187,0.520707,0.5187,0.51447


[I 2025-04-04 08:25:20,057] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0002337869674467017, 'weight_decay': 0.006, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0759,3.414841,0.1618,0.160277,0.1618,0.127828
2,3.2249,2.740292,0.2941,0.293991,0.2941,0.271292
3,2.6879,2.372019,0.3691,0.37504,0.3691,0.352983
4,2.2986,2.115814,0.4291,0.427036,0.4291,0.413716
5,1.9982,1.962392,0.468,0.462076,0.468,0.451952
6,1.7395,1.912559,0.4774,0.479585,0.4774,0.466811
7,1.5129,1.827913,0.4994,0.49787,0.4994,0.49045
8,1.2952,1.791998,0.5106,0.514254,0.5106,0.506728


[I 2025-04-04 08:36:47,925] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0032109758631513803, 'weight_decay': 0.004, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3016,4.090867,0.0621,0.03557,0.0621,0.028831
2,3.9492,3.690312,0.1132,0.111114,0.1132,0.082375
3,3.6399,3.351461,0.1644,0.151701,0.1644,0.136661
4,3.298,3.042822,0.2274,0.214346,0.2274,0.197586


[I 2025-04-04 08:42:38,844] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.00038252724650583025, 'weight_decay': 0.001, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0208,3.274461,0.1821,0.188328,0.1821,0.150644
2,3.1182,2.651101,0.3106,0.311396,0.3106,0.291887
3,2.5973,2.242654,0.3939,0.404701,0.3939,0.381322
4,2.2167,2.029828,0.4488,0.456953,0.4488,0.436358
5,1.9173,1.864082,0.4876,0.483858,0.4876,0.472673
6,1.6588,1.831017,0.5013,0.507689,0.5013,0.491675
7,1.4328,1.732812,0.5205,0.521096,0.5205,0.513351
8,1.2162,1.692528,0.5386,0.5463,0.5386,0.536372
9,1.0265,1.667748,0.5447,0.547395,0.5447,0.540561
10,0.8747,1.679097,0.5432,0.546654,0.5432,0.540541


[I 2025-04-04 08:57:02,627] Trial 71 finished with value: 0.5405412839264137 and parameters: {'learning_rate': 0.00038252724650583025, 'weight_decay': 0.001, 'warmup_steps': 25}. Best is trial 71 with value: 0.5405412839264137.


Trial 72 with params: {'learning_rate': 0.00038510056326945466, 'weight_decay': 0.002, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0413,3.401419,0.1641,0.142155,0.1641,0.131013
2,3.1813,2.692895,0.3029,0.304787,0.3029,0.285239
3,2.6358,2.288967,0.3872,0.397076,0.3872,0.371899
4,2.2639,2.057875,0.4376,0.447135,0.4376,0.424348
5,1.9667,1.935147,0.4677,0.465583,0.4677,0.453342
6,1.707,1.847164,0.4892,0.489804,0.4892,0.478102
7,1.474,1.732309,0.5196,0.519226,0.5196,0.511845
8,1.258,1.723202,0.5261,0.533889,0.5261,0.523357
9,1.0668,1.67961,0.5399,0.54123,0.5399,0.536195
10,0.9141,1.688663,0.5375,0.543665,0.5375,0.536335


[I 2025-04-04 09:11:28,854] Trial 72 finished with value: 0.5363353009529392 and parameters: {'learning_rate': 0.00038510056326945466, 'weight_decay': 0.002, 'warmup_steps': 24}. Best is trial 71 with value: 0.5405412839264137.


Trial 73 with params: {'learning_rate': 0.0003114452353320097, 'weight_decay': 0.002, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0176,3.320877,0.1807,0.170879,0.1807,0.146113
2,3.1897,2.687501,0.2997,0.294163,0.2997,0.279845
3,2.6476,2.321222,0.3778,0.391893,0.3778,0.365342
4,2.2668,2.095282,0.4383,0.442927,0.4383,0.425614
5,1.9613,1.97161,0.4638,0.466499,0.4638,0.449496
6,1.6974,1.869927,0.4901,0.491623,0.4901,0.479764
7,1.4625,1.775869,0.5135,0.512949,0.5135,0.506272
8,1.2389,1.766205,0.5214,0.530711,0.5214,0.519563


[I 2025-04-04 09:22:53,600] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0005756691363929843, 'weight_decay': 0.002, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9771,3.336965,0.1689,0.144189,0.1689,0.13215
2,3.2002,2.759229,0.2841,0.287129,0.2841,0.261911
3,2.7159,2.365518,0.3605,0.364283,0.3605,0.343399
4,2.3569,2.137701,0.4187,0.426174,0.4187,0.405514
5,2.0764,1.997076,0.4566,0.452095,0.4566,0.439828
6,1.8184,1.919618,0.4729,0.474663,0.4729,0.462091
7,1.6004,1.779958,0.5093,0.507836,0.5093,0.499803
8,1.3873,1.754239,0.5203,0.531508,0.5203,0.518676
9,1.1984,1.705724,0.5332,0.531755,0.5332,0.526889
10,1.0396,1.698924,0.5335,0.535132,0.5335,0.530519


[I 2025-04-04 09:37:14,025] Trial 74 finished with value: 0.5305188040574309 and parameters: {'learning_rate': 0.0005756691363929843, 'weight_decay': 0.002, 'warmup_steps': 26}. Best is trial 71 with value: 0.5405412839264137.


Trial 75 with params: {'learning_rate': 0.0002030948773864398, 'weight_decay': 0.001, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0882,3.489107,0.1508,0.136231,0.1508,0.118471
2,3.3096,2.814008,0.2862,0.286059,0.2862,0.262162


[I 2025-04-04 09:40:10,138] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.004262413606266819, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3243,4.056489,0.0576,0.020697,0.0576,0.023975
2,3.9788,3.731885,0.1114,0.082521,0.1114,0.076642
3,3.6763,3.359297,0.1798,0.176652,0.1798,0.144453
4,3.2914,2.990678,0.2468,0.244136,0.2468,0.218654


[I 2025-04-04 09:45:56,101] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00040886405360140944, 'weight_decay': 0.001, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0142,3.348605,0.1663,0.145697,0.1663,0.129303
2,3.1683,2.716443,0.2903,0.287985,0.2903,0.270748


[I 2025-04-04 09:48:46,139] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 6.94436628847983e-05, 'weight_decay': 0.002, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3654,3.764683,0.1182,0.08287,0.1182,0.076776
2,3.7703,3.344794,0.1937,0.175328,0.1937,0.157655


[I 2025-04-04 09:51:43,097] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0005743704713627893, 'weight_decay': 0.0, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.018,3.442319,0.151,0.144825,0.151,0.117153
2,3.2695,2.836875,0.2618,0.262056,0.2618,0.240036
3,2.7754,2.436272,0.347,0.353908,0.347,0.330721
4,2.4273,2.196283,0.4048,0.407813,0.4048,0.389084
5,2.1527,2.046499,0.4405,0.435941,0.4405,0.424794
6,1.9042,1.96192,0.4621,0.460882,0.4621,0.449444
7,1.6846,1.849581,0.4917,0.49104,0.4917,0.482942
8,1.4766,1.799128,0.5084,0.516312,0.5084,0.505282


[I 2025-04-04 10:03:16,861] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.004093359216700726, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2996,4.047016,0.0648,0.030888,0.0648,0.031482
2,3.9431,3.709224,0.1152,0.094459,0.1152,0.08472


[I 2025-04-04 10:06:08,076] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0006506255606680077, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0031,3.412939,0.1561,0.1451,0.1561,0.124978
2,3.2532,2.814126,0.2679,0.282642,0.2679,0.246474
3,2.7617,2.403911,0.3498,0.357295,0.3498,0.332148
4,2.4006,2.15401,0.4149,0.41864,0.4149,0.398988
5,2.1112,2.006095,0.4492,0.445119,0.4492,0.431043
6,1.8545,1.919339,0.4752,0.47814,0.4752,0.463127
7,1.6382,1.807883,0.5016,0.499451,0.5016,0.491046
8,1.4261,1.742369,0.5228,0.529581,0.5228,0.518572
9,1.243,1.701522,0.5331,0.533352,0.5331,0.527913
10,1.0845,1.704616,0.5347,0.53877,0.5347,0.531561


[I 2025-04-04 10:20:36,550] Trial 81 finished with value: 0.5315612027431542 and parameters: {'learning_rate': 0.0006506255606680077, 'weight_decay': 0.01, 'warmup_steps': 20}. Best is trial 71 with value: 0.5405412839264137.


Trial 82 with params: {'learning_rate': 0.0026245310374742674, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2549,3.978134,0.0667,0.048169,0.0667,0.037741
2,3.8626,3.631358,0.1231,0.100514,0.1231,0.097454


[I 2025-04-04 10:23:23,272] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0005243691938836359, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9953,3.3051,0.1798,0.164937,0.1798,0.147304
2,3.1451,2.694155,0.2944,0.292724,0.2944,0.272566
3,2.638,2.311352,0.3783,0.385476,0.3783,0.362576
4,2.2775,2.061078,0.4415,0.445322,0.4415,0.427277
5,1.9931,1.940941,0.4705,0.468777,0.4705,0.455155
6,1.7371,1.89273,0.4833,0.495007,0.4833,0.473025
7,1.5151,1.74056,0.5161,0.517459,0.5161,0.508526
8,1.2992,1.705676,0.5316,0.540203,0.5316,0.529815
9,1.1069,1.677964,0.5386,0.538076,0.5386,0.533189
10,0.9499,1.691091,0.5386,0.544408,0.5386,0.536286


[I 2025-04-04 10:37:46,938] Trial 83 finished with value: 0.5362858142810055 and parameters: {'learning_rate': 0.0005243691938836359, 'weight_decay': 0.01, 'warmup_steps': 20}. Best is trial 71 with value: 0.5405412839264137.


Trial 84 with params: {'learning_rate': 0.0005400929196081989, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0345,3.378384,0.1636,0.173258,0.1636,0.130261
2,3.2394,2.791461,0.2789,0.287193,0.2789,0.257304


[I 2025-04-04 10:40:42,217] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.000684500551460477, 'weight_decay': 0.009000000000000001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0395,3.437392,0.155,0.142187,0.155,0.116553
2,3.3001,2.83526,0.2705,0.275604,0.2705,0.248864
3,2.773,2.416352,0.3578,0.369587,0.3578,0.342589
4,2.4155,2.180837,0.4086,0.409076,0.4086,0.39234


[I 2025-04-04 10:46:21,405] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.00032088974680676946, 'weight_decay': 0.001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0411,3.349575,0.173,0.167031,0.173,0.136772
2,3.1583,2.63698,0.3151,0.311721,0.3151,0.29649
3,2.6205,2.286795,0.3892,0.393568,0.3892,0.372988
4,2.2473,2.045912,0.441,0.445463,0.441,0.428341
5,1.9382,1.931366,0.473,0.472207,0.473,0.457566
6,1.6734,1.853151,0.4969,0.497762,0.4969,0.48635
7,1.4461,1.752422,0.5162,0.51511,0.5162,0.507777
8,1.2238,1.732196,0.5299,0.537479,0.5299,0.527584
9,1.0354,1.709603,0.5348,0.534121,0.5348,0.529109
10,0.886,1.709715,0.5378,0.537377,0.5378,0.53409


[I 2025-04-04 11:00:37,246] Trial 86 finished with value: 0.534090037413412 and parameters: {'learning_rate': 0.00032088974680676946, 'weight_decay': 0.001, 'warmup_steps': 23}. Best is trial 71 with value: 0.5405412839264137.


Trial 87 with params: {'learning_rate': 0.0001950686068582451, 'weight_decay': 0.002, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1074,3.447351,0.1631,0.145583,0.1631,0.129618
2,3.2994,2.824234,0.2818,0.280055,0.2818,0.255053
3,2.7554,2.393996,0.3685,0.366175,0.3685,0.348822
4,2.3649,2.150172,0.423,0.420451,0.423,0.407018
5,2.0567,2.014029,0.4577,0.449793,0.4577,0.44083
6,1.799,1.952819,0.4721,0.470535,0.4721,0.459885
7,1.5742,1.878401,0.4881,0.48615,0.4881,0.479027
8,1.367,1.838089,0.5014,0.504358,0.5014,0.496611


[I 2025-04-04 11:12:02,494] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0002981832597572342, 'weight_decay': 0.0, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0214,3.338796,0.1807,0.170133,0.1807,0.143859
2,3.178,2.692646,0.3023,0.303698,0.3023,0.284237
3,2.6106,2.268546,0.3921,0.39692,0.3921,0.3758
4,2.2148,2.044009,0.4461,0.444872,0.4461,0.433429
5,1.9141,1.899584,0.4774,0.474084,0.4774,0.462118
6,1.6495,1.846746,0.4929,0.489257,0.4929,0.480471
7,1.4222,1.750318,0.524,0.523021,0.524,0.516638
8,1.2033,1.728824,0.5306,0.539592,0.5306,0.529955
9,1.0189,1.69771,0.5356,0.53951,0.5356,0.533291
10,0.8698,1.715534,0.5361,0.53981,0.5361,0.534245


[I 2025-04-04 11:26:17,529] Trial 88 finished with value: 0.5342445413747026 and parameters: {'learning_rate': 0.0002981832597572342, 'weight_decay': 0.0, 'warmup_steps': 29}. Best is trial 71 with value: 0.5405412839264137.


Trial 89 with params: {'learning_rate': 0.0004596038310422563, 'weight_decay': 0.0, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0596,3.419873,0.1534,0.144827,0.1534,0.116191
2,3.2539,2.834457,0.2748,0.274118,0.2748,0.252012


[I 2025-04-04 11:29:06,752] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 6.376083832519333e-05, 'weight_decay': 0.001, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.3899,3.80853,0.1128,0.077916,0.1128,0.072726
2,3.8277,3.433274,0.174,0.173474,0.174,0.138679


[I 2025-04-04 11:31:58,828] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0001498170836224393, 'weight_decay': 0.0, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1677,3.509914,0.1494,0.139693,0.1494,0.115413
2,3.3955,2.919218,0.2681,0.252453,0.2681,0.239538
3,2.8958,2.527003,0.3379,0.337313,0.3379,0.319388
4,2.5228,2.31408,0.3876,0.379651,0.3876,0.370886


[I 2025-04-04 11:37:51,101] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.00029436715136577405, 'weight_decay': 0.0, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.06,3.382299,0.1655,0.162192,0.1655,0.133001
2,3.1737,2.673704,0.3018,0.292018,0.3018,0.275728
3,2.6323,2.298515,0.3817,0.381059,0.3817,0.363583
4,2.2616,2.075522,0.4341,0.439077,0.4341,0.421558


[I 2025-04-04 11:43:38,680] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00040522849536195804, 'weight_decay': 0.002, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9927,3.295926,0.1791,0.161791,0.1791,0.143159
2,3.1264,2.692923,0.297,0.300458,0.297,0.279166
3,2.6064,2.258422,0.3892,0.390014,0.3892,0.372669
4,2.2358,2.059443,0.4402,0.442059,0.4402,0.425298
5,1.9457,1.941829,0.4674,0.466677,0.4674,0.450881
6,1.6887,1.847095,0.4936,0.496527,0.4936,0.481962
7,1.464,1.736323,0.5174,0.514988,0.5174,0.507972
8,1.2446,1.709097,0.533,0.53856,0.533,0.529417
9,1.0567,1.675565,0.5395,0.538524,0.5395,0.534807
10,0.8954,1.701515,0.5357,0.5396,0.5357,0.532499


[I 2025-04-04 11:58:05,921] Trial 93 finished with value: 0.5324992094578337 and parameters: {'learning_rate': 0.00040522849536195804, 'weight_decay': 0.002, 'warmup_steps': 20}. Best is trial 71 with value: 0.5405412839264137.


Trial 94 with params: {'learning_rate': 0.0002582478936204756, 'weight_decay': 0.01, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0457,3.377079,0.173,0.169544,0.173,0.137466
2,3.179,2.667471,0.3104,0.309,0.3104,0.291971
3,2.6166,2.277431,0.3913,0.396468,0.3913,0.375624
4,2.2348,2.054333,0.4463,0.444717,0.4463,0.43284
5,1.9393,1.93065,0.4761,0.474457,0.4761,0.460981
6,1.6729,1.878385,0.4867,0.492729,0.4867,0.477285
7,1.439,1.772561,0.5185,0.519027,0.5185,0.511115
8,1.2264,1.773673,0.5207,0.531817,0.5207,0.519729


[I 2025-04-04 12:10:04,118] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.000420180969579519, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9684,3.266371,0.1854,0.173752,0.1854,0.150979
2,3.122,2.642025,0.3106,0.314525,0.3106,0.293242
3,2.6028,2.248164,0.3939,0.400357,0.3939,0.378221
4,2.2218,2.047532,0.4398,0.446561,0.4398,0.426856
5,1.9222,1.903944,0.4778,0.477083,0.4778,0.462293
6,1.6741,1.827054,0.4997,0.50576,0.4997,0.490264
7,1.4381,1.725484,0.5241,0.525248,0.5241,0.51677
8,1.2145,1.7208,0.5283,0.536902,0.5283,0.525932
9,1.0205,1.689928,0.5397,0.541785,0.5397,0.536184
10,0.8598,1.688877,0.5392,0.543967,0.5392,0.537569


[I 2025-04-04 12:24:36,045] Trial 95 finished with value: 0.5375686622384629 and parameters: {'learning_rate': 0.000420180969579519, 'weight_decay': 0.01, 'warmup_steps': 15}. Best is trial 71 with value: 0.5405412839264137.


Trial 96 with params: {'learning_rate': 0.0003415379603895951, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9717,3.246229,0.1896,0.173269,0.1896,0.154961
2,3.1061,2.623438,0.3144,0.31916,0.3144,0.294949
3,2.5846,2.256544,0.3949,0.395524,0.3949,0.377633
4,2.2219,2.051358,0.4385,0.438624,0.4385,0.424082


[I 2025-04-04 12:30:16,523] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0005096700138491908, 'weight_decay': 0.01, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0713,3.432254,0.1579,0.135654,0.1579,0.121877
2,3.2814,2.768844,0.2819,0.276014,0.2819,0.255878
3,2.7426,2.362844,0.3657,0.371896,0.3657,0.35048
4,2.3631,2.131323,0.4189,0.418291,0.4189,0.404526
5,2.0711,1.971702,0.4561,0.452951,0.4561,0.440942
6,1.8186,1.908899,0.4782,0.479788,0.4782,0.465653
7,1.597,1.773218,0.5134,0.509399,0.5134,0.504591
8,1.3858,1.739341,0.519,0.526081,0.519,0.51567


[I 2025-04-04 12:41:37,496] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0006795372675171427, 'weight_decay': 0.01, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.042,3.441452,0.1531,0.142552,0.1531,0.117695
2,3.3234,2.874326,0.2641,0.263545,0.2641,0.239481
3,2.8303,2.465951,0.3416,0.347534,0.3416,0.323779
4,2.4743,2.23045,0.4043,0.405475,0.4043,0.386789


[I 2025-04-04 12:47:23,317] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00018885836123781226, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1039,3.444294,0.1662,0.149238,0.1662,0.129248
2,3.2867,2.810511,0.2903,0.288318,0.2903,0.263482


[I 2025-04-04 12:50:12,071] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00024964437525612824, 'weight_decay': 0.009000000000000001, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0744,3.424875,0.1638,0.146847,0.1638,0.124831
2,3.2023,2.6934,0.3023,0.309675,0.3023,0.280852
3,2.6337,2.281998,0.3853,0.386059,0.3853,0.370273
4,2.2446,2.059958,0.4413,0.440613,0.4413,0.427994
5,1.9488,1.949072,0.4637,0.458141,0.4637,0.448189
6,1.6871,1.88253,0.4831,0.485651,0.4831,0.472672
7,1.4543,1.779749,0.5094,0.508461,0.5094,0.501558
8,1.2384,1.753759,0.5209,0.525959,0.5209,0.51826


[I 2025-04-04 13:01:52,016] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00020682669930605755, 'weight_decay': 0.002, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0753,3.391057,0.1705,0.15565,0.1705,0.134373
2,3.2884,2.788453,0.2867,0.28674,0.2867,0.263544
3,2.7466,2.389808,0.3668,0.366267,0.3668,0.347483
4,2.3563,2.153994,0.4213,0.419705,0.4213,0.405728
5,2.0533,2.018591,0.4541,0.448274,0.4541,0.43886
6,1.788,1.957043,0.4673,0.466047,0.4673,0.455407
7,1.5562,1.868852,0.4883,0.486712,0.4883,0.479513
8,1.3449,1.826591,0.502,0.506731,0.502,0.498475


[I 2025-04-04 13:13:15,940] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.000389011051242561, 'weight_decay': 0.002, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0023,3.343554,0.1782,0.173713,0.1782,0.1451
2,3.1438,2.673036,0.3028,0.308026,0.3028,0.284481
3,2.6234,2.270673,0.3842,0.388012,0.3842,0.369872
4,2.2615,2.052147,0.4376,0.443547,0.4376,0.425197
5,1.9648,1.931627,0.4708,0.470305,0.4708,0.454305
6,1.708,1.885762,0.4855,0.495114,0.4855,0.475435
7,1.477,1.757958,0.5162,0.513914,0.5162,0.508049
8,1.2601,1.732128,0.5318,0.540446,0.5318,0.52938
9,1.0675,1.698899,0.5354,0.534484,0.5354,0.530735
10,0.9071,1.713201,0.5338,0.535694,0.5338,0.530866


[I 2025-04-04 13:27:42,174] Trial 102 finished with value: 0.5308659268109924 and parameters: {'learning_rate': 0.000389011051242561, 'weight_decay': 0.002, 'warmup_steps': 28}. Best is trial 71 with value: 0.5405412839264137.


Trial 103 with params: {'learning_rate': 0.00033636341493947244, 'weight_decay': 0.001, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0389,3.365161,0.1741,0.155491,0.1741,0.138936
2,3.1611,2.710664,0.2954,0.291612,0.2954,0.275284
3,2.6146,2.286819,0.3856,0.390968,0.3856,0.370686
4,2.2339,2.069625,0.4405,0.448457,0.4405,0.427668
5,1.9383,1.904285,0.4779,0.47589,0.4779,0.462848
6,1.6809,1.845822,0.4886,0.489431,0.4886,0.476924
7,1.4473,1.773996,0.5128,0.511162,0.5128,0.503307
8,1.2267,1.749071,0.5209,0.529446,0.5209,0.518537
9,1.0356,1.717796,0.5339,0.534549,0.5339,0.528542
10,0.8869,1.732669,0.5313,0.534371,0.5313,0.528021


[I 2025-04-04 13:42:17,650] Trial 103 finished with value: 0.5280205338597738 and parameters: {'learning_rate': 0.00033636341493947244, 'weight_decay': 0.001, 'warmup_steps': 21}. Best is trial 71 with value: 0.5405412839264137.


Trial 104 with params: {'learning_rate': 0.00023129434266159745, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0541,3.348276,0.1817,0.165817,0.1817,0.147377
2,3.1595,2.65408,0.3205,0.319559,0.3205,0.301958
3,2.6107,2.275938,0.3911,0.391184,0.3911,0.375455
4,2.2305,2.06128,0.4475,0.447118,0.4475,0.433792
5,1.9316,1.933403,0.4765,0.470713,0.4765,0.460086
6,1.6657,1.892266,0.4889,0.492422,0.4889,0.477622
7,1.4381,1.776538,0.513,0.511554,0.513,0.504409
8,1.2284,1.768508,0.5206,0.528042,0.5206,0.518323


[I 2025-04-04 13:53:48,972] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0008239371320343102, 'weight_decay': 0.008, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0878,3.557871,0.1384,0.108227,0.1384,0.102611
2,3.4236,2.976883,0.2405,0.238933,0.2405,0.212483
3,2.9327,2.564097,0.3239,0.326028,0.3239,0.303623
4,2.5835,2.339588,0.3725,0.369499,0.3725,0.355212


[I 2025-04-04 13:59:38,582] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0005262741049323635, 'weight_decay': 0.001, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0413,3.354293,0.1664,0.139,0.1664,0.12903
2,3.2303,2.746947,0.2872,0.28453,0.2872,0.2655


[I 2025-04-04 14:02:32,081] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0004848409980507442, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.046,3.429307,0.1506,0.126353,0.1506,0.115845
2,3.2924,2.823727,0.2748,0.273264,0.2748,0.252527
3,2.7712,2.426687,0.3544,0.360234,0.3544,0.336383
4,2.3994,2.174826,0.4073,0.403456,0.4073,0.388876


[I 2025-04-04 14:08:23,390] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0003156108278266764, 'weight_decay': 0.0, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0186,3.370134,0.1723,0.159114,0.1723,0.138252
2,3.1485,2.638975,0.3141,0.307711,0.3141,0.292382
3,2.6,2.260447,0.3914,0.393988,0.3914,0.375769
4,2.2321,2.071219,0.4379,0.444469,0.4379,0.423627
5,1.9369,1.927924,0.4769,0.472748,0.4769,0.461484
6,1.6754,1.86172,0.4975,0.499641,0.4975,0.487511
7,1.4434,1.762439,0.5207,0.519717,0.5207,0.511721
8,1.2234,1.728918,0.5328,0.540996,0.5328,0.531192
9,1.0372,1.706081,0.5341,0.535659,0.5341,0.531126
10,0.8871,1.725202,0.5333,0.537326,0.5333,0.531751


[I 2025-04-04 14:22:55,192] Trial 108 finished with value: 0.5317514886162895 and parameters: {'learning_rate': 0.0003156108278266764, 'weight_decay': 0.0, 'warmup_steps': 29}. Best is trial 71 with value: 0.5405412839264137.


Trial 109 with params: {'learning_rate': 0.0010027060203875104, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0911,3.597989,0.1266,0.102226,0.1266,0.091816
2,3.4613,3.032442,0.2296,0.241435,0.2296,0.20424


[I 2025-04-04 14:25:46,683] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.000502320215936422, 'weight_decay': 0.009000000000000001, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0178,3.359232,0.1678,0.150789,0.1678,0.130118
2,3.2134,2.741807,0.2842,0.279719,0.2842,0.260991


[I 2025-04-04 14:28:33,809] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0005815277385696435, 'weight_decay': 0.002, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0157,3.406901,0.158,0.155259,0.158,0.125245
2,3.247,2.767903,0.2839,0.286999,0.2839,0.263153
3,2.7396,2.396855,0.3578,0.363767,0.3578,0.340095
4,2.3788,2.145283,0.4156,0.416971,0.4156,0.398992
5,2.0935,2.014194,0.4509,0.450055,0.4509,0.432506
6,1.847,1.898276,0.4829,0.488226,0.4829,0.472724
7,1.6212,1.792705,0.5104,0.509653,0.5104,0.501078
8,1.4105,1.755248,0.5255,0.534917,0.5255,0.523127
9,1.2216,1.709916,0.5296,0.526906,0.5296,0.523391
10,1.0583,1.706914,0.5331,0.53283,0.5331,0.529204


[I 2025-04-04 14:42:58,218] Trial 111 finished with value: 0.529204321267364 and parameters: {'learning_rate': 0.0005815277385696435, 'weight_decay': 0.002, 'warmup_steps': 21}. Best is trial 71 with value: 0.5405412839264137.


Trial 112 with params: {'learning_rate': 0.0006222896094416296, 'weight_decay': 0.0, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9885,3.377872,0.1707,0.149194,0.1707,0.137303
2,3.2559,2.810486,0.2722,0.271901,0.2722,0.248659


[I 2025-04-04 14:45:47,072] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00039614690414722725, 'weight_decay': 0.01, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0201,3.394181,0.16,0.155387,0.16,0.126132
2,3.1679,2.692176,0.2963,0.289413,0.2963,0.272973
3,2.6186,2.239952,0.396,0.405918,0.396,0.382076
4,2.2468,2.043372,0.442,0.450795,0.442,0.4303
5,1.9527,1.915999,0.4769,0.476206,0.4769,0.460551
6,1.6933,1.857895,0.4904,0.492314,0.4904,0.478436
7,1.4671,1.738695,0.5218,0.518753,0.5218,0.512892
8,1.2498,1.717687,0.5302,0.538483,0.5302,0.528005
9,1.0607,1.688232,0.5372,0.535954,0.5372,0.531324
10,0.9088,1.689039,0.5399,0.541353,0.5399,0.536013


[I 2025-04-04 15:00:10,041] Trial 113 finished with value: 0.5360131827036281 and parameters: {'learning_rate': 0.00039614690414722725, 'weight_decay': 0.01, 'warmup_steps': 19}. Best is trial 71 with value: 0.5405412839264137.


Trial 114 with params: {'learning_rate': 0.0002618303271937896, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0587,3.411399,0.1592,0.14071,0.1592,0.120569
2,3.2344,2.746412,0.2886,0.289335,0.2886,0.26662


[I 2025-04-04 15:03:07,670] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.00047672424115830533, 'weight_decay': 0.008, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.027,3.366443,0.1572,0.13584,0.1572,0.125095
2,3.1984,2.800098,0.2782,0.279806,0.2782,0.260178
3,2.6811,2.347487,0.3701,0.37812,0.3701,0.356242
4,2.3109,2.105212,0.4303,0.433514,0.4303,0.416235
5,2.0155,1.957254,0.4607,0.460644,0.4607,0.444718
6,1.7673,1.903517,0.4826,0.484789,0.4826,0.470358
7,1.5419,1.78035,0.5093,0.509162,0.5093,0.500529
8,1.3322,1.747521,0.527,0.535898,0.527,0.524722
9,1.143,1.691363,0.5346,0.533219,0.5346,0.529385
10,0.9864,1.694317,0.5328,0.535563,0.5328,0.529793


[I 2025-04-04 15:17:38,703] Trial 115 finished with value: 0.5297930343049733 and parameters: {'learning_rate': 0.00047672424115830533, 'weight_decay': 0.008, 'warmup_steps': 19}. Best is trial 71 with value: 0.5405412839264137.


Trial 116 with params: {'learning_rate': 0.0005796222593112984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0465,3.46898,0.1509,0.128699,0.1509,0.113654
2,3.2744,2.811626,0.2687,0.276793,0.2687,0.244774


[I 2025-04-04 15:20:38,503] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0012195111898505371, 'weight_decay': 0.01, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1202,3.711149,0.1035,0.078472,0.1035,0.068606
2,3.5581,3.153143,0.1986,0.194167,0.1986,0.170431
3,3.1033,2.759329,0.2787,0.270715,0.2787,0.250775
4,2.7667,2.525735,0.3295,0.336873,0.3295,0.310612


[I 2025-04-04 15:26:26,547] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0006682039672886342, 'weight_decay': 0.01, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0731,3.523161,0.1395,0.119956,0.1395,0.10415
2,3.3696,2.923602,0.2526,0.244744,0.2526,0.223297
3,2.8716,2.527211,0.3294,0.332527,0.3294,0.305114
4,2.5268,2.28602,0.3878,0.393493,0.3878,0.37128


[I 2025-04-04 15:32:04,262] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0003894797860017099, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9874,3.318167,0.1805,0.157396,0.1805,0.142773
2,3.1577,2.674846,0.3065,0.305076,0.3065,0.285518
3,2.6439,2.331649,0.3799,0.390358,0.3799,0.366224
4,2.2784,2.060051,0.4362,0.440689,0.4362,0.423515


[I 2025-04-04 15:37:52,217] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00013501072872136455, 'weight_decay': 0.006, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1863,3.507218,0.1554,0.135969,0.1554,0.122032
2,3.4083,2.940786,0.262,0.259958,0.262,0.23341
3,2.9147,2.540136,0.3376,0.3351,0.3376,0.315855
4,2.5284,2.299699,0.3951,0.390362,0.3951,0.380057


[I 2025-04-04 15:43:30,592] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.000156556888586311, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1617,3.510569,0.1531,0.149186,0.1531,0.120918
2,3.3502,2.863557,0.2772,0.274425,0.2772,0.252441


[I 2025-04-04 15:46:24,337] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0002999466760594559, 'weight_decay': 0.001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0083,3.311482,0.177,0.174479,0.177,0.143075
2,3.154,2.678454,0.3051,0.299585,0.3051,0.281286
3,2.6025,2.244344,0.3973,0.406711,0.3973,0.380247
4,2.2206,2.059262,0.4429,0.448944,0.4429,0.43064
5,1.9204,1.945881,0.4669,0.464939,0.4669,0.451772
6,1.6623,1.861733,0.492,0.491535,0.492,0.481912
7,1.4361,1.780112,0.5093,0.511128,0.5093,0.503073
8,1.213,1.742687,0.5227,0.528081,0.5227,0.520392
9,1.03,1.733482,0.5231,0.526504,0.5231,0.519991
10,0.8792,1.74096,0.5264,0.529936,0.5264,0.524248


[I 2025-04-04 16:01:03,126] Trial 122 finished with value: 0.5242484912794722 and parameters: {'learning_rate': 0.0002999466760594559, 'weight_decay': 0.001, 'warmup_steps': 22}. Best is trial 71 with value: 0.5405412839264137.


Trial 123 with params: {'learning_rate': 0.0001869661636537782, 'weight_decay': 0.008, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1148,3.405218,0.172,0.154961,0.172,0.138258
2,3.2864,2.781842,0.2865,0.284516,0.2865,0.26236


[I 2025-04-04 16:03:56,099] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0003326002638633381, 'weight_decay': 0.005, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0357,3.383776,0.1628,0.155172,0.1628,0.130623
2,3.1954,2.686165,0.3036,0.30057,0.3036,0.282967
3,2.6472,2.277354,0.3866,0.384429,0.3866,0.368218
4,2.2703,2.083713,0.432,0.434768,0.432,0.419191
5,1.9724,1.924,0.4721,0.464297,0.4721,0.454568
6,1.7094,1.878266,0.4827,0.479808,0.4827,0.47002
7,1.4769,1.768385,0.5104,0.511312,0.5104,0.502685
8,1.2581,1.740824,0.5253,0.53419,0.5253,0.523357
9,1.0699,1.713741,0.5319,0.532831,0.5319,0.52713
10,0.9168,1.727444,0.5281,0.531911,0.5281,0.525389


[I 2025-04-04 16:18:25,107] Trial 124 finished with value: 0.5253890817546495 and parameters: {'learning_rate': 0.0003326002638633381, 'weight_decay': 0.005, 'warmup_steps': 20}. Best is trial 71 with value: 0.5405412839264137.


Trial 125 with params: {'learning_rate': 0.00039312687389094387, 'weight_decay': 0.003, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0065,3.353963,0.1773,0.172362,0.1773,0.144559
2,3.141,2.676123,0.2995,0.303492,0.2995,0.280694
3,2.6013,2.243469,0.3926,0.392136,0.3926,0.37789
4,2.2391,2.048819,0.4423,0.439431,0.4423,0.42745
5,1.9462,1.933268,0.4707,0.46518,0.4707,0.452766
6,1.6885,1.853294,0.4972,0.501338,0.4972,0.485273
7,1.4632,1.738186,0.52,0.519148,0.52,0.511155
8,1.2496,1.69292,0.5351,0.543491,0.5351,0.533627
9,1.0626,1.658623,0.5442,0.543066,0.5442,0.539405
10,0.9037,1.678079,0.5456,0.548692,0.5456,0.542626


[I 2025-04-04 16:32:52,913] Trial 125 finished with value: 0.5426262522510978 and parameters: {'learning_rate': 0.00039312687389094387, 'weight_decay': 0.003, 'warmup_steps': 6}. Best is trial 125 with value: 0.5426262522510978.


Trial 126 with params: {'learning_rate': 0.00034193890020308787, 'weight_decay': 0.001, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0061,3.259122,0.1922,0.179986,0.1922,0.154178
2,3.1114,2.657456,0.307,0.308469,0.307,0.287696
3,2.6031,2.251885,0.3979,0.406114,0.3979,0.384649
4,2.2306,2.064456,0.4403,0.441399,0.4403,0.42664
5,1.9397,1.933715,0.4738,0.47079,0.4738,0.457336
6,1.6768,1.856744,0.4884,0.488541,0.4884,0.477151
7,1.4421,1.759731,0.5116,0.511396,0.5116,0.503761
8,1.2277,1.734502,0.5218,0.53166,0.5218,0.520607
9,1.0401,1.711005,0.529,0.531683,0.529,0.525513
10,0.8875,1.725632,0.529,0.531175,0.529,0.525795


[I 2025-04-04 16:47:24,450] Trial 126 finished with value: 0.5257950515835987 and parameters: {'learning_rate': 0.00034193890020308787, 'weight_decay': 0.001, 'warmup_steps': 25}. Best is trial 125 with value: 0.5426262522510978.


Trial 127 with params: {'learning_rate': 0.00046557605429819175, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9887,3.317068,0.1767,0.166402,0.1767,0.140897
2,3.1767,2.708332,0.2913,0.296259,0.2913,0.27194
3,2.6587,2.306553,0.3794,0.387966,0.3794,0.363912
4,2.292,2.07357,0.4312,0.431957,0.4312,0.416597


[I 2025-04-04 16:53:08,224] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.00018031646953777362, 'weight_decay': 0.005, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1256,3.463351,0.1625,0.140942,0.1625,0.126923
2,3.3343,2.851854,0.2755,0.274228,0.2755,0.25207


[I 2025-04-04 16:56:10,035] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.00028418289953144506, 'weight_decay': 0.008, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0597,3.355624,0.167,0.151971,0.167,0.130761
2,3.1816,2.690197,0.3037,0.306223,0.3037,0.282976
3,2.6357,2.311144,0.3859,0.38603,0.3859,0.367238
4,2.2532,2.060366,0.4439,0.442507,0.4439,0.430062
5,1.9487,1.945842,0.4703,0.464154,0.4703,0.452854
6,1.6886,1.881343,0.4859,0.484973,0.4859,0.473454
7,1.4552,1.77538,0.5117,0.506424,0.5117,0.501536
8,1.239,1.755104,0.5211,0.527522,0.5211,0.517982


[I 2025-04-04 17:07:45,897] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0007858167355289101, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0211,3.467456,0.1466,0.120769,0.1466,0.110672
2,3.3191,2.926657,0.2482,0.263926,0.2482,0.226272
3,2.8168,2.453613,0.3491,0.356331,0.3491,0.330456
4,2.4688,2.247885,0.3923,0.394541,0.3923,0.375691


[I 2025-04-04 17:13:27,510] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.00027756210295507277, 'weight_decay': 0.004, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9982,3.274058,0.1879,0.178789,0.1879,0.153373
2,3.1192,2.636012,0.322,0.316159,0.322,0.300973
3,2.5717,2.246805,0.3963,0.397372,0.3963,0.382143
4,2.1894,2.020426,0.4539,0.456829,0.4539,0.441986
5,1.8937,1.905567,0.4845,0.482942,0.4845,0.469528
6,1.632,1.868653,0.495,0.500145,0.495,0.48628
7,1.3981,1.752687,0.5208,0.520553,0.5208,0.514062
8,1.1823,1.747815,0.5294,0.536532,0.5294,0.527547
9,0.9987,1.71868,0.5358,0.537606,0.5358,0.533239
10,0.8465,1.731515,0.5346,0.540103,0.5346,0.533106


[I 2025-04-04 17:28:08,933] Trial 131 finished with value: 0.5331062474719291 and parameters: {'learning_rate': 0.00027756210295507277, 'weight_decay': 0.004, 'warmup_steps': 11}. Best is trial 125 with value: 0.5426262522510978.


Trial 132 with params: {'learning_rate': 0.0008914626881432214, 'weight_decay': 0.002, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0898,3.577125,0.1323,0.124455,0.1323,0.096536
2,3.4803,3.066625,0.2191,0.207423,0.2191,0.189604
3,3.0282,2.664985,0.2964,0.300187,0.2964,0.272721
4,2.6704,2.407895,0.3621,0.357916,0.3621,0.343049


[I 2025-04-04 17:33:56,961] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0004820012098865387, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.984,3.288422,0.1824,0.161348,0.1824,0.147812
2,3.1407,2.660745,0.3058,0.313342,0.3058,0.286583
3,2.604,2.247606,0.3944,0.398862,0.3944,0.376505
4,2.2464,2.053028,0.4434,0.446353,0.4434,0.428989
5,1.9565,1.933964,0.4725,0.472257,0.4725,0.456004
6,1.7058,1.841851,0.4941,0.497731,0.4941,0.482374
7,1.4844,1.73527,0.5217,0.519844,0.5217,0.513515
8,1.2647,1.69604,0.5372,0.54607,0.5372,0.534629
9,1.0761,1.669808,0.5442,0.543903,0.5442,0.539622
10,0.9206,1.680619,0.5434,0.550491,0.5434,0.540891


[I 2025-04-04 17:49:23,029] Trial 133 finished with value: 0.5408913351359455 and parameters: {'learning_rate': 0.0004820012098865387, 'weight_decay': 0.0, 'warmup_steps': 14}. Best is trial 125 with value: 0.5426262522510978.


Trial 134 with params: {'learning_rate': 0.00046656475517759444, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.985,3.371542,0.1597,0.14403,0.1597,0.126248
2,3.2012,2.752133,0.2828,0.285205,0.2828,0.264058
3,2.6889,2.34979,0.3664,0.364229,0.3664,0.34871
4,2.3267,2.107373,0.4248,0.435612,0.4248,0.412891
5,2.0373,1.9969,0.4569,0.454088,0.4569,0.438941
6,1.7798,1.902442,0.4748,0.475648,0.4748,0.462315
7,1.5518,1.763591,0.5123,0.5076,0.5123,0.502747
8,1.3355,1.731983,0.5277,0.533043,0.5277,0.523871
9,1.1486,1.685417,0.5339,0.53127,0.5339,0.527599
10,0.9917,1.693445,0.5315,0.532772,0.5315,0.527633


[I 2025-04-04 18:03:43,134] Trial 134 finished with value: 0.5276329528148469 and parameters: {'learning_rate': 0.00046656475517759444, 'weight_decay': 0.002, 'warmup_steps': 6}. Best is trial 125 with value: 0.5426262522510978.


Trial 135 with params: {'learning_rate': 0.00031243511242558844, 'weight_decay': 0.003, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0283,3.355783,0.1724,0.189321,0.1724,0.138434
2,3.1531,2.667112,0.3099,0.307772,0.3099,0.290846
3,2.6267,2.285077,0.3881,0.389204,0.3881,0.372991
4,2.2432,2.03956,0.4484,0.445773,0.4484,0.433844
5,1.9345,1.895847,0.4796,0.473222,0.4796,0.462652
6,1.6751,1.838565,0.4964,0.497174,0.4964,0.486107
7,1.444,1.749205,0.5196,0.518066,0.5196,0.5116
8,1.2311,1.72578,0.5295,0.537135,0.5295,0.526607


[I 2025-04-04 18:15:11,395] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0005729623924926906, 'weight_decay': 0.0, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9969,3.358507,0.1688,0.157154,0.1688,0.133843
2,3.1911,2.731275,0.2937,0.299544,0.2937,0.270758
3,2.6841,2.345074,0.3714,0.374224,0.3714,0.354737
4,2.3534,2.139125,0.4178,0.421205,0.4178,0.403501
5,2.0783,2.020693,0.447,0.442321,0.447,0.428881
6,1.8299,1.95436,0.4708,0.478359,0.4708,0.459636
7,1.6135,1.810185,0.5003,0.498294,0.5003,0.491769
8,1.3997,1.771938,0.5134,0.521097,0.5134,0.511099


[I 2025-04-04 18:26:44,678] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.00015888335640566127, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1268,3.441684,0.1654,0.148735,0.1654,0.130397
2,3.3573,2.873633,0.27,0.264938,0.27,0.242623


[I 2025-04-04 18:29:43,188] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.001133129708970219, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1327,3.802608,0.0997,0.066309,0.0997,0.063751
2,3.597,3.213852,0.1961,0.198728,0.1961,0.169443


[I 2025-04-04 18:32:42,180] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0006417856778425359, 'weight_decay': 0.002, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0627,3.5265,0.1403,0.126512,0.1403,0.104556
2,3.3195,2.858081,0.2625,0.259568,0.2625,0.237337
3,2.8186,2.484358,0.3368,0.342833,0.3368,0.321581


[I 2025-04-04 18:44:37,573] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.00044641064591075577, 'weight_decay': 0.001, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.008,3.325954,0.1771,0.155053,0.1771,0.143904
2,3.1614,2.663838,0.3061,0.303856,0.3061,0.286864
3,2.6337,2.29412,0.3865,0.39433,0.3865,0.370529
4,2.2592,2.05153,0.4398,0.441026,0.4398,0.42468
5,1.9621,1.92105,0.4731,0.46641,0.4731,0.455204
6,1.7027,1.88389,0.482,0.491668,0.482,0.47271
7,1.4773,1.724711,0.5221,0.521648,0.5221,0.514768
8,1.2525,1.708569,0.5311,0.538103,0.5311,0.528423
9,1.0641,1.679127,0.538,0.541192,0.538,0.534325
10,0.9068,1.690499,0.5374,0.541017,0.5374,0.534062


[I 2025-04-04 18:59:05,612] Trial 141 finished with value: 0.5340620071631943 and parameters: {'learning_rate': 0.00044641064591075577, 'weight_decay': 0.001, 'warmup_steps': 17}. Best is trial 125 with value: 0.5426262522510978.


Trial 142 with params: {'learning_rate': 0.0007260539234319809, 'weight_decay': 0.001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0752,3.572835,0.1231,0.117244,0.1231,0.088243
2,3.3848,2.903783,0.2508,0.254007,0.2508,0.226142


[I 2025-04-04 19:02:03,247] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0002739954644945178, 'weight_decay': 0.0, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0324,3.363268,0.1719,0.171499,0.1719,0.137244
2,3.1995,2.721216,0.2966,0.297038,0.2966,0.277178
3,2.6562,2.314771,0.3827,0.382415,0.3827,0.366792
4,2.2814,2.117964,0.4286,0.43376,0.4286,0.415549
5,1.9833,1.975387,0.4633,0.463466,0.4633,0.447151
6,1.7168,1.91139,0.484,0.482618,0.484,0.471458
7,1.4799,1.806246,0.5093,0.508444,0.5093,0.500874
8,1.2568,1.787447,0.5153,0.52229,0.5153,0.512812


[I 2025-04-04 19:13:29,678] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.0003612235396496027, 'weight_decay': 0.001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.029,3.393,0.1599,0.153837,0.1599,0.124344
2,3.2041,2.722409,0.2947,0.294734,0.2947,0.274855
3,2.6698,2.300364,0.3834,0.394763,0.3834,0.369269
4,2.2938,2.083744,0.4354,0.442049,0.4354,0.422239


[I 2025-04-04 19:19:18,804] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.00040788796347888587, 'weight_decay': 0.01, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0324,3.351868,0.1678,0.145844,0.1678,0.134047
2,3.1911,2.732506,0.2844,0.290117,0.2844,0.264913
3,2.6705,2.342363,0.3701,0.377923,0.3701,0.354937
4,2.3029,2.108405,0.4294,0.427022,0.4294,0.415173
5,2.01,1.940041,0.4671,0.467633,0.4671,0.453245


In [None]:
print(best_base_random)

In [14]:
base.reset_seed()

## Prohledávání s destilací náhodně inicializovaného modelu

In [15]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-KD_hp-search", logging_dir=f"~/logs/{DATASET}/random-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [16]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [17]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [18]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(100)
)
  

Nastavení prohledávání.

In [19]:
best_distill_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-04 19:44:27,010] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6986,3.187188,0.1893,0.175951,0.1893,0.145527
2,3.0097,2.654963,0.3171,0.326876,0.3171,0.294212


[I 2025-04-04 19:47:09,967] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9042,3.49083,0.1306,0.10615,0.1306,0.088985
2,3.3639,3.080266,0.2242,0.237716,0.2242,0.188581
3,3.0226,2.774375,0.2913,0.302354,0.2913,0.260432
4,2.7536,2.564315,0.3526,0.355461,0.3526,0.327798


[I 2025-04-04 19:52:49,115] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0221,3.648459,0.1021,0.062961,0.1021,0.059186
2,3.603,3.363702,0.1643,0.148962,0.1643,0.124334


[I 2025-04-04 19:55:38,942] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8689,3.456455,0.1371,0.128426,0.1371,0.095492
2,3.3251,3.036491,0.2342,0.24307,0.2342,0.196035
3,2.9641,2.695158,0.3109,0.320821,0.3109,0.280239
4,2.6608,2.482171,0.3622,0.364388,0.3622,0.338493


[I 2025-04-04 20:01:14,658] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6818,3.289879,0.1565,0.136687,0.1565,0.110593
2,3.1192,2.854309,0.2627,0.267306,0.2627,0.232434


[I 2025-04-04 20:03:59,023] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.815,3.644058,0.0887,0.055385,0.0887,0.049172
2,3.4401,3.226924,0.1788,0.186665,0.1788,0.14376
3,3.0725,2.846962,0.2738,0.274745,0.2738,0.245417
4,2.7787,2.590491,0.3439,0.346328,0.3439,0.319104


[I 2025-04-04 20:09:32,642] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6689,3.269087,0.1573,0.142588,0.1573,0.110742
2,3.1011,2.823452,0.2731,0.30171,0.2731,0.243892
3,2.7273,2.479412,0.351,0.364507,0.351,0.327795
4,2.4383,2.255994,0.4162,0.419635,0.4162,0.396823


[I 2025-04-04 20:15:09,433] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8,3.628999,0.0891,0.058259,0.0891,0.049262
2,3.4305,3.213036,0.1846,0.193284,0.1846,0.15163
3,3.097,2.905611,0.2638,0.265659,0.2638,0.236118
4,2.8316,2.659012,0.3364,0.331294,0.3364,0.30577


[I 2025-04-04 20:20:46,211] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.926,3.555205,0.1179,0.088072,0.1179,0.072223
2,3.4456,3.174527,0.215,0.228303,0.215,0.176285
3,3.1284,2.876288,0.2717,0.268692,0.2717,0.233599
4,2.8757,2.677531,0.3231,0.320364,0.3231,0.295138


[I 2025-04-04 20:26:20,764] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7105,3.340035,0.1486,0.124588,0.1486,0.100555
2,3.1692,2.909147,0.2404,0.243182,0.2404,0.210895


[I 2025-04-04 20:29:20,641] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0004285183260552018, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6392,3.176793,0.1858,0.167983,0.1858,0.142995
2,2.9801,2.6588,0.3107,0.328286,0.3107,0.286153
3,2.5387,2.256469,0.4092,0.414802,0.4092,0.391054
4,2.2264,2.056176,0.463,0.46807,0.463,0.446671
5,1.9778,1.915041,0.4988,0.492731,0.4988,0.479933
6,1.7687,1.851098,0.5112,0.518484,0.5112,0.500327
7,1.5844,1.713759,0.5418,0.544831,0.5418,0.531785
8,1.4146,1.666553,0.5532,0.560905,0.5532,0.549999


[I 2025-04-04 20:40:30,721] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0014321301966915287, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.766,3.568148,0.1005,0.06958,0.1005,0.056788
2,3.3335,3.087102,0.2065,0.216053,0.2065,0.171464


[I 2025-04-04 20:43:15,718] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 9.686152689152715e-05, 'weight_decay': 0.002, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8923,3.51066,0.126,0.104043,0.126,0.083244
2,3.3985,3.125738,0.2129,0.219731,0.2129,0.174349
3,3.0729,2.808689,0.2842,0.295839,0.2842,0.252837
4,2.7999,2.610906,0.3365,0.327606,0.3365,0.309775


[I 2025-04-04 20:48:51,386] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0004052254440503788, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6704,3.20796,0.1795,0.181242,0.1795,0.137844
2,3.0074,2.718403,0.2993,0.311024,0.2993,0.275088
3,2.5834,2.320542,0.3902,0.398314,0.3902,0.369696
4,2.2677,2.086185,0.4589,0.465707,0.4589,0.442418


[I 2025-04-04 20:54:32,073] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0002967370539368567, 'weight_decay': 0.004, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6743,3.181607,0.189,0.20454,0.189,0.146793
2,2.9877,2.665128,0.3175,0.324541,0.3175,0.29219
3,2.5573,2.287386,0.4124,0.419297,0.4124,0.393642
4,2.2342,2.063921,0.4633,0.471167,0.4633,0.449831
5,1.9772,1.937808,0.4841,0.483491,0.4841,0.465573
6,1.7635,1.876861,0.5029,0.516821,0.5029,0.492532
7,1.581,1.751179,0.5367,0.536606,0.5367,0.526174
8,1.4134,1.707005,0.5454,0.552699,0.5454,0.542003


[I 2025-04-04 21:05:51,247] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0009349007798192055, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7478,3.441395,0.1289,0.111891,0.1289,0.082576
2,3.2313,2.9943,0.2281,0.243957,0.2281,0.19492


[I 2025-04-04 21:08:38,312] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00022429163078221243, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7086,3.236249,0.1807,0.173082,0.1807,0.13959
2,3.0471,2.719057,0.302,0.315924,0.302,0.277468
3,2.6073,2.306367,0.401,0.399675,0.401,0.379278
4,2.2853,2.132134,0.4551,0.462777,0.4551,0.441726
5,2.0317,1.974902,0.4832,0.484817,0.4832,0.465848
6,1.8126,1.935207,0.4926,0.500967,0.4926,0.482245
7,1.631,1.836168,0.5134,0.514037,0.5134,0.502555
8,1.4734,1.794652,0.5261,0.532541,0.5261,0.520655


[I 2025-04-04 21:19:53,190] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0006412609358779237, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6594,3.243819,0.17,0.156465,0.17,0.12799
2,3.0338,2.749288,0.2864,0.306475,0.2864,0.260667
3,2.6312,2.36346,0.3736,0.388261,0.3736,0.351625
4,2.3402,2.157254,0.4354,0.442008,0.4354,0.418796


[I 2025-04-04 21:25:23,496] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 5.957853392927128e-05, 'weight_decay': 0.004, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0101,3.652111,0.1015,0.086553,0.1015,0.061459
2,3.5784,3.349592,0.1696,0.144123,0.1696,0.126684


[I 2025-04-04 21:28:11,561] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6379,3.195593,0.1812,0.189955,0.1812,0.134764
2,3.0017,2.671977,0.3062,0.326413,0.3062,0.283438
3,2.5767,2.304209,0.395,0.407782,0.395,0.375813
4,2.2629,2.072969,0.4659,0.466231,0.4659,0.448886
5,2.0132,1.95851,0.4861,0.482538,0.4861,0.467935
6,1.8031,1.862819,0.5075,0.516597,0.5075,0.496819
7,1.6243,1.730853,0.5353,0.534742,0.5353,0.524157
8,1.4574,1.682539,0.555,0.564831,0.555,0.551799
9,1.3173,1.648432,0.5643,0.570081,0.5643,0.560701
10,1.2049,1.640634,0.5609,0.569028,0.5609,0.558437


[I 2025-04-04 21:42:28,675] Trial 19 finished with value: 0.5584374117068501 and parameters: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 19 with value: 0.5584374117068501.


Trial 20 with params: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6598,3.188233,0.1865,0.182314,0.1865,0.140555
2,2.9777,2.690216,0.3069,0.326079,0.3069,0.282583
3,2.5505,2.273616,0.4068,0.419434,0.4068,0.387723
4,2.2265,2.056705,0.4636,0.472068,0.4636,0.450298
5,1.9786,1.909118,0.4953,0.492474,0.4953,0.477091
6,1.7722,1.861437,0.511,0.524505,0.511,0.500566
7,1.5894,1.719888,0.5402,0.543477,0.5402,0.5307
8,1.4204,1.680841,0.5542,0.563661,0.5542,0.551156


[I 2025-04-04 21:53:35,382] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00017048302356543796, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.791,3.352905,0.1614,0.137169,0.1614,0.115465
2,3.1841,2.872651,0.2635,0.26965,0.2635,0.228609
3,2.7677,2.483264,0.3592,0.356856,0.3592,0.334915
4,2.4509,2.269128,0.413,0.409355,0.413,0.393212
5,2.1984,2.120855,0.4453,0.437589,0.4453,0.42317
6,1.9833,2.050534,0.4616,0.46082,0.4616,0.44706
7,1.8058,1.942837,0.4845,0.480229,0.4845,0.469758
8,1.6537,1.908313,0.4953,0.503813,0.4953,0.489883


[I 2025-04-04 22:04:41,496] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.000588906057047636, 'weight_decay': 0.004, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6441,3.215425,0.1788,0.175847,0.1788,0.137029
2,3.0085,2.704832,0.3,0.316306,0.3,0.276437
3,2.5982,2.339905,0.3873,0.389792,0.3873,0.364792
4,2.3019,2.13449,0.4565,0.457823,0.4565,0.438893


[I 2025-04-04 22:10:15,543] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0006562519709440268, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7164,3.339191,0.1491,0.133947,0.1491,0.100458
2,3.1278,2.883832,0.26,0.266627,0.26,0.232213
3,2.7248,2.473876,0.3502,0.35362,0.3502,0.322327
4,2.428,2.221541,0.421,0.427148,0.421,0.401774
5,2.1842,2.068166,0.4572,0.454153,0.4572,0.436996
6,1.9823,1.988249,0.4781,0.487813,0.4781,0.463246
7,1.8048,1.857473,0.4984,0.499038,0.4984,0.485442
8,1.6418,1.791559,0.5304,0.538712,0.5304,0.526026


[I 2025-04-04 22:21:32,985] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00039476246666771337, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.673,3.219836,0.1775,0.174427,0.1775,0.132923
2,3.0102,2.699563,0.3039,0.321713,0.3039,0.279769
3,2.5552,2.277818,0.4028,0.416183,0.4028,0.384791
4,2.2341,2.053904,0.4688,0.47506,0.4688,0.452638
5,1.9746,1.914471,0.4912,0.493408,0.4912,0.474136
6,1.7669,1.860007,0.5092,0.525188,0.5092,0.500266
7,1.5829,1.730558,0.5383,0.541242,0.5383,0.529129
8,1.4179,1.68977,0.5525,0.562383,0.5525,0.549419


[I 2025-04-04 22:32:48,258] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0001351015408651554, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8185,3.435999,0.1396,0.123257,0.1396,0.096991
2,3.2761,2.971796,0.2503,0.257908,0.2503,0.214561
3,2.899,2.627295,0.3242,0.330496,0.3242,0.29553
4,2.5926,2.423244,0.3783,0.385722,0.3783,0.359642


[I 2025-04-04 22:38:17,476] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.0004631033932104086, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6627,3.218066,0.1794,0.167035,0.1794,0.137399
2,3.0101,2.701276,0.3042,0.32227,0.3042,0.279771
3,2.5825,2.318686,0.3878,0.398001,0.3878,0.366046
4,2.266,2.106779,0.4513,0.463029,0.4513,0.434575


[I 2025-04-04 22:43:53,567] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0015986832109297702, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7825,3.589179,0.0995,0.065217,0.0995,0.058231
2,3.3831,3.16191,0.1902,0.205626,0.1902,0.155568
3,3.0379,2.820801,0.2733,0.271868,0.2733,0.242774
4,2.7536,2.586178,0.3352,0.328388,0.3352,0.311052


[I 2025-04-04 22:49:31,407] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0003736700317013303, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6392,3.187068,0.1927,0.204229,0.1927,0.154436
2,2.9777,2.669151,0.3117,0.338948,0.3117,0.286783
3,2.5567,2.303779,0.3942,0.415161,0.3942,0.375947
4,2.2321,2.087918,0.452,0.455667,0.452,0.435044
5,1.9797,1.929567,0.4902,0.487073,0.4902,0.472253
6,1.7656,1.865147,0.505,0.51881,0.505,0.495694
7,1.58,1.73719,0.5346,0.534568,0.5346,0.524309
8,1.4126,1.711287,0.5468,0.557916,0.5468,0.543359
9,1.2717,1.663058,0.5601,0.564255,0.5601,0.556043
10,1.1574,1.665271,0.5562,0.562177,0.5562,0.55349


[I 2025-04-04 23:03:43,426] Trial 28 finished with value: 0.5534897104031348 and parameters: {'learning_rate': 0.0003736700317013303, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 19 with value: 0.5584374117068501.


Trial 29 with params: {'learning_rate': 0.00020214850400684017, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.743,3.313068,0.1626,0.153938,0.1626,0.122975
2,3.125,2.779055,0.2928,0.296407,0.2928,0.263801
3,2.6689,2.377866,0.3847,0.393741,0.3847,0.363072
4,2.3345,2.157128,0.4392,0.447744,0.4392,0.423445


[I 2025-04-04 23:09:19,803] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00017994107990460778, 'weight_decay': 0.0, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7617,3.316937,0.1628,0.150571,0.1628,0.120684
2,3.1392,2.799154,0.2815,0.286613,0.2815,0.25174


[I 2025-04-04 23:12:00,981] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0003578317317871822, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6653,3.210083,0.1806,0.161285,0.1806,0.136857
2,2.9965,2.685894,0.3128,0.325272,0.3128,0.284809
3,2.5456,2.266813,0.4075,0.421636,0.4075,0.389779
4,2.2246,2.083566,0.4546,0.468002,0.4546,0.441292
5,1.9751,1.920077,0.4946,0.496588,0.4946,0.476144
6,1.7615,1.856918,0.5077,0.522404,0.5077,0.498594
7,1.5812,1.740122,0.5407,0.542539,0.5407,0.531034
8,1.4146,1.694846,0.5535,0.562398,0.5535,0.550025
9,1.2743,1.662302,0.5594,0.564826,0.5594,0.554919
10,1.1675,1.65765,0.5579,0.565998,0.5579,0.555296


[I 2025-04-04 23:26:01,462] Trial 31 finished with value: 0.5552955212833178 and parameters: {'learning_rate': 0.0003578317317871822, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 19 with value: 0.5584374117068501.


Trial 32 with params: {'learning_rate': 0.0006430043333997293, 'weight_decay': 0.007, 'warmup_steps': 26, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7046,3.298688,0.1633,0.153173,0.1633,0.117171
2,3.0955,2.809112,0.275,0.300089,0.275,0.248488


[I 2025-04-04 23:28:41,846] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.00021984229719808194, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7216,3.270133,0.1716,0.162807,0.1716,0.129283
2,3.0883,2.7452,0.2946,0.300296,0.2946,0.262409
3,2.6489,2.363139,0.3865,0.39037,0.3865,0.364926
4,2.3232,2.164493,0.4352,0.433679,0.4352,0.4173
5,2.0651,2.000757,0.4725,0.463498,0.4725,0.453096
6,1.8511,1.950928,0.4815,0.487731,0.4815,0.469147
7,1.67,1.83035,0.5117,0.510086,0.5117,0.498909
8,1.5093,1.798154,0.5239,0.525347,0.5239,0.516759


[I 2025-04-04 23:39:39,837] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0002580125183372428, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6942,3.263328,0.1701,0.159706,0.1701,0.127934
2,3.0535,2.729617,0.303,0.319065,0.303,0.276286
3,2.6176,2.328306,0.3947,0.40176,0.3947,0.371918
4,2.2819,2.096764,0.4587,0.463485,0.4587,0.444639
5,2.0247,1.971477,0.4848,0.476017,0.4848,0.464525
6,1.8038,1.922289,0.4972,0.509849,0.4972,0.487496
7,1.6222,1.81416,0.5185,0.522269,0.5185,0.508777
8,1.4594,1.76809,0.5379,0.546123,0.5379,0.534168


[I 2025-04-04 23:50:57,607] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.00011040218858696793, 'weight_decay': 0.002, 'warmup_steps': 11, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8828,3.495071,0.1293,0.098948,0.1293,0.088232
2,3.3563,3.072219,0.2309,0.239358,0.2309,0.195065
3,3.007,2.757003,0.2941,0.302441,0.2941,0.261503
4,2.7353,2.545244,0.3503,0.349979,0.3503,0.325823


[I 2025-04-04 23:56:37,353] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0018560066966056843, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.832,3.680223,0.0824,0.047613,0.0824,0.037301
2,3.4703,3.311198,0.1617,0.188881,0.1617,0.12854


[I 2025-04-04 23:59:24,230] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 5.431299921217806e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0281,3.677797,0.0975,0.079639,0.0975,0.058116
2,3.618,3.394786,0.1509,0.15115,0.1509,0.111251
3,3.3958,3.190105,0.2019,0.171394,0.2019,0.160815
4,3.2149,3.035871,0.2456,0.236851,0.2456,0.209885


[I 2025-04-05 00:04:57,921] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0006138466232345877, 'weight_decay': 0.002, 'warmup_steps': 5, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6791,3.258291,0.1774,0.179712,0.1774,0.130317
2,3.0699,2.827152,0.2655,0.287387,0.2655,0.241151
3,2.6772,2.418768,0.3669,0.373699,0.3669,0.343888
4,2.3848,2.221248,0.4215,0.41693,0.4215,0.402061
5,2.1477,2.057066,0.4561,0.451045,0.4561,0.434488
6,1.9454,1.974506,0.484,0.489578,0.484,0.472202
7,1.7769,1.832499,0.5096,0.515305,0.5096,0.500087
8,1.6149,1.776223,0.5309,0.537552,0.5309,0.525452


[I 2025-04-05 00:16:29,416] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0006210110853490856, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6488,3.214099,0.1708,0.170957,0.1708,0.127983
2,3.009,2.734446,0.29,0.302367,0.29,0.268656
3,2.5961,2.326642,0.3896,0.405586,0.3896,0.371301
4,2.2974,2.103357,0.4531,0.458487,0.4531,0.436442


[I 2025-04-05 00:22:05,548] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0004616825750244641, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6466,3.177098,0.1878,0.219227,0.1878,0.144843
2,2.9903,2.674198,0.3096,0.321682,0.3096,0.287075
3,2.5651,2.287979,0.3987,0.411469,0.3987,0.377686
4,2.2548,2.061004,0.4665,0.471024,0.4665,0.449649
5,2.0049,1.925352,0.497,0.501015,0.497,0.478103
6,1.7943,1.837389,0.5161,0.527632,0.5161,0.507148
7,1.6118,1.71967,0.5442,0.547613,0.5442,0.535643
8,1.4423,1.669271,0.5564,0.567467,0.5564,0.554286
9,1.303,1.639581,0.5632,0.568893,0.5632,0.559047
10,1.1896,1.628868,0.5628,0.568884,0.5628,0.560261


[I 2025-04-05 00:36:09,366] Trial 40 finished with value: 0.5602608806027691 and parameters: {'learning_rate': 0.0004616825750244641, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 40 with value: 0.5602608806027691.


Trial 41 with params: {'learning_rate': 0.00301755257183799, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8224,3.659571,0.0844,0.051981,0.0844,0.046177
2,3.4839,3.315686,0.1532,0.166024,0.1532,0.117835


[I 2025-04-05 00:38:55,962] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0008834916214220498, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7077,3.362584,0.1431,0.122142,0.1431,0.09887
2,3.1453,2.885969,0.2528,0.269699,0.2528,0.223742
3,2.7585,2.496333,0.3522,0.350977,0.3522,0.325382
4,2.4749,2.301935,0.4008,0.40655,0.4008,0.380923


[I 2025-04-05 00:44:30,124] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.001667724683323363, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.779,3.553493,0.1028,0.057098,0.1028,0.056995
2,3.3771,3.155145,0.19,0.204819,0.19,0.151198


[I 2025-04-05 00:47:17,887] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0006373600341339449, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7042,3.327674,0.1523,0.156092,0.1523,0.10754
2,3.1114,2.805512,0.2838,0.307747,0.2838,0.254955
3,2.7044,2.449663,0.365,0.379702,0.365,0.341492
4,2.4096,2.228787,0.4197,0.429443,0.4197,0.40063
5,2.1627,2.049783,0.4642,0.462126,0.4642,0.443781
6,1.9552,1.94895,0.486,0.492742,0.486,0.472061
7,1.7777,1.826394,0.5144,0.517296,0.5144,0.504842
8,1.6108,1.76427,0.5378,0.546633,0.5378,0.532418


[I 2025-04-05 00:58:30,134] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.004229168606699789, 'weight_decay': 0.009000000000000001, 'warmup_steps': 24, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8782,3.803516,0.063,0.031204,0.063,0.02787
2,3.6169,3.487764,0.1236,0.120592,0.1236,0.083793


[I 2025-04-05 01:01:18,697] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00036140546866994847, 'weight_decay': 0.0, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6664,3.190101,0.1878,0.188408,0.1878,0.142291
2,2.983,2.651956,0.3085,0.310358,0.3085,0.283011
3,2.5533,2.28713,0.4004,0.407711,0.4004,0.38046
4,2.2375,2.072973,0.4587,0.46348,0.4587,0.444223
5,1.9941,1.92,0.4916,0.48907,0.4916,0.473298
6,1.7792,1.870483,0.5053,0.516072,0.5053,0.494777
7,1.6006,1.754384,0.5287,0.536069,0.5287,0.519792
8,1.4301,1.715979,0.5428,0.548647,0.5428,0.53844


[I 2025-04-05 01:12:34,326] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0025789104733638904, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8644,3.763281,0.076,0.043896,0.076,0.039613
2,3.5803,3.454004,0.1315,0.125448,0.1315,0.09412


[I 2025-04-05 01:15:22,674] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0006787130506768617, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6439,3.234617,0.1795,0.177258,0.1795,0.140225
2,3.0515,2.771627,0.281,0.299427,0.281,0.256242
3,2.647,2.368762,0.3871,0.401708,0.3871,0.363761
4,2.343,2.17666,0.4358,0.439093,0.4358,0.416075


[I 2025-04-05 01:21:00,034] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0012208662990767988, 'weight_decay': 0.006, 'warmup_steps': 20, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7571,3.469914,0.1248,0.093927,0.1248,0.081417
2,3.2818,3.06221,0.2067,0.215465,0.2067,0.172807
3,2.9316,2.696672,0.3007,0.309894,0.3007,0.268642
4,2.6493,2.471749,0.3646,0.365709,0.3646,0.339207


[I 2025-04-05 01:26:37,958] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.00017389944002201715, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7963,3.400616,0.1467,0.132223,0.1467,0.102408
2,3.2035,2.873269,0.2715,0.274041,0.2715,0.240976
3,2.7803,2.489291,0.3545,0.359761,0.3545,0.330056
4,2.4631,2.290108,0.4093,0.418607,0.4093,0.392381


[I 2025-04-05 01:32:14,211] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0004487470733258785, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6597,3.193568,0.1799,0.180243,0.1799,0.138004
2,2.999,2.691013,0.302,0.330787,0.302,0.278891
3,2.5631,2.285218,0.3946,0.400848,0.3946,0.37361
4,2.256,2.079354,0.4573,0.466469,0.4573,0.440512
5,2.0113,1.944021,0.488,0.487036,0.488,0.470007
6,1.7975,1.877904,0.5041,0.520595,0.5041,0.495397
7,1.6166,1.731245,0.5397,0.540504,0.5397,0.52982
8,1.4466,1.678341,0.5558,0.56283,0.5558,0.552411
9,1.304,1.652063,0.5656,0.56956,0.5656,0.560472
10,1.1885,1.646613,0.5651,0.572455,0.5651,0.562718


[I 2025-04-05 01:46:32,124] Trial 51 finished with value: 0.5627182943716651 and parameters: {'learning_rate': 0.0004487470733258785, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}. Best is trial 51 with value: 0.5627182943716651.


Trial 52 with params: {'learning_rate': 0.00018046648638865416, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7691,3.354076,0.1497,0.145418,0.1497,0.106588
2,3.1514,2.838634,0.2786,0.280313,0.2786,0.247667
3,2.7364,2.449361,0.3713,0.38139,0.3713,0.348263
4,2.4257,2.261606,0.4218,0.426416,0.4218,0.404721
5,2.1739,2.101079,0.451,0.448953,0.451,0.430411
6,1.9622,2.019009,0.4724,0.477813,0.4724,0.46062
7,1.7825,1.928337,0.4895,0.487336,0.4895,0.476074
8,1.6321,1.885322,0.5042,0.511936,0.5042,0.498138


[I 2025-04-05 01:57:45,594] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0005508154239649416, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6681,3.23858,0.1777,0.160775,0.1777,0.134958
2,3.0341,2.713345,0.2905,0.313464,0.2905,0.266056
3,2.596,2.313953,0.3944,0.404994,0.3944,0.375639
4,2.2852,2.10393,0.4557,0.461186,0.4557,0.440608
5,2.0341,1.942199,0.4862,0.49033,0.4862,0.467316
6,1.8245,1.89313,0.4976,0.515998,0.4976,0.486205
7,1.6498,1.742187,0.5366,0.538977,0.5366,0.526137
8,1.4828,1.688712,0.5555,0.564964,0.5555,0.552552
9,1.3435,1.647985,0.5632,0.564673,0.5632,0.557179
10,1.2327,1.640317,0.5616,0.564912,0.5616,0.557291


[I 2025-04-05 02:11:50,849] Trial 53 finished with value: 0.557290560834215 and parameters: {'learning_rate': 0.0005508154239649416, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 51 with value: 0.5627182943716651.


Trial 54 with params: {'learning_rate': 0.000508087154464116, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.646,3.230996,0.1714,0.150313,0.1714,0.126463
2,3.0295,2.720677,0.299,0.310438,0.299,0.273208
3,2.6122,2.351772,0.3897,0.402028,0.3897,0.3682
4,2.3039,2.115177,0.4495,0.454418,0.4495,0.432697
5,2.0613,1.969434,0.4854,0.485933,0.4854,0.466781
6,1.8574,1.899078,0.4929,0.500651,0.4929,0.481027
7,1.6785,1.756661,0.5325,0.534989,0.5325,0.523741
8,1.5155,1.704427,0.5446,0.550217,0.5446,0.539609
9,1.3732,1.676374,0.5541,0.562245,0.5541,0.550548
10,1.2596,1.664262,0.5558,0.561135,0.5558,0.55208


[I 2025-04-05 02:25:54,560] Trial 54 finished with value: 0.5520802957177685 and parameters: {'learning_rate': 0.000508087154464116, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 5.0}. Best is trial 51 with value: 0.5627182943716651.


Trial 55 with params: {'learning_rate': 0.004251166826739927, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8818,3.771796,0.0625,0.038921,0.0625,0.028328
2,3.6221,3.481202,0.1271,0.113906,0.1271,0.087908


[I 2025-04-05 02:28:37,796] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.001008729050504147, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7153,3.420909,0.1294,0.106983,0.1294,0.086142
2,3.2256,2.976877,0.2344,0.258502,0.2344,0.202593
3,2.8528,2.624506,0.312,0.320387,0.312,0.280583
4,2.5634,2.382121,0.3912,0.39449,0.3912,0.367937


[I 2025-04-05 02:34:21,159] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0004343999446451733, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6579,3.167352,0.1941,0.186111,0.1941,0.151421
2,2.9829,2.648338,0.3136,0.318464,0.3136,0.28678
3,2.5546,2.286558,0.4028,0.422546,0.4028,0.385563
4,2.2353,2.075066,0.4665,0.470002,0.4665,0.451309
5,1.99,1.926659,0.4914,0.49018,0.4914,0.473485
6,1.7769,1.848496,0.5074,0.518848,0.5074,0.498067
7,1.598,1.734981,0.5385,0.545923,0.5385,0.529095
8,1.43,1.68257,0.5594,0.56775,0.5594,0.55639
9,1.2915,1.642083,0.5655,0.569934,0.5655,0.561711
10,1.1779,1.647986,0.5595,0.565377,0.5595,0.556411


[I 2025-04-05 02:48:28,769] Trial 57 finished with value: 0.5564106150960106 and parameters: {'learning_rate': 0.0004343999446451733, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 1.0, 'temperature': 6.5}. Best is trial 51 with value: 0.5627182943716651.


Trial 58 with params: {'learning_rate': 0.00017172539312763194, 'weight_decay': 0.006, 'warmup_steps': 5, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7905,3.366201,0.1553,0.148296,0.1553,0.112023
2,3.1824,2.863246,0.2723,0.277542,0.2723,0.242735


[I 2025-04-05 02:51:18,721] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0008538182645939727, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7187,3.391782,0.1303,0.105648,0.1303,0.086923
2,3.1981,2.910687,0.2512,0.263857,0.2512,0.222141
3,2.8057,2.580228,0.3299,0.343571,0.3299,0.3042
4,2.5132,2.308976,0.4059,0.400726,0.4059,0.383337


[I 2025-04-05 02:56:54,748] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00046762991988506683, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6314,3.155241,0.1851,0.181033,0.1851,0.142761
2,2.9593,2.658316,0.3183,0.328583,0.3183,0.294379
3,2.5344,2.265567,0.4061,0.417947,0.4061,0.388217
4,2.227,2.049545,0.4661,0.470528,0.4661,0.451934
5,1.9822,1.91933,0.4926,0.497186,0.4926,0.476045
6,1.7786,1.858797,0.5063,0.517089,0.5063,0.495739
7,1.5996,1.725968,0.5402,0.541524,0.5402,0.528896
8,1.4287,1.666868,0.5629,0.573166,0.5629,0.559908
9,1.286,1.634158,0.5688,0.574323,0.5688,0.56501
10,1.1712,1.631455,0.5655,0.569914,0.5655,0.561327


[I 2025-04-05 03:11:00,318] Trial 60 finished with value: 0.5613274942713273 and parameters: {'learning_rate': 0.00046762991988506683, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 51 with value: 0.5627182943716651.


Trial 61 with params: {'learning_rate': 0.00039123257945065004, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6729,3.204436,0.1814,0.161118,0.1814,0.139421
2,2.9963,2.658361,0.3083,0.31656,0.3083,0.283382
3,2.5494,2.288704,0.3994,0.41282,0.3994,0.381331
4,2.2369,2.057359,0.4654,0.473126,0.4654,0.451322
5,1.985,1.90614,0.5001,0.498914,0.5001,0.480777
6,1.778,1.836257,0.5154,0.524732,0.5154,0.50623
7,1.5978,1.731872,0.5398,0.539879,0.5398,0.527981
8,1.4314,1.670595,0.5612,0.56589,0.5612,0.556636
9,1.287,1.645591,0.5658,0.569392,0.5658,0.561786
10,1.1767,1.65284,0.562,0.567504,0.562,0.559078


[I 2025-04-05 03:25:06,076] Trial 61 finished with value: 0.5590776924168456 and parameters: {'learning_rate': 0.00039123257945065004, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 2.5}. Best is trial 51 with value: 0.5627182943716651.


Trial 62 with params: {'learning_rate': 0.0004135003791426372, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6342,3.179885,0.1838,0.197336,0.1838,0.144238
2,2.9786,2.687003,0.3058,0.320512,0.3058,0.281078
3,2.5555,2.296377,0.4017,0.415737,0.4017,0.381964
4,2.2518,2.08801,0.4535,0.453227,0.4535,0.437381


[I 2025-04-05 03:30:49,568] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0005569167278417476, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6391,3.205161,0.1843,0.186125,0.1843,0.143807
2,3.0112,2.715831,0.2948,0.31873,0.2948,0.272985
3,2.6045,2.344712,0.382,0.395894,0.382,0.361076
4,2.3069,2.126158,0.4453,0.451539,0.4453,0.430467
5,2.0635,1.994764,0.4744,0.474183,0.4744,0.455471
6,1.8648,1.891713,0.5002,0.510373,0.5002,0.489502
7,1.6816,1.76689,0.5295,0.531542,0.5295,0.518656
8,1.5147,1.706818,0.5483,0.558287,0.5483,0.544867
9,1.3706,1.679743,0.5515,0.557436,0.5515,0.546756
10,1.2557,1.66575,0.5542,0.560349,0.5542,0.550602


[I 2025-04-05 03:44:58,952] Trial 63 finished with value: 0.5506018499157365 and parameters: {'learning_rate': 0.0005569167278417476, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 51 with value: 0.5627182943716651.


Trial 64 with params: {'learning_rate': 0.0008630760655161954, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7095,3.335529,0.1461,0.115271,0.1461,0.100968
2,3.1547,2.877871,0.2587,0.277118,0.2587,0.230866


[I 2025-04-05 03:47:50,443] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.00018481111980232553, 'weight_decay': 0.006, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7582,3.305578,0.1685,0.154462,0.1685,0.126527
2,3.1179,2.79289,0.2818,0.288553,0.2818,0.25367
3,2.6943,2.404957,0.3775,0.38173,0.3775,0.354004
4,2.3705,2.201292,0.4317,0.442217,0.4317,0.418648
5,2.1163,2.041671,0.4746,0.470422,0.4746,0.455033
6,1.9,1.991484,0.4821,0.486786,0.4821,0.470003
7,1.7242,1.888582,0.4977,0.499083,0.4977,0.48664
8,1.5701,1.84659,0.5106,0.513206,0.5106,0.504274


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Mar 29 17:35:16 2025) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-04-05 03:59:05,224] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.001893125255522273, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7721,3.578997,0.1008,0.080385,0.1008,0.059212
2,3.3656,3.12211,0.1979,0.221661,0.1979,0.164669


[I 2025-04-05 04:01:53,453] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0003624582068387896, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6798,3.247763,0.1712,0.165037,0.1712,0.129498
2,3.0148,2.722231,0.3021,0.310477,0.3021,0.278297
3,2.5763,2.306081,0.4004,0.406331,0.4004,0.380657
4,2.2496,2.075123,0.4548,0.461377,0.4548,0.438338
5,1.9962,1.926955,0.4982,0.500353,0.4982,0.482765
6,1.7821,1.868032,0.5097,0.516094,0.5097,0.498184
7,1.6027,1.743814,0.5304,0.536615,0.5304,0.521188
8,1.4335,1.706254,0.548,0.56059,0.548,0.545639
9,1.2953,1.670292,0.5573,0.56255,0.5573,0.552569
10,1.1833,1.672337,0.5555,0.562481,0.5555,0.552812


[I 2025-04-05 04:15:40,802] Trial 67 finished with value: 0.5528118118581457 and parameters: {'learning_rate': 0.0003624582068387896, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 51 with value: 0.5627182943716651.


Trial 68 with params: {'learning_rate': 0.00011594519273570718, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8539,3.455484,0.1401,0.121409,0.1401,0.099315
2,3.3195,3.046621,0.2331,0.239177,0.2331,0.196507
3,2.9727,2.697611,0.3116,0.308628,0.3116,0.280846
4,2.6865,2.496393,0.3631,0.356617,0.3631,0.337671


[I 2025-04-05 04:21:16,894] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0010791563783778637, 'weight_decay': 0.004, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7097,3.395273,0.1319,0.107524,0.1319,0.087476
2,3.1677,2.893096,0.2492,0.256055,0.2492,0.218412


[I 2025-04-05 04:24:14,502] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0005700674878496552, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6502,3.224334,0.174,0.15707,0.174,0.126872
2,3.0257,2.740355,0.2895,0.306799,0.2895,0.265567
3,2.611,2.351147,0.3828,0.391392,0.3828,0.360337
4,2.3056,2.134445,0.4412,0.452289,0.4412,0.425655


[I 2025-04-05 04:29:41,379] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0001346951779320763, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8319,3.424865,0.1487,0.1442,0.1487,0.105598
2,3.2489,2.92116,0.2569,0.268199,0.2569,0.221727
3,2.8486,2.568456,0.3373,0.340508,0.3373,0.309368
4,2.5459,2.363431,0.3926,0.396525,0.3926,0.372641


[I 2025-04-05 04:35:22,430] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00037558096798764683, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6711,3.199445,0.1857,0.180358,0.1857,0.14194
2,3.0171,2.695029,0.3083,0.31435,0.3083,0.282971
3,2.5821,2.31211,0.3941,0.39917,0.3941,0.372477
4,2.2613,2.095468,0.4492,0.452,0.4492,0.433257


[I 2025-04-05 04:40:57,524] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.00025219446098755945, 'weight_decay': 0.009000000000000001, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6888,3.217709,0.179,0.163206,0.179,0.135457
2,3.0052,2.670681,0.3142,0.323365,0.3142,0.289322
3,2.5707,2.28886,0.4086,0.416185,0.4086,0.388461
4,2.2556,2.094449,0.4567,0.463586,0.4567,0.442253
5,2.0032,1.980801,0.4803,0.480967,0.4803,0.462613
6,1.7888,1.898688,0.502,0.511065,0.502,0.492878
7,1.608,1.794255,0.5257,0.524983,0.5257,0.514315
8,1.4436,1.758164,0.5357,0.542916,0.5357,0.530839


[I 2025-04-05 04:52:26,925] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0002952710041203322, 'weight_decay': 0.01, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7027,3.277174,0.1656,0.159298,0.1656,0.120013
2,3.0418,2.69382,0.3074,0.307746,0.3074,0.280813
3,2.5862,2.310261,0.3998,0.415823,0.3998,0.382432
4,2.2651,2.091548,0.4548,0.461352,0.4548,0.437602


[I 2025-04-05 04:58:09,283] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0004956009904695325, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6465,3.213225,0.1759,0.159192,0.1759,0.131015
2,3.0118,2.701457,0.302,0.318357,0.302,0.276419
3,2.5788,2.302008,0.393,0.399845,0.393,0.372001
4,2.2612,2.099202,0.46,0.459313,0.46,0.442919
5,2.013,1.932457,0.4915,0.48755,0.4915,0.47279
6,1.8053,1.879653,0.5056,0.520624,0.5056,0.495612
7,1.6242,1.737005,0.5375,0.539208,0.5375,0.527215
8,1.4577,1.69039,0.5564,0.565324,0.5564,0.552688
9,1.3178,1.659116,0.5619,0.566351,0.5619,0.557465
10,1.2055,1.647128,0.5674,0.57133,0.5674,0.563224


[I 2025-04-05 05:12:17,678] Trial 75 finished with value: 0.5632237273971085 and parameters: {'learning_rate': 0.0004956009904695325, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 75 with value: 0.5632237273971085.


Trial 76 with params: {'learning_rate': 0.0005103406901583305, 'weight_decay': 0.006, 'warmup_steps': 6, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6342,3.157701,0.1918,0.193988,0.1918,0.145046
2,2.9829,2.67725,0.303,0.311607,0.303,0.278903
3,2.5751,2.319914,0.3935,0.406013,0.3935,0.374545
4,2.2702,2.094458,0.4579,0.467416,0.4579,0.4423
5,2.0248,1.936516,0.4911,0.48946,0.4911,0.472666
6,1.8201,1.859682,0.5076,0.518955,0.5076,0.496805
7,1.6373,1.718837,0.5382,0.542208,0.5382,0.529188
8,1.4712,1.665662,0.5554,0.560614,0.5554,0.550008


[I 2025-04-05 05:23:32,687] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00023214730968687815, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7187,3.240816,0.1784,0.176305,0.1784,0.133957
2,3.0418,2.692907,0.3096,0.313763,0.3096,0.28126
3,2.5899,2.313043,0.4001,0.409233,0.4001,0.38294
4,2.2665,2.109229,0.4496,0.457156,0.4496,0.435879
5,2.0155,1.975331,0.4752,0.47763,0.4752,0.457182
6,1.8034,1.911635,0.4941,0.499123,0.4941,0.482294
7,1.6229,1.818294,0.52,0.524446,0.52,0.510606
8,1.4608,1.790173,0.5296,0.540764,0.5296,0.526545


[I 2025-04-05 05:34:53,068] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0005863729932744796, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6603,3.213845,0.1792,0.187625,0.1792,0.136015
2,3.0089,2.719592,0.2913,0.319535,0.2913,0.267636
3,2.5888,2.305617,0.3982,0.419387,0.3982,0.378242
4,2.2757,2.096533,0.4562,0.463678,0.4562,0.44034
5,2.0324,1.947041,0.4911,0.4918,0.4911,0.471542
6,1.8267,1.874552,0.5046,0.51898,0.5046,0.494907
7,1.6477,1.722335,0.5428,0.546706,0.5428,0.532804
8,1.486,1.672346,0.5593,0.566749,0.5593,0.554977
9,1.3425,1.627911,0.5698,0.57827,0.5698,0.566985
10,1.2301,1.612969,0.5726,0.5764,0.5726,0.56913


[I 2025-04-05 05:48:59,530] Trial 78 finished with value: 0.5691301587796576 and parameters: {'learning_rate': 0.0005863729932744796, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 79 with params: {'learning_rate': 0.0017779310761021746, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7998,3.609597,0.0931,0.050404,0.0931,0.050351
2,3.4075,3.208645,0.174,0.19338,0.174,0.135195


[I 2025-04-05 05:51:53,082] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0004945054565072854, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.624,3.174613,0.1897,0.188144,0.1897,0.148081
2,2.9781,2.698382,0.2933,0.309274,0.2933,0.270795
3,2.5671,2.310858,0.3968,0.413916,0.3968,0.377417
4,2.2675,2.110642,0.4572,0.466982,0.4572,0.442659
5,2.0248,1.954149,0.4886,0.48783,0.4886,0.471446
6,1.816,1.890813,0.5013,0.511789,0.5013,0.491586
7,1.6402,1.763284,0.5294,0.535083,0.5294,0.519246
8,1.476,1.701393,0.552,0.55963,0.552,0.546813
9,1.333,1.67295,0.5588,0.5637,0.5588,0.553535
10,1.2226,1.658054,0.5605,0.567954,0.5605,0.557833


[I 2025-04-05 06:06:08,177] Trial 80 finished with value: 0.5578327803151613 and parameters: {'learning_rate': 0.0004945054565072854, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 81 with params: {'learning_rate': 0.000565216135480852, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6214,3.178075,0.189,0.190623,0.189,0.146076
2,3.0011,2.708364,0.299,0.315054,0.299,0.275765
3,2.5827,2.337276,0.3832,0.387137,0.3832,0.358914
4,2.2743,2.100739,0.455,0.464388,0.455,0.439516
5,2.034,1.944716,0.486,0.485024,0.486,0.46581
6,1.8296,1.882905,0.4968,0.508585,0.4968,0.485481
7,1.652,1.737749,0.5365,0.536577,0.5365,0.52648
8,1.4838,1.682914,0.5543,0.560316,0.5543,0.550024


[I 2025-04-05 06:17:31,092] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0003299634160130272, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6558,3.174155,0.188,0.187547,0.188,0.147181
2,2.9931,2.682298,0.3099,0.316605,0.3099,0.283914
3,2.5692,2.322576,0.3943,0.40317,0.3943,0.373144
4,2.2577,2.080959,0.4576,0.461079,0.4576,0.440743
5,1.9965,1.937903,0.4897,0.484306,0.4897,0.471189
6,1.7767,1.888071,0.5009,0.511242,0.5009,0.489703
7,1.5948,1.759074,0.5348,0.536019,0.5348,0.523527
8,1.4301,1.718736,0.5501,0.557745,0.5501,0.54563
9,1.2901,1.692441,0.5538,0.558394,0.5538,0.549389
10,1.1799,1.699232,0.5484,0.55504,0.5484,0.544906


[I 2025-04-05 06:31:32,424] Trial 82 finished with value: 0.5449056721656885 and parameters: {'learning_rate': 0.0003299634160130272, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 2.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 83 with params: {'learning_rate': 0.00018971937685038228, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7687,3.332516,0.1644,0.156012,0.1644,0.121118
2,3.1589,2.815697,0.285,0.290793,0.285,0.251831


[I 2025-04-05 06:34:26,445] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.000712331080109812, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6486,3.235178,0.1707,0.156518,0.1707,0.125418
2,3.0423,2.758111,0.2784,0.301932,0.2784,0.252835
3,2.6341,2.388092,0.3765,0.390789,0.3765,0.355443
4,2.338,2.18031,0.4398,0.445298,0.4398,0.422909
5,2.1136,2.012144,0.4738,0.470386,0.4738,0.453209
6,1.9145,1.933056,0.4958,0.502181,0.4958,0.482702
7,1.7436,1.790641,0.5261,0.532332,0.5261,0.515132
8,1.5839,1.734588,0.5421,0.552681,0.5421,0.538182


[I 2025-04-05 06:45:40,314] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0008700404806800741, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7078,3.348373,0.1416,0.115904,0.1416,0.100327
2,3.1486,2.864375,0.2591,0.264016,0.2591,0.23127


[I 2025-04-05 06:48:31,484] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0011397484222048736, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7331,3.459831,0.126,0.113112,0.126,0.081519
2,3.2382,3.016346,0.2253,0.254275,0.2253,0.196091
3,2.8759,2.635337,0.3145,0.321005,0.3145,0.28506
4,2.5867,2.401284,0.3846,0.389587,0.3846,0.359067


[I 2025-04-05 06:54:08,305] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.00025335316923329827, 'weight_decay': 0.004, 'warmup_steps': 8, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6848,3.193136,0.1829,0.177138,0.1829,0.139192
2,3.0031,2.655273,0.3177,0.328018,0.3177,0.294458
3,2.5564,2.271438,0.4083,0.41547,0.4083,0.38879
4,2.2286,2.061868,0.4637,0.470453,0.4637,0.448515
5,1.9706,1.93773,0.4865,0.482445,0.4865,0.468527
6,1.7531,1.881591,0.5038,0.517401,0.5038,0.496066
7,1.5729,1.772978,0.5245,0.528309,0.5245,0.514068
8,1.4142,1.737117,0.5393,0.549633,0.5393,0.535513


[I 2025-04-05 07:05:35,267] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.000128882436656059, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8386,3.426931,0.1434,0.114046,0.1434,0.100613
2,3.2924,2.99656,0.2425,0.239965,0.2425,0.205035


[I 2025-04-05 07:08:26,706] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 8.413515361322119e-05, 'weight_decay': 0.008, 'warmup_steps': 5, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9455,3.573494,0.1144,0.105125,0.1144,0.069444
2,3.472,3.212219,0.1976,0.182989,0.1976,0.158298
3,3.158,2.8996,0.2695,0.265758,0.2695,0.233322
4,2.916,2.728688,0.3064,0.306635,0.3064,0.277899


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Sat Mar 29 17:35:20 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-04-05 07:14:13,421] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0007401345347696197, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6625,3.227157,0.1806,0.173571,0.1806,0.137247
2,3.05,2.803809,0.2717,0.296873,0.2717,0.246557
3,2.6465,2.393142,0.3777,0.402001,0.3777,0.357377
4,2.3446,2.196383,0.4307,0.436057,0.4307,0.412959
5,2.1099,2.021351,0.4742,0.477344,0.4742,0.455847
6,1.9144,1.937793,0.4953,0.500393,0.4953,0.483154
7,1.7428,1.807988,0.5257,0.529734,0.5257,0.515248
8,1.5832,1.728761,0.5426,0.550659,0.5426,0.537731


[I 2025-04-05 07:25:31,236] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0006486955941208022, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6538,3.221544,0.1799,0.171296,0.1799,0.136291
2,3.0186,2.758052,0.2897,0.305239,0.2897,0.262835
3,2.6104,2.351811,0.3848,0.39251,0.3848,0.364584
4,2.316,2.160309,0.439,0.439536,0.439,0.420574


[I 2025-04-05 07:31:11,362] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0003556595637366758, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6701,3.199239,0.1768,0.182232,0.1768,0.134488
2,3.0038,2.677498,0.3062,0.317795,0.3062,0.282392
3,2.55,2.265688,0.4082,0.423104,0.4082,0.38921
4,2.2259,2.068903,0.4637,0.474732,0.4637,0.448716
5,1.9675,1.907996,0.5013,0.499785,0.5013,0.482591
6,1.7509,1.849187,0.5089,0.518077,0.5089,0.498699
7,1.5634,1.73537,0.5316,0.535265,0.5316,0.522331
8,1.397,1.687904,0.5523,0.560121,0.5523,0.548171
9,1.2536,1.647953,0.5595,0.56268,0.5595,0.554614
10,1.1467,1.657552,0.5581,0.564537,0.5581,0.555237


[I 2025-04-05 07:45:19,068] Trial 92 finished with value: 0.5552369598887656 and parameters: {'learning_rate': 0.0003556595637366758, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 93 with params: {'learning_rate': 0.003508500948205609, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8756,3.793691,0.0624,0.027245,0.0624,0.028638
2,3.6059,3.484527,0.1226,0.131442,0.1226,0.089412


[I 2025-04-05 07:48:09,993] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.00035134172789189197, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6708,3.212366,0.1755,0.169566,0.1755,0.133949
2,2.9833,2.684854,0.303,0.31467,0.303,0.279552
3,2.5458,2.296745,0.4001,0.418929,0.4001,0.381047
4,2.2254,2.068615,0.4607,0.469586,0.4607,0.446841
5,1.9759,1.921124,0.4922,0.489737,0.4922,0.474664
6,1.7637,1.874554,0.5028,0.51546,0.5028,0.491766
7,1.5792,1.744183,0.5345,0.540314,0.5345,0.526634
8,1.4146,1.683826,0.554,0.559782,0.554,0.54979
9,1.2773,1.665444,0.561,0.56464,0.561,0.556001
10,1.1704,1.659911,0.5601,0.565341,0.5601,0.556161


[I 2025-04-05 08:02:21,260] Trial 94 finished with value: 0.5561608086866875 and parameters: {'learning_rate': 0.00035134172789189197, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 95 with params: {'learning_rate': 0.0011111842605147518, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7575,3.514023,0.1174,0.073128,0.1174,0.073729
2,3.2866,3.044026,0.2226,0.234417,0.2226,0.191553


[I 2025-04-05 08:05:10,671] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0342,3.709032,0.0882,0.060389,0.0882,0.048956
2,3.629,3.410169,0.1531,0.149948,0.1531,0.112914
3,3.3843,3.162672,0.2117,0.182445,0.2117,0.169793
4,3.1822,3.004786,0.2504,0.257198,0.2504,0.21691


[I 2025-04-05 08:10:51,321] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.000404248145845079, 'weight_decay': 0.001, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6795,3.233549,0.1667,0.169207,0.1667,0.125829
2,3.0231,2.710011,0.3028,0.32247,0.3028,0.279546
3,2.5797,2.281775,0.4064,0.416315,0.4064,0.385742
4,2.2547,2.083081,0.4524,0.454128,0.4524,0.434593


[I 2025-04-05 08:16:32,889] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0005908507246441839, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6764,3.250204,0.1675,0.148327,0.1675,0.122993
2,3.0648,2.731042,0.2905,0.294921,0.2905,0.264179
3,2.6556,2.394137,0.367,0.378549,0.367,0.345651
4,2.3598,2.163019,0.4424,0.447249,0.4424,0.425682


[I 2025-04-05 08:22:19,620] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00028250700841433215, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7068,3.251197,0.1672,0.156192,0.1672,0.124325
2,3.0351,2.723002,0.2963,0.311602,0.2963,0.274031
3,2.5908,2.304167,0.395,0.400672,0.395,0.373749
4,2.2724,2.108074,0.4475,0.458636,0.4475,0.432656
5,2.017,1.962236,0.4831,0.481476,0.4831,0.463043
6,1.801,1.902403,0.4989,0.508954,0.4989,0.488324
7,1.6171,1.800809,0.5204,0.526651,0.5204,0.510408
8,1.4541,1.734949,0.5441,0.549753,0.5441,0.539661


[I 2025-04-05 08:33:49,275] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.71,3.262883,0.1724,0.170126,0.1724,0.127905
2,3.0469,2.695268,0.3119,0.328871,0.3119,0.282991
3,2.588,2.304135,0.4014,0.413832,0.4014,0.38243
4,2.2576,2.079791,0.4591,0.46638,0.4591,0.442644
5,2.0036,1.947973,0.4816,0.477152,0.4816,0.462348
6,1.7862,1.90145,0.5022,0.510842,0.5022,0.491554
7,1.601,1.797505,0.523,0.523872,0.523,0.512441
8,1.4378,1.753849,0.542,0.547558,0.542,0.537297


[I 2025-04-05 08:45:09,862] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00012070756143461181, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8371,3.451149,0.1437,0.112161,0.1437,0.101217
2,3.2868,2.984345,0.2433,0.254863,0.2433,0.207629


[I 2025-04-05 08:48:04,038] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.00044781459280434385, 'weight_decay': 0.001, 'warmup_steps': 5, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6854,3.237597,0.1691,0.160577,0.1691,0.120455
2,3.0459,2.734692,0.298,0.312331,0.298,0.26941
3,2.6173,2.349161,0.388,0.401354,0.388,0.366046
4,2.3032,2.139406,0.4471,0.460249,0.4471,0.431733
5,2.0573,1.982985,0.4762,0.474574,0.4762,0.45635
6,1.8499,1.899724,0.5011,0.511832,0.5011,0.491474
7,1.6695,1.76589,0.5314,0.533306,0.5314,0.521379
8,1.503,1.711421,0.551,0.558235,0.551,0.547476
9,1.3608,1.679115,0.5575,0.562404,0.5575,0.552827
10,1.2492,1.675336,0.5559,0.558042,0.5559,0.550275


[I 2025-04-05 09:02:54,251] Trial 102 finished with value: 0.5502748144224853 and parameters: {'learning_rate': 0.00044781459280434385, 'weight_decay': 0.001, 'warmup_steps': 5, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 103 with params: {'learning_rate': 0.0005634909457149228, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6751,3.239096,0.1716,0.170578,0.1716,0.126001
2,3.0366,2.753963,0.286,0.302767,0.286,0.257854
3,2.6299,2.369879,0.3805,0.392331,0.3805,0.35831
4,2.3223,2.125535,0.4459,0.452996,0.4459,0.428649


[I 2025-04-05 09:08:39,308] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.00030668984177812433, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6851,3.217108,0.1797,0.180142,0.1797,0.136874
2,2.989,2.639498,0.3189,0.34313,0.3189,0.293528
3,2.535,2.228486,0.4199,0.427082,0.4199,0.40177
4,2.2103,2.037891,0.4661,0.469815,0.4661,0.452003
5,1.9584,1.912652,0.4992,0.497238,0.4992,0.480862
6,1.746,1.837658,0.513,0.523532,0.513,0.504325
7,1.563,1.731442,0.535,0.53788,0.535,0.526368
8,1.3947,1.699709,0.5473,0.555202,0.5473,0.543022
9,1.2612,1.661296,0.5621,0.56746,0.5621,0.558011
10,1.156,1.665707,0.5562,0.56322,0.5562,0.553527


[I 2025-04-05 09:22:43,004] Trial 104 finished with value: 0.5535265263349695 and parameters: {'learning_rate': 0.00030668984177812433, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 78 with value: 0.5691301587796576.


Trial 105 with params: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.779,3.518141,0.1066,0.08013,0.1066,0.060125
2,3.3253,3.093205,0.2091,0.232946,0.2091,0.177929


[I 2025-04-05 09:25:33,271] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0005711724405570842, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6757,3.237067,0.1699,0.157465,0.1699,0.125002
2,3.0361,2.724617,0.2926,0.308482,0.2926,0.269837
3,2.6134,2.344586,0.3865,0.394997,0.3865,0.365949
4,2.3096,2.116645,0.4512,0.458553,0.4512,0.433669
5,2.0636,1.970739,0.4849,0.482715,0.4849,0.465935
6,1.8621,1.908617,0.4994,0.509444,0.4994,0.487627
7,1.6833,1.769986,0.532,0.535405,0.532,0.520831
8,1.519,1.703856,0.5506,0.556542,0.5506,0.54543
9,1.3807,1.667934,0.5621,0.567647,0.5621,0.556734
10,1.2676,1.661789,0.5621,0.56806,0.5621,0.558016


[I 2025-04-05 09:39:46,827] Trial 106 finished with value: 0.5580156037049804 and parameters: {'learning_rate': 0.0005711724405570842, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 5.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 107 with params: {'learning_rate': 0.00037527758408920114, 'weight_decay': 0.01, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6791,3.252585,0.1716,0.153207,0.1716,0.128101
2,3.0434,2.720226,0.2995,0.310226,0.2995,0.274138
3,2.5926,2.326954,0.3987,0.408676,0.3987,0.37751
4,2.2686,2.095288,0.454,0.455974,0.454,0.438065
5,2.012,1.942566,0.488,0.481883,0.488,0.469726
6,1.7995,1.876489,0.5047,0.518051,0.5047,0.496651
7,1.616,1.747732,0.5318,0.534525,0.5318,0.522256
8,1.4463,1.709694,0.5432,0.549791,0.5432,0.539544


[I 2025-04-05 09:51:22,180] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0003560186795084032, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.663,3.175787,0.1902,0.189179,0.1902,0.149659
2,2.9741,2.642148,0.3218,0.323905,0.3218,0.295926
3,2.5341,2.275467,0.4076,0.422065,0.4076,0.389421
4,2.217,2.075386,0.4615,0.472318,0.4615,0.44688
5,1.9728,1.920276,0.4955,0.490029,0.4955,0.476872
6,1.76,1.874513,0.5045,0.517289,0.5045,0.494389
7,1.5797,1.741308,0.5355,0.536686,0.5355,0.52612
8,1.4108,1.705279,0.5466,0.551668,0.5466,0.541486


[I 2025-04-05 10:02:51,893] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.000774134206313517, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7573,3.440696,0.1285,0.112659,0.1285,0.083215
2,3.2324,2.963362,0.2396,0.256257,0.2396,0.211351
3,2.8389,2.617167,0.3221,0.334981,0.3221,0.294653
4,2.5351,2.371664,0.3946,0.398074,0.3946,0.372563


[I 2025-04-05 10:08:36,579] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0002167850769926573, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7399,3.307293,0.1596,0.139619,0.1596,0.114565
2,3.0894,2.756393,0.2925,0.303158,0.2925,0.264717
3,2.6386,2.341549,0.3963,0.400112,0.3963,0.373441
4,2.306,2.13712,0.4431,0.448639,0.4431,0.426522
5,2.05,2.005191,0.4757,0.47322,0.4757,0.456128
6,1.8307,1.940005,0.4932,0.499579,0.4932,0.481594
7,1.6521,1.824829,0.5201,0.520692,0.5201,0.509115
8,1.4934,1.793877,0.5275,0.534216,0.5275,0.522931


[I 2025-04-05 10:19:57,495] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0003592408033717029, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.658,3.23235,0.1668,0.170556,0.1668,0.124716
2,2.9832,2.668415,0.3072,0.317041,0.3072,0.284114
3,2.5493,2.272436,0.41,0.425709,0.41,0.3936
4,2.2404,2.076964,0.4563,0.467193,0.4563,0.442704
5,1.9906,1.947249,0.4875,0.489922,0.4875,0.471139
6,1.78,1.878783,0.4966,0.506137,0.4966,0.486561
7,1.594,1.769258,0.5229,0.529982,0.5229,0.513261
8,1.4287,1.719043,0.5429,0.553068,0.5429,0.53991


[I 2025-04-05 10:31:22,866] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0007697887596040175, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6796,3.310141,0.156,0.126824,0.156,0.109599
2,3.1088,2.859867,0.263,0.272442,0.263,0.236288
3,2.7418,2.49639,0.344,0.353886,0.344,0.317138
4,2.4569,2.267251,0.413,0.425089,0.413,0.395599


[I 2025-04-05 10:37:06,307] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0006277433305756026, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6832,3.238889,0.178,0.168761,0.178,0.136467
2,3.0556,2.767245,0.2901,0.323118,0.2901,0.268921
3,2.6411,2.362357,0.3866,0.390748,0.3866,0.364181
4,2.3314,2.156333,0.4403,0.443268,0.4403,0.422603


[I 2025-04-05 10:42:41,126] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.00034583452895718984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6765,3.218594,0.1738,0.164474,0.1738,0.130019
2,3.0203,2.694993,0.3104,0.329291,0.3104,0.288342
3,2.5799,2.291929,0.4027,0.413687,0.4027,0.385044
4,2.2569,2.078748,0.465,0.467335,0.465,0.449584
5,2.0043,1.943309,0.4896,0.484948,0.4896,0.471401
6,1.7906,1.86717,0.5072,0.521697,0.5072,0.497585
7,1.6086,1.764696,0.5273,0.527177,0.5273,0.516046
8,1.4412,1.713606,0.549,0.553832,0.549,0.5436
9,1.2999,1.696502,0.5522,0.555712,0.5522,0.546782
10,1.1922,1.682941,0.5503,0.555965,0.5503,0.547206


[I 2025-04-05 10:56:52,183] Trial 114 finished with value: 0.5472056839594505 and parameters: {'learning_rate': 0.00034583452895718984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 3.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 115 with params: {'learning_rate': 0.00017020878833662635, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7753,3.342466,0.1605,0.142019,0.1605,0.121923
2,3.1642,2.825353,0.2876,0.302522,0.2876,0.257625


[I 2025-04-05 10:59:40,124] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0007852534354465918, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6493,3.23357,0.1707,0.159407,0.1707,0.122625
2,3.0568,2.763625,0.2814,0.295932,0.2814,0.256963
3,2.661,2.377003,0.3783,0.39674,0.3783,0.360084
4,2.3627,2.187693,0.4375,0.439125,0.4375,0.419466
5,2.1299,2.029287,0.4682,0.457637,0.4682,0.445894
6,1.9299,1.937549,0.4897,0.497086,0.4897,0.477052
7,1.757,1.803873,0.5226,0.523844,0.5226,0.511156
8,1.5979,1.750256,0.5391,0.544057,0.5391,0.532736


[I 2025-04-05 11:10:56,545] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00033867727017348695, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6605,3.223538,0.1812,0.161957,0.1812,0.136945
2,2.9993,2.695926,0.3053,0.3165,0.3053,0.280058
3,2.5625,2.281074,0.4023,0.413411,0.4023,0.383597
4,2.2373,2.057954,0.4635,0.465893,0.4635,0.446562
5,1.9852,1.926935,0.4951,0.494836,0.4951,0.476055
6,1.7718,1.866736,0.5097,0.52055,0.5097,0.499279
7,1.5873,1.747057,0.5331,0.536198,0.5331,0.522418
8,1.4187,1.716133,0.5478,0.559649,0.5478,0.544301


[I 2025-04-05 11:22:17,618] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.004921981015910125, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8889,3.793572,0.0637,0.036154,0.0637,0.027835
2,3.6081,3.447325,0.1281,0.10003,0.1281,0.089142


[I 2025-04-05 11:25:07,653] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0005500129640587929, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6279,3.166459,0.192,0.183237,0.192,0.148012
2,2.9777,2.65869,0.3154,0.333618,0.3154,0.295103
3,2.5595,2.284002,0.4063,0.41745,0.4063,0.387419
4,2.2549,2.081845,0.4557,0.471429,0.4557,0.439797
5,2.0198,1.959615,0.4826,0.48014,0.4826,0.463623
6,1.8163,1.848994,0.5024,0.513781,0.5024,0.491744
7,1.642,1.747081,0.5322,0.536154,0.5322,0.523284
8,1.4775,1.678772,0.5565,0.563054,0.5565,0.553283
9,1.3396,1.64149,0.5629,0.568627,0.5629,0.558885
10,1.2279,1.653573,0.5571,0.565354,0.5571,0.554452


[I 2025-04-05 11:39:09,641] Trial 119 finished with value: 0.5544518784693386 and parameters: {'learning_rate': 0.0005500129640587929, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 78 with value: 0.5691301587796576.


Trial 120 with params: {'learning_rate': 0.0002595446692896394, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7057,3.276747,0.1665,0.175463,0.1665,0.123114
2,3.0477,2.724168,0.3026,0.320414,0.3026,0.280297
3,2.59,2.294394,0.4045,0.411182,0.4045,0.385234
4,2.2572,2.087729,0.4627,0.469741,0.4627,0.447246
5,2.002,1.944079,0.4909,0.486475,0.4909,0.474403
6,1.7867,1.897591,0.4975,0.501842,0.4975,0.485441
7,1.6042,1.785316,0.5301,0.526794,0.5301,0.517581
8,1.4443,1.746014,0.5448,0.55105,0.5448,0.541183


[I 2025-04-05 11:50:21,910] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9257,3.540062,0.1217,0.088843,0.1217,0.08097
2,3.4497,3.179615,0.2027,0.200045,0.2027,0.16474
3,3.1382,2.883268,0.271,0.282688,0.271,0.235104
4,2.8899,2.699405,0.3161,0.314918,0.3161,0.287959


[I 2025-04-05 11:56:08,045] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00044925277881239584, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6417,3.168018,0.1856,0.18427,0.1856,0.144649
2,2.9766,2.664026,0.3115,0.336648,0.3115,0.287708
3,2.5577,2.298248,0.3977,0.408273,0.3977,0.379432
4,2.2569,2.078959,0.458,0.46236,0.458,0.440646
5,2.0129,1.950061,0.4892,0.486517,0.4892,0.472032
6,1.8083,1.874496,0.5072,0.520451,0.5072,0.497597
7,1.6265,1.747485,0.5314,0.528622,0.5314,0.519503
8,1.4644,1.700067,0.5475,0.560211,0.5475,0.544797


[I 2025-04-05 12:07:27,056] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0013029904370982425, 'weight_decay': 0.007, 'warmup_steps': 11, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.732,3.413918,0.1331,0.099722,0.1331,0.085854
2,3.2271,3.001542,0.223,0.246951,0.223,0.193327
3,2.8961,2.672043,0.3067,0.306067,0.3067,0.278574
4,2.6469,2.492839,0.3602,0.355168,0.3602,0.336599


[I 2025-04-05 12:13:07,401] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.00026255889410269946, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6996,3.248703,0.1716,0.15767,0.1716,0.129723
2,3.05,2.740788,0.2962,0.324245,0.2962,0.275669
3,2.6104,2.311274,0.3941,0.406467,0.3941,0.375473
4,2.2814,2.112242,0.449,0.461285,0.449,0.434468


[I 2025-04-05 12:18:51,152] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0006026288635087375, 'weight_decay': 0.006, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6771,3.244493,0.1811,0.166124,0.1811,0.141223
2,3.0582,2.756812,0.2889,0.313595,0.2889,0.265054
3,2.6326,2.351043,0.3784,0.388395,0.3784,0.355726
4,2.3314,2.160704,0.4306,0.434394,0.4306,0.413401


[I 2025-04-05 12:24:25,695] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0002874010845764581, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6648,3.197893,0.1811,0.178135,0.1811,0.137462
2,3.0012,2.674339,0.3177,0.328446,0.3177,0.293337
3,2.57,2.293006,0.4052,0.417722,0.4052,0.387286
4,2.2437,2.073905,0.4598,0.467603,0.4598,0.445211
5,1.996,1.934554,0.4927,0.491015,0.4927,0.475214
6,1.7787,1.883119,0.505,0.516256,0.505,0.495196
7,1.5977,1.777279,0.5217,0.522953,0.5217,0.51226
8,1.4352,1.733425,0.5405,0.547436,0.5405,0.537072


[I 2025-04-05 12:35:41,645] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.00015544263879753605, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8096,3.388329,0.1518,0.134504,0.1518,0.11255
2,3.225,2.905725,0.2619,0.261176,0.2619,0.228716
3,2.8089,2.530401,0.3577,0.360584,0.3577,0.333432
4,2.4838,2.294261,0.4117,0.40826,0.4117,0.391269


[I 2025-04-05 12:41:18,795] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0006658926980267772, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6969,3.261634,0.165,0.139304,0.165,0.11864
2,3.0921,2.798867,0.2719,0.300905,0.2719,0.249553
3,2.692,2.438564,0.3654,0.382928,0.3654,0.339384
4,2.3962,2.221449,0.4295,0.437127,0.4295,0.410714


[I 2025-04-05 12:46:55,658] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0004891078547776553, 'weight_decay': 0.001, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6806,3.254807,0.1684,0.164411,0.1684,0.122584
2,3.0225,2.695945,0.3061,0.322231,0.3061,0.279792
3,2.5857,2.30869,0.3946,0.407632,0.3946,0.375963
4,2.2774,2.097055,0.4524,0.461594,0.4524,0.43399


[I 2025-04-05 12:52:38,333] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.00027730875432680207, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6945,3.21475,0.1859,0.191494,0.1859,0.14398
2,3.0199,2.677092,0.3101,0.318782,0.3101,0.286002
3,2.5722,2.312561,0.3972,0.410742,0.3972,0.378855
4,2.2427,2.083873,0.4593,0.471953,0.4593,0.444451
5,1.9888,1.936216,0.4946,0.494325,0.4946,0.47666
6,1.7704,1.879974,0.5074,0.519574,0.5074,0.498811
7,1.5886,1.758499,0.5309,0.532242,0.5309,0.520745
8,1.4245,1.734364,0.5462,0.553364,0.5462,0.542414


[I 2025-04-05 13:03:53,394] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0005612567161548509, 'weight_decay': 0.01, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6978,3.253869,0.1656,0.168225,0.1656,0.124038
2,3.051,2.760561,0.2897,0.307516,0.2897,0.26641
3,2.6379,2.385658,0.3717,0.378789,0.3717,0.349661
4,2.3262,2.152612,0.4401,0.448084,0.4401,0.42373
5,2.0864,1.995708,0.4747,0.472584,0.4747,0.457278
6,1.8815,1.922899,0.4954,0.507683,0.4954,0.483793
7,1.7054,1.78841,0.5271,0.533488,0.5271,0.517802
8,1.5412,1.71994,0.5457,0.555887,0.5457,0.542297


[I 2025-04-05 13:15:12,645] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0003120290512861416, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6819,3.220646,0.1801,0.166658,0.1801,0.136751
2,2.9961,2.655161,0.319,0.333751,0.319,0.294513
3,2.5491,2.261607,0.4088,0.416547,0.4088,0.390501
4,2.2254,2.063121,0.4601,0.47543,0.4601,0.44806
5,1.9743,1.910966,0.4941,0.487721,0.4941,0.476664
6,1.7604,1.851157,0.5104,0.522314,0.5104,0.501564
7,1.5785,1.744563,0.5302,0.530746,0.5302,0.520154
8,1.4137,1.718656,0.5421,0.549793,0.5421,0.537732


[I 2025-04-05 13:26:39,784] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0003086373152677779, 'weight_decay': 0.006, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6762,3.220742,0.1812,0.161395,0.1812,0.137391
2,3.0113,2.669957,0.3094,0.311736,0.3094,0.281226
3,2.5686,2.298188,0.4007,0.410771,0.4007,0.381985
4,2.2413,2.084323,0.455,0.465637,0.455,0.441678
5,1.9866,1.933264,0.4914,0.485078,0.4914,0.472906
6,1.7718,1.89055,0.5047,0.516733,0.5047,0.495893
7,1.5887,1.754217,0.5316,0.532798,0.5316,0.519841
8,1.4205,1.710516,0.5458,0.551607,0.5458,0.540961


[I 2025-04-05 13:37:52,231] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.00043136300932718576, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.675,3.208631,0.1776,0.163286,0.1776,0.135525
2,3.0163,2.709228,0.3034,0.318899,0.3034,0.279917
3,2.5755,2.300497,0.4041,0.41259,0.4041,0.386332
4,2.2517,2.07841,0.4611,0.47063,0.4611,0.445603
5,2.0009,1.919753,0.493,0.49588,0.493,0.476794
6,1.7874,1.855943,0.5089,0.520051,0.5089,0.497619
7,1.6055,1.724042,0.5421,0.546773,0.5421,0.533526
8,1.4387,1.675099,0.5547,0.564127,0.5547,0.551538
9,1.2954,1.641937,0.5656,0.567955,0.5656,0.559671
10,1.1832,1.633918,0.5639,0.569174,0.5639,0.56057


[I 2025-04-05 13:52:07,661] Trial 134 finished with value: 0.5605699641240648 and parameters: {'learning_rate': 0.00043136300932718576, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.9, 'temperature': 5.0}. Best is trial 78 with value: 0.5691301587796576.


Trial 135 with params: {'learning_rate': 0.0003883470361447812, 'weight_decay': 0.004, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6571,3.22132,0.1733,0.176005,0.1733,0.132623
2,3.001,2.672841,0.3129,0.32204,0.3129,0.286276
3,2.5612,2.297184,0.4026,0.416938,0.4026,0.384844
4,2.2466,2.103604,0.4553,0.468127,0.4553,0.440308
5,1.9992,1.958197,0.4811,0.48168,0.4811,0.462961
6,1.7896,1.890522,0.5025,0.513701,0.5025,0.493263
7,1.6057,1.758302,0.5299,0.536413,0.5299,0.522517
8,1.4374,1.720223,0.5457,0.553076,0.5457,0.541673


[I 2025-04-05 14:03:39,091] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.000611408701377147, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6243,3.184528,0.1808,0.202273,0.1808,0.13795
2,2.9935,2.714798,0.2898,0.313904,0.2898,0.266565
3,2.5839,2.338545,0.383,0.398959,0.383,0.361566
4,2.2813,2.09563,0.4581,0.474898,0.4581,0.444316
5,2.0333,1.941517,0.4887,0.487865,0.4887,0.469794
6,1.8271,1.86761,0.5006,0.516293,0.5006,0.489999
7,1.6502,1.731731,0.538,0.539958,0.538,0.52823
8,1.4812,1.674872,0.5559,0.567914,0.5559,0.55198
9,1.3421,1.641537,0.5647,0.568748,0.5647,0.559638
10,1.2282,1.626812,0.5716,0.575553,0.5716,0.567913


[I 2025-04-05 14:17:57,123] Trial 136 finished with value: 0.5679126341011765 and parameters: {'learning_rate': 0.000611408701377147, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 6.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 137 with params: {'learning_rate': 0.0007149221056101734, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6866,3.331955,0.1574,0.131124,0.1574,0.111146
2,3.1086,2.852677,0.2651,0.283895,0.2651,0.242276
3,2.6925,2.429412,0.3673,0.38626,0.3673,0.345404
4,2.3888,2.201743,0.426,0.432222,0.426,0.407659


[I 2025-04-05 14:23:41,987] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.00038096581751477634, 'weight_decay': 0.003, 'warmup_steps': 8, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6875,3.212531,0.1801,0.189357,0.1801,0.135407
2,3.0022,2.65572,0.3194,0.32287,0.3194,0.294029
3,2.5665,2.291893,0.4032,0.41197,0.4032,0.385268
4,2.244,2.085232,0.4546,0.462869,0.4546,0.442284
5,1.9967,1.939795,0.4861,0.481865,0.4861,0.468638
6,1.7947,1.879138,0.5052,0.517677,0.5052,0.49506
7,1.61,1.758257,0.5292,0.532171,0.5292,0.518988
8,1.4444,1.711086,0.5455,0.556205,0.5455,0.543036
9,1.3016,1.684496,0.5502,0.557564,0.5502,0.546626
10,1.1937,1.681979,0.5495,0.556705,0.5495,0.546944


[I 2025-04-05 14:37:50,847] Trial 138 finished with value: 0.546943907886199 and parameters: {'learning_rate': 0.00038096581751477634, 'weight_decay': 0.003, 'warmup_steps': 8, 'lambda_param': 0.9, 'temperature': 5.0}. Best is trial 78 with value: 0.5691301587796576.


Trial 139 with params: {'learning_rate': 0.0011842764120852554, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7652,3.489461,0.1151,0.097646,0.1151,0.071511
2,3.2831,3.028569,0.2219,0.248568,0.2219,0.194672


[I 2025-04-05 14:40:42,772] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0007409424459998287, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6792,3.287168,0.1572,0.126957,0.1572,0.114963
2,3.0943,2.796626,0.2752,0.288688,0.2752,0.247105
3,2.7053,2.43638,0.3646,0.383794,0.3646,0.342736
4,2.4185,2.213806,0.4215,0.427082,0.4215,0.402874


[I 2025-04-05 14:46:23,332] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0011412168862343134, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7412,3.485937,0.1169,0.099715,0.1169,0.073841
2,3.2676,3.02734,0.2269,0.236857,0.2269,0.195575


[I 2025-04-05 14:49:13,992] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0003282999939011858, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6592,3.189828,0.1881,0.200466,0.1881,0.145834
2,2.9855,2.634585,0.3206,0.332922,0.3206,0.29759
3,2.5499,2.271428,0.41,0.418237,0.41,0.392136
4,2.2287,2.064287,0.4628,0.46917,0.4628,0.446108
5,1.98,1.921538,0.4955,0.491678,0.4955,0.479206
6,1.7674,1.829568,0.5192,0.526644,0.5192,0.508229
7,1.5842,1.742289,0.5366,0.541497,0.5366,0.526662
8,1.4174,1.69292,0.5548,0.563798,0.5548,0.55083
9,1.2811,1.663614,0.5584,0.56214,0.5584,0.554831
10,1.1745,1.667028,0.5605,0.566925,0.5605,0.557476


[I 2025-04-05 15:03:25,902] Trial 142 finished with value: 0.5574756552063597 and parameters: {'learning_rate': 0.0003282999939011858, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 143 with params: {'learning_rate': 0.0025305202817701693, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8204,3.685723,0.0792,0.037193,0.0792,0.03964
2,3.4801,3.28748,0.1607,0.15809,0.1607,0.124622


[I 2025-04-05 15:06:17,950] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.00017555339706899648, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7654,3.340562,0.1623,0.141167,0.1623,0.117915
2,3.1565,2.828258,0.2771,0.287313,0.2771,0.247326
3,2.7342,2.463228,0.3661,0.369493,0.3661,0.34481
4,2.4156,2.250537,0.4161,0.419116,0.4161,0.397414


[I 2025-04-05 15:11:58,949] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.00045152189960677647, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6369,3.18911,0.1825,0.177353,0.1825,0.138514
2,2.9937,2.668764,0.3102,0.321842,0.3102,0.287693
3,2.5664,2.297986,0.3978,0.415572,0.3978,0.378801
4,2.2564,2.067713,0.4651,0.472836,0.4651,0.450146
5,2.0113,1.943226,0.4884,0.485626,0.4884,0.4698
6,1.8046,1.868596,0.5053,0.520124,0.5053,0.495626
7,1.6239,1.749652,0.5323,0.536672,0.5323,0.522807
8,1.4587,1.698021,0.555,0.563405,0.555,0.551469
9,1.3163,1.661362,0.5644,0.5697,0.5644,0.559917
10,1.2069,1.656259,0.5613,0.568202,0.5613,0.558833


[I 2025-04-05 15:26:01,385] Trial 145 finished with value: 0.5588334293398208 and parameters: {'learning_rate': 0.00045152189960677647, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 5.5}. Best is trial 78 with value: 0.5691301587796576.


Trial 146 with params: {'learning_rate': 0.000519144623615968, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.645,3.206847,0.1775,0.19299,0.1775,0.137472
2,3.0149,2.697941,0.2991,0.314081,0.2991,0.272455
3,2.5984,2.332316,0.3869,0.40568,0.3869,0.366855
4,2.2953,2.141984,0.4505,0.458343,0.4505,0.43172
5,2.0505,1.964321,0.4872,0.482855,0.4872,0.469811
6,1.8435,1.872789,0.5105,0.518829,0.5105,0.500394
7,1.663,1.758709,0.5311,0.530902,0.5311,0.518947
8,1.4943,1.70032,0.5472,0.55452,0.5472,0.542385
9,1.3528,1.674165,0.5567,0.562255,0.5567,0.551854
10,1.2378,1.666303,0.5644,0.569206,0.5644,0.560693


[I 2025-04-05 15:40:27,308] Trial 146 finished with value: 0.5606933405620392 and parameters: {'learning_rate': 0.000519144623615968, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 78 with value: 0.5691301587796576.


Trial 147 with params: {'learning_rate': 0.0006243754388783268, 'weight_decay': 0.002, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6584,3.223586,0.1748,0.158236,0.1748,0.127362
2,3.0354,2.749388,0.2898,0.295109,0.2898,0.264284
3,2.6307,2.373016,0.3785,0.3887,0.3785,0.358281
4,2.3221,2.180865,0.428,0.437586,0.428,0.412107


[I 2025-04-05 15:46:12,113] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.000401766081292308, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6735,3.23422,0.1732,0.161848,0.1732,0.131267
2,3.0288,2.706821,0.3074,0.319275,0.3074,0.281637
3,2.5931,2.318923,0.3924,0.40954,0.3924,0.374209
4,2.2755,2.098227,0.4548,0.45956,0.4548,0.43794


[I 2025-04-05 15:51:56,180] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.000397489159098293, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6624,3.208494,0.1764,0.174639,0.1764,0.134337
2,3.0019,2.666571,0.3141,0.314493,0.3141,0.288047
3,2.5595,2.285873,0.4094,0.425981,0.4094,0.393328
4,2.2428,2.069414,0.4605,0.467822,0.4605,0.445006
5,1.9858,1.933515,0.4874,0.482714,0.4874,0.470192
6,1.775,1.857854,0.5126,0.526089,0.5126,0.505382
7,1.5934,1.745511,0.5334,0.536616,0.5334,0.524154
8,1.426,1.698594,0.5524,0.560179,0.5524,0.548677
9,1.286,1.656166,0.5594,0.564904,0.5594,0.554819
10,1.174,1.66719,0.5563,0.562955,0.5563,0.553087


[I 2025-04-05 16:06:23,363] Trial 149 finished with value: 0.5530867062854452 and parameters: {'learning_rate': 0.000397489159098293, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 78 with value: 0.5691301587796576.


In [20]:
print(best_distill_random)

BestRun(run_id='78', objective=0.5691301587796576, hyperparameters={'learning_rate': 0.0005863729932744796, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}, run_summary=None)


In [21]:
base.reset_seed()

## Prohledávání s normálním tréninkem s doučením klasifikační hlavy předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [22]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [23]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [24]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [25]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(100))
)
  

Nastavení prohledávání.

In [26]:
best_base_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-05 18:38:48,172] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3703,2.430507,0.4699,0.482332,0.4699,0.455055
2,2.3828,2.01231,0.5094,0.519077,0.5094,0.504219


[I 2025-04-05 18:41:25,909] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7215,1.930484,0.5126,0.535793,0.5126,0.505543
2,2.0043,1.758795,0.5393,0.558448,0.5393,0.538133
3,1.8524,1.716527,0.5393,0.552293,0.5393,0.535442
4,1.7748,1.681192,0.5478,0.560663,0.5478,0.543834


[I 2025-04-05 18:46:47,900] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2013,3.719972,0.3141,0.335759,0.3141,0.292585
2,3.5122,3.126707,0.4129,0.429416,0.4129,0.393761


[I 2025-04-05 18:49:26,368] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5934,1.861229,0.5182,0.553158,0.5182,0.511283
2,1.9438,1.745147,0.5345,0.564436,0.5345,0.532711
3,1.8061,1.711797,0.5379,0.557519,0.5379,0.535001
4,1.7348,1.685461,0.5473,0.565682,0.5473,0.54317


[I 2025-04-05 18:54:41,006] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4535,1.864609,0.5124,0.561249,0.5124,0.507492
2,1.9608,1.810378,0.5241,0.57363,0.5241,0.522543
3,1.8334,1.779548,0.528,0.561123,0.528,0.526038
4,1.7649,1.746295,0.5389,0.568263,0.5389,0.536951


[I 2025-04-05 18:59:58,739] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9242,3.219649,0.3899,0.41388,0.3899,0.368289
2,3.0193,2.579408,0.4636,0.475495,0.4636,0.450436
3,2.6234,2.304892,0.4813,0.480859,0.4813,0.466359
4,2.4228,2.16099,0.496,0.496919,0.496,0.485744


[I 2025-04-05 19:05:14,626] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1864,2.249094,0.4862,0.500047,0.4862,0.474747
2,2.2494,1.914652,0.5197,0.531623,0.5197,0.516116
3,2.033,1.819614,0.5252,0.531959,0.5252,0.519311
4,1.9309,1.758658,0.5376,0.544305,0.5376,0.533655


[I 2025-04-05 19:10:33,697] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0264,3.402926,0.3678,0.3955,0.3678,0.34566
2,3.1914,2.761351,0.4476,0.461833,0.4476,0.431885
3,2.7798,2.454459,0.468,0.469653,0.468,0.451168
4,2.5585,2.289977,0.4839,0.484059,0.4839,0.470926


[I 2025-04-05 19:15:52,809] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0866,2.178272,0.4907,0.504499,0.4907,0.48043
2,2.1982,1.879929,0.5239,0.536696,0.5239,0.521059
3,1.9973,1.796228,0.5276,0.535611,0.5276,0.522281
4,1.9009,1.740094,0.5403,0.54816,0.5403,0.53665


[I 2025-04-05 19:21:05,995] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9145,2.051825,0.501,0.518593,0.501,0.492923
2,2.1021,1.816164,0.5328,0.548514,0.5328,0.531174
3,1.9266,1.75319,0.5331,0.543686,0.5331,0.528495
4,1.84,1.706513,0.5442,0.553233,0.5442,0.540418
5,1.7792,1.705431,0.5435,0.544576,0.5435,0.535779
6,1.7383,1.683016,0.5525,0.55678,0.5525,0.547767
7,1.708,1.666734,0.5556,0.554669,0.5556,0.548964
8,1.6829,1.661314,0.5562,0.560497,0.5562,0.553258


[I 2025-04-05 19:31:32,972] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0026025741521183794, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4674,1.881006,0.509,0.560352,0.509,0.504502
2,1.9824,1.84018,0.5198,0.574775,0.5198,0.518945
3,1.8554,1.806351,0.5264,0.564267,0.5264,0.525006
4,1.7858,1.769498,0.5346,0.566427,0.5346,0.533348


[I 2025-04-05 19:36:59,758] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0003262588029927626, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2223,2.308174,0.4804,0.4923,0.4804,0.467913
2,2.2961,1.949712,0.5159,0.526945,0.5159,0.511722


[I 2025-04-05 19:39:43,346] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0009531187414107555, 'weight_decay': 0.005, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6389,1.891609,0.5144,0.543574,0.5144,0.50774
2,1.9714,1.745371,0.537,0.559467,0.537,0.536042
3,1.8273,1.708543,0.5394,0.553804,0.5394,0.535901
4,1.753,1.677835,0.548,0.56218,0.548,0.543802
5,1.6961,1.686555,0.5459,0.553651,0.5459,0.538219
6,1.6556,1.670564,0.5518,0.562406,0.5518,0.548199
7,1.6231,1.652366,0.5551,0.556731,0.5551,0.54867
8,1.5951,1.644734,0.5585,0.565139,0.5585,0.556383


[I 2025-04-05 19:50:25,315] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0009263363105887989, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.645,1.896021,0.5139,0.542484,0.5139,0.507347
2,1.9753,1.746643,0.5372,0.558917,0.5372,0.536196
3,1.8304,1.709244,0.5395,0.553268,0.5395,0.535863
4,1.7558,1.677954,0.5478,0.561619,0.5478,0.54353
5,1.6988,1.686253,0.5465,0.553512,0.5465,0.53871
6,1.6584,1.670197,0.5522,0.562348,0.5522,0.548548
7,1.6261,1.652167,0.5553,0.556651,0.5553,0.548833
8,1.5984,1.644755,0.5583,0.564912,0.5583,0.556201


[I 2025-04-05 20:00:49,569] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0017763026521482, 'weight_decay': 0.005, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4763,1.849999,0.5157,0.558096,0.5157,0.509388
2,1.9365,1.767085,0.5275,0.568302,0.5275,0.52547
3,1.8064,1.736886,0.5329,0.558394,0.5329,0.5303
4,1.7381,1.709143,0.5438,0.566755,0.5438,0.54033
5,1.6794,1.724633,0.5368,0.552694,0.5368,0.529212
6,1.6333,1.701643,0.5448,0.561841,0.5448,0.542083
7,1.5922,1.674762,0.5512,0.554029,0.5512,0.545021
8,1.5561,1.663939,0.5549,0.564983,0.5549,0.553421


[I 2025-04-05 20:11:36,155] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.002125688919623599, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5012,1.859596,0.513,0.558933,0.513,0.507056
2,1.9533,1.794556,0.5253,0.573089,0.5253,0.523567
3,1.8236,1.764305,0.5296,0.560135,0.5296,0.52751
4,1.7548,1.733225,0.5402,0.567694,0.5402,0.537738


[I 2025-04-05 20:16:48,989] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0001293425222493065, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8429,3.104883,0.4036,0.423321,0.4036,0.381787
2,2.9219,2.48461,0.4726,0.485346,0.4726,0.461485
3,2.5438,2.232519,0.4869,0.486114,0.4869,0.472922
4,2.3568,2.10056,0.503,0.50457,0.503,0.493897


[I 2025-04-05 20:22:08,280] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0017352383115840264, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.475,1.849325,0.5164,0.558121,0.5164,0.509956
2,1.9354,1.764285,0.5279,0.567566,0.5279,0.526003
3,1.8051,1.734002,0.5333,0.558628,0.5333,0.530786
4,1.7368,1.706554,0.5452,0.56795,0.5452,0.541623
5,1.6783,1.721966,0.5373,0.552385,0.5373,0.529738
6,1.6325,1.699615,0.5447,0.561549,0.5447,0.541966
7,1.5919,1.673239,0.5513,0.554381,0.5513,0.545134
8,1.5563,1.662472,0.5548,0.564693,0.5548,0.553257


[I 2025-04-05 20:32:46,311] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0001044907148504563, 'weight_decay': 0.006, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9979,3.331565,0.3773,0.404756,0.3773,0.35486
2,3.1187,2.680047,0.4555,0.468901,0.4555,0.441015


[I 2025-04-05 20:35:29,876] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00046730377985285565, 'weight_decay': 0.004, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0172,2.114443,0.4953,0.511049,0.4953,0.486329
2,2.1491,1.846298,0.5286,0.542747,0.5286,0.5264
3,1.9606,1.773043,0.5313,0.541177,0.5313,0.526357
4,1.869,1.721641,0.5427,0.551284,0.5427,0.539003
5,1.8062,1.717662,0.5427,0.542964,0.5427,0.534865
6,1.7645,1.692477,0.5499,0.553021,0.5499,0.544818
7,1.734,1.675916,0.5536,0.552207,0.5536,0.546823
8,1.7091,1.67047,0.5552,0.559216,0.5552,0.552264


[I 2025-04-05 20:46:09,501] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0036707750721263967, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5207,1.968837,0.4937,0.562302,0.4937,0.492102
2,2.0863,1.983174,0.5006,0.573479,0.5006,0.501894


[I 2025-04-05 20:48:54,409] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0024963497434331502, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4497,1.873663,0.511,0.561243,0.511,0.506721
2,1.9735,1.828506,0.5219,0.574853,0.5219,0.520659
3,1.8465,1.796191,0.5265,0.562379,0.5265,0.524976
4,1.7775,1.760594,0.5354,0.566135,0.5354,0.533971


[I 2025-04-05 20:54:11,490] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0026080652276327457, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4521,1.880337,0.5102,0.561568,0.5102,0.505809
2,1.9822,1.84035,0.5206,0.575044,0.5206,0.519585
3,1.8554,1.806467,0.5262,0.563622,0.5262,0.524848
4,1.7859,1.769491,0.5343,0.566705,0.5343,0.533356


[I 2025-04-05 20:59:30,181] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0010450619114319906, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6144,1.878795,0.5159,0.547201,0.5159,0.509258
2,1.9598,1.742736,0.5371,0.56131,0.5371,0.536098
3,1.8184,1.707263,0.5375,0.552764,0.5375,0.534
4,1.7452,1.678405,0.549,0.564243,0.549,0.544751
5,1.6884,1.688359,0.544,0.552818,0.544,0.536497
6,1.6475,1.672415,0.55,0.561919,0.55,0.546697
7,1.6141,1.653458,0.5553,0.557006,0.5553,0.548833
8,1.5854,1.645225,0.5572,0.563807,0.5572,0.554962
9,1.5645,1.627497,0.5573,0.555344,0.5573,0.552231
10,1.5403,1.636196,0.5596,0.559915,0.5596,0.556304


[I 2025-04-05 21:12:56,137] Trial 23 finished with value: 0.5563036082256244 and parameters: {'learning_rate': 0.0010450619114319906, 'weight_decay': 0.005, 'warmup_steps': 9}. Best is trial 23 with value: 0.5563036082256244.


Trial 24 with params: {'learning_rate': 0.0006099322054343215, 'weight_decay': 0.006, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8628,2.006729,0.5045,0.523113,0.5045,0.496639
2,2.0656,1.792938,0.5343,0.550965,0.5343,0.532853
3,1.8983,1.737767,0.5362,0.547845,0.5362,0.531908
4,1.8149,1.695081,0.5453,0.55517,0.5453,0.541244
5,1.7554,1.696444,0.5436,0.546611,0.5436,0.536225
6,1.7149,1.676329,0.5525,0.557967,0.5525,0.547888
7,1.6844,1.660035,0.5567,0.556015,0.5567,0.5501
8,1.6589,1.654343,0.5573,0.561443,0.5573,0.554412
9,1.6419,1.637935,0.5584,0.554903,0.5584,0.552448
10,1.6248,1.647134,0.5574,0.556949,0.5574,0.553469


[I 2025-04-05 21:26:21,620] Trial 24 finished with value: 0.553468690842466 and parameters: {'learning_rate': 0.0006099322054343215, 'weight_decay': 0.006, 'warmup_steps': 12}. Best is trial 23 with value: 0.5563036082256244.


Trial 25 with params: {'learning_rate': 0.0006626635968074869, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8173,1.979412,0.5074,0.526737,0.5074,0.499737
2,2.0438,1.780032,0.5356,0.55247,0.5356,0.534256
3,1.882,1.729594,0.5373,0.549183,0.5373,0.533089
4,1.8007,1.689398,0.5478,0.559572,0.5478,0.544177
5,1.7419,1.692258,0.5446,0.548288,0.5446,0.53735
6,1.7016,1.673349,0.5529,0.559359,0.5529,0.548514
7,1.6709,1.656929,0.5566,0.555957,0.5566,0.549982
8,1.6451,1.650994,0.558,0.562871,0.558,0.555362
9,1.6277,1.634408,0.5579,0.554513,0.5579,0.552034
10,1.6097,1.643636,0.5568,0.556236,0.5568,0.552863


[I 2025-04-05 21:39:39,088] Trial 25 finished with value: 0.552863055985839 and parameters: {'learning_rate': 0.0006626635968074869, 'weight_decay': 0.006, 'warmup_steps': 11}. Best is trial 23 with value: 0.5563036082256244.


Trial 26 with params: {'learning_rate': 0.000272913738561902, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3685,2.443574,0.4682,0.479305,0.4682,0.453033
2,2.3942,2.022187,0.5088,0.518442,0.5088,0.503449


[I 2025-04-05 21:42:22,114] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0010280960603638197, 'weight_decay': 0.006, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6371,1.882747,0.5152,0.545183,0.5152,0.508455
2,1.9631,1.743527,0.5364,0.560492,0.5364,0.53548
3,1.8205,1.707559,0.5384,0.553842,0.5384,0.535002
4,1.7468,1.678338,0.5484,0.563238,0.5484,0.544075
5,1.6899,1.688054,0.5439,0.552583,0.5439,0.536385
6,1.649,1.672101,0.5504,0.562101,0.5504,0.547103
7,1.6156,1.653266,0.5553,0.55697,0.5553,0.548813
8,1.5871,1.645153,0.5581,0.564482,0.5581,0.555833
9,1.5663,1.627462,0.5576,0.555785,0.5576,0.552572
10,1.5423,1.636158,0.5595,0.559785,0.5595,0.55617


[I 2025-04-05 21:55:45,123] Trial 27 finished with value: 0.5561698156953345 and parameters: {'learning_rate': 0.0010280960603638197, 'weight_decay': 0.006, 'warmup_steps': 17}. Best is trial 23 with value: 0.5563036082256244.


Trial 28 with params: {'learning_rate': 0.0009418131843411974, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.664,1.895958,0.5129,0.540907,0.5129,0.506111
2,1.9747,1.746486,0.5364,0.55869,0.5364,0.535459
3,1.8293,1.709034,0.539,0.552911,0.539,0.535347
4,1.7545,1.678052,0.5483,0.562375,0.5483,0.544063
5,1.6974,1.686543,0.5463,0.554093,0.5463,0.538695
6,1.6569,1.670494,0.552,0.562716,0.552,0.548456
7,1.6243,1.652375,0.5552,0.556712,0.5552,0.54873
8,1.5965,1.644818,0.5587,0.565178,0.5587,0.556567
9,1.5766,1.627325,0.5584,0.556623,0.5584,0.553426
10,1.554,1.636258,0.5578,0.557976,0.5578,0.554411


[I 2025-04-05 22:08:54,032] Trial 28 finished with value: 0.5544111507677794 and parameters: {'learning_rate': 0.0009418131843411974, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14}. Best is trial 23 with value: 0.5563036082256244.


Trial 29 with params: {'learning_rate': 0.0023321838809476363, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4773,1.866771,0.512,0.558886,0.512,0.506747
2,1.9638,1.81282,0.5241,0.574974,0.5241,0.522741
3,1.8357,1.781794,0.5275,0.561219,0.5275,0.525722
4,1.7668,1.748391,0.5378,0.567763,0.5378,0.536044


[I 2025-04-05 22:14:08,727] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0003825540298871317, 'weight_decay': 0.007, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.163,2.223916,0.4877,0.50232,0.4877,0.477061
2,2.2303,1.900995,0.5221,0.534655,0.5221,0.518854


[I 2025-04-05 22:16:46,397] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.001398891870057478, 'weight_decay': 0.007, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5508,1.855316,0.5196,0.55649,0.5196,0.51276
2,1.9384,1.747609,0.534,0.566703,0.534,0.532061
3,1.8033,1.715313,0.5383,0.559076,0.5383,0.535496
4,1.733,1.689226,0.5464,0.565563,0.5464,0.542292
5,1.6756,1.702661,0.5425,0.554318,0.5425,0.534762
6,1.6324,1.684545,0.5482,0.562884,0.5482,0.54503
7,1.5955,1.662136,0.5527,0.554511,0.5527,0.546042
8,1.5634,1.652154,0.5573,0.565682,0.5573,0.555432
9,1.5384,1.633463,0.5575,0.556412,0.5575,0.552708
10,1.5085,1.640889,0.5578,0.557429,0.5578,0.554358


[I 2025-04-05 22:30:00,086] Trial 31 finished with value: 0.5543580527845232 and parameters: {'learning_rate': 0.001398891870057478, 'weight_decay': 0.007, 'warmup_steps': 21}. Best is trial 23 with value: 0.5563036082256244.


Trial 32 with params: {'learning_rate': 0.0006894965942485841, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8146,1.970104,0.508,0.527727,0.508,0.500398
2,2.0359,1.775162,0.5362,0.553696,0.5362,0.534927
3,1.8754,1.726368,0.5381,0.550715,0.5381,0.534097
4,1.7946,1.687237,0.5479,0.559774,0.5479,0.544184
5,1.736,1.690725,0.5449,0.548634,0.5449,0.537595
6,1.6957,1.672305,0.5532,0.56012,0.5532,0.548973
7,1.6649,1.655751,0.5563,0.556387,0.5563,0.549784
8,1.6388,1.649688,0.5585,0.563308,0.5585,0.555851
9,1.6212,1.633015,0.559,0.555679,0.559,0.553202
10,1.6027,1.642205,0.5569,0.556381,0.5569,0.553028


[I 2025-04-05 22:43:17,850] Trial 32 finished with value: 0.5530278879464016 and parameters: {'learning_rate': 0.0006894965942485841, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}. Best is trial 23 with value: 0.5563036082256244.


Trial 33 with params: {'learning_rate': 0.0029669031935158374, 'weight_decay': 0.007, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4925,1.907477,0.5054,0.564686,0.5054,0.502301
2,2.0149,1.884098,0.5158,0.577564,0.5158,0.516297
3,1.888,1.841399,0.5222,0.561922,0.5222,0.520334
4,1.8163,1.79983,0.53,0.56555,0.53,0.52961


[I 2025-04-05 22:48:47,952] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0010432425642338363, 'weight_decay': 0.007, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6383,1.881374,0.5155,0.546346,0.5155,0.50873
2,1.9618,1.743339,0.5366,0.560898,0.5366,0.53556
3,1.8194,1.707511,0.5383,0.553758,0.5383,0.53482
4,1.7458,1.67853,0.5487,0.563899,0.5487,0.544383
5,1.6888,1.688429,0.5442,0.553139,0.5442,0.536749
6,1.6478,1.672439,0.5498,0.561493,0.5498,0.546406
7,1.6143,1.653496,0.555,0.556772,0.555,0.548604
8,1.5857,1.645295,0.5576,0.564035,0.5576,0.555325
9,1.5647,1.627556,0.5574,0.555686,0.5574,0.55237
10,1.5404,1.636199,0.5595,0.559774,0.5595,0.556202


[I 2025-04-05 23:02:18,651] Trial 34 finished with value: 0.5562022930730277 and parameters: {'learning_rate': 0.0010432425642338363, 'weight_decay': 0.007, 'warmup_steps': 20}. Best is trial 23 with value: 0.5563036082256244.


Trial 35 with params: {'learning_rate': 0.0012269580182257612, 'weight_decay': 0.004, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5804,1.863154,0.518,0.551325,0.518,0.511034
2,1.9454,1.743075,0.5353,0.563757,0.5353,0.533817
3,1.8076,1.7093,0.5383,0.556994,0.5383,0.535284
4,1.7361,1.682624,0.548,0.565425,0.548,0.543853
5,1.6791,1.694555,0.5432,0.554118,0.5432,0.535574
6,1.6371,1.677922,0.5493,0.562355,0.5493,0.545927
7,1.6019,1.657327,0.5534,0.554708,0.5534,0.546589
8,1.5716,1.648061,0.5554,0.562839,0.5554,0.553297


[I 2025-04-05 23:13:02,799] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0007056962824864117, 'weight_decay': 0.01, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7901,1.961473,0.5093,0.529267,0.5093,0.501899
2,2.0292,1.771679,0.5371,0.5547,0.5371,0.535836
3,1.8708,1.724302,0.5377,0.550168,0.5377,0.53375
4,1.7908,1.685905,0.5482,0.560287,0.5482,0.544538
5,1.7324,1.689852,0.5451,0.549034,0.5451,0.537802
6,1.6922,1.671713,0.5537,0.560771,0.5537,0.549492
7,1.6613,1.655107,0.5558,0.556117,0.5558,0.549377
8,1.6352,1.64897,0.5585,0.563602,0.5585,0.556014
9,1.6174,1.632224,0.5594,0.556032,0.5594,0.553577
10,1.5987,1.641451,0.5571,0.556496,0.5571,0.553182


[I 2025-04-05 23:26:34,556] Trial 36 finished with value: 0.5531815348578227 and parameters: {'learning_rate': 0.0007056962824864117, 'weight_decay': 0.01, 'warmup_steps': 13}. Best is trial 23 with value: 0.5563036082256244.


Trial 37 with params: {'learning_rate': 0.0015043412297556752, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5234,1.851959,0.5191,0.557462,0.5191,0.512422
2,1.9357,1.751771,0.5329,0.567047,0.5329,0.53097
3,1.8023,1.720327,0.5379,0.559794,0.5379,0.53498
4,1.7328,1.69411,0.5469,0.567857,0.5469,0.543077
5,1.6752,1.708326,0.5396,0.551367,0.5396,0.531873
6,1.6313,1.689038,0.5465,0.562468,0.5465,0.543733
7,1.5932,1.665441,0.5524,0.554678,0.5524,0.545934
8,1.5601,1.655147,0.5573,0.565816,0.5573,0.555443
9,1.5339,1.636129,0.5589,0.557968,0.5589,0.554033
10,1.5023,1.643175,0.5566,0.55639,0.5566,0.553233


[I 2025-04-05 23:40:03,420] Trial 37 finished with value: 0.5532334014419784 and parameters: {'learning_rate': 0.0015043412297556752, 'weight_decay': 0.006, 'warmup_steps': 16}. Best is trial 23 with value: 0.5563036082256244.


Trial 38 with params: {'learning_rate': 0.0006905335389941489, 'weight_decay': 0.007, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8316,1.972302,0.5083,0.528062,0.5083,0.500609
2,2.0374,1.775564,0.5358,0.553256,0.5358,0.534497
3,1.8759,1.726457,0.5381,0.550698,0.5381,0.534106
4,1.7948,1.687246,0.548,0.559882,0.548,0.544335
5,1.7361,1.69074,0.5447,0.54838,0.5447,0.537323
6,1.6957,1.672316,0.5527,0.559772,0.5527,0.548526
7,1.6647,1.655721,0.5554,0.555505,0.5554,0.548869
8,1.6387,1.649662,0.5584,0.56324,0.5584,0.555783
9,1.6209,1.632944,0.5595,0.556148,0.5595,0.553648
10,1.6025,1.642154,0.5568,0.556196,0.5568,0.552899


[I 2025-04-05 23:53:34,938] Trial 38 finished with value: 0.552898712245728 and parameters: {'learning_rate': 0.0006905335389941489, 'weight_decay': 0.007, 'warmup_steps': 27}. Best is trial 23 with value: 0.5563036082256244.


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2553,3.816,0.2895,0.307526,0.2895,0.269036
2,3.6143,3.248659,0.3977,0.414878,0.3977,0.377411


[I 2025-04-05 23:56:20,171] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 9.680397102249245e-05, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.0139,3.383967,0.3715,0.39791,0.3715,0.349187
2,3.1741,2.743117,0.4484,0.462492,0.4484,0.433018
3,2.7643,2.43947,0.4697,0.47171,0.4697,0.453049
4,2.5451,2.277102,0.4853,0.485489,0.4853,0.47265


[I 2025-04-06 00:01:43,390] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0008971160277340763, 'weight_decay': 0.007, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6967,1.906316,0.5132,0.539614,0.5132,0.506005
2,1.9834,1.749472,0.5372,0.558507,0.5372,0.536202
3,1.8356,1.710679,0.5395,0.553009,0.5395,0.535832
4,1.7597,1.678532,0.5484,0.562149,0.5484,0.54441
5,1.7025,1.686287,0.5455,0.552196,0.5455,0.537878
6,1.662,1.67004,0.553,0.562948,0.553,0.549336
7,1.6298,1.652226,0.555,0.556351,0.555,0.548577
8,1.6023,1.644933,0.5578,0.564137,0.5578,0.555722
9,1.5828,1.62761,0.5588,0.556815,0.5588,0.553771
10,1.5609,1.636596,0.5574,0.557548,0.5574,0.554013


[I 2025-04-06 00:14:52,315] Trial 41 finished with value: 0.5540134060939599 and parameters: {'learning_rate': 0.0008971160277340763, 'weight_decay': 0.007, 'warmup_steps': 20}. Best is trial 23 with value: 0.5563036082256244.


Trial 42 with params: {'learning_rate': 0.0015492225246106178, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5275,1.852025,0.518,0.557174,0.518,0.511409
2,1.9362,1.754076,0.5315,0.566365,0.5315,0.529545
3,1.8029,1.722859,0.537,0.559865,0.537,0.534288
4,1.7335,1.69647,0.547,0.568448,0.547,0.543134
5,1.6756,1.710985,0.5389,0.551139,0.5389,0.531147
6,1.6313,1.691045,0.5463,0.562455,0.5463,0.543572
7,1.5927,1.666958,0.551,0.553262,0.551,0.544526
8,1.5591,1.656526,0.5567,0.565523,0.5567,0.55491


[I 2025-04-06 00:25:35,086] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0012095759269540086, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5826,1.864183,0.5178,0.550837,0.5178,0.510897
2,1.9463,1.742817,0.535,0.562825,0.535,0.533646
3,1.8083,1.708884,0.5386,0.556796,0.5386,0.535501
4,1.7367,1.682078,0.5481,0.565451,0.5481,0.54401
5,1.6797,1.693833,0.5431,0.553841,0.5431,0.535453
6,1.6378,1.677338,0.5491,0.56199,0.5491,0.545779
7,1.6028,1.656897,0.5534,0.554677,0.5534,0.546668
8,1.5727,1.647717,0.5552,0.562773,0.5552,0.553142


[I 2025-04-06 00:36:22,898] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0007680629189689585, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7627,1.940609,0.5111,0.53383,0.5111,0.503945
2,2.012,1.762293,0.5388,0.557421,0.5388,0.537517
3,1.8573,1.718422,0.5381,0.550844,0.5381,0.534143
4,1.7787,1.682308,0.5479,0.560388,0.5479,0.543955
5,1.7208,1.687606,0.545,0.550044,0.545,0.537692
6,1.6806,1.670346,0.553,0.560814,0.553,0.548948
7,1.6493,1.653381,0.5557,0.556067,0.5557,0.54917
8,1.6227,1.646909,0.5586,0.56419,0.5586,0.556234
9,1.6044,1.629969,0.5592,0.556248,0.5592,0.553681
10,1.5847,1.639114,0.5573,0.557199,0.5573,0.553701


[I 2025-04-06 00:49:40,127] Trial 44 finished with value: 0.553700889056108 and parameters: {'learning_rate': 0.0007680629189689585, 'weight_decay': 0.006, 'warmup_steps': 19}. Best is trial 23 with value: 0.5563036082256244.


Trial 45 with params: {'learning_rate': 6.06596987954659e-05, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.215,3.762294,0.3039,0.321706,0.3039,0.28282
2,3.563,3.191293,0.4045,0.420832,0.4045,0.384674


[I 2025-04-06 00:52:20,735] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0006498492649823818, 'weight_decay': 0.01, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8623,1.991053,0.506,0.525223,0.506,0.498195
2,2.0523,1.784172,0.535,0.552062,0.535,0.533609
3,1.8872,1.731867,0.537,0.549275,0.537,0.532821
4,1.8047,1.690892,0.5472,0.558469,0.5472,0.543455
5,1.7454,1.693308,0.5445,0.547877,0.5445,0.53714
6,1.705,1.674068,0.5535,0.559621,0.5535,0.549054
7,1.6742,1.657649,0.5561,0.555342,0.5561,0.549415
8,1.6484,1.651752,0.5579,0.562592,0.5579,0.555202


[I 2025-04-06 01:02:59,797] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0009473224542208239, 'weight_decay': 0.003, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6533,1.89401,0.5127,0.541079,0.5127,0.506052
2,1.9731,1.745983,0.5365,0.55895,0.5365,0.535628
3,1.8284,1.708792,0.5393,0.553623,0.5393,0.535724
4,1.7538,1.677977,0.5482,0.562316,0.5482,0.543968
5,1.6968,1.686581,0.546,0.553975,0.546,0.538408
6,1.6563,1.67054,0.5519,0.562542,0.5519,0.548302
7,1.6237,1.652373,0.5551,0.556624,0.5551,0.548649
8,1.5958,1.644771,0.5586,0.565153,0.5586,0.556474
9,1.576,1.627287,0.5584,0.556547,0.5584,0.553384
10,1.5532,1.636223,0.5576,0.557762,0.5576,0.554217


[I 2025-04-06 01:16:23,260] Trial 47 finished with value: 0.554216603967467 and parameters: {'learning_rate': 0.0009473224542208239, 'weight_decay': 0.003, 'warmup_steps': 10}. Best is trial 23 with value: 0.5563036082256244.


Trial 48 with params: {'learning_rate': 0.0021732253224942765, 'weight_decay': 0.008, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5036,1.861337,0.5126,0.559218,0.5126,0.50665
2,1.9562,1.798709,0.5247,0.572933,0.5247,0.522925
3,1.8266,1.76834,0.5298,0.56107,0.5298,0.527887
4,1.7577,1.736761,0.54,0.567966,0.54,0.537683


[I 2025-04-06 01:21:50,499] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.001462818098752676, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5196,1.852018,0.52,0.556614,0.52,0.513424
2,1.9355,1.749773,0.5327,0.566099,0.5327,0.530711
3,1.802,1.71806,0.5383,0.559551,0.5383,0.535364
4,1.7325,1.691957,0.5463,0.566458,0.5463,0.54221


[I 2025-04-06 01:27:13,890] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.001294853802816282, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5565,1.858415,0.518,0.551728,0.518,0.511112
2,1.9411,1.74425,0.5357,0.564917,0.5357,0.533885
3,1.8049,1.71117,0.5369,0.556461,0.5369,0.534027
4,1.7341,1.684876,0.5471,0.565614,0.5471,0.543038


[I 2025-04-06 01:32:33,056] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0006068017740218379, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8571,2.007041,0.5044,0.522702,0.5044,0.496516
2,2.0661,1.793421,0.5342,0.550928,0.5342,0.532836
3,1.8991,1.738181,0.5359,0.547466,0.5359,0.531593
4,1.8157,1.695396,0.5455,0.555491,0.5455,0.541517


[I 2025-04-06 01:37:55,441] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0015712674667928963, 'weight_decay': 0.004, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5044,1.85036,0.5169,0.556953,0.5169,0.51044
2,1.9346,1.754915,0.5315,0.566636,0.5315,0.52951
3,1.8024,1.723892,0.5368,0.559808,0.5368,0.533985
4,1.7334,1.697464,0.5472,0.569056,0.5472,0.543458
5,1.6755,1.712099,0.5387,0.550958,0.5387,0.530921
6,1.6311,1.691999,0.547,0.563398,0.547,0.544334
7,1.5923,1.667616,0.5506,0.552999,0.5506,0.544195
8,1.5585,1.657123,0.5565,0.565341,0.5565,0.554737


[I 2025-04-06 01:48:43,631] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0009634034071663754, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6367,1.890065,0.5142,0.543647,0.5142,0.507613
2,1.9699,1.744959,0.5369,0.559905,0.5369,0.535964
3,1.8262,1.708321,0.539,0.553237,0.539,0.535454
4,1.752,1.677858,0.5481,0.562251,0.5481,0.543863
5,1.6951,1.686726,0.5457,0.55368,0.5457,0.538102
6,1.6546,1.670744,0.5518,0.562656,0.5518,0.54824
7,1.6219,1.652449,0.5554,0.557081,0.5554,0.548973
8,1.5939,1.644781,0.5581,0.564716,0.5581,0.556018
9,1.5739,1.627258,0.5585,0.556639,0.5585,0.553506
10,1.551,1.636163,0.558,0.558221,0.558,0.5546


[I 2025-04-06 02:02:15,054] Trial 53 finished with value: 0.5546003704694045 and parameters: {'learning_rate': 0.0009634034071663754, 'weight_decay': 0.001, 'warmup_steps': 5}. Best is trial 23 with value: 0.5563036082256244.


Trial 54 with params: {'learning_rate': 0.0033344946634773728, 'weight_decay': 0.0, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4854,1.936504,0.4989,0.561477,0.4989,0.496662
2,2.0488,1.932541,0.5081,0.574553,0.5081,0.508797
3,1.9233,1.877065,0.5179,0.560083,0.5179,0.515422
4,1.8493,1.830385,0.5273,0.56605,0.5273,0.527381


[I 2025-04-06 02:07:33,141] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0008953750478722926, 'weight_decay': 0.0, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6823,1.904959,0.513,0.539696,0.513,0.505884
2,1.9824,1.749163,0.5373,0.558359,0.5373,0.536288
3,1.8352,1.710618,0.5394,0.552777,0.5394,0.535667
4,1.7596,1.678434,0.5484,0.56208,0.5484,0.544293
5,1.7025,1.68621,0.5456,0.552234,0.5456,0.538051
6,1.6621,1.669981,0.5529,0.562792,0.5529,0.549199
7,1.6299,1.652208,0.5554,0.556604,0.5554,0.548935
8,1.6025,1.644917,0.5578,0.56398,0.5578,0.555714


[I 2025-04-06 02:18:18,860] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0017368730614283924, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4749,1.849303,0.5164,0.55828,0.5164,0.50999
2,1.9355,1.764399,0.5279,0.567893,0.5279,0.526006
3,1.8052,1.734103,0.5335,0.5588,0.5335,0.530957
4,1.7368,1.706657,0.5452,0.567913,0.5452,0.541609


[I 2025-04-06 02:23:49,567] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0013478420121246933, 'weight_decay': 0.008, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5625,1.85737,0.5186,0.553881,0.5186,0.511744
2,1.9402,1.745957,0.5341,0.564674,0.5341,0.53218
3,1.8042,1.713219,0.5386,0.558667,0.5386,0.535735
4,1.7335,1.68707,0.5469,0.565432,0.5469,0.542716
5,1.6763,1.700093,0.5427,0.554239,0.5427,0.534941
6,1.6334,1.682479,0.5476,0.561767,0.5476,0.544315
7,1.597,1.660655,0.5528,0.554199,0.5528,0.545957
8,1.5655,1.650841,0.5573,0.565423,0.5573,0.555341


[I 2025-04-06 02:34:39,596] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.001403395212688073, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5375,1.854202,0.5196,0.555343,0.5196,0.512842
2,1.9373,1.747592,0.5335,0.565505,0.5335,0.531468
3,1.8028,1.71538,0.5385,0.559415,0.5385,0.535693
4,1.7327,1.68933,0.5468,0.566235,0.5468,0.542727
5,1.6754,1.702802,0.542,0.553902,0.542,0.53427
6,1.6322,1.684725,0.5481,0.562788,0.5481,0.544976
7,1.5953,1.662239,0.5528,0.55471,0.5528,0.546194
8,1.5632,1.652253,0.5571,0.565358,0.5571,0.555189


[I 2025-04-06 02:45:33,767] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0002471824952041614, 'weight_decay': 0.001, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4326,2.519855,0.4608,0.472929,0.4608,0.444739
2,2.4514,2.066922,0.5049,0.514276,0.5049,0.49888
3,2.1798,1.926258,0.5128,0.515422,0.5128,0.504385
4,2.0546,1.846089,0.5278,0.532504,0.5278,0.523


[I 2025-04-06 02:51:00,296] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.000672490532363344, 'weight_decay': 0.002, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8056,1.974269,0.5082,0.527336,0.5082,0.500596
2,2.0398,1.777792,0.5366,0.553536,0.5366,0.535264
3,1.8791,1.728207,0.5375,0.549701,0.5375,0.533454
4,1.7982,1.688494,0.5479,0.559726,0.5479,0.544251
5,1.7395,1.691628,0.5447,0.548258,0.5447,0.537389
6,1.6993,1.672916,0.5532,0.559814,0.5532,0.548846
7,1.6686,1.656452,0.5561,0.555607,0.5561,0.549459
8,1.6427,1.650467,0.558,0.56282,0.558,0.555347


[I 2025-04-06 03:01:47,254] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0009731071946911398, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6388,1.889115,0.5142,0.543772,0.5142,0.507573
2,1.969,1.744735,0.5368,0.559852,0.5368,0.535805
3,1.8253,1.708165,0.5398,0.554259,0.5398,0.536284
4,1.7512,1.677915,0.5484,0.562864,0.5484,0.544173
5,1.6943,1.686914,0.5452,0.553334,0.5452,0.537657
6,1.6537,1.670921,0.5518,0.562612,0.5518,0.548268
7,1.6209,1.652547,0.5554,0.55712,0.5554,0.548996
8,1.5928,1.644782,0.5584,0.565062,0.5584,0.556232
9,1.5727,1.627255,0.5582,0.556322,0.5582,0.553158
10,1.5496,1.636132,0.5583,0.558543,0.5583,0.554911


[I 2025-04-06 03:15:22,533] Trial 61 finished with value: 0.5549110736414353 and parameters: {'learning_rate': 0.0009731071946911398, 'weight_decay': 0.004, 'warmup_steps': 8}. Best is trial 23 with value: 0.5563036082256244.


Trial 62 with params: {'learning_rate': 0.0007732308623231164, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7279,1.934656,0.5116,0.533682,0.5116,0.504513
2,2.0078,1.76054,0.5391,0.557817,0.5391,0.537824
3,1.8551,1.717617,0.539,0.55153,0.539,0.535052
4,1.7773,1.681816,0.5478,0.560498,0.5478,0.543785
5,1.7196,1.687317,0.545,0.549735,0.545,0.537624
6,1.6795,1.670149,0.5529,0.560895,0.5529,0.548865
7,1.6483,1.653218,0.5559,0.55637,0.5559,0.549343
8,1.6217,1.646702,0.5588,0.564311,0.5588,0.556371
9,1.6035,1.629753,0.5592,0.556203,0.5592,0.553696
10,1.5836,1.638942,0.5571,0.556836,0.5571,0.553447


[I 2025-04-06 03:28:42,279] Trial 62 finished with value: 0.5534469786840814 and parameters: {'learning_rate': 0.0007732308623231164, 'weight_decay': 0.0, 'warmup_steps': 4}. Best is trial 23 with value: 0.5563036082256244.


Trial 63 with params: {'learning_rate': 0.0009168644484741278, 'weight_decay': 0.005, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6989,1.903477,0.5129,0.540159,0.5129,0.505843
2,1.9809,1.748486,0.5369,0.558569,0.5369,0.535832
3,1.8333,1.710085,0.5391,0.552933,0.5391,0.535397
4,1.7577,1.678373,0.5479,0.561777,0.5479,0.543702
5,1.7004,1.686427,0.5461,0.55318,0.5461,0.538434
6,1.6598,1.670234,0.5528,0.563015,0.5528,0.549197
7,1.6274,1.652292,0.5551,0.556343,0.5551,0.548645
8,1.5997,1.644875,0.5583,0.564836,0.5583,0.556312
9,1.58,1.627493,0.5583,0.556481,0.5583,0.553266
10,1.5578,1.636421,0.558,0.558209,0.558,0.554673


[I 2025-04-06 03:42:20,134] Trial 63 finished with value: 0.554673493743172 and parameters: {'learning_rate': 0.0009168644484741278, 'weight_decay': 0.005, 'warmup_steps': 25}. Best is trial 23 with value: 0.5563036082256244.


Trial 64 with params: {'learning_rate': 0.0019664276292026975, 'weight_decay': 0.003, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5022,1.855061,0.5153,0.559196,0.5153,0.508929
2,1.9455,1.781449,0.5255,0.570472,0.5255,0.523484


[I 2025-04-06 03:45:01,665] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0003472532285159115, 'weight_decay': 0.005, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.234,2.284907,0.4822,0.494868,0.4822,0.470009
2,2.275,1.932293,0.5177,0.529214,0.5177,0.513986
3,2.0507,1.831521,0.524,0.530364,0.524,0.517786
4,1.9457,1.76817,0.5369,0.543283,0.5369,0.532889


[I 2025-04-06 03:50:23,023] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0007795504264807411, 'weight_decay': 0.004, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7714,1.938954,0.5107,0.533721,0.5107,0.503546
2,2.0104,1.761233,0.5392,0.558302,0.5392,0.537925
3,1.8556,1.717639,0.5382,0.551118,0.5382,0.534227
4,1.7771,1.681862,0.548,0.560767,0.548,0.544198
5,1.7191,1.687378,0.5456,0.55066,0.5456,0.538262
6,1.6788,1.670207,0.553,0.56087,0.553,0.548877
7,1.6473,1.653183,0.5563,0.556769,0.5563,0.549746
8,1.6207,1.646616,0.5588,0.564378,0.5588,0.556482
9,1.6022,1.629645,0.5593,0.556464,0.5593,0.55384
10,1.5823,1.638779,0.5573,0.557295,0.5573,0.553801


[I 2025-04-06 04:03:52,660] Trial 66 finished with value: 0.55380140698542 and parameters: {'learning_rate': 0.0007795504264807411, 'weight_decay': 0.004, 'warmup_steps': 26}. Best is trial 23 with value: 0.5563036082256244.


Trial 67 with params: {'learning_rate': 0.0004288128175934878, 'weight_decay': 0.005, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0783,2.158489,0.4912,0.505668,0.4912,0.481462
2,2.1821,1.868181,0.5252,0.538386,0.5252,0.52247
3,1.9844,1.787765,0.5284,0.537523,0.5284,0.52335
4,1.8894,1.733189,0.5407,0.548787,0.5407,0.536939


[I 2025-04-06 04:09:21,364] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0009949902674558943, 'weight_decay': 0.004, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6244,1.884979,0.5156,0.545737,0.5156,0.509074
2,1.9655,1.743806,0.5368,0.560388,0.5368,0.535833
3,1.8228,1.707704,0.5386,0.553451,0.5386,0.535166
4,1.7491,1.677923,0.5491,0.56364,0.5491,0.544818
5,1.6923,1.687249,0.545,0.55329,0.545,0.537462
6,1.6516,1.671325,0.5512,0.562655,0.5512,0.547753
7,1.6187,1.652757,0.5551,0.55677,0.5551,0.548706
8,1.5904,1.644881,0.5577,0.564181,0.5577,0.555501
9,1.5701,1.627256,0.558,0.556095,0.558,0.552953
10,1.5466,1.636114,0.5592,0.559297,0.5592,0.555797


[I 2025-04-06 04:22:40,509] Trial 68 finished with value: 0.5557966760791586 and parameters: {'learning_rate': 0.0009949902674558943, 'weight_decay': 0.004, 'warmup_steps': 5}. Best is trial 23 with value: 0.5563036082256244.


Trial 69 with params: {'learning_rate': 0.0004903167395728889, 'weight_decay': 0.004, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9698,2.088515,0.4974,0.514146,0.4974,0.488897
2,2.1302,1.834274,0.5305,0.545286,0.5305,0.52859


[I 2025-04-06 04:25:23,198] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.004920893904226793, 'weight_decay': 0.006, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.623,2.078392,0.4883,0.560367,0.4883,0.486152
2,2.2436,2.177023,0.4824,0.575363,0.4824,0.483384


[I 2025-04-06 04:28:05,628] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.001157962531644347, 'weight_decay': 0.005, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5777,1.866558,0.5166,0.548759,0.5166,0.509488
2,1.9487,1.741992,0.535,0.561074,0.535,0.533778
3,1.8103,1.707765,0.5378,0.555241,0.5378,0.534669
4,1.7385,1.680521,0.5488,0.565597,0.5488,0.544625
5,1.6817,1.691775,0.5435,0.554056,0.5435,0.53582
6,1.6402,1.675554,0.5488,0.561787,0.5488,0.54563
7,1.6058,1.655597,0.5542,0.555564,0.5542,0.547585
8,1.5761,1.64671,0.5559,0.563223,0.5559,0.553799


[I 2025-04-06 04:38:43,651] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0009854810779434998, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.63,1.886694,0.5151,0.545122,0.5151,0.508595
2,1.9669,1.744154,0.537,0.560405,0.537,0.535995
3,1.8239,1.707889,0.5394,0.553888,0.5394,0.535865
4,1.75,1.677897,0.5489,0.56335,0.5489,0.544623
5,1.6931,1.687082,0.545,0.553268,0.545,0.537504
6,1.6525,1.671135,0.5513,0.562713,0.5513,0.547899
7,1.6196,1.652668,0.5552,0.556958,0.5552,0.548781
8,1.5915,1.644842,0.5579,0.564413,0.5579,0.555699


[I 2025-04-06 04:49:11,079] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2433,3.793987,0.295,0.313008,0.295,0.273984
2,3.5905,3.219809,0.4013,0.417927,0.4013,0.381251
3,3.187,2.874487,0.4277,0.433002,0.4277,0.406903
4,2.9348,2.671237,0.4536,0.455455,0.4536,0.435967


[I 2025-04-06 04:54:25,037] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.000985960570456549, 'weight_decay': 0.003, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6279,1.886367,0.515,0.545007,0.515,0.508516
2,1.9667,1.744083,0.5368,0.560179,0.5368,0.535815
3,1.8238,1.707873,0.5394,0.553834,0.5394,0.535855
4,1.7499,1.677883,0.5489,0.563241,0.5489,0.544606
5,1.693,1.687083,0.5452,0.553406,0.5452,0.537652
6,1.6524,1.671118,0.5514,0.562855,0.5514,0.547978
7,1.6196,1.65266,0.5552,0.556917,0.5552,0.548763
8,1.5914,1.644854,0.5577,0.564225,0.5577,0.5555


[I 2025-04-06 05:05:20,369] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0011258498859330352, 'weight_decay': 0.005, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5775,1.868604,0.5169,0.548544,0.5169,0.509908
2,1.9508,1.741697,0.5354,0.561131,0.5354,0.534189
3,1.812,1.707332,0.5377,0.554379,0.5377,0.534347
4,1.74,1.679669,0.548,0.564337,0.548,0.543753
5,1.6832,1.690577,0.5426,0.552304,0.5426,0.534822
6,1.6419,1.674543,0.5495,0.561878,0.5495,0.546152
7,1.6079,1.654853,0.5548,0.556225,0.5548,0.548188
8,1.5785,1.646201,0.5555,0.562629,0.5555,0.553501


[I 2025-04-06 05:16:18,006] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 5.7423270605816206e-05, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.2437,3.808568,0.2914,0.309654,0.2914,0.271157
2,3.6109,3.247925,0.3979,0.415121,0.3979,0.377868


[I 2025-04-06 05:18:59,689] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0009729643387123101, 'weight_decay': 0.005, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6777,1.893299,0.5134,0.541826,0.5134,0.506616
2,1.9722,1.745721,0.5361,0.55936,0.5361,0.535128
3,1.8267,1.708567,0.5397,0.554081,0.5397,0.536123
4,1.7519,1.678161,0.5489,0.563455,0.5489,0.544594
5,1.6948,1.687054,0.5454,0.553537,0.5454,0.537848
6,1.654,1.67103,0.5516,0.562607,0.5516,0.54812
7,1.6211,1.652638,0.5553,0.556892,0.5553,0.548858
8,1.593,1.644879,0.5586,0.564811,0.5586,0.556338
9,1.5727,1.627327,0.5577,0.555836,0.5577,0.552659
10,1.5496,1.636136,0.5589,0.559182,0.5589,0.555575


[I 2025-04-06 05:32:22,176] Trial 77 finished with value: 0.5555745886783666 and parameters: {'learning_rate': 0.0009729643387123101, 'weight_decay': 0.005, 'warmup_steps': 26}. Best is trial 23 with value: 0.5563036082256244.


Trial 78 with params: {'learning_rate': 0.0010138527029455373, 'weight_decay': 0.005, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6666,1.887311,0.5145,0.543838,0.5145,0.50777
2,1.967,1.744451,0.536,0.559567,0.536,0.534923
3,1.8227,1.707934,0.5381,0.553476,0.5381,0.534721
4,1.7484,1.678362,0.5488,0.563546,0.5488,0.544511
5,1.6914,1.687831,0.5441,0.552647,0.5441,0.53653
6,1.6504,1.671817,0.5505,0.562201,0.5505,0.547111
7,1.6171,1.653104,0.5551,0.556728,0.5551,0.548622
8,1.5886,1.645084,0.5584,0.56471,0.5584,0.556099
9,1.5679,1.627428,0.5576,0.555515,0.5576,0.552447
10,1.5441,1.636127,0.5592,0.559451,0.5592,0.555873


[I 2025-04-06 05:45:48,398] Trial 78 finished with value: 0.5558732287134589 and parameters: {'learning_rate': 0.0010138527029455373, 'weight_decay': 0.005, 'warmup_steps': 28}. Best is trial 23 with value: 0.5563036082256244.


Trial 79 with params: {'learning_rate': 0.001428609983886525, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5499,1.854759,0.5198,0.557024,0.5198,0.512915
2,1.938,1.74879,0.5327,0.565319,0.5327,0.530727
3,1.8031,1.716678,0.5389,0.560378,0.5389,0.536098
4,1.733,1.690594,0.5462,0.565824,0.5462,0.54213


[I 2025-04-06 05:51:18,400] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0009290806936815941, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7048,1.902306,0.5126,0.540378,0.5126,0.505714
2,1.9799,1.748077,0.5361,0.557952,0.5361,0.535023
3,1.8322,1.709799,0.5389,0.552943,0.5389,0.535206
4,1.7566,1.678321,0.5481,0.562061,0.5481,0.543855
5,1.6993,1.686561,0.5462,0.553625,0.5462,0.538586
6,1.6586,1.670402,0.5525,0.562971,0.5525,0.548953
7,1.626,1.652338,0.5547,0.555945,0.5547,0.548209
8,1.5982,1.644861,0.5586,0.56497,0.5586,0.556505
9,1.5784,1.627413,0.558,0.55617,0.558,0.552994
10,1.5559,1.636321,0.5581,0.558262,0.5581,0.554756


[I 2025-04-06 06:04:43,326] Trial 80 finished with value: 0.5547562107313609 and parameters: {'learning_rate': 0.0009290806936815941, 'weight_decay': 0.005, 'warmup_steps': 30}. Best is trial 23 with value: 0.5563036082256244.


Trial 81 with params: {'learning_rate': 0.0003463961586813789, 'weight_decay': 0.001, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2357,2.28648,0.4819,0.494482,0.4819,0.469666
2,2.2761,1.933124,0.5176,0.529102,0.5176,0.513887


[I 2025-04-06 06:07:23,662] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0012315261197628753, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6055,1.865133,0.5174,0.551523,0.5174,0.510552
2,1.9473,1.743637,0.5345,0.563188,0.5345,0.533087
3,1.8084,1.709647,0.5381,0.556722,0.5381,0.535076
4,1.7365,1.682928,0.5482,0.566008,0.5482,0.544095
5,1.6794,1.694891,0.5431,0.554335,0.5431,0.535472
6,1.6372,1.678112,0.5485,0.5618,0.5485,0.545176
7,1.6019,1.657504,0.5535,0.554926,0.5535,0.54673
8,1.5715,1.648222,0.5562,0.563747,0.5562,0.554131


[I 2025-04-06 06:18:07,412] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.001225202721906863, 'weight_decay': 0.005, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6093,1.8658,0.5174,0.551359,0.5174,0.510544
2,1.9479,1.7436,0.5342,0.562827,0.5342,0.532841
3,1.8088,1.709528,0.5381,0.556725,0.5381,0.53507
4,1.7368,1.682717,0.5483,0.565984,0.5483,0.544194
5,1.6796,1.69461,0.5428,0.554016,0.5428,0.535174
6,1.6375,1.677893,0.5486,0.561698,0.5486,0.54523
7,1.6022,1.657357,0.5535,0.555016,0.5535,0.54676
8,1.5719,1.648077,0.556,0.563496,0.556,0.553921


[I 2025-04-06 06:28:47,323] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0005496918029153406, 'weight_decay': 0.005, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9611,2.05249,0.5003,0.518582,0.5003,0.492317
2,2.1001,1.813266,0.5325,0.548323,0.5325,0.530831
3,1.9226,1.750518,0.5324,0.543459,0.5324,0.527891
4,1.8355,1.704247,0.5441,0.553059,0.5441,0.540219


[I 2025-04-06 06:34:05,899] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.000828024960117199, 'weight_decay': 0.005, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.742,1.924409,0.5121,0.536019,0.5121,0.504863
2,1.9985,1.75558,0.5391,0.558948,0.5391,0.537741
3,1.8466,1.714273,0.5395,0.552563,0.5395,0.535538
4,1.7692,1.680061,0.5478,0.561155,0.5478,0.544016
5,1.7116,1.686542,0.546,0.551867,0.546,0.538772
6,1.6712,1.669842,0.5523,0.561439,0.5523,0.548456
7,1.6394,1.652526,0.5562,0.557153,0.5562,0.549729
8,1.6125,1.645684,0.5582,0.564067,0.5582,0.555914
9,1.5936,1.628535,0.5589,0.556309,0.5589,0.553594
10,1.5728,1.637605,0.5578,0.557907,0.5578,0.554321


[I 2025-04-06 06:47:28,899] Trial 85 finished with value: 0.5543210230065554 and parameters: {'learning_rate': 0.000828024960117199, 'weight_decay': 0.005, 'warmup_steps': 25}. Best is trial 23 with value: 0.5563036082256244.


Trial 86 with params: {'learning_rate': 0.001279526423818204, 'weight_decay': 0.006, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5833,1.861191,0.5184,0.552784,0.5184,0.511646
2,1.9437,1.744321,0.5349,0.564549,0.5349,0.533308
3,1.8062,1.710894,0.5372,0.556714,0.5372,0.534317
4,1.7349,1.684509,0.5469,0.565375,0.5469,0.542888


[I 2025-04-06 06:52:44,014] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0006626721866680967, 'weight_decay': 0.007, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.859,1.985909,0.5063,0.52598,0.5063,0.498577
2,2.0481,1.78153,0.5351,0.552365,0.5351,0.533755
3,1.8837,1.73015,0.5371,0.549322,0.5371,0.532974
4,1.8016,1.6897,0.548,0.559295,0.548,0.544289
5,1.7424,1.692439,0.5447,0.548043,0.5447,0.537352
6,1.702,1.673467,0.5525,0.558826,0.5525,0.548054
7,1.6711,1.656975,0.556,0.555431,0.556,0.549378
8,1.6453,1.651049,0.5582,0.562997,0.5582,0.55552


[I 2025-04-06 07:03:26,975] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0009332922647549137, 'weight_decay': 0.002, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6828,1.899227,0.5132,0.541044,0.5132,0.506281
2,1.9774,1.747318,0.5365,0.558558,0.5365,0.535454
3,1.8309,1.709465,0.54,0.554056,0.54,0.536323
4,1.7557,1.678192,0.5484,0.562509,0.5484,0.544148
5,1.6985,1.686538,0.5462,0.553618,0.5462,0.538535
6,1.6579,1.670414,0.5522,0.56272,0.5522,0.548607
7,1.6254,1.652327,0.5552,0.556488,0.5552,0.548727
8,1.5976,1.644834,0.5582,0.564576,0.5582,0.556133
9,1.5778,1.62738,0.5581,0.556238,0.5581,0.553078
10,1.5553,1.636312,0.5579,0.558093,0.5579,0.554564


[I 2025-04-06 07:16:49,571] Trial 88 finished with value: 0.5545638426249121 and parameters: {'learning_rate': 0.0009332922647549137, 'weight_decay': 0.002, 'warmup_steps': 21}. Best is trial 23 with value: 0.5563036082256244.


Trial 89 with params: {'learning_rate': 0.0007369034991700315, 'weight_decay': 0.004, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7977,1.95355,0.5101,0.531274,0.5101,0.502855
2,2.0223,1.767342,0.5379,0.556376,0.5379,0.536592
3,1.8646,1.721394,0.5374,0.550056,0.5374,0.533419
4,1.7849,1.68408,0.5482,0.560461,0.5482,0.544263
5,1.7267,1.688664,0.5449,0.549344,0.5449,0.537631
6,1.6863,1.670968,0.5533,0.560681,0.5533,0.54918
7,1.6551,1.65415,0.5553,0.555673,0.5553,0.548876
8,1.6288,1.647858,0.5592,0.564503,0.5592,0.556749
9,1.6107,1.631016,0.5584,0.555278,0.5584,0.552768
10,1.5915,1.640171,0.5572,0.556734,0.5572,0.553391


[I 2025-04-06 07:30:04,159] Trial 89 finished with value: 0.5533911126463115 and parameters: {'learning_rate': 0.0007369034991700315, 'weight_decay': 0.004, 'warmup_steps': 26}. Best is trial 23 with value: 0.5563036082256244.


Trial 90 with params: {'learning_rate': 0.003779632099489645, 'weight_decay': 0.005, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5368,1.979159,0.4934,0.561862,0.4934,0.491519
2,2.0996,2.000544,0.4991,0.574556,0.4991,0.500481
3,1.9737,1.922954,0.5133,0.558887,0.5133,0.510535
4,1.8954,1.868999,0.5238,0.564448,0.5238,0.523493


[I 2025-04-06 07:35:30,663] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0006644544710681908, 'weight_decay': 0.006, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.842,1.982617,0.5065,0.526031,0.5065,0.498897
2,2.0458,1.780596,0.5355,0.552494,0.5355,0.534138
3,1.8826,1.729686,0.5377,0.549814,0.5377,0.533556
4,1.8008,1.689412,0.5482,0.559691,0.5482,0.544544
5,1.7418,1.692269,0.5447,0.548173,0.5447,0.537403
6,1.7014,1.673352,0.5527,0.55903,0.5527,0.548273
7,1.6706,1.656865,0.556,0.555436,0.556,0.54942
8,1.6448,1.650932,0.5581,0.562949,0.5581,0.555442


[I 2025-04-06 07:46:08,321] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0009103298611202828, 'weight_decay': 0.003, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7131,1.906109,0.5131,0.540117,0.5131,0.506008
2,1.9831,1.749178,0.5371,0.558274,0.5371,0.535908
3,1.8347,1.710422,0.5388,0.552526,0.5388,0.535115
4,1.7587,1.678493,0.5479,0.561609,0.5479,0.543757
5,1.7013,1.686419,0.5458,0.552782,0.5458,0.538135
6,1.6607,1.670191,0.5527,0.562843,0.5527,0.549082
7,1.6283,1.652279,0.5548,0.556019,0.5548,0.548395
8,1.6006,1.644909,0.5585,0.564953,0.5585,0.556481
9,1.5809,1.627511,0.5586,0.556854,0.5586,0.553622
10,1.5588,1.636463,0.5576,0.55768,0.5576,0.554221


[I 2025-04-06 07:59:34,192] Trial 92 finished with value: 0.5542210187104564 and parameters: {'learning_rate': 0.0009103298611202828, 'weight_decay': 0.003, 'warmup_steps': 30}. Best is trial 23 with value: 0.5563036082256244.


Trial 93 with params: {'learning_rate': 0.0006510639168400221, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8201,1.983897,0.5069,0.525734,0.5069,0.499058
2,2.0477,1.782375,0.5356,0.552298,0.5356,0.53422
3,1.8851,1.731122,0.5373,0.548927,0.5373,0.532998
4,1.8035,1.690455,0.5471,0.558393,0.5471,0.543362


[I 2025-04-06 08:04:52,831] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0022686353653773774, 'weight_decay': 0.003, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4583,1.863033,0.5129,0.561356,0.5129,0.507596
2,1.9584,1.806497,0.524,0.572674,0.524,0.522501
3,1.8307,1.77586,0.5287,0.561537,0.5287,0.526894
4,1.7623,1.743187,0.539,0.567548,0.539,0.536848


[I 2025-04-06 08:10:20,320] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0007306299981164373, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7647,1.950659,0.5104,0.530573,0.5104,0.503062
2,2.0207,1.767121,0.5378,0.555663,0.5378,0.536428
3,1.8646,1.721551,0.5383,0.550594,0.5383,0.534328
4,1.7854,1.684181,0.5479,0.560244,0.5479,0.544065
5,1.7273,1.688725,0.545,0.549347,0.545,0.537755
6,1.6872,1.671003,0.553,0.560299,0.553,0.548769
7,1.6562,1.654287,0.5556,0.555802,0.5556,0.549126
8,1.6299,1.648012,0.5589,0.564183,0.5589,0.55648
9,1.612,1.631196,0.559,0.555917,0.559,0.55335
10,1.5929,1.640397,0.5573,0.55692,0.5573,0.55349


[I 2025-04-06 08:24:15,815] Trial 95 finished with value: 0.5534897694936788 and parameters: {'learning_rate': 0.0007306299981164373, 'weight_decay': 0.005, 'warmup_steps': 9}. Best is trial 23 with value: 0.5563036082256244.


Trial 96 with params: {'learning_rate': 0.0009138125484702917, 'weight_decay': 0.006, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7048,1.904622,0.5128,0.53984,0.5128,0.505714
2,1.9819,1.748801,0.537,0.558399,0.537,0.535898
3,1.8339,1.710226,0.539,0.552743,0.539,0.535285
4,1.7581,1.678418,0.548,0.561786,0.548,0.543853
5,1.7008,1.686436,0.546,0.553019,0.546,0.538326
6,1.6602,1.670211,0.5529,0.562964,0.5529,0.54926
7,1.6278,1.65227,0.5549,0.556181,0.5549,0.548483
8,1.6001,1.644901,0.5586,0.565051,0.5586,0.556577
9,1.5804,1.627501,0.5585,0.556717,0.5585,0.553497
10,1.5583,1.636424,0.5577,0.557811,0.5577,0.554328


[I 2025-04-06 08:37:47,463] Trial 96 finished with value: 0.5543281987119283 and parameters: {'learning_rate': 0.0009138125484702917, 'weight_decay': 0.006, 'warmup_steps': 27}. Best is trial 23 with value: 0.5563036082256244.


Trial 97 with params: {'learning_rate': 0.0011732225675101303, 'weight_decay': 0.005, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6007,1.867756,0.5168,0.549352,0.5168,0.509831
2,1.9496,1.742625,0.5347,0.561579,0.5347,0.533479
3,1.8104,1.708273,0.538,0.555704,0.538,0.534872
4,1.7383,1.6811,0.5488,0.565536,0.5488,0.544601
5,1.6814,1.69248,0.5428,0.553363,0.5428,0.535092
6,1.6396,1.676113,0.5487,0.561517,0.5487,0.545451
7,1.6049,1.656038,0.5538,0.555223,0.5538,0.547167
8,1.5751,1.647037,0.5562,0.563598,0.5562,0.554195


[I 2025-04-06 08:48:30,347] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0017196160749035658, 'weight_decay': 0.004, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4795,1.849468,0.5161,0.557544,0.5161,0.509824
2,1.9353,1.763341,0.5282,0.567694,0.5282,0.526233


[I 2025-04-06 08:51:14,650] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00219394185201192, 'weight_decay': 0.006, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4971,1.861807,0.5125,0.559295,0.5125,0.506469
2,1.9569,1.800413,0.524,0.572011,0.524,0.522079
3,1.8275,1.770035,0.5292,0.561062,0.5292,0.527288
4,1.7587,1.738183,0.5403,0.568359,0.5403,0.537971


[I 2025-04-06 08:56:38,101] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0007331645492171186, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7491,1.94773,0.5103,0.531005,0.5103,0.503117
2,2.0188,1.766271,0.5382,0.556169,0.5382,0.536875
3,1.8635,1.721144,0.5386,0.551001,0.5386,0.534731
4,1.7847,1.683905,0.5475,0.559875,0.5475,0.543692
5,1.7267,1.688541,0.5446,0.549057,0.5446,0.537352
6,1.6867,1.670891,0.553,0.560539,0.553,0.548836
7,1.6557,1.654152,0.5556,0.556101,0.5556,0.549243
8,1.6294,1.647913,0.5593,0.564608,0.5593,0.556833
9,1.6115,1.631063,0.5598,0.556697,0.5598,0.554135
10,1.5924,1.64032,0.5579,0.557495,0.5579,0.554104


[I 2025-04-06 09:09:50,728] Trial 100 finished with value: 0.5541036538029214 and parameters: {'learning_rate': 0.0007331645492171186, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 23 with value: 0.5563036082256244.


Trial 101 with params: {'learning_rate': 0.001119666481166137, 'weight_decay': 0.004, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5967,1.870878,0.5172,0.549421,0.5172,0.510282
2,1.9526,1.742131,0.5359,0.561427,0.5359,0.534716
3,1.8129,1.707441,0.5375,0.554307,0.5375,0.534191
4,1.7406,1.679653,0.5478,0.564047,0.5478,0.543636


[I 2025-04-06 09:15:07,766] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0010346762158897815, 'weight_decay': 0.002, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6524,1.883645,0.5151,0.545587,0.5151,0.508364
2,1.9638,1.743778,0.5365,0.560645,0.5365,0.535425
3,1.8205,1.707659,0.5381,0.553664,0.5381,0.53474
4,1.7466,1.678526,0.5486,0.563554,0.5486,0.544211
5,1.6896,1.688263,0.5442,0.552877,0.5442,0.53668
6,1.6486,1.672269,0.55,0.561652,0.55,0.546617
7,1.6152,1.653401,0.5552,0.55689,0.5552,0.548781
8,1.5865,1.645245,0.5578,0.564168,0.5578,0.555516


[I 2025-04-06 09:25:40,535] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0010809798876346172, 'weight_decay': 0.001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6214,1.876277,0.5165,0.548528,0.5165,0.509525
2,1.9573,1.74268,0.5355,0.560129,0.5355,0.534382
3,1.8161,1.707367,0.5378,0.553972,0.5378,0.534463
4,1.7431,1.679057,0.5482,0.563907,0.5482,0.54401
5,1.6862,1.689407,0.5437,0.552828,0.5437,0.536045
6,1.645,1.673371,0.5501,0.56194,0.5501,0.546672
7,1.6112,1.654124,0.5553,0.556996,0.5553,0.548854
8,1.5822,1.64565,0.5575,0.564011,0.5575,0.55527


[I 2025-04-06 09:36:20,578] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0005522339323570141, 'weight_decay': 0.003, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.941,2.047345,0.5003,0.518513,0.5003,0.492395
2,2.0967,1.811623,0.5329,0.548739,0.5329,0.531225
3,1.9208,1.749626,0.5328,0.543781,0.5328,0.52826
4,1.8341,1.703653,0.5445,0.553546,0.5445,0.540634


[I 2025-04-06 09:41:48,756] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0024664488192656544, 'weight_decay': 0.0, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4937,1.874135,0.511,0.561037,0.511,0.506402
2,1.9745,1.826463,0.5215,0.575013,0.5215,0.520195


[I 2025-04-06 09:44:26,337] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0013029008990965954, 'weight_decay': 0.002, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5631,1.85872,0.5181,0.552639,0.5181,0.511249
2,1.9413,1.744593,0.5355,0.564892,0.5355,0.53363
3,1.805,1.711529,0.5372,0.556939,0.5372,0.534362
4,1.7341,1.685241,0.547,0.565246,0.547,0.54282


[I 2025-04-06 09:49:44,679] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0008035470190435819, 'weight_decay': 0.004, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7377,1.929065,0.5122,0.536455,0.5122,0.505178
2,2.0025,1.757707,0.5394,0.558755,0.5394,0.538085
3,1.8502,1.715653,0.5389,0.551986,0.5389,0.534965
4,1.7727,1.680773,0.5479,0.561046,0.5479,0.543992
5,1.715,1.68682,0.5458,0.551296,0.5458,0.538528
6,1.6747,1.669932,0.5523,0.561163,0.5523,0.548429
7,1.6432,1.652791,0.5566,0.557236,0.5566,0.550081
8,1.6164,1.646089,0.5589,0.564595,0.5589,0.556637
9,1.5978,1.629032,0.5592,0.556477,0.5592,0.553831
10,1.5775,1.638145,0.5575,0.557464,0.5575,0.554033


[I 2025-04-06 10:02:56,614] Trial 107 finished with value: 0.554033201039822 and parameters: {'learning_rate': 0.0008035470190435819, 'weight_decay': 0.004, 'warmup_steps': 17}. Best is trial 23 with value: 0.5563036082256244.


Trial 108 with params: {'learning_rate': 0.000723038535948288, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7913,1.956572,0.5099,0.530562,0.5099,0.502623
2,2.025,1.769104,0.5379,0.555842,0.5379,0.536571
3,1.8671,1.722607,0.5377,0.5502,0.5377,0.533727
4,1.7874,1.684819,0.5482,0.560263,0.5482,0.544429
5,1.7291,1.689145,0.545,0.549293,0.545,0.537721
6,1.6889,1.671262,0.5533,0.560355,0.5533,0.549042
7,1.6578,1.654538,0.5556,0.555929,0.5556,0.549199
8,1.6316,1.648312,0.5593,0.564315,0.5593,0.556795
9,1.6136,1.63152,0.559,0.555811,0.559,0.553306
10,1.5946,1.640703,0.5569,0.55643,0.5569,0.553069


[I 2025-04-06 10:16:33,708] Trial 108 finished with value: 0.5530693177239993 and parameters: {'learning_rate': 0.000723038535948288, 'weight_decay': 0.006, 'warmup_steps': 19}. Best is trial 23 with value: 0.5563036082256244.


Trial 109 with params: {'learning_rate': 0.0004258560209617247, 'weight_decay': 0.001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0889,2.163564,0.4911,0.505579,0.4911,0.481269
2,2.1856,1.870367,0.5249,0.538015,0.5249,0.522066
3,1.9867,1.789153,0.5283,0.537086,0.5283,0.523156
4,1.8913,1.734251,0.5409,0.549139,0.5409,0.537223


[I 2025-04-06 10:21:51,841] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0007439124775250257, 'weight_decay': 0.005, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7955,1.951344,0.5099,0.531458,0.5099,0.502733
2,2.0205,1.76632,0.538,0.556655,0.538,0.536728
3,1.8631,1.720772,0.5377,0.550324,0.5377,0.533671
4,1.7836,1.683666,0.5481,0.560594,0.5481,0.544213
5,1.7254,1.688419,0.5452,0.549648,0.5452,0.537894
6,1.6851,1.670821,0.5533,0.560773,0.5533,0.549164
7,1.6538,1.653973,0.5552,0.555429,0.5552,0.548709
8,1.6274,1.647636,0.5591,0.564404,0.5591,0.556658
9,1.6092,1.630741,0.5583,0.555167,0.5583,0.552646
10,1.5899,1.639916,0.557,0.556466,0.557,0.553168


[I 2025-04-06 10:35:17,917] Trial 110 finished with value: 0.5531676792449695 and parameters: {'learning_rate': 0.0007439124775250257, 'weight_decay': 0.005, 'warmup_steps': 27}. Best is trial 23 with value: 0.5563036082256244.


Trial 111 with params: {'learning_rate': 0.00044811859470369965, 'weight_decay': 0.01, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0511,2.136405,0.4943,0.509312,0.4943,0.484835
2,2.1654,1.856924,0.5268,0.540402,0.5268,0.524321


[I 2025-04-06 10:37:58,316] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.001577595782113658, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5095,1.850728,0.5168,0.556756,0.5168,0.510283
2,1.935,1.75533,0.5313,0.566883,0.5313,0.529391
3,1.8026,1.724335,0.5363,0.559577,0.5363,0.5335
4,1.7336,1.697859,0.5474,0.569225,0.5474,0.543662
5,1.6757,1.712514,0.5389,0.551266,0.5389,0.531131
6,1.6312,1.692324,0.5469,0.563135,0.5469,0.544083
7,1.5923,1.667839,0.5508,0.553244,0.5508,0.544405
8,1.5584,1.65735,0.5567,0.565526,0.5567,0.554938


[I 2025-04-06 10:48:33,017] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0005798888281166415, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9009,2.026495,0.5023,0.520352,0.5023,0.494372
2,2.0808,1.802029,0.5341,0.550418,0.5341,0.532496


[I 2025-04-06 10:51:10,950] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0001998398959947355, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5617,2.693976,0.4437,0.456102,0.4437,0.425601
2,2.5858,2.177856,0.4975,0.506847,0.4975,0.48999
3,2.2799,2.005504,0.5041,0.505797,0.5041,0.494548
4,2.138,1.911688,0.5215,0.52503,0.5215,0.515667


[I 2025-04-06 10:56:29,850] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0002654675046535602, 'weight_decay': 0.004, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3764,2.46045,0.4667,0.477738,0.4667,0.451445
2,2.4078,2.03321,0.5079,0.517382,0.5079,0.502376


[I 2025-04-06 10:59:07,583] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0016167937400614625, 'weight_decay': 0.006, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4934,1.849693,0.5171,0.558593,0.5171,0.510754
2,1.9343,1.757243,0.5299,0.566415,0.5299,0.527993
3,1.8028,1.726502,0.5354,0.5592,0.5354,0.532691
4,1.734,1.699841,0.5467,0.56852,0.5467,0.542881


[I 2025-04-06 11:04:28,991] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00012486032116326294, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8917,3.15763,0.3971,0.420019,0.3971,0.375452
2,2.9624,2.521046,0.4692,0.480629,0.4692,0.457036


[I 2025-04-06 11:07:01,753] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0011655686421056554, 'weight_decay': 0.006, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6093,1.869018,0.5164,0.549069,0.5164,0.509508
2,1.9508,1.742707,0.5349,0.561795,0.5349,0.533761
3,1.8111,1.708198,0.5378,0.555343,0.5378,0.534612
4,1.7388,1.680931,0.5488,0.565527,0.5488,0.54465
5,1.6818,1.692209,0.5431,0.553682,0.5431,0.535416
6,1.6401,1.675892,0.549,0.561688,0.549,0.545728
7,1.6055,1.655872,0.5539,0.555354,0.5539,0.547263
8,1.5757,1.646912,0.5561,0.563338,0.5561,0.554017


[I 2025-04-06 11:17:55,749] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0007074901451607933, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7647,1.95723,0.5096,0.529715,0.5096,0.502314
2,2.0266,1.770533,0.5365,0.553767,0.5365,0.535195
3,1.8695,1.723805,0.5381,0.550849,0.5381,0.534319
4,1.79,1.68559,0.5473,0.559628,0.5473,0.543503
5,1.7318,1.689643,0.5451,0.549205,0.5451,0.537981
6,1.6917,1.671571,0.5533,0.560461,0.5533,0.549103
7,1.6609,1.654967,0.5557,0.555884,0.5557,0.549225
8,1.6348,1.648858,0.5587,0.563773,0.5587,0.556175
9,1.6171,1.632109,0.5596,0.556393,0.5596,0.553889
10,1.5984,1.641389,0.5576,0.55716,0.5576,0.553774


[I 2025-04-06 11:31:27,326] Trial 119 finished with value: 0.5537736801894646 and parameters: {'learning_rate': 0.0007074901451607933, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 23 with value: 0.5563036082256244.


Trial 120 with params: {'learning_rate': 0.000891474684930734, 'weight_decay': 0.004, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.715,1.909351,0.5135,0.540177,0.5135,0.506483
2,1.9859,1.750275,0.5375,0.558275,0.5375,0.536291
3,1.837,1.711075,0.5394,0.552675,0.5394,0.535608
4,1.7608,1.678698,0.5482,0.561766,0.5482,0.544061
5,1.7034,1.686344,0.5459,0.552476,0.5459,0.538267
6,1.6628,1.670018,0.5529,0.562759,0.5529,0.549221
7,1.6306,1.652237,0.5549,0.556237,0.5549,0.548497
8,1.6031,1.645009,0.5576,0.563882,0.5576,0.555546


[I 2025-04-06 11:42:06,853] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.002345307746363271, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4734,1.867254,0.5121,0.55928,0.5121,0.506886
2,1.9644,1.814019,0.5241,0.575342,0.5241,0.52286
3,1.8364,1.782957,0.5273,0.561232,0.5273,0.525623
4,1.7676,1.749356,0.5376,0.567815,0.5376,0.535925


[I 2025-04-06 11:47:25,851] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0007327949445088886, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7592,1.949274,0.5105,0.53086,0.5105,0.503143
2,2.0197,1.766647,0.5383,0.556215,0.5383,0.536973
3,1.8639,1.72131,0.5386,0.550914,0.5386,0.534601
4,1.7849,1.683994,0.5478,0.560117,0.5478,0.543943


[I 2025-04-06 11:52:49,357] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.001430855806898148, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5369,1.853739,0.5197,0.556868,0.5197,0.512844
2,1.937,1.748689,0.5326,0.565298,0.5326,0.530678
3,1.8026,1.71667,0.5386,0.560066,0.5386,0.535789
4,1.7327,1.690596,0.5469,0.567063,0.5469,0.542861


[I 2025-04-06 11:58:16,325] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0038399418051914304, 'weight_decay': 0.01, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5263,1.983842,0.4937,0.564062,0.4937,0.492737
2,2.105,2.008447,0.498,0.575082,0.498,0.499443


[I 2025-04-06 12:00:59,069] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0008593600121938434, 'weight_decay': 0.007, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7126,1.914711,0.5125,0.538095,0.5125,0.505595
2,1.9905,1.752301,0.5384,0.558349,0.5384,0.536969
3,1.841,1.712371,0.5397,0.552854,0.5397,0.535881
4,1.7645,1.679166,0.5479,0.561646,0.5479,0.54398
5,1.7071,1.68629,0.5451,0.551701,0.5451,0.537823
6,1.6667,1.669812,0.5529,0.562611,0.5529,0.549162
7,1.6348,1.652287,0.5552,0.556224,0.5552,0.548753
8,1.6076,1.645258,0.5577,0.563632,0.5577,0.555441


[I 2025-04-06 12:11:53,488] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.001610525487083141, 'weight_decay': 0.007, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5197,1.851489,0.5171,0.557484,0.5171,0.510562
2,1.9362,1.757301,0.5299,0.566939,0.5299,0.528186


[I 2025-04-06 12:14:39,016] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0012498474127265824, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5686,1.861107,0.5183,0.551714,0.5183,0.511405
2,1.9435,1.743346,0.5351,0.564259,0.5351,0.533618
3,1.8065,1.709859,0.5384,0.557231,0.5384,0.535425
4,1.7353,1.683305,0.548,0.565754,0.548,0.543851
5,1.6783,1.695501,0.5428,0.553914,0.5428,0.535058
6,1.6362,1.678722,0.5487,0.561778,0.5487,0.545337
7,1.6008,1.657884,0.553,0.554122,0.553,0.546088
8,1.5703,1.648506,0.5555,0.563044,0.5555,0.553444


[I 2025-04-06 12:25:32,443] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0009597163685943792, 'weight_decay': 0.005, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6482,1.891816,0.5129,0.541615,0.5129,0.50632
2,1.9712,1.745396,0.537,0.559814,0.537,0.536134
3,1.8269,1.708499,0.5392,0.553407,0.5392,0.535601
4,1.7525,1.677934,0.5484,0.562653,0.5484,0.54416
5,1.6956,1.686737,0.5459,0.553894,0.5459,0.538259
6,1.655,1.670716,0.552,0.562842,0.552,0.548456
7,1.6223,1.652457,0.5553,0.556943,0.5553,0.548863
8,1.5944,1.644768,0.5584,0.564957,0.5584,0.556301
9,1.5744,1.627273,0.5581,0.556221,0.5581,0.553071
10,1.5515,1.636181,0.558,0.55821,0.558,0.554584


[I 2025-04-06 12:38:59,548] Trial 128 finished with value: 0.5545844517939621 and parameters: {'learning_rate': 0.0009597163685943792, 'weight_decay': 0.005, 'warmup_steps': 10}. Best is trial 23 with value: 0.5563036082256244.


Trial 129 with params: {'learning_rate': 0.0005871225897499905, 'weight_decay': 0.006, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8771,2.01896,0.503,0.521131,0.503,0.495109
2,2.0755,1.799145,0.5333,0.549913,0.5333,0.531755


[I 2025-04-06 12:41:43,352] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0009200029571290606, 'weight_decay': 0.004, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6691,1.899644,0.5136,0.541161,0.5136,0.506696
2,1.9779,1.747548,0.5364,0.557982,0.5364,0.535321
3,1.8319,1.709691,0.5403,0.553936,0.5403,0.536558
4,1.7568,1.678175,0.548,0.561901,0.548,0.543807


[I 2025-04-06 12:46:58,630] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0008194068717653686, 'weight_decay': 0.004, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7077,1.921876,0.5133,0.537151,0.5133,0.506279
2,1.997,1.755311,0.5393,0.558857,0.5393,0.537953
3,1.8467,1.714365,0.5397,0.552472,0.5397,0.535813
4,1.7698,1.680063,0.5477,0.560918,0.5477,0.543781
5,1.7123,1.686492,0.5462,0.551998,0.5462,0.538991
6,1.6722,1.669767,0.5532,0.56237,0.5532,0.549384
7,1.6406,1.652551,0.5566,0.557438,0.5566,0.5501
8,1.6137,1.64575,0.558,0.563992,0.558,0.5558
9,1.5951,1.62865,0.5594,0.556875,0.5594,0.55413
10,1.5745,1.637774,0.5577,0.557865,0.5577,0.554313


[I 2025-04-06 13:00:37,079] Trial 131 finished with value: 0.5543125711890468 and parameters: {'learning_rate': 0.0008194068717653686, 'weight_decay': 0.004, 'warmup_steps': 7}. Best is trial 23 with value: 0.5563036082256244.


Trial 132 with params: {'learning_rate': 0.001520605327278336, 'weight_decay': 0.005, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5131,1.851113,0.5188,0.557712,0.5188,0.512244
2,1.9349,1.752428,0.5325,0.566391,0.5325,0.530548


[I 2025-04-06 13:03:18,366] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 8.153014791034117e-05, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1123,3.546494,0.3452,0.371238,0.3452,0.322727
2,3.3298,2.912759,0.433,0.448723,0.433,0.415967
3,2.9109,2.584352,0.4561,0.456278,0.4561,0.437579
4,2.6746,2.403954,0.4737,0.475369,0.4737,0.45961


[I 2025-04-06 13:08:32,587] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0012644663654047474, 'weight_decay': 0.005, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5999,1.863215,0.5182,0.552449,0.5182,0.511455
2,1.9456,1.744252,0.5353,0.564662,0.5353,0.533711
3,1.8072,1.710547,0.538,0.557227,0.538,0.535088
4,1.7356,1.68403,0.5479,0.566163,0.5479,0.543867


[I 2025-04-06 13:13:54,918] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0006070964815669602, 'weight_decay': 0.006, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8696,2.009065,0.504,0.522538,0.504,0.496083
2,2.0673,1.793867,0.5343,0.551162,0.5343,0.532848
3,1.8995,1.738335,0.5359,0.547525,0.5359,0.531552
4,1.8159,1.695464,0.5453,0.555278,0.5453,0.541326


[I 2025-04-06 13:19:16,985] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0016228517224790007, 'weight_decay': 0.005, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5141,1.851129,0.5172,0.558189,0.5172,0.510757
2,1.9359,1.757924,0.5294,0.566461,0.5294,0.52759


[I 2025-04-06 13:21:53,346] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0013343380445389036, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5234,1.854718,0.5209,0.555959,0.5209,0.514151
2,1.938,1.744996,0.5349,0.564672,0.5349,0.532902
3,1.8033,1.71215,0.5375,0.557254,0.5375,0.534611
4,1.7331,1.686142,0.5473,0.566197,0.5473,0.543147
5,1.676,1.698997,0.5418,0.553618,0.5418,0.534087
6,1.6334,1.681798,0.5485,0.562488,0.5485,0.545194
7,1.5974,1.660013,0.5529,0.554732,0.5529,0.54619
8,1.566,1.650386,0.5562,0.564244,0.5562,0.554311


[I 2025-04-06 13:32:45,208] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0011027846762588017, 'weight_decay': 0.001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5957,1.872025,0.5173,0.548972,0.5173,0.510374
2,1.9537,1.74206,0.5358,0.561055,0.5358,0.534641
3,1.8139,1.707268,0.5378,0.554104,0.5378,0.534452
4,1.7414,1.679314,0.5481,0.564019,0.5481,0.543889
5,1.6846,1.689955,0.5432,0.552208,0.5432,0.535376
6,1.6434,1.673925,0.55,0.562086,0.55,0.546635
7,1.6095,1.654477,0.5551,0.556732,0.5551,0.548569
8,1.5803,1.64587,0.5567,0.56348,0.5567,0.554601


[I 2025-04-06 13:43:29,501] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0016720006902773905, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.517,1.851552,0.5167,0.557815,0.5167,0.510227
2,1.937,1.76087,0.5287,0.567469,0.5287,0.526688


[I 2025-04-06 13:46:15,017] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0007351298771355616, 'weight_decay': 0.008, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7922,1.95325,0.5098,0.530852,0.5098,0.502561
2,2.0222,1.767407,0.538,0.556493,0.538,0.536655
3,1.8647,1.721498,0.5377,0.550317,0.5377,0.533726
4,1.7851,1.68413,0.548,0.560351,0.548,0.544103
5,1.7269,1.688698,0.5451,0.549522,0.5451,0.537854
6,1.6866,1.670996,0.5536,0.56091,0.5536,0.549464
7,1.6554,1.65419,0.5553,0.555675,0.5553,0.548885
8,1.6291,1.647904,0.5592,0.564426,0.5592,0.556715
9,1.6111,1.631072,0.5584,0.555374,0.5584,0.55279
10,1.5919,1.640254,0.5571,0.556608,0.5571,0.553267


[I 2025-04-06 13:59:34,443] Trial 140 finished with value: 0.5532669899623425 and parameters: {'learning_rate': 0.0007351298771355616, 'weight_decay': 0.008, 'warmup_steps': 23}. Best is trial 23 with value: 0.5563036082256244.


Trial 141 with params: {'learning_rate': 0.001280999683203616, 'weight_decay': 0.006, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5917,1.861877,0.5184,0.553075,0.5184,0.511581
2,1.9444,1.744519,0.5348,0.564499,0.5348,0.533196
3,1.8065,1.711024,0.5371,0.556668,0.5371,0.53425
4,1.7351,1.684597,0.5475,0.565972,0.5475,0.543463


[I 2025-04-06 14:04:52,288] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0005089443712932172, 'weight_decay': 0.007, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9942,2.082109,0.4984,0.515365,0.4984,0.489748
2,2.1233,1.828444,0.5304,0.545576,0.5304,0.528494
3,1.9403,1.760668,0.5333,0.543726,0.5333,0.528562
4,1.851,1.711957,0.5435,0.552402,0.5435,0.539715


[I 2025-04-06 14:10:12,791] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0017667923731141764, 'weight_decay': 0.007, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5142,1.852194,0.515,0.557785,0.515,0.508741
2,1.9391,1.766953,0.5272,0.569274,0.5272,0.525314


[I 2025-04-06 14:12:48,192] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.000531164858035556, 'weight_decay': 0.004, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9652,2.063305,0.4997,0.516746,0.4997,0.491384
2,2.109,1.819384,0.5316,0.547226,0.5316,0.529796
3,1.9298,1.754721,0.5331,0.543883,0.5331,0.528558
4,1.842,1.707464,0.5444,0.553283,0.5444,0.540585


[I 2025-04-06 14:18:03,420] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0009535265448457227, 'weight_decay': 0.006, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6923,1.897449,0.5133,0.541172,0.5133,0.506395
2,1.9758,1.746752,0.5354,0.557869,0.5354,0.534289
3,1.8292,1.709075,0.5395,0.553721,0.5395,0.535871
4,1.754,1.678219,0.5482,0.562482,0.5482,0.543959


[I 2025-04-06 14:23:18,518] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.000888926562716924, 'weight_decay': 0.007, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7116,1.909387,0.5132,0.539765,0.5132,0.506182
2,1.986,1.750345,0.5375,0.55866,0.5375,0.536405
3,1.8371,1.711153,0.5395,0.552851,0.5395,0.535727
4,1.761,1.678709,0.548,0.561641,0.548,0.543939
5,1.7036,1.686324,0.5457,0.552143,0.5457,0.538085
6,1.6631,1.670006,0.5529,0.562745,0.5529,0.549231
7,1.6309,1.652248,0.5547,0.55597,0.5547,0.548241
8,1.6034,1.645011,0.5577,0.563904,0.5577,0.555612
9,1.584,1.627691,0.5583,0.556208,0.5583,0.553192
10,1.5622,1.636675,0.5575,0.557659,0.5575,0.554092


[I 2025-04-06 14:36:48,383] Trial 146 finished with value: 0.5540919585206836 and parameters: {'learning_rate': 0.000888926562716924, 'weight_decay': 0.007, 'warmup_steps': 25}. Best is trial 23 with value: 0.5563036082256244.


Trial 147 with params: {'learning_rate': 0.0005305781275245706, 'weight_decay': 0.004, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9833,2.067219,0.4991,0.516276,0.4991,0.49064
2,2.1114,1.820399,0.5313,0.546715,0.5313,0.529446


[I 2025-04-06 14:39:22,089] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0010941210159963052, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5984,1.872946,0.5169,0.548535,0.5169,0.509881
2,1.9545,1.742102,0.536,0.560894,0.536,0.534916
3,1.8145,1.707214,0.5378,0.554071,0.5378,0.534473
4,1.7419,1.679145,0.5484,0.564194,0.5484,0.544165
5,1.6851,1.689675,0.5433,0.552345,0.5433,0.535558
6,1.644,1.673656,0.5501,0.562046,0.5501,0.546697
7,1.6102,1.654312,0.5551,0.556733,0.5551,0.548578
8,1.581,1.645759,0.5568,0.56343,0.5568,0.554627


[I 2025-04-06 14:50:16,288] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0006931015834947695, 'weight_decay': 0.006, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8275,1.97086,0.5083,0.528053,0.5083,0.500596
2,2.0363,1.774988,0.5358,0.553391,0.5358,0.534527
3,1.8751,1.726127,0.5382,0.550719,0.5382,0.534196
4,1.7942,1.687042,0.5478,0.559656,0.5478,0.544088
5,1.7355,1.690608,0.5449,0.548815,0.5449,0.53755
6,1.6951,1.672207,0.5527,0.559728,0.5527,0.548505
7,1.6642,1.655625,0.5554,0.555574,0.5554,0.548905
8,1.6381,1.649549,0.5584,0.563268,0.5584,0.555796
9,1.6203,1.632837,0.5592,0.555835,0.5592,0.553361
10,1.6018,1.642009,0.5568,0.556152,0.5568,0.552872


[I 2025-04-06 15:03:47,471] Trial 149 finished with value: 0.5528721344486837 and parameters: {'learning_rate': 0.0006931015834947695, 'weight_decay': 0.006, 'warmup_steps': 26}. Best is trial 23 with value: 0.5563036082256244.


In [27]:
print(best_base_head)

BestRun(run_id='23', objective=0.5563036082256244, hyperparameters={'learning_rate': 0.0010450619114319906, 'weight_decay': 0.005, 'warmup_steps': 9}, run_summary=None)


In [28]:
base.reset_seed()

## Prohledávání s destilací s doučením klasifikační hlavy předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [29]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head-KD_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.


In [30]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [31]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [32]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(100))
)

Nastavení prohledávání.

In [33]:
best_distill_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-06 15:03:48,219] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1843,2.575029,0.4648,0.482607,0.4648,0.448079
2,2.5131,2.288072,0.5094,0.523166,0.5094,0.500863


[I 2025-04-06 15:06:24,659] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6494,3.211404,0.3692,0.413381,0.3692,0.350376
2,2.997,2.748021,0.447,0.482306,0.447,0.434754
3,2.7223,2.545853,0.4673,0.482737,0.4673,0.45147
4,2.5858,2.438436,0.4836,0.49421,0.4836,0.470674


[I 2025-04-06 15:11:48,358] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8606,3.595057,0.2766,0.328943,0.2766,0.263657
2,3.3686,3.171144,0.3777,0.43117,0.3777,0.362715


[I 2025-04-06 15:14:33,637] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5864,3.118317,0.3867,0.424244,0.3867,0.367425
2,2.9191,2.668497,0.4579,0.489765,0.4579,0.445939
3,2.6591,2.483768,0.476,0.487718,0.476,0.4609
4,2.5338,2.385017,0.491,0.501028,0.491,0.478947


[I 2025-04-06 15:19:54,171] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7419,2.237232,0.5108,0.53569,0.5108,0.501369
2,2.2985,2.113336,0.5345,0.55202,0.5345,0.528506
3,2.2176,2.085725,0.5351,0.548099,0.5351,0.529025
4,2.1804,2.05366,0.5436,0.552264,0.5436,0.537308


[I 2025-04-06 15:25:14,266] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.621,2.178011,0.5237,0.550126,0.5237,0.516562
2,2.3004,2.131871,0.5334,0.564431,0.5334,0.530013
3,2.241,2.093284,0.5351,0.556476,0.5351,0.527781
4,2.2101,2.064809,0.5433,0.564353,0.5433,0.541082
5,2.1805,2.07572,0.5398,0.54619,0.5398,0.527832
6,2.1561,2.050431,0.5485,0.563317,0.5485,0.543632
7,2.1332,2.024572,0.5562,0.56023,0.5562,0.54955
8,2.1125,2.01402,0.5556,0.561865,0.5556,0.550036


[I 2025-04-06 15:36:06,440] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7373,2.238625,0.5109,0.535225,0.5109,0.501456
2,2.2994,2.113935,0.535,0.552652,0.535,0.529197
3,2.2182,2.086066,0.5353,0.548484,0.5353,0.529381
4,2.1808,2.053954,0.544,0.552517,0.544,0.537662


[I 2025-04-06 15:41:25,720] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6008,2.178602,0.5232,0.552285,0.5232,0.51652
2,2.3101,2.141065,0.5306,0.565388,0.5306,0.527641
3,2.2533,2.098928,0.5352,0.55733,0.5352,0.528254
4,2.2223,2.070603,0.5433,0.565881,0.5433,0.541268
5,2.1917,2.081161,0.5395,0.547131,0.5395,0.527719
6,2.1659,2.055438,0.5466,0.562301,0.5466,0.541657
7,2.1414,2.027412,0.5552,0.560109,0.5552,0.548652
8,2.1186,2.01599,0.555,0.56158,0.555,0.549421


[I 2025-04-06 15:52:08,229] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6873,3.299731,0.3538,0.398627,0.3538,0.336361
2,3.0815,2.842435,0.4349,0.476068,0.4349,0.421794
3,2.8007,2.626016,0.4553,0.472103,0.4553,0.43836
4,2.6539,2.509852,0.4729,0.484401,0.4729,0.459022


[I 2025-04-06 15:57:30,449] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6979,2.211345,0.5175,0.541543,0.5175,0.508715
2,2.2868,2.106978,0.5336,0.553998,0.5336,0.527982


[I 2025-04-06 16:00:07,798] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6565,2.231266,0.5126,0.546913,0.5126,0.505248
2,2.418,2.206879,0.5152,0.554413,0.5152,0.510965
3,2.3632,2.161523,0.522,0.549023,0.522,0.514978
4,2.3273,2.110779,0.5359,0.558868,0.5359,0.532918
5,2.2829,2.127044,0.528,0.549156,0.528,0.518082
6,2.2431,2.082278,0.5409,0.558608,0.5409,0.534325
7,2.2055,2.050366,0.5483,0.558721,0.5483,0.543078
8,2.1666,2.036999,0.5522,0.561021,0.5522,0.545499


[I 2025-04-06 16:10:47,939] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0023774407201803105, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5998,2.185242,0.5212,0.552308,0.5212,0.514588
2,2.3287,2.151972,0.5255,0.560121,0.5255,0.52186


[I 2025-04-06 16:13:23,775] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.002376024890572026, 'weight_decay': 0.001, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6109,2.186328,0.5198,0.550645,0.5198,0.513272
2,2.3291,2.152442,0.5256,0.560062,0.5256,0.521954
3,2.2743,2.110023,0.531,0.555587,0.531,0.524144
4,2.2427,2.079614,0.5427,0.56601,0.5427,0.540905
5,2.2099,2.089396,0.5388,0.548973,0.5388,0.527417
6,2.1816,2.062729,0.5446,0.562453,0.5446,0.539878
7,2.1545,2.032026,0.5525,0.558961,0.5525,0.546176
8,2.1284,2.019786,0.5539,0.560811,0.5539,0.547888


[I 2025-04-06 16:24:02,121] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.003064104261670614, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6298,2.213461,0.5133,0.550389,0.5133,0.506857
2,2.3788,2.17628,0.5229,0.555739,0.5229,0.518574


[I 2025-04-06 16:26:45,940] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.003645100232010343, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6675,2.235353,0.5121,0.546977,0.5121,0.504539
2,2.4259,2.213633,0.5142,0.555727,0.5142,0.510284
3,2.3708,2.166213,0.5206,0.547877,0.5206,0.513388
4,2.3343,2.114013,0.5352,0.557945,0.5352,0.53175


[I 2025-04-06 16:31:59,078] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0003173012733215097, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0924,2.506249,0.4737,0.489321,0.4737,0.457947
2,2.4694,2.252324,0.5141,0.526167,0.5141,0.506033


[I 2025-04-06 16:34:36,202] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0007549727386624846, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7622,2.251497,0.5085,0.531671,0.5085,0.49877
2,2.3067,2.119044,0.534,0.550123,0.534,0.528151
3,2.2224,2.08923,0.5368,0.548318,0.5368,0.530549
4,2.1837,2.056424,0.5441,0.552593,0.5441,0.537683
5,2.1528,2.064205,0.5421,0.544227,0.5421,0.530797
6,2.1332,2.047974,0.5508,0.558263,0.5508,0.54418
7,2.1168,2.032084,0.5546,0.5542,0.5546,0.5461
8,2.104,2.021628,0.5552,0.558969,0.5552,0.550154
9,2.0925,2.013932,0.5557,0.553946,0.5557,0.547991
10,2.0794,2.020549,0.5541,0.553466,0.5541,0.548508


[I 2025-04-06 16:48:13,597] Trial 16 finished with value: 0.5485078304447538 and parameters: {'learning_rate': 0.0007549727386624846, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 16 with value: 0.5485078304447538.


Trial 17 with params: {'learning_rate': 0.0005427978294491535, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8703,2.319849,0.4977,0.516987,0.4977,0.486237
2,2.3489,2.151836,0.529,0.541765,0.529,0.522867
3,2.2497,2.110524,0.5324,0.542307,0.5324,0.525385
4,2.2044,2.073552,0.5429,0.549579,0.5429,0.5361


[I 2025-04-06 16:53:34,473] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0016751020144302176, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6126,2.177149,0.5241,0.549358,0.5241,0.517405
2,2.292,2.121442,0.534,0.561004,0.534,0.529858
3,2.2305,2.088442,0.536,0.556641,0.536,0.529254
4,2.1996,2.05995,0.5441,0.564274,0.5441,0.541395
5,2.1707,2.070512,0.541,0.546684,0.541,0.529037
6,2.1477,2.046141,0.5481,0.560738,0.5481,0.542818
7,2.1262,2.0223,0.5563,0.558935,0.5563,0.54933
8,2.1072,2.012522,0.556,0.562108,0.556,0.550715


[I 2025-04-06 17:04:17,039] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0037270936015570953, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6544,2.237303,0.5119,0.548025,0.5119,0.504078
2,2.4313,2.218105,0.5136,0.556524,0.5136,0.509833
3,2.3761,2.169445,0.5197,0.547479,0.5197,0.512535
4,2.3394,2.116493,0.5347,0.557693,0.5347,0.531169


[I 2025-04-06 17:09:41,789] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0009636177397386483, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7124,2.2203,0.5146,0.538082,0.5146,0.505385
2,2.2901,2.108384,0.5343,0.553503,0.5343,0.528636
3,2.2137,2.082787,0.537,0.552846,0.537,0.530884
4,2.1783,2.05167,0.5452,0.556361,0.5452,0.539467
5,2.1486,2.059867,0.5429,0.545451,0.5429,0.531079
6,2.129,2.042624,0.5503,0.559094,0.5503,0.543758
7,2.1118,2.025149,0.5558,0.556277,0.5558,0.547604
8,2.0981,2.015166,0.5566,0.561835,0.5566,0.551854


[I 2025-04-06 17:20:32,884] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00026350562892114474, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1861,2.597186,0.4617,0.477832,0.4617,0.444561
2,2.53,2.304005,0.5056,0.518132,0.5056,0.496477
3,2.3729,2.216392,0.5119,0.518887,0.5119,0.501732
4,2.3033,2.158407,0.5287,0.534042,0.5287,0.520703


[I 2025-04-06 17:25:51,274] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0006773029065026066, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8164,2.273969,0.5062,0.528802,0.5062,0.49569
2,2.3198,2.128494,0.5323,0.546357,0.5323,0.526088
3,2.2302,2.095044,0.536,0.546886,0.536,0.529401
4,2.1892,2.061136,0.5441,0.551598,0.5441,0.537467


[I 2025-04-06 17:31:09,555] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0018746308241276409, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6112,2.177171,0.5237,0.550438,0.5237,0.516578
2,2.3006,2.132209,0.5329,0.564183,0.5329,0.529565
3,2.2416,2.093486,0.5352,0.556452,0.5352,0.52789
4,2.2108,2.06508,0.5431,0.564687,0.5431,0.541026
5,2.1811,2.075985,0.5394,0.545874,0.5394,0.527425
6,2.1567,2.050685,0.5479,0.562801,0.5479,0.543054
7,2.1337,2.024691,0.5563,0.560336,0.5563,0.549631
8,2.1128,2.014117,0.556,0.562405,0.556,0.550463
9,2.0906,2.00275,0.556,0.556441,0.556,0.548586
10,2.0652,2.004477,0.5546,0.554034,0.5546,0.54934


[I 2025-04-06 17:44:35,373] Trial 23 finished with value: 0.5493400148572276 and parameters: {'learning_rate': 0.0018746308241276409, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 23 with value: 0.5493400148572276.


Trial 24 with params: {'learning_rate': 0.0007721421740081912, 'weight_decay': 0.006, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7688,2.249835,0.5083,0.532309,0.5083,0.498599
2,2.3055,2.118119,0.5339,0.550412,0.5339,0.528067
3,2.2216,2.0886,0.5359,0.547776,0.5359,0.529732
4,2.183,2.055938,0.5442,0.552628,0.5442,0.537727
5,2.1522,2.06371,0.5425,0.544622,0.5425,0.531105
6,2.1326,2.047412,0.5503,0.557819,0.5503,0.543661
7,2.1161,2.031394,0.555,0.554349,0.555,0.546386
8,2.1033,2.02098,0.5556,0.559556,0.5556,0.550614


[I 2025-04-06 17:55:21,257] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.002498665327711353, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6187,2.190858,0.519,0.550835,0.519,0.512376
2,2.3377,2.156171,0.5245,0.559119,0.5245,0.520857
3,2.2832,2.115049,0.5301,0.555039,0.5301,0.523164
4,2.2513,2.083027,0.5423,0.564481,0.5423,0.5401
5,2.2175,2.092707,0.5384,0.549697,0.5384,0.527309
6,2.188,2.065154,0.5443,0.562724,0.5443,0.539463
7,2.1599,2.034016,0.5525,0.559481,0.5525,0.546349
8,2.1324,2.021455,0.553,0.560383,0.553,0.546886


[I 2025-04-06 18:06:02,283] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.00251169249784892, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6017,2.189602,0.5202,0.5527,0.5202,0.513656
2,2.3376,2.155758,0.5257,0.560376,0.5257,0.522317


[I 2025-04-06 18:08:48,847] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0005564539967914668, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8883,2.318535,0.4974,0.516417,0.4974,0.485736
2,2.3473,2.149932,0.5288,0.541462,0.5288,0.522639
3,2.2479,2.108989,0.5327,0.542572,0.5327,0.525642
4,2.2028,2.072248,0.5433,0.550225,0.5433,0.53661
5,2.1696,2.077365,0.5394,0.539333,0.5394,0.527821
6,2.1489,2.060019,0.5488,0.553548,0.5488,0.541286
7,2.1327,2.045128,0.5523,0.55038,0.5523,0.54319
8,2.1202,2.03479,0.5537,0.556401,0.5537,0.548293


[I 2025-04-06 18:19:56,346] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0006710071932930264, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8133,2.274859,0.506,0.528463,0.506,0.495551
2,2.3204,2.129102,0.533,0.547165,0.533,0.526894
3,2.2307,2.095436,0.5354,0.546044,0.5354,0.528724
4,2.1896,2.061491,0.5444,0.551832,0.5444,0.53778
5,2.158,2.068372,0.5412,0.542506,0.5412,0.529805
6,2.138,2.05197,0.5496,0.555897,0.5496,0.542659
7,2.1218,2.036572,0.5527,0.551254,0.5527,0.543864
8,2.1092,2.026089,0.5553,0.558995,0.5553,0.550228
9,2.0983,2.018417,0.5549,0.55278,0.5549,0.547107
10,2.086,2.025248,0.5539,0.55279,0.5539,0.548013


[I 2025-04-06 18:33:30,784] Trial 28 finished with value: 0.548012936092902 and parameters: {'learning_rate': 0.0006710071932930264, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 3.0}. Best is trial 23 with value: 0.5493400148572276.


Trial 29 with params: {'learning_rate': 0.0008765974320770966, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7326,2.231374,0.5112,0.534597,0.5112,0.501637
2,2.2954,2.111345,0.5342,0.551892,0.5342,0.52817
3,2.216,2.084547,0.5349,0.548623,0.5349,0.528818
4,2.1794,2.052758,0.5445,0.554101,0.5445,0.538317
5,2.1493,2.060956,0.5436,0.546405,0.5436,0.531716
6,2.1298,2.044267,0.5499,0.557972,0.5499,0.543227
7,2.1129,2.027544,0.5554,0.555609,0.5554,0.547206
8,2.0997,2.017275,0.5565,0.561077,0.5565,0.551638
9,2.0871,2.009527,0.5568,0.55539,0.5568,0.549366
10,2.0727,2.015689,0.5542,0.553706,0.5542,0.548723


[I 2025-04-06 18:47:16,238] Trial 29 finished with value: 0.5487225389395013 and parameters: {'learning_rate': 0.0008765974320770966, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 23 with value: 0.5493400148572276.


Trial 30 with params: {'learning_rate': 0.000311584806759745, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1008,2.514373,0.472,0.487414,0.472,0.455993
2,2.4748,2.256905,0.5129,0.524938,0.5129,0.504877


[I 2025-04-06 18:49:56,717] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.00033346851249045406, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0901,2.490634,0.4761,0.493032,0.4761,0.460992
2,2.4579,2.241828,0.5166,0.529016,0.5166,0.508805
3,2.3228,2.172274,0.5215,0.5296,0.5215,0.512756
4,2.2628,2.122192,0.5333,0.537989,0.5333,0.525445


[I 2025-04-06 18:55:17,829] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0009606485201495702, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7073,2.219944,0.5145,0.538077,0.5145,0.505322
2,2.2899,2.108245,0.5337,0.552803,0.5337,0.527987
3,2.2135,2.082704,0.5368,0.552792,0.5368,0.530768
4,2.1782,2.051604,0.5453,0.556487,0.5453,0.539582
5,2.1486,2.059812,0.5429,0.545531,0.5429,0.531115
6,2.129,2.042628,0.5503,0.559091,0.5503,0.543743
7,2.1118,2.025186,0.5558,0.556335,0.5558,0.547631
8,2.0981,2.015191,0.5567,0.562062,0.5567,0.552034
9,2.0849,2.007335,0.5577,0.556319,0.5577,0.550332
10,2.0696,2.013186,0.5551,0.554421,0.5551,0.549589


[I 2025-04-06 19:08:32,661] Trial 32 finished with value: 0.549589137680664 and parameters: {'learning_rate': 0.0009606485201495702, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 32 with value: 0.549589137680664.


Trial 33 with params: {'learning_rate': 0.001300731823690408, 'weight_decay': 0.007, 'warmup_steps': 10, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.65,2.192189,0.5206,0.545553,0.5206,0.513059
2,2.2838,2.1075,0.5352,0.557902,0.5352,0.530214
3,2.2159,2.082489,0.5352,0.553834,0.5352,0.528893
4,2.1836,2.05354,0.5473,0.563123,0.5473,0.542898
5,2.1549,2.061907,0.5418,0.545857,0.5418,0.529672
6,2.1341,2.041079,0.5503,0.56078,0.5503,0.544175
7,2.115,2.020751,0.5566,0.55765,0.5566,0.5491
8,2.0993,2.011649,0.5565,0.562051,0.5565,0.551367
9,2.0828,2.00302,0.5577,0.55746,0.5577,0.550636
10,2.0638,2.007369,0.5556,0.554819,0.5556,0.550101


[I 2025-04-06 19:22:14,009] Trial 33 finished with value: 0.5501007381753635 and parameters: {'learning_rate': 0.001300731823690408, 'weight_decay': 0.007, 'warmup_steps': 10, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 34 with params: {'learning_rate': 0.0011742337070766063, 'weight_decay': 0.007, 'warmup_steps': 16, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6758,2.201527,0.5191,0.543376,0.5191,0.511023
2,2.2844,2.106552,0.5342,0.556373,0.5342,0.528945
3,2.2135,2.081847,0.5352,0.553331,0.5352,0.529073
4,2.1803,2.05224,0.5463,0.561086,0.5463,0.54158
5,2.1513,2.060298,0.5425,0.54576,0.5425,0.530379
6,2.1311,2.040877,0.5505,0.560126,0.5505,0.544181
7,2.1128,2.021674,0.5562,0.557538,0.5562,0.548628
8,2.0978,2.012331,0.5565,0.562168,0.5565,0.551641
9,2.0826,2.004045,0.5569,0.556486,0.5569,0.549857
10,2.065,2.008971,0.5551,0.554405,0.5551,0.549629


[I 2025-04-06 19:35:40,522] Trial 34 finished with value: 0.5496290779382731 and parameters: {'learning_rate': 0.0011742337070766063, 'weight_decay': 0.007, 'warmup_steps': 16, 'lambda_param': 0.8, 'temperature': 2.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 35 with params: {'learning_rate': 0.0003803835266203788, 'weight_decay': 0.008, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0406,2.439401,0.4834,0.500548,0.4834,0.469464
2,2.4241,2.213155,0.52,0.532216,0.52,0.512859
3,2.2996,2.152069,0.5264,0.533605,0.5264,0.518051
4,2.244,2.105996,0.5363,0.541781,0.5363,0.52866


[I 2025-04-06 19:41:08,687] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0011799791650979616, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6676,2.200345,0.5196,0.543658,0.5196,0.511559
2,2.2839,2.106309,0.5344,0.556659,0.5344,0.529179
3,2.2134,2.0817,0.5351,0.553045,0.5351,0.528902
4,2.1803,2.052163,0.5458,0.560517,0.5458,0.541062
5,2.1514,2.060263,0.5427,0.545885,0.5427,0.530537
6,2.1312,2.040818,0.5503,0.560072,0.5503,0.544023
7,2.1128,2.021554,0.5564,0.557668,0.5564,0.548836
8,2.0979,2.012266,0.5566,0.562104,0.5566,0.551675
9,2.0826,2.003955,0.5569,0.556499,0.5569,0.549847
10,2.0649,2.008852,0.5552,0.55448,0.5552,0.549726


[I 2025-04-06 19:54:23,374] Trial 36 finished with value: 0.5497257629933443 and parameters: {'learning_rate': 0.0011799791650979616, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 37 with params: {'learning_rate': 0.0019223581065850085, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.61,2.177505,0.5238,0.55093,0.5238,0.516614
2,2.3029,2.134589,0.5318,0.564254,0.5318,0.528612
3,2.2445,2.09477,0.5352,0.556307,0.5352,0.528026
4,2.2136,2.066482,0.5434,0.565761,0.5434,0.541493
5,2.1837,2.077307,0.5394,0.546102,0.5394,0.527478
6,2.1589,2.051913,0.5476,0.562369,0.5476,0.542615
7,2.1356,2.025323,0.5558,0.559828,0.5558,0.549073
8,2.1142,2.014563,0.5553,0.561488,0.5553,0.549724


[I 2025-04-06 20:05:14,089] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00241491520646606, 'weight_decay': 0.005, 'warmup_steps': 23, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6256,2.188787,0.5203,0.551824,0.5203,0.513709
2,2.3325,2.154154,0.5257,0.560573,0.5257,0.522045
3,2.2776,2.111926,0.5305,0.555221,0.5305,0.523689
4,2.2457,2.08083,0.5422,0.565124,0.5422,0.540263
5,2.2126,2.090591,0.5383,0.549254,0.5383,0.526981
6,2.1838,2.06368,0.5447,0.562658,0.5447,0.539936
7,2.1564,2.03282,0.552,0.558732,0.552,0.545744
8,2.1298,2.020414,0.553,0.560274,0.553,0.546981


[I 2025-04-06 20:16:10,655] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0004600036792601093, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.943,2.368953,0.492,0.509979,0.492,0.479544
2,2.3796,2.176812,0.525,0.537292,0.525,0.518555


[I 2025-04-06 20:18:57,492] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0009108969019015958, 'weight_decay': 0.006, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.723,2.226522,0.5123,0.535718,0.5123,0.5027
2,2.293,2.109911,0.5341,0.552059,0.5341,0.528087
3,2.2148,2.083694,0.536,0.550725,0.536,0.529922
4,2.1788,2.052199,0.5451,0.555066,0.5451,0.539046
5,2.1489,2.060401,0.5437,0.546784,0.5437,0.531995
6,2.1293,2.043523,0.55,0.55834,0.55,0.543338
7,2.1123,2.026533,0.5556,0.555783,0.5556,0.547396
8,2.0989,2.016381,0.557,0.562043,0.557,0.552259
9,2.0861,2.008569,0.5571,0.555875,0.5571,0.549734
10,2.0713,2.014586,0.5546,0.554148,0.5546,0.54911


[I 2025-04-06 20:32:12,858] Trial 40 finished with value: 0.5491096892692515 and parameters: {'learning_rate': 0.0009108969019015958, 'weight_decay': 0.006, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 41 with params: {'learning_rate': 0.0015826614327043955, 'weight_decay': 0.01, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6213,2.179397,0.5249,0.548599,0.5249,0.517945
2,2.2891,2.11694,0.5366,0.562436,0.5366,0.532313
3,2.2262,2.086622,0.5347,0.554415,0.5347,0.52805
4,2.1951,2.058044,0.5442,0.563518,0.5442,0.541138
5,2.1663,2.068055,0.5414,0.546332,0.5414,0.529353
6,2.1439,2.04442,0.5494,0.561657,0.5494,0.543958
7,2.123,2.021543,0.5564,0.558222,0.5564,0.549234
8,2.1049,2.012009,0.5558,0.561608,0.5558,0.550582


[I 2025-04-06 20:43:03,078] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.001918227305777917, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6101,2.177478,0.5236,0.550684,0.5236,0.516449
2,2.3027,2.134387,0.5321,0.564569,0.5321,0.528962
3,2.2442,2.09466,0.5353,0.556303,0.5353,0.528059
4,2.2133,2.066356,0.5435,0.565694,0.5435,0.541584
5,2.1835,2.077212,0.5394,0.546183,0.5394,0.527506
6,2.1587,2.051798,0.5476,0.562494,0.5476,0.54265
7,2.1354,2.025259,0.5559,0.559956,0.5559,0.549191
8,2.1141,2.014524,0.5554,0.561653,0.5554,0.549853
9,2.0915,2.002887,0.5565,0.55685,0.5565,0.549025
10,2.0655,2.004432,0.5549,0.554428,0.5549,0.549605


[I 2025-04-06 20:56:42,285] Trial 42 finished with value: 0.5496054214622758 and parameters: {'learning_rate': 0.001918227305777917, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 43 with params: {'learning_rate': 0.00037516568345567503, 'weight_decay': 0.008, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0244,2.439075,0.4824,0.498994,0.4824,0.46844
2,2.4249,2.214581,0.5201,0.532708,0.5201,0.513075


[I 2025-04-06 20:59:24,802] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0035395517796668716, 'weight_decay': 0.006, 'warmup_steps': 12, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6484,2.230084,0.5127,0.546869,0.5127,0.505332
2,2.4163,2.205282,0.5158,0.554601,0.5158,0.511562
3,2.3618,2.160574,0.5215,0.547999,0.5215,0.514403
4,2.326,2.110214,0.5362,0.55917,0.5362,0.533252


[I 2025-04-06 21:04:44,443] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0010192158482357939, 'weight_decay': 0.007, 'warmup_steps': 15, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7027,2.21466,0.5164,0.540484,0.5164,0.50748
2,2.2878,2.107388,0.5348,0.554466,0.5348,0.529103
3,2.213,2.082178,0.5362,0.552987,0.5362,0.530322
4,2.1783,2.051491,0.5455,0.55771,0.5455,0.540106
5,2.1489,2.059633,0.5432,0.545975,0.5432,0.531272
6,2.1291,2.041856,0.5508,0.559752,0.5508,0.544309
7,2.1116,2.023926,0.5558,0.556477,0.5558,0.547766
8,2.0976,2.014156,0.5562,0.561946,0.5562,0.551609


[I 2025-04-06 21:15:19,172] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0013638301628415182, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6328,2.187483,0.5229,0.547415,0.5229,0.515481
2,2.2838,2.108345,0.5356,0.559034,0.5356,0.530906
3,2.2174,2.082987,0.5346,0.553323,0.5346,0.528309
4,2.1856,2.054202,0.546,0.562489,0.546,0.542003
5,2.1569,2.062946,0.5416,0.545622,0.5416,0.529365
6,2.1359,2.041421,0.5506,0.561351,0.5506,0.544584
7,2.1165,2.020584,0.5561,0.557489,0.5561,0.548815
8,2.1002,2.011488,0.5564,0.562042,0.5564,0.55124
9,2.0832,2.00265,0.5575,0.557178,0.5575,0.550419
10,2.0635,2.006733,0.5548,0.554233,0.5548,0.549406


[I 2025-04-06 21:28:29,570] Trial 46 finished with value: 0.5494064316334957 and parameters: {'learning_rate': 0.0013638301628415182, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 47 with params: {'learning_rate': 0.001317003447997662, 'weight_decay': 0.007, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6552,2.191895,0.5197,0.543765,0.5197,0.512034
2,2.2843,2.108102,0.5359,0.55888,0.5359,0.531006
3,2.2165,2.082788,0.5352,0.553853,0.5352,0.528883
4,2.1842,2.053868,0.5465,0.562599,0.5465,0.542188
5,2.1555,2.062315,0.5422,0.546346,0.5422,0.530058
6,2.1347,2.0412,0.5504,0.560852,0.5504,0.544271
7,2.1155,2.020762,0.5562,0.557075,0.5562,0.548644
8,2.0995,2.01163,0.5566,0.562237,0.5566,0.55148


[I 2025-04-06 21:38:59,360] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.001977299807177304, 'weight_decay': 0.01, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6237,2.179445,0.5221,0.550747,0.5221,0.515048
2,2.3064,2.137853,0.5297,0.563584,0.5297,0.526736


[I 2025-04-06 21:41:37,805] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0015898708923464957, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6345,2.180435,0.5232,0.547446,0.5232,0.516365
2,2.2901,2.117952,0.5364,0.562481,0.5364,0.532205
3,2.2269,2.087048,0.535,0.555009,0.535,0.528336
4,2.1957,2.058432,0.5448,0.563929,0.5448,0.541731
5,2.1669,2.068472,0.5417,0.547178,0.5417,0.529731
6,2.1444,2.044667,0.549,0.56145,0.549,0.543633
7,2.1234,2.02166,0.5563,0.558348,0.5563,0.54915
8,2.1052,2.012101,0.5561,0.56202,0.5561,0.550872
9,2.0858,2.002297,0.5559,0.556121,0.5559,0.548814
10,2.0636,2.00533,0.5552,0.554654,0.5552,0.549945


[I 2025-04-06 21:54:56,151] Trial 49 finished with value: 0.5499452461069926 and parameters: {'learning_rate': 0.0015898708923464957, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 6.5}. Best is trial 33 with value: 0.5501007381753635.


Trial 50 with params: {'learning_rate': 0.0009737474826606135, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.716,2.219849,0.5148,0.538016,0.5148,0.505515
2,2.29,2.108348,0.5342,0.553645,0.5342,0.528572
3,2.2136,2.082756,0.5369,0.552972,0.5369,0.530849
4,2.1784,2.051698,0.5458,0.557301,0.5458,0.540171
5,2.1487,2.059831,0.5427,0.545285,0.5427,0.530896
6,2.129,2.042471,0.55,0.558898,0.55,0.543489
7,2.1117,2.024951,0.5557,0.556152,0.5557,0.54745
8,2.098,2.015006,0.5562,0.561607,0.5562,0.551479
9,2.0846,2.007123,0.5572,0.555757,0.5572,0.549798
10,2.0692,2.012871,0.5553,0.554602,0.5553,0.549733


[I 2025-04-06 22:08:14,185] Trial 50 finished with value: 0.5497327418534281 and parameters: {'learning_rate': 0.0009737474826606135, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 51 with params: {'learning_rate': 0.0012097977396930784, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6749,2.199369,0.5192,0.543559,0.5192,0.511228
2,2.2845,2.106896,0.535,0.557508,0.535,0.529929
3,2.2142,2.082058,0.5352,0.553545,0.5352,0.52904
4,2.1812,2.05264,0.5462,0.561124,0.5462,0.541552
5,2.1523,2.060725,0.5416,0.544759,0.5416,0.529466
6,2.1319,2.040897,0.5501,0.560034,0.5501,0.543869
7,2.1133,2.021345,0.5562,0.557405,0.5562,0.548689
8,2.0982,2.012102,0.557,0.562749,0.557,0.552146
9,2.0826,2.003733,0.5572,0.556781,0.5572,0.550104
10,2.0646,2.008476,0.5556,0.554898,0.5556,0.550076


[I 2025-04-06 22:21:37,186] Trial 51 finished with value: 0.5500759431296587 and parameters: {'learning_rate': 0.0012097977396930784, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 52 with params: {'learning_rate': 0.0012767749007847082, 'weight_decay': 0.004, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6615,2.19448,0.5207,0.545619,0.5207,0.513123
2,2.2842,2.107478,0.5351,0.557981,0.5351,0.530199
3,2.2155,2.082442,0.5349,0.553449,0.5349,0.528606
4,2.183,2.053395,0.547,0.562695,0.547,0.542563
5,2.1542,2.061631,0.5426,0.546613,0.5426,0.530487
6,2.1336,2.041054,0.5501,0.56035,0.5501,0.543961
7,2.1146,2.020885,0.5561,0.557089,0.5561,0.548535
8,2.099,2.011765,0.5567,0.562236,0.5567,0.551614
9,2.0827,2.003195,0.5576,0.557436,0.5576,0.550509
10,2.064,2.007656,0.5553,0.554565,0.5553,0.549828


[I 2025-04-06 22:34:57,854] Trial 52 finished with value: 0.5498278663775852 and parameters: {'learning_rate': 0.0012767749007847082, 'weight_decay': 0.004, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 53 with params: {'learning_rate': 0.0018502176798139323, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6199,2.177846,0.5239,0.550053,0.5239,0.516765
2,2.2999,2.131377,0.5333,0.564364,0.5333,0.529933
3,2.2405,2.093031,0.5355,0.557036,0.5355,0.52823
4,2.2095,2.064552,0.5433,0.564345,0.5433,0.541051
5,2.18,2.075422,0.5397,0.54599,0.5397,0.527678
6,2.1557,2.050192,0.5485,0.562968,0.5485,0.543539
7,2.1329,2.02445,0.5561,0.560091,0.5561,0.549467
8,2.1122,2.013944,0.5558,0.561976,0.5558,0.550255


[I 2025-04-06 22:45:50,293] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0010682978352424197, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7018,2.210956,0.5171,0.541196,0.5171,0.508382
2,2.2869,2.107074,0.5331,0.553616,0.5331,0.527469
3,2.213,2.082032,0.5353,0.552862,0.5353,0.52937
4,2.1788,2.051686,0.5461,0.5592,0.5461,0.541
5,2.1495,2.059701,0.5436,0.546684,0.5436,0.531549
6,2.1296,2.04147,0.5507,0.55986,0.5507,0.544331
7,2.1118,2.023081,0.5559,0.557027,0.5559,0.548079
8,2.0975,2.013468,0.5565,0.562166,0.5565,0.551829
9,2.0832,2.005399,0.5572,0.556034,0.5572,0.549868
10,2.0668,2.010754,0.5544,0.553776,0.5544,0.548973


[I 2025-04-06 22:59:04,927] Trial 54 finished with value: 0.5489725432622325 and parameters: {'learning_rate': 0.0010682978352424197, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 4.5}. Best is trial 33 with value: 0.5501007381753635.


Trial 55 with params: {'learning_rate': 0.0014819839400398945, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6393,2.183656,0.5224,0.547595,0.5224,0.51559
2,2.287,2.112998,0.5365,0.56047,0.5365,0.531818
3,2.2222,2.084986,0.5348,0.553739,0.5348,0.528159
4,2.1906,2.056351,0.546,0.564143,0.546,0.542692
5,2.162,2.065776,0.5416,0.546448,0.5416,0.529439
6,2.1402,2.042903,0.5503,0.561671,0.5503,0.544511
7,2.1199,2.020987,0.5559,0.557128,0.5559,0.548543
8,2.1026,2.011705,0.556,0.561591,0.556,0.550776


[I 2025-04-06 23:09:50,825] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.00023818519644301298, 'weight_decay': 0.002, 'warmup_steps': 17, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2523,2.657585,0.4517,0.469326,0.4517,0.433679
2,2.57,2.33795,0.5022,0.516289,0.5022,0.493001
3,2.3994,2.239901,0.5086,0.514628,0.5086,0.497509
4,2.3242,2.177748,0.5241,0.528968,0.5241,0.515485


[I 2025-04-06 23:15:14,021] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0009618184819517546, 'weight_decay': 0.006, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7312,2.222607,0.5134,0.536715,0.5134,0.503957
2,2.2915,2.109038,0.5337,0.552389,0.5337,0.527801


[I 2025-04-06 23:17:49,539] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0008004183817540932, 'weight_decay': 0.007, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7631,2.244901,0.5102,0.535088,0.5102,0.500591
2,2.3027,2.116063,0.5345,0.551243,0.5345,0.528462
3,2.2199,2.087338,0.5348,0.547194,0.5348,0.528586
4,2.1819,2.054917,0.5438,0.552092,0.5438,0.537339


[I 2025-04-06 23:23:11,659] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.002117448929979038, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6257,2.181792,0.521,0.551545,0.521,0.51419
2,2.3142,2.144106,0.5297,0.564886,0.5297,0.526682
3,2.2573,2.101021,0.5348,0.557795,0.5348,0.527851
4,2.226,2.072348,0.5432,0.565847,0.5432,0.541024
5,2.1951,2.082808,0.5394,0.547476,0.5394,0.527703
6,2.1687,2.056983,0.5461,0.562105,0.5461,0.541232
7,2.1438,2.028311,0.5547,0.559906,0.5547,0.548085
8,2.1204,2.016706,0.5547,0.561254,0.5547,0.54902


[I 2025-04-06 23:33:51,992] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.0004500482809310711, 'weight_decay': 0.002, 'warmup_steps': 8, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9519,2.375902,0.4906,0.508555,0.4906,0.477994
2,2.3841,2.180471,0.5246,0.536735,0.5246,0.518092


[I 2025-04-06 23:36:31,056] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0029954620091949143, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6381,2.211757,0.513,0.549275,0.513,0.506359
2,2.3743,2.173433,0.5229,0.555305,0.5229,0.518559
3,2.3204,2.135944,0.528,0.552342,0.528,0.520878
4,2.2869,2.095712,0.5413,0.563929,0.5413,0.538977


[I 2025-04-06 23:41:44,875] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.000725829610496161, 'weight_decay': 0.002, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8153,2.264116,0.5062,0.529753,0.5062,0.496202
2,2.3138,2.123478,0.5331,0.54819,0.5331,0.526973


[I 2025-04-06 23:44:28,633] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0009470520922484956, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7255,2.223349,0.5131,0.536681,0.5131,0.503773
2,2.2916,2.109152,0.5333,0.551732,0.5333,0.527335
3,2.2143,2.083244,0.5371,0.552704,0.5371,0.530986
4,2.1786,2.051938,0.5452,0.555978,0.5452,0.539296
5,2.1488,2.0601,0.5426,0.545343,0.5426,0.530831
6,2.1292,2.042953,0.5503,0.558777,0.5503,0.543698
7,2.112,2.025613,0.5553,0.55577,0.5553,0.547106
8,2.0984,2.015584,0.5569,0.56214,0.5569,0.552264
9,2.0852,2.007724,0.5576,0.556154,0.5576,0.550205
10,2.07,2.013594,0.5554,0.554796,0.5554,0.549906


[I 2025-04-06 23:57:54,383] Trial 63 finished with value: 0.5499064176861697 and parameters: {'learning_rate': 0.0009470520922484956, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.5}. Best is trial 33 with value: 0.5501007381753635.


Trial 64 with params: {'learning_rate': 0.001109737697013382, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.685,2.206445,0.5178,0.541835,0.5178,0.509407
2,2.2853,2.106553,0.5332,0.554465,0.5332,0.527503


[I 2025-04-07 00:00:34,488] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0004415440076209003, 'weight_decay': 0.0, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9747,2.385421,0.4898,0.507784,0.4898,0.476946
2,2.3896,2.184592,0.5239,0.536,0.5239,0.517152
3,2.2763,2.132405,0.531,0.53851,0.531,0.522868
4,2.2254,2.090561,0.5388,0.545305,0.5388,0.531639


[I 2025-04-07 00:06:00,705] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0006628215954847815, 'weight_decay': 0.004, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8265,2.278705,0.5047,0.527086,0.5047,0.493981
2,2.3227,2.130598,0.533,0.547012,0.533,0.526801


[I 2025-04-07 00:08:40,273] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 8.465954991738309e-05, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7129,3.33159,0.3477,0.392895,0.3477,0.329889
2,3.1082,2.869888,0.43,0.472449,0.43,0.41665
3,2.8226,2.647898,0.4532,0.472131,0.4532,0.43623
4,2.672,2.528625,0.4713,0.483331,0.4713,0.456919


[I 2025-04-07 00:13:59,991] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0032066183423216615, 'weight_decay': 0.001, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6458,2.219547,0.5113,0.546972,0.5113,0.504296
2,2.3906,2.184669,0.5224,0.55573,0.5224,0.517787
3,2.3366,2.145377,0.5259,0.551487,0.5259,0.518581
4,2.3023,2.100955,0.5406,0.562375,0.5406,0.537932


[I 2025-04-07 00:19:24,649] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.001644898117607956, 'weight_decay': 0.004, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6271,2.178932,0.5236,0.548927,0.5236,0.516779
2,2.2916,2.120552,0.5353,0.562404,0.5353,0.531321
3,2.2294,2.088088,0.5353,0.55589,0.5353,0.528594
4,2.1983,2.059477,0.5441,0.564015,0.5441,0.54128
5,2.1694,2.069839,0.5412,0.546487,0.5412,0.529127
6,2.1466,2.045643,0.5484,0.561108,0.5484,0.543179
7,2.1252,2.0221,0.5562,0.558628,0.5562,0.549195
8,2.1065,2.012415,0.5557,0.561549,0.5557,0.550367


[I 2025-04-07 00:29:53,881] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.001986410280966956, 'weight_decay': 0.006, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6207,2.179306,0.5221,0.550732,0.5221,0.514997
2,2.3067,2.138168,0.5297,0.563668,0.5297,0.526701


[I 2025-04-07 00:32:31,749] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0025444206310488338, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6094,2.191562,0.519,0.552238,0.519,0.512471
2,2.3402,2.157036,0.525,0.559397,0.525,0.521374
3,2.2861,2.116712,0.5298,0.554785,0.5298,0.522887
4,2.2541,2.084075,0.5419,0.563652,0.5419,0.539546
5,2.22,2.093781,0.5378,0.549718,0.5378,0.526816
6,2.1902,2.065932,0.5438,0.56267,0.5438,0.538979
7,2.1617,2.034615,0.5525,0.55986,0.5525,0.546478
8,2.1337,2.022003,0.5527,0.560178,0.5527,0.546584


[I 2025-04-07 00:43:05,563] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0008982279386355596, 'weight_decay': 0.001, 'warmup_steps': 16, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7339,2.22919,0.5117,0.535189,0.5117,0.502079
2,2.2943,2.110662,0.5333,0.551247,0.5333,0.527317


[I 2025-04-07 00:45:46,086] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0008088447211890071, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7774,2.245551,0.5094,0.533993,0.5094,0.499614
2,2.3031,2.116053,0.5343,0.550921,0.5343,0.528065
3,2.2199,2.087311,0.5352,0.547916,0.5352,0.529078
4,2.1818,2.054846,0.544,0.552557,0.544,0.537615


[I 2025-04-07 00:51:02,455] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0018701299195407175, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6249,2.178409,0.5236,0.550121,0.5236,0.516416
2,2.3011,2.132625,0.5328,0.564415,0.5328,0.529555
3,2.2418,2.093649,0.5352,0.556505,0.5352,0.527876
4,2.2108,2.0652,0.5437,0.565265,0.5437,0.541542
5,2.1811,2.076046,0.5395,0.545974,0.5395,0.527526
6,2.1567,2.050761,0.5481,0.56302,0.5481,0.543224
7,2.1337,2.024719,0.5565,0.560557,0.5565,0.549881
8,2.1128,2.014162,0.5558,0.562309,0.5558,0.550296


[I 2025-04-07 01:01:38,511] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0011185661541979522, 'weight_decay': 0.007, 'warmup_steps': 21, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6925,2.206638,0.5175,0.541582,0.5175,0.509165
2,2.2857,2.106806,0.5332,0.554701,0.5332,0.527534
3,2.2132,2.081907,0.5354,0.552914,0.5354,0.52926
4,2.1794,2.0519,0.5463,0.560135,0.5463,0.541397
5,2.1503,2.059942,0.5432,0.546436,0.5432,0.530995
6,2.1302,2.041123,0.5516,0.561304,0.5516,0.545341
7,2.1121,2.022341,0.5555,0.556634,0.5555,0.547755
8,2.0976,2.012861,0.5562,0.562177,0.5562,0.551516
9,2.0828,2.004715,0.5571,0.556379,0.5571,0.549949
10,2.0659,2.009858,0.5544,0.553898,0.5544,0.548978


[I 2025-04-07 01:14:52,348] Trial 75 finished with value: 0.5489778933564194 and parameters: {'learning_rate': 0.0011185661541979522, 'weight_decay': 0.007, 'warmup_steps': 21, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 33 with value: 0.5501007381753635.


Trial 76 with params: {'learning_rate': 0.002268693243966363, 'weight_decay': 0.004, 'warmup_steps': 10, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6079,2.18326,0.5223,0.552844,0.5223,0.515784
2,2.3222,2.148929,0.5266,0.561702,0.5266,0.52313


[I 2025-04-07 01:17:38,617] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0012221587004880151, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6673,2.197943,0.5201,0.544268,0.5201,0.512122
2,2.284,2.106778,0.5349,0.557352,0.5349,0.529845
3,2.2143,2.082021,0.5354,0.553694,0.5354,0.52923
4,2.1814,2.052705,0.5462,0.561357,0.5462,0.541661
5,2.1526,2.060816,0.5418,0.545508,0.5418,0.5296
6,2.1322,2.040875,0.5501,0.560134,0.5501,0.543885
7,2.1135,2.021224,0.5562,0.557429,0.5562,0.548764
8,2.0983,2.012001,0.5569,0.562822,0.5569,0.552072
9,2.0826,2.003603,0.5576,0.557201,0.5576,0.550512
10,2.0645,2.008297,0.5558,0.555026,0.5558,0.55027


[I 2025-04-07 01:30:48,867] Trial 77 finished with value: 0.5502699382600482 and parameters: {'learning_rate': 0.0012221587004880151, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 77 with value: 0.5502699382600482.


Trial 78 with params: {'learning_rate': 0.0008010151333946589, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7737,2.246215,0.5095,0.533818,0.5095,0.499608
2,2.3035,2.11641,0.5345,0.551403,0.5345,0.52842
3,2.2202,2.087532,0.535,0.547371,0.535,0.528764
4,2.182,2.055025,0.5439,0.552349,0.5439,0.537436


[I 2025-04-07 01:36:09,525] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0008612279509342259, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7463,2.234841,0.5111,0.535636,0.5111,0.501607
2,2.2972,2.112429,0.5339,0.551834,0.5339,0.527872
3,2.2169,2.085187,0.5351,0.54861,0.5351,0.529046
4,2.1799,2.053219,0.5437,0.553087,0.5437,0.537498


[I 2025-04-07 01:41:25,777] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0031146825316494525, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6352,2.215677,0.5119,0.549058,0.5119,0.505224
2,2.383,2.179116,0.5223,0.555508,0.5223,0.518
3,2.3292,2.140923,0.5266,0.552183,0.5266,0.519464
4,2.2953,2.098576,0.5412,0.563631,0.5412,0.538751


[I 2025-04-07 01:46:47,788] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.00046113530927947713, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9526,2.370429,0.4921,0.510382,0.4921,0.479619
2,2.3802,2.176991,0.5251,0.537449,0.5251,0.518717


[I 2025-04-07 01:49:28,915] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0015393901730468209, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6308,2.181224,0.5222,0.546387,0.5222,0.515222
2,2.2882,2.115267,0.5372,0.562547,0.5372,0.532784
3,2.2245,2.085905,0.5344,0.553718,0.5344,0.527753
4,2.1931,2.057343,0.5448,0.563281,0.5448,0.541595
5,2.1644,2.067101,0.5415,0.546889,0.5415,0.529426
6,2.1423,2.043747,0.55,0.561593,0.55,0.544335
7,2.1217,2.021248,0.5563,0.557747,0.5563,0.548969
8,2.1039,2.011855,0.5559,0.561924,0.5559,0.550712
9,2.0851,2.002259,0.5561,0.556058,0.5561,0.548983
10,2.0635,2.005568,0.5548,0.554272,0.5548,0.549494


[I 2025-04-07 02:02:42,982] Trial 82 finished with value: 0.5494935780170815 and parameters: {'learning_rate': 0.0015393901730468209, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 77 with value: 0.5502699382600482.


Trial 83 with params: {'learning_rate': 0.0006578058722600243, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8228,2.279271,0.5042,0.526152,0.5042,0.49352
2,2.3231,2.131073,0.533,0.546955,0.533,0.526825


[I 2025-04-07 02:05:25,391] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 5.286423289644344e-05, 'weight_decay': 0.008, 'warmup_steps': 19, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8666,3.611925,0.2688,0.320674,0.2688,0.256401
2,3.388,3.195921,0.3736,0.428666,0.3736,0.359054
3,3.104,2.948576,0.4132,0.444741,0.4132,0.394858
4,2.9293,2.804868,0.4352,0.456606,0.4352,0.418676


[I 2025-04-07 02:10:52,488] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00028731625417467325, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1533,2.556923,0.4675,0.483968,0.4675,0.450928
2,2.5022,2.279719,0.5101,0.523195,0.5101,0.501535


[I 2025-04-07 02:13:28,509] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0002681159956916346, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2042,2.597476,0.4615,0.479132,0.4615,0.4445
2,2.5283,2.301219,0.507,0.51973,0.507,0.49793
3,2.3702,2.213726,0.5127,0.519724,0.5127,0.502631
4,2.3007,2.155914,0.5294,0.53494,0.5294,0.521449


[I 2025-04-07 02:18:44,743] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.002450630407210917, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.613,2.188742,0.5204,0.551893,0.5204,0.513895
2,2.3341,2.154685,0.5256,0.560306,0.5256,0.522096


[I 2025-04-07 02:21:21,638] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.000128882436656059, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5534,3.055027,0.3969,0.43198,0.3969,0.377438
2,2.8649,2.612415,0.4659,0.494174,0.4659,0.45418


[I 2025-04-07 02:24:00,449] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0006725671321371811, 'weight_decay': 0.005, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.814,2.27463,0.5063,0.528832,0.5063,0.49589
2,2.3203,2.128956,0.5327,0.54677,0.5327,0.526546
3,2.2306,2.095352,0.5357,0.54641,0.5357,0.529042
4,2.1895,2.061405,0.5443,0.551632,0.5443,0.537654


[I 2025-04-07 02:29:18,653] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0015407224719452383, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6321,2.181309,0.5223,0.546415,0.5223,0.515274
2,2.2883,2.115402,0.5374,0.562849,0.5374,0.533052
3,2.2245,2.085958,0.5345,0.553878,0.5345,0.527872
4,2.1932,2.057388,0.5447,0.563308,0.5447,0.541491
5,2.1645,2.067144,0.5415,0.546765,0.5415,0.529459
6,2.1424,2.043808,0.55,0.561586,0.55,0.544337
7,2.1217,2.021283,0.5565,0.558101,0.5565,0.549257
8,2.1039,2.011874,0.556,0.561933,0.556,0.550793
9,2.0851,2.002266,0.5559,0.555834,0.5559,0.548767
10,2.0635,2.005567,0.5549,0.554406,0.5549,0.549627


[I 2025-04-07 02:42:46,232] Trial 90 finished with value: 0.5496269380557683 and parameters: {'learning_rate': 0.0015407224719452383, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 77 with value: 0.5502699382600482.


Trial 91 with params: {'learning_rate': 0.0012457733887986462, 'weight_decay': 0.001, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6568,2.195622,0.5206,0.545192,0.5206,0.512767
2,2.2836,2.106728,0.5348,0.557475,0.5348,0.52984
3,2.2145,2.08199,0.5357,0.554025,0.5357,0.529465
4,2.1819,2.052871,0.5466,0.561997,0.5466,0.542149
5,2.1531,2.061058,0.5416,0.545395,0.5416,0.52938
6,2.1327,2.040869,0.5501,0.560381,0.5501,0.543978
7,2.1139,2.021007,0.5559,0.556999,0.5559,0.548385
8,2.0985,2.011857,0.5569,0.562635,0.5569,0.551925
9,2.0826,2.003369,0.5574,0.557037,0.5574,0.550279
10,2.0642,2.007971,0.556,0.555354,0.556,0.550545


[I 2025-04-07 02:56:24,675] Trial 91 finished with value: 0.550545336158113 and parameters: {'learning_rate': 0.0012457733887986462, 'weight_decay': 0.001, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 91 with value: 0.550545336158113.


Trial 92 with params: {'learning_rate': 0.0008846159350465202, 'weight_decay': 0.001, 'warmup_steps': 28, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7565,2.233405,0.511,0.534445,0.511,0.501321
2,2.2967,2.111899,0.5333,0.551392,0.5333,0.527266
3,2.2165,2.084815,0.536,0.550537,0.536,0.529947
4,2.1797,2.052969,0.5437,0.553322,0.5437,0.537608


[I 2025-04-07 03:01:38,622] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.000746111405492015, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7714,2.254298,0.5076,0.530577,0.5076,0.497816
2,2.3082,2.120119,0.534,0.549951,0.534,0.528137
3,2.2233,2.089862,0.5371,0.548523,0.5371,0.530821
4,2.1842,2.056965,0.5432,0.55139,0.5432,0.536774


[I 2025-04-07 03:06:55,714] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0015839417228953388, 'weight_decay': 0.002, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6264,2.179831,0.5237,0.54758,0.5237,0.516792
2,2.2894,2.117277,0.5367,0.562415,0.5367,0.532366
3,2.2264,2.086738,0.5347,0.554407,0.5347,0.528023
4,2.1952,2.058151,0.5443,0.563366,0.5443,0.541195
5,2.1664,2.068186,0.5417,0.546643,0.5417,0.529645
6,2.1441,2.044479,0.5491,0.561448,0.5491,0.543679
7,2.1231,2.021583,0.5563,0.55827,0.5563,0.54915
8,2.105,2.012038,0.5559,0.56171,0.5559,0.550641
9,2.0857,2.002274,0.5561,0.556272,0.5561,0.549004
10,2.0636,2.005358,0.5552,0.554609,0.5552,0.549952


[I 2025-04-07 03:20:18,957] Trial 94 finished with value: 0.5499521333942312 and parameters: {'learning_rate': 0.0015839417228953388, 'weight_decay': 0.002, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 91 with value: 0.550545336158113.


Trial 95 with params: {'learning_rate': 0.0033929625600142947, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6338,2.224042,0.5118,0.547141,0.5118,0.504567
2,2.4041,2.194841,0.5196,0.555553,0.5196,0.515125


[I 2025-04-07 03:23:03,330] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.003471433469266581, 'weight_decay': 0.003, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6483,2.22793,0.5116,0.547807,0.5116,0.504388
2,2.4111,2.200989,0.5175,0.554801,0.5175,0.513159
3,2.3567,2.157442,0.5224,0.548711,0.5224,0.515164
4,2.3212,2.108186,0.5371,0.559841,0.5371,0.53421


[I 2025-04-07 03:28:23,943] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.002808343859880905, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6202,2.202837,0.5154,0.550793,0.5154,0.508793
2,2.3594,2.165227,0.5221,0.554846,0.5221,0.517995


[I 2025-04-07 03:31:04,423] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0008539448347494187, 'weight_decay': 0.0, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7485,2.236003,0.511,0.536071,0.511,0.501627
2,2.2978,2.11282,0.5344,0.552149,0.5344,0.528349
3,2.2172,2.085408,0.535,0.548313,0.535,0.528992
4,2.1801,2.053365,0.5439,0.553089,0.5439,0.537715


[I 2025-04-07 03:36:18,225] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 8.710007471084877e-05, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7031,3.313637,0.3502,0.395454,0.3502,0.332843
2,3.0913,2.851147,0.4331,0.47513,0.4331,0.420096
3,2.8071,2.631902,0.4546,0.471967,0.4546,0.437699
4,2.6585,2.514392,0.4729,0.484201,0.4729,0.458653


[I 2025-04-07 03:41:31,431] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2141,2.599975,0.4605,0.478321,0.4605,0.443472
2,2.5292,2.30139,0.5072,0.520051,0.5072,0.49817
3,2.3702,2.213561,0.5124,0.519526,0.5124,0.502297
4,2.3005,2.155649,0.5291,0.534638,0.5291,0.521136


[I 2025-04-07 03:46:51,461] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00031502971397332646, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0978,2.510135,0.4731,0.489075,0.4731,0.457435
2,2.4719,2.25432,0.5135,0.525553,0.5135,0.505435


[I 2025-04-07 03:49:32,746] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0014621522126747307, 'weight_decay': 0.001, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6381,2.184215,0.5225,0.547409,0.5225,0.515686
2,2.2864,2.112117,0.5362,0.559739,0.5362,0.531479
3,2.2213,2.084629,0.5347,0.553571,0.5347,0.528084
4,2.1897,2.055985,0.5465,0.564372,0.5465,0.543017
5,2.1611,2.065265,0.5416,0.546271,0.5416,0.529417
6,2.1394,2.042638,0.5499,0.560782,0.5499,0.543992
7,2.1193,2.020897,0.5564,0.557675,0.5564,0.549078
8,2.1022,2.011644,0.5563,0.56194,0.5563,0.551094
9,2.0842,2.002387,0.5563,0.556082,0.5563,0.549193
10,2.0634,2.006001,0.5547,0.554204,0.5547,0.549375


[I 2025-04-07 04:03:02,214] Trial 102 finished with value: 0.5493751850912254 and parameters: {'learning_rate': 0.0014621522126747307, 'weight_decay': 0.001, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 91 with value: 0.550545336158113.


Trial 103 with params: {'learning_rate': 0.002666225461731558, 'weight_decay': 0.005, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6229,2.197425,0.518,0.553897,0.518,0.511648
2,2.3494,2.160851,0.5239,0.556725,0.5239,0.51968


[I 2025-04-07 04:05:40,103] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0013251662775994246, 'weight_decay': 0.003, 'warmup_steps': 14, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6528,2.191267,0.5199,0.544327,0.5199,0.512376
2,2.2843,2.108203,0.5358,0.558812,0.5358,0.530954
3,2.2167,2.082838,0.5349,0.553418,0.5349,0.528538
4,2.1845,2.053945,0.5463,0.562537,0.5463,0.542114
5,2.1558,2.062464,0.5417,0.545959,0.5417,0.529604
6,2.1349,2.041254,0.5502,0.560689,0.5502,0.54411
7,2.1156,2.02075,0.5564,0.557309,0.5564,0.548876
8,2.0997,2.011615,0.5566,0.562276,0.5566,0.551495
9,2.083,2.002885,0.5575,0.557503,0.5575,0.550495
10,2.0637,2.007128,0.5555,0.554859,0.5555,0.55007


[I 2025-04-07 04:18:56,525] Trial 104 finished with value: 0.5500696565528851 and parameters: {'learning_rate': 0.0013251662775994246, 'weight_decay': 0.003, 'warmup_steps': 14, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 91 with value: 0.550545336158113.


Trial 105 with params: {'learning_rate': 0.0001220905192290103, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5575,3.080242,0.392,0.426969,0.392,0.372269
2,2.8888,2.638749,0.4631,0.493101,0.4631,0.451214


[I 2025-04-07 04:21:32,915] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0002596014872391645, 'weight_decay': 0.001, 'warmup_steps': 6, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.196,2.605867,0.4603,0.476968,0.4603,0.443098
2,2.5357,2.308833,0.5051,0.518026,0.5051,0.495873
3,2.3766,2.219742,0.5112,0.518073,0.5112,0.500928
4,2.3062,2.161147,0.5282,0.533459,0.5282,0.520117


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Sat Mar 29 17:35:19 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-04-07 04:26:51,888] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0005292729978742482, 'weight_decay': 0.004, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8885,2.328202,0.4966,0.514671,0.4966,0.484809
2,2.3538,2.155649,0.5289,0.541168,0.5289,0.522698


[I 2025-04-07 04:29:38,639] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0006173002977665656, 'weight_decay': 0.01, 'warmup_steps': 8, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8334,2.290919,0.5019,0.522919,0.5019,0.490902
2,2.3304,2.136956,0.5305,0.544395,0.5305,0.524566


[I 2025-04-07 04:32:21,063] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.000932499644011594, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7143,2.223353,0.5133,0.536486,0.5133,0.503941
2,2.2914,2.109072,0.5338,0.552331,0.5338,0.527983
3,2.2142,2.083163,0.536,0.550905,0.536,0.529848
4,2.1785,2.05187,0.5452,0.555877,0.5452,0.539303
5,2.1487,2.060097,0.5431,0.545945,0.5431,0.531322
6,2.1291,2.043096,0.5501,0.558701,0.5501,0.54347
7,2.112,2.025893,0.5554,0.555825,0.5554,0.547221
8,2.0985,2.015835,0.5572,0.562454,0.5572,0.552565
9,2.0855,2.008011,0.5572,0.555936,0.5572,0.549827
10,2.0705,2.013923,0.5548,0.554176,0.5548,0.549283


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Mar 29 17:35:16 2025) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-04-07 04:45:42,572] Trial 109 finished with value: 0.5492829565946316 and parameters: {'learning_rate': 0.000932499644011594, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 91 with value: 0.550545336158113.


Trial 110 with params: {'learning_rate': 0.0007868535305842857, 'weight_decay': 0.001, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7604,2.246455,0.51,0.534236,0.51,0.500359
2,2.3036,2.116796,0.5344,0.551189,0.5344,0.528571
3,2.2205,2.087801,0.536,0.548147,0.536,0.529821
4,2.1823,2.055314,0.5441,0.552551,0.5441,0.537625


[I 2025-04-07 04:51:13,573] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.000820337662225412, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7563,2.241261,0.5111,0.536432,0.5111,0.501675
2,2.3007,2.114704,0.5348,0.551835,0.5348,0.528723
3,2.2188,2.086525,0.5355,0.54831,0.5355,0.529401
4,2.1811,2.054267,0.5435,0.552162,0.5435,0.537202


[I 2025-04-07 04:56:33,175] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0013306302933194095, 'weight_decay': 0.002, 'warmup_steps': 20, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.661,2.191767,0.5197,0.543916,0.5197,0.51215
2,2.2849,2.108632,0.5361,0.559281,0.5361,0.53126
3,2.2171,2.083068,0.535,0.55378,0.535,0.528703
4,2.1848,2.054137,0.5462,0.562327,0.5462,0.54197
5,2.1561,2.062649,0.5417,0.545949,0.5417,0.529576
6,2.1352,2.041351,0.5502,0.560665,0.5502,0.544092
7,2.1158,2.020769,0.5566,0.557596,0.5566,0.549104
8,2.0998,2.011643,0.5562,0.561833,0.5562,0.55108
9,2.083,2.002911,0.5569,0.556776,0.5569,0.549836
10,2.0637,2.007095,0.5554,0.554764,0.5554,0.549984


[I 2025-04-07 05:09:57,838] Trial 112 finished with value: 0.5499837175649144 and parameters: {'learning_rate': 0.0013306302933194095, 'weight_decay': 0.002, 'warmup_steps': 20, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 91 with value: 0.550545336158113.


Trial 113 with params: {'learning_rate': 0.0017301364234223531, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6125,2.176874,0.5249,0.54988,0.5249,0.518026
2,2.2942,2.124451,0.5338,0.561263,0.5338,0.529618
3,2.2335,2.089754,0.5362,0.55707,0.5362,0.529414
4,2.2026,2.061263,0.5443,0.564423,0.5443,0.541792
5,2.1735,2.071996,0.5414,0.546751,0.5414,0.529332
6,2.1501,2.047326,0.5478,0.560619,0.5478,0.542652
7,2.1282,2.02291,0.5554,0.558313,0.5554,0.548455
8,2.1087,2.01291,0.5559,0.562014,0.5559,0.550532


[I 2025-04-07 05:20:37,160] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0010737426266900908, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6961,2.20998,0.5178,0.541958,0.5178,0.509154
2,2.2864,2.106879,0.5331,0.553643,0.5331,0.527326
3,2.2129,2.081916,0.5354,0.552892,0.5354,0.529499
4,2.1788,2.051648,0.5461,0.559188,0.5461,0.541033
5,2.1495,2.059661,0.5434,0.546438,0.5434,0.531338
6,2.1296,2.041382,0.5509,0.560045,0.5509,0.544514
7,2.1118,2.02299,0.5556,0.556762,0.5556,0.547786
8,2.0975,2.013384,0.5567,0.562514,0.5567,0.551991
9,2.0832,2.005301,0.5571,0.556031,0.5571,0.549808
10,2.0667,2.010642,0.5544,0.553794,0.5544,0.548948


[I 2025-04-07 05:33:44,598] Trial 114 finished with value: 0.5489483562943094 and parameters: {'learning_rate': 0.0010737426266900908, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 91 with value: 0.550545336158113.


Trial 115 with params: {'learning_rate': 0.004447300089343906, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6976,2.276694,0.5054,0.549975,0.5054,0.498286
2,2.4905,2.26932,0.5068,0.560832,0.5068,0.50319


[I 2025-04-07 05:36:24,140] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0024603806207838673, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6286,2.19048,0.5199,0.551753,0.5199,0.513395
2,2.3357,2.155502,0.5245,0.559188,0.5245,0.520859
3,2.2809,2.113768,0.5304,0.555303,0.5304,0.523588
4,2.2489,2.08209,0.5422,0.564391,0.5422,0.540062
5,2.2154,2.091829,0.5386,0.549474,0.5386,0.527385
6,2.1862,2.064557,0.5442,0.562327,0.5442,0.539329
7,2.1584,2.033511,0.552,0.558867,0.552,0.5458
8,2.1313,2.021029,0.5529,0.560101,0.5529,0.546788


[I 2025-04-07 05:47:06,718] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0022358032126326565, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6197,2.183562,0.5213,0.551658,0.5213,0.514499
2,2.3208,2.148269,0.5278,0.562796,0.5278,0.524395


[I 2025-04-07 05:49:52,220] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0008431228672940533, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7628,2.239121,0.5103,0.534849,0.5103,0.500659
2,2.2996,2.113792,0.5341,0.551736,0.5341,0.528034
3,2.218,2.085951,0.5352,0.548491,0.5352,0.529086
4,2.1806,2.053775,0.544,0.55267,0.544,0.537712


[I 2025-04-07 05:55:14,009] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0008929731721877744, 'weight_decay': 0.001, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7354,2.22994,0.5117,0.535206,0.5117,0.502149
2,2.2947,2.110877,0.5334,0.551359,0.5334,0.527364
3,2.2156,2.084274,0.5361,0.550536,0.5361,0.530068
4,2.1792,2.052601,0.5445,0.554247,0.5445,0.538343


[I 2025-04-07 06:00:37,886] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0033018056604635507, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6307,2.22108,0.5114,0.547698,0.5114,0.504433
2,2.3969,2.189226,0.5227,0.557732,0.5227,0.518189


[I 2025-04-07 06:03:20,351] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7141,3.32949,0.3471,0.392465,0.3471,0.329453
2,3.1053,2.865965,0.4306,0.473007,0.4306,0.417297
3,2.8191,2.644086,0.4538,0.472138,0.4538,0.436698
4,2.6686,2.525019,0.4721,0.484058,0.4721,0.457757


[I 2025-04-07 06:08:33,159] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00010692115265466455, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6335,3.18373,0.3749,0.417816,0.3749,0.356021
2,2.9727,2.722312,0.4496,0.483297,0.4496,0.437215


[I 2025-04-07 06:11:12,594] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0009486969324828044, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7161,2.222093,0.5136,0.536899,0.5136,0.504412
2,2.2909,2.108791,0.5334,0.551729,0.5334,0.527428
3,2.214,2.083019,0.5371,0.5527,0.5371,0.531028
4,2.1784,2.051798,0.5454,0.556315,0.5454,0.539573
5,2.1487,2.05999,0.5429,0.545539,0.5429,0.531092
6,2.1291,2.042855,0.5502,0.558852,0.5502,0.543606
7,2.1119,2.025517,0.5553,0.555876,0.5553,0.54714
8,2.0983,2.015501,0.557,0.562216,0.557,0.552358
9,2.0851,2.007652,0.5575,0.556113,0.5575,0.550101
10,2.07,2.013519,0.5555,0.554819,0.5555,0.549986


[I 2025-04-07 06:24:21,306] Trial 123 finished with value: 0.5499862143785127 and parameters: {'learning_rate': 0.0009486969324828044, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 91 with value: 0.550545336158113.


Trial 124 with params: {'learning_rate': 0.0008145228229438566, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7507,2.241273,0.5104,0.535472,0.5104,0.500972
2,2.3007,2.114803,0.5347,0.551678,0.5347,0.528688
3,2.2189,2.086576,0.5353,0.548166,0.5353,0.529241
4,2.1812,2.054362,0.5434,0.551893,0.5434,0.537047


[I 2025-04-07 06:29:45,833] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0012577953851738311, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6835,2.197487,0.5193,0.543891,0.5193,0.511484
2,2.2854,2.107852,0.5345,0.557575,0.5345,0.529593
3,2.2156,2.08263,0.5351,0.553909,0.5351,0.52876
4,2.1828,2.053386,0.5463,0.561844,0.5463,0.54187
5,2.1539,2.061539,0.542,0.546027,0.542,0.529868
6,2.1333,2.041128,0.5504,0.560822,0.5504,0.544356
7,2.1143,2.021093,0.5559,0.557049,0.5559,0.54838
8,2.0988,2.011908,0.5566,0.562458,0.5566,0.551593
9,2.0827,2.003386,0.5574,0.557103,0.5574,0.550233
10,2.0642,2.00791,0.555,0.5543,0.555,0.549546


[I 2025-04-07 06:42:58,345] Trial 125 finished with value: 0.5495455465454367 and parameters: {'learning_rate': 0.0012577953851738311, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 91 with value: 0.550545336158113.


Trial 126 with params: {'learning_rate': 0.0013023312833655271, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6613,2.19316,0.5198,0.544071,0.5198,0.512083
2,2.2845,2.107982,0.5357,0.55856,0.5357,0.530786
3,2.2162,2.082726,0.5348,0.553452,0.5348,0.528492
4,2.1838,2.053734,0.5468,0.562777,0.5468,0.542466
5,2.1551,2.062095,0.5423,0.546289,0.5423,0.530103
6,2.1343,2.041165,0.5505,0.560945,0.5505,0.54435
7,2.1152,2.020828,0.5566,0.55756,0.5566,0.549051
8,2.0993,2.011706,0.5567,0.562341,0.5567,0.55162
9,2.0829,2.003045,0.5576,0.55742,0.5576,0.550519
10,2.0639,2.007393,0.5552,0.554441,0.5552,0.54969


[I 2025-04-07 06:56:21,096] Trial 126 finished with value: 0.5496901769807039 and parameters: {'learning_rate': 0.0013023312833655271, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 91 with value: 0.550545336158113.


Trial 127 with params: {'learning_rate': 0.002929540061952398, 'weight_decay': 0.007, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.636,2.20907,0.5125,0.548088,0.5125,0.505835
2,2.3692,2.17048,0.522,0.554357,0.522,0.517682
3,2.3154,2.133172,0.5276,0.551936,0.5276,0.520454
4,2.2821,2.094112,0.5421,0.564734,0.5421,0.539697
5,2.2442,2.105514,0.5342,0.549407,0.5342,0.523647
6,2.2107,2.072043,0.5443,0.561669,0.5443,0.538384
7,2.1788,2.040366,0.5493,0.558243,0.5493,0.543693
8,2.1465,2.027753,0.552,0.560088,0.552,0.545576


[I 2025-04-07 07:07:10,783] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0014027866687194453, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6477,2.187314,0.5224,0.547282,0.5224,0.515321
2,2.2855,2.110252,0.5362,0.559264,0.5362,0.531478
3,2.2192,2.083838,0.535,0.554499,0.535,0.528738
4,2.1874,2.055099,0.5463,0.563089,0.5463,0.542332
5,2.1587,2.063972,0.5427,0.547113,0.5427,0.530558
6,2.1374,2.042017,0.5506,0.561378,0.5506,0.544613
7,2.1177,2.020765,0.5564,0.557568,0.5564,0.549035
8,2.1011,2.011579,0.5566,0.562333,0.5566,0.551495
9,2.0836,2.002562,0.5567,0.556273,0.5567,0.549578
10,2.0634,2.00644,0.5548,0.554071,0.5548,0.549336


[I 2025-04-07 07:20:36,693] Trial 128 finished with value: 0.5493358666196104 and parameters: {'learning_rate': 0.0014027866687194453, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 91 with value: 0.550545336158113.


Trial 129 with params: {'learning_rate': 0.0006712937288776745, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8334,2.277822,0.5055,0.52803,0.5055,0.494774
2,2.322,2.129825,0.5325,0.546621,0.5325,0.526212


[I 2025-04-07 07:23:13,528] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0015666322276299412, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.636,2.181028,0.5233,0.547841,0.5233,0.516384
2,2.2894,2.116842,0.5369,0.56239,0.5369,0.532504
3,2.2259,2.086603,0.5347,0.55417,0.5347,0.528023
4,2.1946,2.057957,0.5447,0.563603,0.5447,0.541534
5,2.1658,2.067891,0.5414,0.546859,0.5414,0.529417
6,2.1435,2.044281,0.5496,0.561568,0.5496,0.544098
7,2.1226,2.021482,0.5564,0.558107,0.5564,0.54916
8,2.1046,2.012006,0.556,0.562003,0.556,0.550786


[I 2025-04-07 07:34:05,288] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.00123191106070283, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.684,2.198956,0.5194,0.543953,0.5194,0.511556
2,2.2852,2.107444,0.5345,0.557236,0.5345,0.52954
3,2.215,2.082384,0.5348,0.553241,0.5348,0.528494
4,2.182,2.053035,0.5461,0.561423,0.5461,0.541616
5,2.1531,2.061149,0.5416,0.545438,0.5416,0.529441
6,2.1326,2.041011,0.5506,0.560752,0.5506,0.544446
7,2.1138,2.021236,0.5559,0.557078,0.5559,0.548424
8,2.0985,2.012049,0.5569,0.562822,0.5569,0.552006
9,2.0827,2.003586,0.5574,0.557089,0.5574,0.55029
10,2.0644,2.008202,0.5553,0.554699,0.5553,0.549823


[I 2025-04-07 07:47:20,237] Trial 131 finished with value: 0.5498232520649929 and parameters: {'learning_rate': 0.00123191106070283, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 91 with value: 0.550545336158113.


Trial 132 with params: {'learning_rate': 0.003412570624582479, 'weight_decay': 0.008, 'warmup_steps': 26, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6578,2.226828,0.5124,0.548972,0.5124,0.505514
2,2.4072,2.197861,0.5192,0.555504,0.5192,0.5147


[I 2025-04-07 07:50:03,035] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0010217584977757848, 'weight_decay': 0.01, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.719,2.216237,0.5159,0.540101,0.5159,0.506877
2,2.2889,2.107878,0.5344,0.554445,0.5344,0.528807
3,2.2135,2.082474,0.5365,0.553896,0.5365,0.530683
4,2.1786,2.051732,0.5457,0.558193,0.5457,0.540358
5,2.1491,2.059781,0.5432,0.546035,0.5432,0.531333
6,2.1293,2.041912,0.5508,0.559496,0.5508,0.544297
7,2.1117,2.023977,0.5557,0.556229,0.5557,0.547553
8,2.0977,2.014165,0.556,0.561721,0.556,0.551375
9,2.0838,2.00621,0.5571,0.555858,0.5571,0.549791
10,2.0679,2.011752,0.5547,0.553995,0.5547,0.549213


[I 2025-04-07 08:03:34,992] Trial 133 finished with value: 0.5492127269615128 and parameters: {'learning_rate': 0.0010217584977757848, 'weight_decay': 0.01, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 91 with value: 0.550545336158113.


Trial 134 with params: {'learning_rate': 0.0016071244824103939, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6264,2.179415,0.5235,0.547749,0.5235,0.516663
2,2.2902,2.118485,0.5362,0.562664,0.5362,0.532108
3,2.2275,2.087241,0.5355,0.555938,0.5355,0.528902
4,2.1964,2.058671,0.5443,0.563728,0.5443,0.541374
5,2.1676,2.068817,0.5418,0.546718,0.5418,0.529745
6,2.145,2.044918,0.5487,0.561088,0.5487,0.543354
7,2.1239,2.021781,0.556,0.558243,0.556,0.54887
8,2.1055,2.012161,0.5558,0.561641,0.5558,0.550566


[I 2025-04-07 08:14:28,380] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.000590370796460756, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8549,2.302032,0.5006,0.520441,0.5006,0.489345
2,2.3372,2.142209,0.53,0.5433,0.53,0.524165
3,2.2416,2.104012,0.5329,0.542808,0.5329,0.525907
4,2.198,2.06836,0.5433,0.550673,0.5433,0.536783


[I 2025-04-07 08:19:46,871] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0011102213069721187, 'weight_decay': 0.009000000000000001, 'warmup_steps': 27, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7032,2.208238,0.5173,0.54123,0.5173,0.508821
2,2.2865,2.107119,0.5329,0.554141,0.5329,0.527188
3,2.2134,2.08207,0.5352,0.552976,0.5352,0.529139
4,2.1795,2.051986,0.5462,0.559965,0.5462,0.541293
5,2.1502,2.059981,0.5433,0.546553,0.5433,0.531174
6,2.1302,2.041213,0.5514,0.56108,0.5514,0.545101
7,2.1121,2.022502,0.5554,0.556498,0.5554,0.547586
8,2.0976,2.012979,0.5562,0.562022,0.5562,0.551522
9,2.0829,2.004844,0.5568,0.556101,0.5568,0.54964
10,2.066,2.010014,0.5544,0.553789,0.5544,0.548947


[I 2025-04-07 08:33:09,461] Trial 136 finished with value: 0.5489473914134734 and parameters: {'learning_rate': 0.0011102213069721187, 'weight_decay': 0.009000000000000001, 'warmup_steps': 27, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 91 with value: 0.550545336158113.


Trial 137 with params: {'learning_rate': 0.0015601023409106901, 'weight_decay': 0.0, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6452,2.181951,0.5231,0.547777,0.5231,0.516203
2,2.2897,2.116924,0.5367,0.562321,0.5367,0.532261
3,2.2258,2.08665,0.5348,0.5544,0.5348,0.528148
4,2.1944,2.057952,0.5446,0.563605,0.5446,0.541426
5,2.1656,2.067823,0.5414,0.546868,0.5414,0.529443
6,2.1433,2.044259,0.5495,0.561452,0.5495,0.543994
7,2.1225,2.021477,0.5564,0.558115,0.5564,0.549151
8,2.1045,2.012025,0.556,0.561963,0.556,0.550788


[I 2025-04-07 08:43:54,284] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.002819055822915683, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6158,2.202839,0.5159,0.55122,0.5159,0.509241
2,2.3599,2.165424,0.5218,0.554439,0.5218,0.517669
3,2.3063,2.128129,0.528,0.552389,0.528,0.520729
4,2.2736,2.091139,0.5428,0.564386,0.5428,0.540172
5,2.2368,2.101635,0.5361,0.549964,0.5361,0.525383
6,2.2045,2.07029,0.5437,0.561601,0.5437,0.538152
7,2.1736,2.038583,0.5506,0.559517,0.5506,0.54497
8,2.1427,2.025942,0.552,0.559599,0.552,0.545497


[I 2025-04-07 08:54:52,240] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0012045708434121336, 'weight_decay': 0.004, 'warmup_steps': 18, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6742,2.199601,0.5196,0.544084,0.5196,0.511535
2,2.2844,2.106818,0.535,0.557663,0.535,0.529945
3,2.2141,2.082001,0.5349,0.553294,0.5349,0.528735
4,2.1811,2.052578,0.5462,0.561077,0.5462,0.541539
5,2.1522,2.060633,0.5418,0.544903,0.5418,0.529629
6,2.1318,2.040886,0.55,0.559923,0.55,0.543792
7,2.1133,2.021388,0.5563,0.557573,0.5563,0.548792
8,2.0981,2.012129,0.5569,0.562611,0.5569,0.552032
9,2.0826,2.003771,0.5572,0.556797,0.5572,0.550137
10,2.0647,2.008542,0.5557,0.554948,0.5557,0.550155


[I 2025-04-07 09:08:32,655] Trial 139 finished with value: 0.5501551278739559 and parameters: {'learning_rate': 0.0012045708434121336, 'weight_decay': 0.004, 'warmup_steps': 18, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 91 with value: 0.550545336158113.


Trial 140 with params: {'learning_rate': 0.0006570479090201322, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8309,2.280707,0.5039,0.525888,0.5039,0.493087
2,2.3239,2.131479,0.5328,0.546821,0.5328,0.526613
3,2.2327,2.096909,0.535,0.545271,0.535,0.528221
4,2.191,2.062666,0.5444,0.551931,0.5444,0.537847


[I 2025-04-07 09:13:56,028] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0012102102599269135, 'weight_decay': 0.01, 'warmup_steps': 30, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6917,2.200921,0.5186,0.543175,0.5186,0.510566
2,2.2856,2.107431,0.5341,0.556457,0.5341,0.528965
3,2.2147,2.082374,0.5349,0.553223,0.5349,0.528699
4,2.1815,2.052877,0.5463,0.561442,0.5463,0.541698
5,2.1526,2.060899,0.5417,0.545234,0.5417,0.52956
6,2.1321,2.041002,0.5505,0.560502,0.5505,0.544347
7,2.1135,2.02143,0.5562,0.55742,0.5562,0.548682
8,2.0983,2.01216,0.557,0.562731,0.557,0.552106
9,2.0826,2.003764,0.5573,0.556845,0.5573,0.550216
10,2.0646,2.008501,0.555,0.55433,0.555,0.5495


[I 2025-04-07 09:27:35,073] Trial 141 finished with value: 0.5495001468733564 and parameters: {'learning_rate': 0.0012102102599269135, 'weight_decay': 0.01, 'warmup_steps': 30, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 91 with value: 0.550545336158113.


Trial 142 with params: {'learning_rate': 0.0010906692609828946, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6929,2.208511,0.5173,0.541435,0.5173,0.508717
2,2.286,2.106778,0.5332,0.554236,0.5332,0.527485
3,2.2129,2.081869,0.5353,0.552851,0.5353,0.529394
4,2.1789,2.051699,0.5459,0.559171,0.5459,0.540854
5,2.1497,2.059732,0.5431,0.54624,0.5431,0.531012
6,2.1298,2.041256,0.5511,0.560516,0.5511,0.544801
7,2.1119,2.022721,0.5554,0.556591,0.5554,0.547577
8,2.0975,2.013167,0.5565,0.562151,0.5565,0.55179
9,2.083,2.005056,0.5574,0.556532,0.5574,0.550179
10,2.0663,2.010327,0.5544,0.553756,0.5544,0.548926


[I 2025-04-07 09:40:58,649] Trial 142 finished with value: 0.5489258960126668 and parameters: {'learning_rate': 0.0010906692609828946, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 91 with value: 0.550545336158113.


Trial 143 with params: {'learning_rate': 0.0015621364392326888, 'weight_decay': 0.004, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6363,2.181175,0.5236,0.548024,0.5236,0.516689
2,2.2892,2.116627,0.5367,0.562148,0.5367,0.53225
3,2.2257,2.086535,0.5347,0.554146,0.5347,0.528034
4,2.1944,2.057878,0.5446,0.563409,0.5446,0.54143
5,2.1656,2.067777,0.5414,0.546866,0.5414,0.529416
6,2.1433,2.044198,0.5498,0.561693,0.5498,0.544266
7,2.1225,2.021442,0.5565,0.558164,0.5565,0.549248
8,2.1045,2.011979,0.556,0.561968,0.556,0.550774


[I 2025-04-07 09:51:38,983] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 6.847251037088202e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7855,3.463411,0.3168,0.366488,0.3168,0.30124
2,3.2357,3.014882,0.4053,0.453816,0.4053,0.390957
3,2.9451,2.776129,0.4371,0.461208,0.4371,0.419373
4,2.7808,2.644195,0.4573,0.472366,0.4573,0.441335


[I 2025-04-07 09:56:59,596] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.00015744600691410397, 'weight_decay': 0.005, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4411,2.907518,0.4173,0.444573,0.4173,0.398232
2,2.7519,2.504772,0.4818,0.502377,0.4818,0.47091


[I 2025-04-07 09:59:40,633] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0006278205450938924, 'weight_decay': 0.006, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8337,2.288265,0.5024,0.522938,0.5024,0.491342
2,2.3287,2.13547,0.5313,0.54509,0.5313,0.525287
3,2.236,2.099587,0.5336,0.543881,0.5336,0.526733
4,2.1937,2.064849,0.5444,0.551961,0.5444,0.53803


[I 2025-04-07 10:04:57,224] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.0013369611788629047, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6544,2.190859,0.5199,0.54465,0.5199,0.512441
2,2.2846,2.108525,0.536,0.559178,0.536,0.531066
3,2.2171,2.083031,0.5348,0.553664,0.5348,0.52846
4,2.185,2.054145,0.5461,0.562147,0.5461,0.541843
5,2.1563,2.062701,0.5417,0.545907,0.5417,0.529554
6,2.1353,2.041357,0.5506,0.561199,0.5506,0.544522
7,2.116,2.020725,0.5565,0.557635,0.5565,0.549057
8,2.0999,2.011603,0.5562,0.561816,0.5562,0.551068


[I 2025-04-07 10:15:26,815] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.002361667525517817, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6161,2.186422,0.5203,0.551539,0.5203,0.513947
2,2.3285,2.152302,0.5257,0.560239,0.5257,0.5221


[I 2025-04-07 10:18:10,444] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0010030210472888152, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6905,2.214449,0.5156,0.539547,0.5156,0.506771
2,2.2875,2.107025,0.5344,0.553682,0.5344,0.528753
3,2.2127,2.081971,0.5365,0.552779,0.5365,0.530499
4,2.178,2.051279,0.5449,0.556865,0.5449,0.539446
5,2.1486,2.059483,0.543,0.546276,0.543,0.531251
6,2.1289,2.041957,0.5505,0.559134,0.5505,0.54394
7,2.1115,2.024155,0.5559,0.556505,0.5559,0.547865
8,2.0977,2.014355,0.5563,0.56197,0.5563,0.551714
9,2.084,2.006403,0.5576,0.556469,0.5576,0.550339
10,2.0683,2.012084,0.5545,0.55381,0.5545,0.548984


[I 2025-04-07 10:31:33,872] Trial 149 finished with value: 0.5489836397092521 and parameters: {'learning_rate': 0.0010030210472888152, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 91 with value: 0.550545336158113.


In [34]:
print(best_distill_head)

BestRun(run_id='91', objective=0.550545336158113, hyperparameters={'learning_rate': 0.0012457733887986462, 'weight_decay': 0.001, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 5.0}, run_summary=None)


In [11]:
base.reset_seed()

## Prohledávání s normálním tréninkem s doučením předtrénovaného modelu
Konfigurace jednotlivých tréninků.

In [12]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [13]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.


In [14]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [15]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(100)
)
  


Nastavení prohledávání.


In [None]:
best_base_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-01 22:44:31,611] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6264,0.932228,0.7229,0.730839,0.7229,0.721594
2,0.6662,0.846432,0.7495,0.759946,0.7495,0.74881


[I 2025-04-01 22:47:17,504] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5289,1.153658,0.663,0.684659,0.663,0.65924
2,0.8322,1.02309,0.7064,0.728423,0.7064,0.703468
3,0.5365,0.950274,0.7399,0.752207,0.7399,0.739412
4,0.3492,0.978343,0.7441,0.757957,0.7441,0.744765


[I 2025-04-01 22:52:41,310] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6094,1.351212,0.6367,0.642072,0.6367,0.630582
2,1.1455,1.032156,0.7102,0.720496,0.7102,0.708836


[I 2025-04-01 22:55:28,420] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7563,1.341508,0.6187,0.641301,0.6187,0.61287
2,1.0412,1.11806,0.6812,0.7021,0.6812,0.680071
3,0.7178,1.0196,0.7094,0.724628,0.7094,0.708489
4,0.5069,1.027991,0.7214,0.735601,0.7214,0.721355


[I 2025-04-01 23:00:52,636] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0321,1.573341,0.5596,0.586553,0.5596,0.551358
2,1.2943,1.290666,0.634,0.655821,0.634,0.630061


[I 2025-04-01 23:03:34,284] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0838,1.073318,0.6913,0.697226,0.6913,0.688148
2,0.8627,0.903903,0.7356,0.747274,0.7356,0.735175
3,0.5522,0.834433,0.7547,0.762687,0.7547,0.753581
4,0.362,0.838887,0.7559,0.764612,0.7559,0.75642
5,0.234,0.86256,0.7579,0.766374,0.7579,0.758077
6,0.149,0.891186,0.7584,0.767065,0.7584,0.758837
7,0.0952,0.877653,0.7636,0.76751,0.7636,0.762625
8,0.0616,0.928996,0.7606,0.766083,0.7606,0.76067


[I 2025-04-01 23:14:47,636] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5482,0.964171,0.7141,0.725802,0.7141,0.713178
2,0.6666,0.862226,0.7465,0.755383,0.7465,0.744997
3,0.3818,0.867058,0.7514,0.761909,0.7514,0.750191
4,0.2165,0.89916,0.7578,0.774149,0.7578,0.759542


[I 2025-04-01 23:20:18,376] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2264,1.147199,0.6784,0.685399,0.6784,0.674831
2,0.9436,0.935505,0.7282,0.740042,0.7282,0.727978
3,0.6288,0.850105,0.7497,0.756703,0.7497,0.748315
4,0.4388,0.841823,0.7498,0.756406,0.7498,0.749855
5,0.3067,0.849415,0.7568,0.764749,0.7568,0.756522
6,0.2149,0.869797,0.7552,0.763459,0.7552,0.755527
7,0.152,0.857771,0.7619,0.765891,0.7619,0.760845
8,0.1093,0.899107,0.7583,0.763599,0.7583,0.758596


[I 2025-04-01 23:31:01,274] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4861,0.99052,0.7074,0.722299,0.7074,0.706154
2,0.6726,0.875135,0.7477,0.757995,0.7477,0.746663
3,0.3868,0.86629,0.7543,0.765573,0.7543,0.753044
4,0.2262,0.88581,0.7651,0.776898,0.7651,0.765857
5,0.1275,0.950612,0.7606,0.768351,0.7606,0.759415
6,0.0694,0.959534,0.7721,0.77837,0.7721,0.77187
7,0.0389,0.971564,0.7732,0.776505,0.7732,0.771707
8,0.0183,1.014284,0.78,0.784688,0.78,0.779016
9,0.0072,0.991929,0.784,0.786903,0.784,0.783115
10,0.0034,0.968233,0.7892,0.792291,0.7892,0.788522


[I 2025-04-01 23:44:44,648] Trial 8 finished with value: 0.7885223960150077 and parameters: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.7885223960150077.


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4604,1.039097,0.6961,0.711389,0.6961,0.693908
2,0.7198,0.931947,0.7282,0.744675,0.7282,0.727084


[I 2025-04-01 23:47:29,722] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5115,1.31053,0.6412,0.645802,0.6412,0.635638
2,1.1099,1.013544,0.7112,0.722051,0.7112,0.710142
3,0.7794,0.894533,0.7364,0.742967,0.7364,0.734848
4,0.5854,0.868159,0.7428,0.749821,0.7428,0.742277


[I 2025-04-01 23:53:02,463] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0008255712395727001, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5505,1.163899,0.6658,0.68586,0.6658,0.661057
2,0.842,1.022692,0.7077,0.723401,0.7077,0.705654


[I 2025-04-01 23:55:46,109] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 5.3573067071623195e-05, 'weight_decay': 0.0, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8503,1.506562,0.6065,0.612625,0.6065,0.597602
2,1.2794,1.104916,0.6942,0.70341,0.6942,0.692174
3,0.915,0.954428,0.721,0.727422,0.721,0.719277
4,0.717,0.911051,0.7341,0.740279,0.7341,0.73287


[I 2025-04-02 00:01:10,262] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 5.372291923575569e-05, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7645,1.484807,0.6124,0.618593,0.6124,0.604602
2,1.2647,1.09956,0.6928,0.701994,0.6928,0.690716


[I 2025-04-02 00:03:49,156] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 8.840349414475647e-05, 'weight_decay': 0.006, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3467,1.183426,0.6699,0.67576,0.6699,0.665348
2,0.9855,0.953215,0.7247,0.73689,0.7247,0.724344
3,0.6667,0.860092,0.7469,0.754089,0.7469,0.745847
4,0.4743,0.84536,0.7513,0.758653,0.7513,0.751277
5,0.3411,0.850877,0.7551,0.763297,0.7551,0.754857
6,0.2456,0.869463,0.7539,0.76166,0.7539,0.754031
7,0.1815,0.846662,0.7609,0.764888,0.7609,0.759749
8,0.1347,0.887347,0.7595,0.764911,0.7595,0.759875


[I 2025-04-02 00:14:33,819] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.000545387384751194, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4905,1.048185,0.6942,0.71109,0.6942,0.691766
2,0.7212,0.941277,0.7252,0.738104,0.7252,0.723
3,0.4367,0.897362,0.7503,0.762157,0.7503,0.749237
4,0.2679,0.927522,0.7576,0.767648,0.7576,0.757914


[I 2025-04-02 00:20:08,938] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00017466177826022436, 'weight_decay': 0.007, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8042,0.973249,0.7152,0.722419,0.7152,0.713817
2,0.736,0.852654,0.7521,0.7617,0.7521,0.752036
3,0.4297,0.823653,0.7607,0.769525,0.7607,0.760385
4,0.2516,0.855114,0.7616,0.77265,0.7616,0.762903
5,0.1395,0.896953,0.7615,0.770556,0.7615,0.761075
6,0.076,0.932645,0.7633,0.770627,0.7633,0.763352
7,0.0415,0.921039,0.7724,0.775011,0.7724,0.77154
8,0.0221,0.978512,0.7697,0.774349,0.7697,0.769139
9,0.0125,0.981338,0.7712,0.775176,0.7712,0.77072
10,0.0079,0.983248,0.775,0.778763,0.775,0.774286


[I 2025-04-02 00:33:58,291] Trial 16 finished with value: 0.7742859245391478 and parameters: {'learning_rate': 0.00017466177826022436, 'weight_decay': 0.007, 'warmup_steps': 12}. Best is trial 8 with value: 0.7885223960150077.


Trial 17 with params: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.963,1.515077,0.575,0.602056,0.575,0.566616
2,1.2268,1.296283,0.6369,0.664756,0.6369,0.635179


[I 2025-04-02 00:36:43,905] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.00022338791112731283, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6832,0.952913,0.7198,0.730076,0.7198,0.719251
2,0.6884,0.848113,0.7517,0.760863,0.7517,0.751463
3,0.3895,0.828008,0.7647,0.775237,0.7647,0.764943
4,0.2182,0.86604,0.7667,0.778897,0.7667,0.768155
5,0.1158,0.929401,0.7619,0.773792,0.7619,0.761813
6,0.0608,0.943927,0.771,0.779416,0.771,0.771652
7,0.0326,0.936759,0.7762,0.779513,0.7762,0.775328
8,0.0172,0.991163,0.7759,0.782432,0.7759,0.776136


[I 2025-04-02 00:47:48,486] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.002961935479501581, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1843,1.693319,0.5232,0.555691,0.5232,0.512753
2,1.4059,1.390666,0.6091,0.645305,0.6091,0.607159
3,1.058,1.257208,0.6395,0.665309,0.6395,0.637084
4,0.8149,1.189191,0.6658,0.679422,0.6658,0.664866


[I 2025-04-02 00:53:15,466] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0007288044441792408, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4894,1.111494,0.6806,0.695459,0.6806,0.677398
2,0.8039,0.979261,0.7201,0.736544,0.7201,0.718998


[I 2025-04-02 00:55:59,918] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00020591268049360804, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.722,0.950486,0.7232,0.730703,0.7232,0.722136
2,0.6981,0.856159,0.7486,0.760927,0.7486,0.748165
3,0.3988,0.825973,0.7605,0.770205,0.7605,0.760215
4,0.2243,0.869417,0.7585,0.771371,0.7585,0.759764
5,0.1197,0.926962,0.7574,0.768065,0.7574,0.756925
6,0.0638,0.937445,0.7681,0.775019,0.7681,0.76812
7,0.0345,0.938368,0.7679,0.772626,0.7679,0.766666
8,0.018,0.975109,0.7754,0.780446,0.7754,0.775558
9,0.0095,0.98116,0.7782,0.781945,0.7782,0.777849
10,0.0055,0.977215,0.7754,0.779924,0.7754,0.774796


[I 2025-04-02 01:10:06,136] Trial 21 finished with value: 0.7747956005493851 and parameters: {'learning_rate': 0.00020591268049360804, 'weight_decay': 0.006, 'warmup_steps': 15}. Best is trial 8 with value: 0.7885223960150077.


Trial 22 with params: {'learning_rate': 0.0002947384149976914, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.575,0.956468,0.7166,0.729106,0.7166,0.71588
2,0.6612,0.848621,0.7525,0.761258,0.7525,0.751802
3,0.3723,0.848208,0.7607,0.772142,0.7607,0.76092
4,0.207,0.904789,0.7598,0.775486,0.7598,0.761343
5,0.1125,0.918541,0.7627,0.774056,0.7627,0.762425
6,0.0593,0.982794,0.764,0.77408,0.764,0.765123
7,0.0333,0.966289,0.7718,0.774292,0.7718,0.770161
8,0.016,1.001512,0.7798,0.784184,0.7798,0.779327


[I 2025-04-02 01:21:10,488] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 8.486897139951585e-05, 'weight_decay': 0.006, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3224,1.199111,0.6683,0.674691,0.6683,0.664127
2,0.9981,0.959713,0.7235,0.734995,0.7235,0.723084
3,0.6809,0.863315,0.7419,0.748612,0.7419,0.740531
4,0.4894,0.843879,0.7497,0.756057,0.7497,0.749425


[I 2025-04-02 01:26:39,191] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.00042912910838429936, 'weight_decay': 0.004, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4899,0.982765,0.7093,0.72485,0.7093,0.707466
2,0.6737,0.895411,0.7395,0.748191,0.7395,0.737544
3,0.3916,0.882815,0.7557,0.766717,0.7557,0.75548
4,0.2289,0.901241,0.7606,0.772613,0.7606,0.761697
5,0.1299,0.97387,0.7552,0.7659,0.7552,0.753775
6,0.0744,0.955601,0.7734,0.780816,0.7734,0.773568
7,0.0414,0.983055,0.7729,0.777135,0.7729,0.771671
8,0.0193,1.044738,0.7762,0.78336,0.7762,0.776319


[I 2025-04-02 01:37:47,952] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0027693395374376512, 'weight_decay': 0.0, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1346,1.671119,0.5312,0.559742,0.5312,0.519041
2,1.3614,1.387516,0.6133,0.64021,0.6133,0.609726
3,1.0217,1.219844,0.653,0.679475,0.653,0.652375


[I 2025-04-02 07:11:34,598] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0005146325625849073, 'weight_decay': 0.004, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5161,1.024985,0.6992,0.715551,0.6992,0.696954
2,0.7157,0.905347,0.7388,0.748434,0.7388,0.736637
3,0.4286,0.901881,0.7493,0.759629,0.7493,0.748656
4,0.2604,0.922422,0.7548,0.768296,0.7548,0.755141


[I 2025-04-02 07:17:07,901] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0003398456972043675, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.515,0.948235,0.7185,0.72999,0.7185,0.717422
2,0.6541,0.860566,0.7508,0.759921,0.7508,0.749947
3,0.3669,0.860695,0.7562,0.766638,0.7562,0.755969
4,0.2089,0.888229,0.7618,0.773632,0.7618,0.762539
5,0.1131,0.93333,0.765,0.775618,0.765,0.764811
6,0.0614,0.969641,0.7669,0.774992,0.7669,0.76668
7,0.0344,0.971111,0.7707,0.775935,0.7707,0.769839
8,0.0157,1.0085,0.7789,0.785444,0.7789,0.778507
9,0.0071,0.990517,0.7805,0.783923,0.7805,0.779818
10,0.0035,0.977247,0.7884,0.791552,0.7884,0.787436


[I 2025-04-02 07:30:36,642] Trial 65 finished with value: 0.7874364599143608 and parameters: {'learning_rate': 0.0003398456972043675, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6}. Best is trial 8 with value: 0.7885223960150077.


Trial 66 with params: {'learning_rate': 0.0004489499112098391, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4856,1.02357,0.6991,0.716103,0.6991,0.697264
2,0.6841,0.883919,0.7401,0.753433,0.7401,0.739946
3,0.3965,0.880303,0.7566,0.767702,0.7566,0.756781
4,0.2336,0.906988,0.7592,0.769507,0.7592,0.759671


[I 2025-04-02 07:36:03,857] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.000720321408719951, 'weight_decay': 0.002, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5519,1.09961,0.6798,0.702058,0.6798,0.676927
2,0.8071,1.011114,0.7128,0.735025,0.7128,0.712003
3,0.511,0.93957,0.7406,0.756329,0.7406,0.741017
4,0.3315,0.963434,0.7432,0.75682,0.7432,0.743637


[I 2025-04-02 07:41:28,427] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.00042840131572042395, 'weight_decay': 0.003, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5438,1.005572,0.7052,0.719444,0.7052,0.704725
2,0.682,0.893455,0.7393,0.754458,0.7393,0.739298
3,0.3957,0.861104,0.76,0.771477,0.76,0.759019
4,0.2334,0.899743,0.7602,0.7713,0.7602,0.760183
5,0.1323,0.956187,0.7616,0.774363,0.7616,0.761356
6,0.0738,0.978759,0.7689,0.776026,0.7689,0.768761
7,0.0411,1.011426,0.7692,0.774949,0.7692,0.768469
8,0.0192,1.043067,0.776,0.780655,0.776,0.774972


[I 2025-04-02 07:52:31,675] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.00021386813300884093, 'weight_decay': 0.005, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7105,0.942931,0.7214,0.728709,0.7214,0.720458
2,0.6923,0.838657,0.7532,0.761188,0.7532,0.75268
3,0.3922,0.826499,0.7643,0.773798,0.7643,0.763566
4,0.2191,0.855689,0.7603,0.770457,0.7603,0.761009
5,0.1179,0.917075,0.7625,0.772486,0.7625,0.762184
6,0.0614,0.9545,0.7679,0.776962,0.7679,0.768572
7,0.0338,0.945867,0.7682,0.771758,0.7682,0.767592
8,0.0171,0.978922,0.7743,0.78057,0.7743,0.774585


[I 2025-04-02 08:03:44,426] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.001507561280625615, 'weight_decay': 0.007, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7903,1.396598,0.5995,0.626387,0.5995,0.591624
2,1.0951,1.210514,0.6592,0.684198,0.6592,0.657605


[I 2025-04-02 08:06:36,001] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.000357382630993563, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4973,0.960604,0.7163,0.727965,0.7163,0.71526
2,0.658,0.8783,0.7434,0.755415,0.7434,0.742543
3,0.3711,0.847681,0.7599,0.770647,0.7599,0.759829
4,0.2115,0.867406,0.7673,0.776909,0.7673,0.767463
5,0.1169,0.936603,0.767,0.776191,0.767,0.76655
6,0.0642,0.979167,0.7687,0.775755,0.7687,0.768364
7,0.0345,0.989775,0.771,0.775338,0.771,0.77031
8,0.0167,1.017602,0.7746,0.780954,0.7746,0.774295


[I 2025-04-02 08:17:30,395] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00012663811278949076, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9975,1.044622,0.6994,0.706255,0.6994,0.69723
2,0.8299,0.888527,0.7406,0.751621,0.7406,0.740493
3,0.5228,0.828682,0.7579,0.7654,0.7579,0.756916
4,0.3362,0.847271,0.7547,0.764958,0.7547,0.755599


[I 2025-04-02 08:23:09,832] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7317,1.420748,0.6222,0.628338,0.6222,0.615205
2,1.2075,1.064615,0.7002,0.709874,0.7002,0.698535
3,0.8585,0.926533,0.729,0.736061,0.729,0.727672
4,0.6618,0.891106,0.7378,0.744247,0.7378,0.73674


[I 2025-04-02 08:28:39,586] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.00038346766832388047, 'weight_decay': 0.007, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.482,0.971946,0.7135,0.725427,0.7135,0.712314
2,0.664,0.873482,0.7424,0.751475,0.7424,0.741428
3,0.379,0.880176,0.7558,0.766208,0.7558,0.75397
4,0.2194,0.894329,0.7631,0.77525,0.7631,0.764129
5,0.12,0.948759,0.7626,0.774491,0.7626,0.762082
6,0.0659,0.985827,0.766,0.774242,0.766,0.7657
7,0.0371,0.977751,0.7743,0.779075,0.7743,0.773834
8,0.0175,1.014941,0.7797,0.784066,0.7797,0.779176
9,0.0074,1.026035,0.7801,0.786972,0.7801,0.780103
10,0.0033,1.00665,0.7832,0.786938,0.7832,0.782512


[I 2025-04-02 08:42:07,889] Trial 74 finished with value: 0.7825118906915481 and parameters: {'learning_rate': 0.00038346766832388047, 'weight_decay': 0.007, 'warmup_steps': 2}. Best is trial 8 with value: 0.7885223960150077.


Trial 75 with params: {'learning_rate': 0.00042835609838931066, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4991,1.004133,0.7065,0.722761,0.7065,0.704541
2,0.6819,0.899904,0.7372,0.748424,0.7372,0.735824
3,0.3972,0.870026,0.7569,0.767189,0.7569,0.756453
4,0.233,0.92411,0.7554,0.767177,0.7554,0.756326


[I 2025-04-02 08:47:39,753] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.00040369079538524926, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5327,0.964265,0.7162,0.728327,0.7162,0.71494
2,0.6716,0.878683,0.744,0.754953,0.744,0.742325
3,0.3887,0.866243,0.7552,0.767779,0.7552,0.755054
4,0.2247,0.919751,0.7568,0.771368,0.7568,0.757525


[I 2025-04-02 08:53:26,882] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.001007761125954244, 'weight_decay': 0.01, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6525,1.225699,0.6463,0.668135,0.6463,0.641922
2,0.9291,1.051527,0.6992,0.718051,0.6992,0.696902
3,0.627,0.989425,0.7244,0.744653,0.7244,0.725327
4,0.4242,0.968088,0.7343,0.746228,0.7343,0.734863


[I 2025-04-02 08:59:11,205] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.00024924097254166083, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6106,0.938484,0.7222,0.732272,0.7222,0.721238
2,0.6712,0.846057,0.7529,0.761996,0.7529,0.752205
3,0.3755,0.831889,0.7609,0.771791,0.7609,0.760357
4,0.2097,0.890064,0.7597,0.772278,0.7597,0.760934
5,0.1111,0.946966,0.7614,0.772296,0.7614,0.760743
6,0.0598,0.975846,0.7667,0.773895,0.7667,0.766175
7,0.0324,0.95074,0.7718,0.775029,0.7718,0.77057
8,0.0163,1.015175,0.7734,0.779141,0.7734,0.772911


[I 2025-04-02 09:10:28,717] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.00032319939962330524, 'weight_decay': 0.004, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5778,0.959839,0.7176,0.728599,0.7176,0.716576
2,0.6628,0.861553,0.75,0.759141,0.75,0.74908
3,0.3716,0.844273,0.7606,0.771743,0.7606,0.760407
4,0.2092,0.877021,0.7631,0.773935,0.7631,0.764258
5,0.1152,0.960272,0.7622,0.773711,0.7622,0.761517
6,0.0627,0.984321,0.7658,0.776769,0.7658,0.765957
7,0.0355,0.963603,0.7698,0.774786,0.7698,0.768618
8,0.0162,0.996585,0.7763,0.780663,0.7763,0.775933


[I 2025-04-02 09:21:36,642] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.00045152822009434404, 'weight_decay': 0.008, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4711,1.001413,0.7024,0.718824,0.7024,0.700378
2,0.6819,0.881501,0.7409,0.752833,0.7409,0.740082
3,0.3996,0.885978,0.7537,0.764384,0.7537,0.753023
4,0.2382,0.895212,0.7604,0.77079,0.7604,0.761137
5,0.1344,0.949112,0.7599,0.769346,0.7599,0.759245
6,0.0751,0.995564,0.7602,0.770803,0.7602,0.760614
7,0.0411,0.990801,0.7707,0.775065,0.7707,0.769737
8,0.0208,1.021452,0.7803,0.7868,0.7803,0.780766
9,0.0079,1.012311,0.7851,0.788866,0.7851,0.784561
10,0.0035,1.002723,0.7856,0.789769,0.7856,0.784841


[I 2025-04-02 09:35:52,278] Trial 80 finished with value: 0.7848414791392144 and parameters: {'learning_rate': 0.00045152822009434404, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 8 with value: 0.7885223960150077.


Trial 81 with params: {'learning_rate': 0.00016911288291160924, 'weight_decay': 0.004, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7891,0.967514,0.7164,0.724015,0.7164,0.715239
2,0.7379,0.855176,0.7481,0.7589,0.7481,0.748218
3,0.4341,0.821742,0.7573,0.765035,0.7573,0.756556
4,0.254,0.854047,0.7584,0.768724,0.7584,0.758813


[I 2025-04-02 09:41:26,002] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.000578275252392436, 'weight_decay': 0.007, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4804,1.050711,0.6934,0.706889,0.6934,0.690227
2,0.7303,0.933109,0.7333,0.745598,0.7333,0.731742
3,0.4509,0.900556,0.7449,0.756504,0.7449,0.744007
4,0.2756,0.949282,0.7486,0.760747,0.7486,0.748441


[I 2025-04-02 09:46:54,169] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0005640241621813505, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4729,1.055894,0.6899,0.704956,0.6899,0.687564
2,0.728,0.917879,0.7318,0.741463,0.7318,0.729499


[I 2025-04-02 09:49:35,698] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.00019342388687107164, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7101,0.957699,0.7164,0.725813,0.7164,0.715523
2,0.7076,0.841688,0.7524,0.761747,0.7524,0.752328
3,0.4054,0.82136,0.76,0.768048,0.76,0.758776
4,0.2303,0.868217,0.7586,0.768944,0.7586,0.759486
5,0.1248,0.914364,0.762,0.77249,0.762,0.762133
6,0.0656,0.952355,0.7646,0.771766,0.7646,0.76467
7,0.0359,0.930552,0.7681,0.771418,0.7681,0.76726
8,0.0191,0.990505,0.7733,0.779001,0.7733,0.773111


[I 2025-04-02 10:00:58,479] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0003320890357342316, 'weight_decay': 0.008, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.526,0.948613,0.7156,0.726818,0.7156,0.71498
2,0.6554,0.851779,0.7487,0.758181,0.7487,0.748455
3,0.3648,0.86,0.7567,0.768936,0.7567,0.756072
4,0.2058,0.900524,0.7602,0.770509,0.7602,0.760496
5,0.1103,0.93219,0.7662,0.776777,0.7662,0.766417
6,0.0616,0.980168,0.7631,0.769763,0.7631,0.762438
7,0.0336,1.004631,0.7659,0.771162,0.7659,0.764717
8,0.0164,1.004922,0.7796,0.785242,0.7796,0.779426
9,0.0073,1.008462,0.7839,0.788544,0.7839,0.783331
10,0.0034,1.007619,0.7803,0.784471,0.7803,0.779622


[I 2025-04-02 10:14:50,658] Trial 85 finished with value: 0.7796221004607269 and parameters: {'learning_rate': 0.0003320890357342316, 'weight_decay': 0.008, 'warmup_steps': 8}. Best is trial 8 with value: 0.7885223960150077.


Trial 86 with params: {'learning_rate': 0.0004216402991447079, 'weight_decay': 0.006, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5104,0.994697,0.7085,0.722468,0.7085,0.706529
2,0.676,0.873422,0.7474,0.758187,0.7474,0.745514
3,0.3936,0.85755,0.76,0.770627,0.76,0.75978
4,0.2292,0.890353,0.7643,0.775012,0.7643,0.764646
5,0.1281,0.959592,0.7633,0.775092,0.7633,0.762999
6,0.0715,0.996013,0.7636,0.773512,0.7636,0.76439
7,0.04,0.9844,0.772,0.775193,0.772,0.770357
8,0.0189,1.028571,0.7792,0.784259,0.7792,0.77886
9,0.0082,1.008834,0.7819,0.787185,0.7819,0.781981
10,0.0036,1.009722,0.7852,0.789192,0.7852,0.784562


[I 2025-04-02 10:28:41,218] Trial 86 finished with value: 0.784561747390131 and parameters: {'learning_rate': 0.0004216402991447079, 'weight_decay': 0.006, 'warmup_steps': 14}. Best is trial 8 with value: 0.7885223960150077.


Trial 87 with params: {'learning_rate': 0.0004238161165039015, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4696,0.976862,0.7123,0.723169,0.7123,0.711006
2,0.6773,0.875296,0.7426,0.753417,0.7426,0.741978
3,0.3906,0.87293,0.7567,0.768263,0.7567,0.756206
4,0.2295,0.892954,0.7624,0.773049,0.7624,0.762874
5,0.1312,0.960648,0.7621,0.772703,0.7621,0.762397
6,0.0724,0.973965,0.7711,0.780401,0.7711,0.771591
7,0.041,0.993149,0.7718,0.776808,0.7718,0.771292
8,0.0203,1.022906,0.778,0.783014,0.778,0.777836
9,0.0077,1.025004,0.7811,0.784677,0.7811,0.77988
10,0.0034,1.011957,0.7831,0.786534,0.7831,0.781981


[I 2025-04-02 10:42:25,832] Trial 87 finished with value: 0.7819808910568892 and parameters: {'learning_rate': 0.0004238161165039015, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}. Best is trial 8 with value: 0.7885223960150077.


Trial 88 with params: {'learning_rate': 0.0008825970450490194, 'weight_decay': 0.003, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5608,1.200338,0.6559,0.677924,0.6559,0.651728
2,0.8755,1.043922,0.7047,0.722582,0.7047,0.702451


[I 2025-04-02 10:45:16,843] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.00017994677753402992, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8377,0.965473,0.7167,0.724378,0.7167,0.715526
2,0.7322,0.852789,0.7491,0.760853,0.7491,0.749482
3,0.4252,0.819295,0.7601,0.770056,0.7601,0.760305
4,0.2458,0.858386,0.7597,0.771866,0.7597,0.761021
5,0.1353,0.900199,0.7609,0.771063,0.7609,0.760957
6,0.0731,0.930368,0.7659,0.773428,0.7659,0.766099
7,0.0401,0.930292,0.7697,0.772973,0.7697,0.768707
8,0.0218,0.976635,0.7729,0.778813,0.7729,0.773544


[I 2025-04-02 10:56:25,500] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0014029949346319095, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7554,1.370251,0.614,0.641,0.614,0.607882
2,1.072,1.156439,0.67,0.69063,0.67,0.667269
3,0.7578,1.097035,0.6964,0.714398,0.6964,0.695871
4,0.5382,1.052747,0.7098,0.724163,0.7098,0.709231


[I 2025-04-02 11:02:09,029] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0006199582040701769, 'weight_decay': 0.006, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4928,1.073742,0.6883,0.708524,0.6883,0.686812
2,0.7581,0.962879,0.726,0.741283,0.726,0.72447


[I 2025-04-02 11:04:58,905] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0009277636633998966, 'weight_decay': 0.005, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5958,1.222582,0.6522,0.67272,0.6522,0.647836
2,0.8894,1.039416,0.7002,0.722956,0.7002,0.698698
3,0.5936,0.972793,0.7297,0.742629,0.7297,0.729831
4,0.3989,0.977187,0.7369,0.754943,0.7369,0.737771


[I 2025-04-02 11:10:32,757] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0004765188335468686, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5105,1.016357,0.7022,0.720099,0.7022,0.699532
2,0.6978,0.890449,0.7439,0.755249,0.7439,0.742659
3,0.4151,0.872662,0.7557,0.767939,0.7557,0.755607
4,0.2539,0.911118,0.7571,0.766133,0.7571,0.757651


[I 2025-04-02 11:16:15,096] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0004034971203341916, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5245,0.992044,0.7089,0.72544,0.7089,0.708827
2,0.6734,0.876017,0.7456,0.754321,0.7456,0.743616
3,0.3867,0.858102,0.756,0.767873,0.756,0.754926
4,0.2232,0.916405,0.7569,0.770044,0.7569,0.758062
5,0.1264,0.950567,0.7616,0.770854,0.7616,0.759916
6,0.0694,0.960886,0.7696,0.777453,0.7696,0.769789
7,0.0387,0.97013,0.7742,0.780618,0.7742,0.772976
8,0.0179,1.007719,0.7795,0.785282,0.7795,0.779201
9,0.0078,0.988982,0.7894,0.794828,0.7894,0.789461
10,0.0034,0.989369,0.7858,0.790125,0.7858,0.785305


[I 2025-04-02 11:30:25,770] Trial 94 finished with value: 0.7853050859532834 and parameters: {'learning_rate': 0.0004034971203341916, 'weight_decay': 0.006, 'warmup_steps': 19}. Best is trial 8 with value: 0.7885223960150077.


Trial 95 with params: {'learning_rate': 0.0002631690952758222, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6275,0.940284,0.7231,0.733209,0.7231,0.722593
2,0.6682,0.840177,0.7526,0.761408,0.7526,0.752076
3,0.3735,0.831515,0.7633,0.773582,0.7633,0.763225
4,0.2055,0.870394,0.7644,0.776274,0.7644,0.765928
5,0.11,0.920952,0.7674,0.777151,0.7674,0.767159
6,0.0589,0.945261,0.7699,0.776518,0.7699,0.769842
7,0.0311,0.966444,0.7703,0.775905,0.7703,0.769656
8,0.015,1.008355,0.772,0.777498,0.772,0.771847


[I 2025-04-02 11:41:33,700] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.0003928745423540618, 'weight_decay': 0.005, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5447,0.96824,0.712,0.725542,0.712,0.710415
2,0.6715,0.867982,0.747,0.75844,0.747,0.746325
3,0.3857,0.87039,0.7579,0.769706,0.7579,0.757404
4,0.2215,0.925175,0.7559,0.766671,0.7559,0.756636


[I 2025-04-02 11:47:11,990] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.000593671608535364, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5206,1.055532,0.6899,0.709672,0.6899,0.68699
2,0.7444,0.934252,0.7294,0.742064,0.7294,0.727963


[I 2025-04-02 11:50:03,175] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.00024079733639861185, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6267,0.935226,0.7242,0.734156,0.7242,0.724388
2,0.6721,0.845849,0.7563,0.766101,0.7563,0.756175
3,0.3779,0.834593,0.7595,0.769207,0.7595,0.75933
4,0.2081,0.898121,0.7595,0.771747,0.7595,0.760688
5,0.1112,0.953489,0.7632,0.774668,0.7632,0.762896
6,0.0573,0.945908,0.7707,0.778195,0.7707,0.771115
7,0.0308,0.932878,0.7718,0.774846,0.7718,0.770759
8,0.0156,0.976273,0.7738,0.779563,0.7738,0.773773


[I 2025-04-02 12:01:06,894] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0002522484714214018, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6148,0.942399,0.7207,0.730898,0.7207,0.720397
2,0.6683,0.838012,0.7555,0.764835,0.7555,0.754895
3,0.3739,0.842519,0.7582,0.76994,0.7582,0.758065
4,0.207,0.865895,0.7637,0.773141,0.7637,0.764505
5,0.109,0.932766,0.762,0.771138,0.762,0.761328
6,0.0577,0.970555,0.7666,0.773818,0.7666,0.766388
7,0.0313,0.962226,0.7687,0.772468,0.7687,0.767589
8,0.0154,0.997954,0.775,0.780549,0.775,0.77494


[I 2025-04-02 12:12:07,218] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.00028559661075693666, 'weight_decay': 0.004, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5997,0.940944,0.7205,0.731033,0.7205,0.72023
2,0.6654,0.846418,0.754,0.763925,0.754,0.753623
3,0.3719,0.856477,0.7531,0.765067,0.7531,0.752957
4,0.2063,0.901445,0.7586,0.771258,0.7586,0.759789
5,0.1087,0.94039,0.7668,0.778052,0.7668,0.766288
6,0.0595,0.977723,0.7656,0.77286,0.7656,0.765186
7,0.0311,0.955866,0.7744,0.77745,0.7744,0.773786
8,0.0154,1.032542,0.7775,0.783886,0.7775,0.777309
9,0.0073,0.982206,0.7821,0.786155,0.7821,0.781539
10,0.0039,0.994139,0.7819,0.786481,0.7819,0.781171


[I 2025-04-02 12:25:52,673] Trial 100 finished with value: 0.7811705350033127 and parameters: {'learning_rate': 0.00028559661075693666, 'weight_decay': 0.004, 'warmup_steps': 16}. Best is trial 8 with value: 0.7885223960150077.


Trial 101 with params: {'learning_rate': 0.00044634333264700925, 'weight_decay': 0.005, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5003,1.000983,0.705,0.7196,0.705,0.704141
2,0.6867,0.886414,0.7419,0.75235,0.7419,0.740819
3,0.3976,0.880369,0.7547,0.765924,0.7547,0.753695
4,0.2338,0.918214,0.7577,0.769584,0.7577,0.758309


[I 2025-04-02 12:31:20,390] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 7.484211914264787e-05, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4705,1.26632,0.6506,0.656282,0.6506,0.645299
2,1.0664,0.991924,0.717,0.727563,0.717,0.716142
3,0.7404,0.880285,0.7382,0.745203,0.7382,0.736747
4,0.5478,0.859822,0.7464,0.752911,0.7464,0.745955


[I 2025-04-02 12:36:54,467] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.00028761398663328005, 'weight_decay': 0.006, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5862,0.942992,0.7209,0.729195,0.7209,0.719655
2,0.659,0.856799,0.7483,0.758213,0.7483,0.747934
3,0.3686,0.845251,0.7564,0.766007,0.7564,0.755026
4,0.2065,0.874886,0.7653,0.777406,0.7653,0.767034
5,0.1077,0.941908,0.7645,0.774966,0.7645,0.764471
6,0.0591,0.95057,0.768,0.774919,0.768,0.767947
7,0.032,0.962252,0.7693,0.774302,0.7693,0.76824
8,0.0158,1.003033,0.7767,0.783081,0.7767,0.776825
9,0.007,1.000574,0.7772,0.781074,0.7772,0.776655
10,0.0039,0.992615,0.7795,0.783573,0.7795,0.778766


[I 2025-04-02 12:51:04,919] Trial 103 finished with value: 0.7787657425066393 and parameters: {'learning_rate': 0.00028761398663328005, 'weight_decay': 0.006, 'warmup_steps': 13}. Best is trial 8 with value: 0.7885223960150077.


Trial 104 with params: {'learning_rate': 0.0004098719331815038, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5124,0.982259,0.7093,0.72263,0.7093,0.708387
2,0.6739,0.886969,0.7397,0.751514,0.7397,0.738541
3,0.3902,0.871239,0.7552,0.766429,0.7552,0.754817
4,0.2271,0.901102,0.7598,0.773831,0.7598,0.761565
5,0.1279,0.953591,0.7611,0.771551,0.7611,0.760807
6,0.0701,1.000971,0.7678,0.776326,0.7678,0.768014
7,0.0403,1.018965,0.7669,0.770638,0.7669,0.764777
8,0.0173,1.047194,0.7782,0.784931,0.7782,0.777993
9,0.0076,1.015163,0.7803,0.78579,0.7803,0.780254
10,0.0034,1.001583,0.7831,0.786429,0.7831,0.782408


[I 2025-04-02 13:04:51,562] Trial 104 finished with value: 0.7824083888315204 and parameters: {'learning_rate': 0.0004098719331815038, 'weight_decay': 0.007, 'warmup_steps': 14}. Best is trial 8 with value: 0.7885223960150077.


Trial 105 with params: {'learning_rate': 0.000607255590412118, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5134,1.064301,0.6893,0.707496,0.6893,0.685746
2,0.7516,0.940616,0.7312,0.747599,0.7312,0.731443


[I 2025-04-02 13:07:42,991] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.00041158715349045563, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5335,0.983679,0.7116,0.724287,0.7116,0.709693
2,0.6814,0.877451,0.7439,0.754708,0.7439,0.742113
3,0.3895,0.874074,0.7533,0.765963,0.7533,0.752808
4,0.2267,0.899763,0.7558,0.768181,0.7558,0.756808


[I 2025-04-02 13:13:10,630] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0009107158196806978, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5849,1.217269,0.6537,0.674219,0.6537,0.649667
2,0.8895,1.030041,0.7051,0.725219,0.7051,0.704145


[I 2025-04-02 13:15:49,883] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 6.222420760096916e-05, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6673,1.385452,0.6293,0.63509,0.6293,0.622696
2,1.1753,1.047447,0.7061,0.716029,0.7061,0.704549


[I 2025-04-02 13:18:37,280] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0009077054764506989, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5733,1.192062,0.6613,0.681527,0.6613,0.657231
2,0.8828,1.029796,0.7057,0.722406,0.7057,0.704196
3,0.5825,0.987195,0.7306,0.746267,0.7306,0.730017
4,0.3904,0.978079,0.7354,0.749145,0.7354,0.735682


[I 2025-04-02 13:24:08,023] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0006092442782400235, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4872,1.065003,0.6894,0.706812,0.6894,0.68698
2,0.7482,0.932011,0.7312,0.746247,0.7312,0.729586
3,0.4621,0.894985,0.7486,0.757257,0.7486,0.747975
4,0.2881,0.951655,0.7438,0.756144,0.7438,0.744507


[I 2025-04-02 13:29:37,890] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0004790778380726622, 'weight_decay': 0.008, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.518,1.022021,0.6994,0.718507,0.6994,0.697881
2,0.6987,0.899017,0.7405,0.752738,0.7405,0.738797


[I 2025-04-02 13:32:27,616] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0003912207989097152, 'weight_decay': 0.007, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5362,0.989637,0.7071,0.721544,0.7071,0.705505
2,0.6682,0.852847,0.7519,0.763847,0.7519,0.75181
3,0.3826,0.859929,0.7591,0.77224,0.7591,0.758828
4,0.2223,0.871314,0.7641,0.775866,0.7641,0.765021
5,0.1246,0.940248,0.7625,0.77348,0.7625,0.761896
6,0.069,0.963006,0.7705,0.77627,0.7705,0.769916
7,0.0373,0.983509,0.774,0.778468,0.774,0.772994
8,0.0179,1.021391,0.7811,0.786337,0.7811,0.781108
9,0.0073,1.003059,0.7848,0.788564,0.7848,0.784367
10,0.0035,1.003689,0.7851,0.788063,0.7851,0.784137


[I 2025-04-02 13:46:28,978] Trial 112 finished with value: 0.78413743985824 and parameters: {'learning_rate': 0.0003912207989097152, 'weight_decay': 0.007, 'warmup_steps': 20}. Best is trial 8 with value: 0.7885223960150077.


Trial 113 with params: {'learning_rate': 0.0001827905090690507, 'weight_decay': 0.006, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8071,0.973108,0.7133,0.721359,0.7133,0.711176
2,0.7264,0.852632,0.7503,0.76015,0.7503,0.750296
3,0.4239,0.830963,0.7602,0.77012,0.7602,0.759575
4,0.2442,0.862152,0.7581,0.76813,0.7581,0.759175


[I 2025-04-02 13:52:04,959] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0001647529721505253, 'weight_decay': 0.008, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8459,0.98722,0.7098,0.717604,0.7098,0.708319
2,0.7521,0.861532,0.7472,0.757139,0.7472,0.746868
3,0.4484,0.81314,0.7643,0.772787,0.7643,0.763854
4,0.2638,0.851387,0.7619,0.771082,0.7619,0.762738
5,0.1501,0.904495,0.759,0.769072,0.759,0.758546
6,0.0825,0.919924,0.7629,0.769147,0.7629,0.762677
7,0.0453,0.912315,0.7684,0.770714,0.7684,0.767504
8,0.0249,0.969166,0.7698,0.775498,0.7698,0.769925


[I 2025-04-02 14:03:20,558] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0004288859629618163, 'weight_decay': 0.005, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5254,0.985019,0.7075,0.721066,0.7075,0.706429
2,0.6805,0.882877,0.746,0.760037,0.746,0.745469
3,0.3944,0.872007,0.7556,0.767498,0.7556,0.755441
4,0.2303,0.921837,0.7535,0.766703,0.7535,0.754331


[I 2025-04-02 14:09:11,479] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.00013856845993946669, 'weight_decay': 0.008, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9798,1.02081,0.7033,0.709944,0.7033,0.701123
2,0.8027,0.88345,0.7418,0.752168,0.7418,0.741254
3,0.494,0.828656,0.7573,0.765086,0.7573,0.756357
4,0.3066,0.841813,0.7588,0.767569,0.7588,0.759462
5,0.1843,0.874668,0.7596,0.769136,0.7596,0.759423
6,0.1083,0.90915,0.7637,0.771212,0.7637,0.764022
7,0.0632,0.904551,0.7639,0.767388,0.7639,0.762975
8,0.0372,0.956099,0.7649,0.770959,0.7649,0.765371


[I 2025-04-02 14:20:50,823] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0027121193476131807, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1175,1.653715,0.5339,0.562131,0.5339,0.523402
2,1.3512,1.363977,0.6117,0.639265,0.6117,0.609302


[I 2025-04-02 14:23:32,612] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00025452295620522836, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6364,0.944914,0.7229,0.732871,0.7229,0.722375
2,0.6725,0.847971,0.7515,0.760686,0.7515,0.750953
3,0.3727,0.84699,0.7624,0.77174,0.7624,0.761228
4,0.209,0.891455,0.7619,0.774775,0.7619,0.763155
5,0.112,0.928108,0.7654,0.774575,0.7654,0.764505
6,0.0588,0.982658,0.7657,0.775481,0.7657,0.766258
7,0.0307,0.957769,0.7721,0.776763,0.7721,0.771552
8,0.0157,1.013988,0.7737,0.778654,0.7737,0.773315


[I 2025-04-02 14:34:36,302] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.00040416264671282767, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4621,0.979942,0.7113,0.722374,0.7113,0.70889
2,0.672,0.894509,0.7402,0.751509,0.7402,0.739162
3,0.3876,0.868343,0.7568,0.768093,0.7568,0.756193
4,0.2212,0.906967,0.76,0.771652,0.76,0.760772
5,0.1255,0.991282,0.7591,0.77296,0.7591,0.759386
6,0.0695,1.00756,0.7625,0.772755,0.7625,0.76271
7,0.0375,0.979572,0.7766,0.779866,0.7766,0.775303
8,0.0178,1.031097,0.7799,0.787088,0.7799,0.780088
9,0.0078,1.006399,0.784,0.787715,0.784,0.783611
10,0.0036,1.006459,0.7835,0.786917,0.7835,0.782575


[I 2025-04-02 14:48:22,060] Trial 119 finished with value: 0.7825748979536127 and parameters: {'learning_rate': 0.00040416264671282767, 'weight_decay': 0.008, 'warmup_steps': 0}. Best is trial 8 with value: 0.7885223960150077.


Trial 120 with params: {'learning_rate': 0.0003869923305813519, 'weight_decay': 0.008, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5545,0.977556,0.7105,0.724296,0.7105,0.710253
2,0.6706,0.871436,0.746,0.756716,0.746,0.745474
3,0.3852,0.854584,0.7598,0.77077,0.7598,0.759959
4,0.2218,0.925704,0.754,0.768828,0.754,0.755549


[I 2025-04-02 14:53:57,219] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.0009610655332625832, 'weight_decay': 0.007, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6196,1.196719,0.6536,0.671356,0.6536,0.64926
2,0.9066,1.027422,0.7077,0.7247,0.7077,0.70637
3,0.605,1.006682,0.7214,0.742748,0.7214,0.721578
4,0.4082,0.975686,0.7356,0.749946,0.7356,0.735995


[I 2025-04-02 14:59:31,127] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0005812662098360384, 'weight_decay': 0.006, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5026,1.06281,0.6871,0.704849,0.6871,0.684961
2,0.7383,0.942365,0.727,0.742197,0.727,0.725031


[I 2025-04-02 15:02:16,241] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.00033624577726887523, 'weight_decay': 0.008, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5581,0.964196,0.7173,0.728993,0.7173,0.716616
2,0.662,0.863955,0.7503,0.760965,0.7503,0.749776
3,0.3739,0.843254,0.7596,0.768592,0.7596,0.758321
4,0.2132,0.881787,0.7635,0.775413,0.7635,0.764388
5,0.116,0.950786,0.7633,0.776371,0.7633,0.763858
6,0.0639,0.968762,0.769,0.778342,0.769,0.769563
7,0.0355,0.974431,0.7758,0.779364,0.7758,0.774309
8,0.0161,1.005149,0.7769,0.783755,0.7769,0.777427
9,0.0067,1.007351,0.7814,0.786001,0.7814,0.780684
10,0.0035,0.987395,0.785,0.788616,0.785,0.784247


[I 2025-04-02 15:16:11,787] Trial 123 finished with value: 0.7842473663397957 and parameters: {'learning_rate': 0.00033624577726887523, 'weight_decay': 0.008, 'warmup_steps': 19}. Best is trial 8 with value: 0.7885223960150077.


Trial 124 with params: {'learning_rate': 0.0003762874009945916, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.536,0.964938,0.7159,0.728569,0.7159,0.715197
2,0.6657,0.855947,0.7528,0.762202,0.7528,0.751706
3,0.381,0.861638,0.7598,0.771629,0.7598,0.758972
4,0.2192,0.891572,0.7628,0.774565,0.7628,0.763788
5,0.1228,0.962327,0.7619,0.772025,0.7619,0.761443
6,0.067,0.989525,0.768,0.77708,0.768,0.768495
7,0.0364,0.970263,0.7738,0.777941,0.7738,0.772832
8,0.0176,1.010317,0.7826,0.788444,0.7826,0.782358
9,0.0075,1.010934,0.7876,0.792508,0.7876,0.787081
10,0.0035,0.991878,0.7878,0.791368,0.7878,0.786964


[I 2025-04-02 15:30:13,100] Trial 124 finished with value: 0.7869637346673577 and parameters: {'learning_rate': 0.0003762874009945916, 'weight_decay': 0.008, 'warmup_steps': 17}. Best is trial 8 with value: 0.7885223960150077.


Trial 125 with params: {'learning_rate': 0.00029518969395936513, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.573,0.941689,0.7204,0.73218,0.7204,0.720116
2,0.6576,0.846966,0.7509,0.760994,0.7509,0.75078
3,0.3684,0.840251,0.7625,0.772193,0.7625,0.761748
4,0.2051,0.874337,0.764,0.775355,0.764,0.764873
5,0.1083,0.946012,0.7611,0.774168,0.7611,0.761648
6,0.0567,0.985056,0.7679,0.77552,0.7679,0.767647
7,0.0322,0.968555,0.7717,0.778122,0.7717,0.771006
8,0.0159,1.037212,0.7773,0.785697,0.7773,0.778045
9,0.0072,0.997509,0.7807,0.785289,0.7807,0.780164
10,0.0037,1.008164,0.7804,0.784398,0.7804,0.779255


[I 2025-04-02 15:43:56,818] Trial 125 finished with value: 0.779254778659142 and parameters: {'learning_rate': 0.00029518969395936513, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}. Best is trial 8 with value: 0.7885223960150077.


Trial 126 with params: {'learning_rate': 0.00020843080182189966, 'weight_decay': 0.006, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6961,0.946422,0.7211,0.72783,0.7211,0.720135
2,0.6949,0.844049,0.7543,0.763556,0.7543,0.753992
3,0.3961,0.826658,0.7605,0.769076,0.7605,0.759934
4,0.2223,0.855181,0.761,0.771099,0.761,0.761596
5,0.1187,0.907246,0.7621,0.771215,0.7621,0.762027
6,0.0637,0.93133,0.7714,0.778461,0.7714,0.771672
7,0.0345,0.93796,0.7679,0.77158,0.7679,0.767082
8,0.018,0.986962,0.7736,0.779324,0.7736,0.773533


[I 2025-04-02 15:55:13,928] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0003849267606745277, 'weight_decay': 0.007, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5282,0.957693,0.721,0.731508,0.721,0.719448
2,0.6699,0.865752,0.75,0.761266,0.75,0.748989
3,0.3865,0.86041,0.7556,0.767322,0.7556,0.754715
4,0.2212,0.884557,0.7644,0.77332,0.7644,0.764551
5,0.1205,0.92214,0.7692,0.775343,0.7692,0.767879
6,0.0679,0.973964,0.7698,0.776864,0.7698,0.769314
7,0.0362,0.973434,0.7747,0.778913,0.7747,0.773861
8,0.018,1.004223,0.7782,0.784231,0.7782,0.778157
9,0.0074,1.01055,0.7817,0.786334,0.7817,0.78065
10,0.0033,0.988915,0.7858,0.789079,0.7858,0.784721


[I 2025-04-02 16:09:20,694] Trial 127 finished with value: 0.7847208091730999 and parameters: {'learning_rate': 0.0003849267606745277, 'weight_decay': 0.007, 'warmup_steps': 16}. Best is trial 8 with value: 0.7885223960150077.


Trial 128 with params: {'learning_rate': 0.0004783928886344544, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4728,1.002707,0.7056,0.716644,0.7056,0.703588
2,0.6909,0.884501,0.7462,0.75837,0.7462,0.744961
3,0.4114,0.861504,0.7606,0.768249,0.7606,0.759446
4,0.2466,0.927711,0.7575,0.769042,0.7575,0.758192


[I 2025-04-02 16:14:58,300] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.00028759810545119136, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5929,0.954179,0.7159,0.727403,0.7159,0.715392
2,0.6626,0.842957,0.7525,0.761378,0.7525,0.751728
3,0.3689,0.843338,0.7622,0.773078,0.7622,0.761751
4,0.2056,0.896006,0.7599,0.77224,0.7599,0.761224
5,0.1088,0.927148,0.766,0.774253,0.766,0.765263
6,0.0592,0.959426,0.7711,0.779122,0.7711,0.771245
7,0.0326,0.964453,0.7743,0.778405,0.7743,0.772846
8,0.0159,1.009724,0.7821,0.788296,0.7821,0.782135
9,0.007,0.988714,0.7837,0.787012,0.7837,0.782755
10,0.0039,1.00006,0.7814,0.785603,0.7814,0.780794


[I 2025-04-02 16:28:55,340] Trial 129 finished with value: 0.780794366021662 and parameters: {'learning_rate': 0.00028759810545119136, 'weight_decay': 0.008, 'warmup_steps': 16}. Best is trial 8 with value: 0.7885223960150077.


Trial 130 with params: {'learning_rate': 0.000550447964184958, 'weight_decay': 0.008, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4881,1.051244,0.6933,0.709622,0.6933,0.690327
2,0.721,0.905649,0.7385,0.748762,0.7385,0.737058


[I 2025-04-02 16:31:48,624] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0003472944811842624, 'weight_decay': 0.006, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5455,0.95507,0.7151,0.726844,0.7151,0.714038
2,0.6617,0.870854,0.7466,0.756694,0.7466,0.745753
3,0.3725,0.876411,0.7542,0.766857,0.7542,0.753213
4,0.211,0.904619,0.7578,0.771002,0.7578,0.758939
5,0.115,0.935811,0.7643,0.771783,0.7643,0.763295
6,0.0629,1.013104,0.7633,0.772735,0.7633,0.763839
7,0.034,0.982883,0.7698,0.774011,0.7698,0.768711
8,0.0164,1.022219,0.775,0.779884,0.775,0.774704


[I 2025-04-02 16:43:09,551] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.00011561810388848686, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0416,1.068828,0.6937,0.699729,0.6937,0.690791
2,0.8586,0.89852,0.7353,0.745886,0.7353,0.734842


[I 2025-04-02 16:45:50,696] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0029148480570998308, 'weight_decay': 0.001, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1809,1.699538,0.5239,0.562224,0.5239,0.51419
2,1.382,1.392804,0.6131,0.64236,0.6131,0.609837
3,1.0405,1.250933,0.6464,0.672304,0.6464,0.645237
4,0.7992,1.174634,0.6769,0.694076,0.6769,0.676862


[I 2025-04-02 16:51:19,621] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0003218182337101714, 'weight_decay': 0.007, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5581,0.954548,0.7185,0.731187,0.7185,0.717915
2,0.6595,0.867328,0.7427,0.754378,0.7427,0.742131
3,0.3714,0.863136,0.7587,0.769993,0.7587,0.758704
4,0.2068,0.904343,0.7611,0.774123,0.7611,0.761951
5,0.1132,0.951486,0.7613,0.772301,0.7613,0.761222
6,0.0608,0.963936,0.7704,0.77679,0.7704,0.770147
7,0.0343,0.970774,0.7726,0.77635,0.7726,0.771729
8,0.0158,1.018096,0.7797,0.786311,0.7797,0.780065
9,0.0077,1.007064,0.7828,0.787404,0.7828,0.782313
10,0.0035,0.987292,0.7869,0.790505,0.7869,0.78605


[I 2025-04-02 17:05:12,182] Trial 134 finished with value: 0.7860504083991725 and parameters: {'learning_rate': 0.0003218182337101714, 'weight_decay': 0.007, 'warmup_steps': 16}. Best is trial 8 with value: 0.7885223960150077.


Trial 135 with params: {'learning_rate': 0.00030594152502271113, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5739,0.953911,0.7198,0.731459,0.7198,0.719514
2,0.6609,0.854769,0.7485,0.758204,0.7485,0.747905
3,0.3682,0.853614,0.7585,0.769174,0.7585,0.758058
4,0.2071,0.880506,0.7574,0.768393,0.7574,0.758118


[I 2025-04-02 17:10:31,550] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0007877039909812065, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5491,1.116694,0.6763,0.69705,0.6763,0.673039
2,0.8332,0.986854,0.7176,0.735245,0.7176,0.714929


[I 2025-04-02 17:13:22,225] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0002781546133644776, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.576,0.94675,0.7204,0.730205,0.7204,0.719663
2,0.661,0.840851,0.7545,0.763816,0.7545,0.754438
3,0.3677,0.847545,0.762,0.772232,0.762,0.761382
4,0.2041,0.888093,0.7622,0.772091,0.7622,0.762902
5,0.1093,0.936101,0.765,0.774826,0.765,0.76517
6,0.0586,0.978553,0.7647,0.773287,0.7647,0.764805
7,0.0321,0.968312,0.7705,0.773963,0.7705,0.769343
8,0.0161,1.017783,0.7745,0.780266,0.7745,0.773727


[I 2025-04-02 17:24:44,391] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0003196740399120691, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5578,0.94699,0.719,0.730743,0.719,0.718988
2,0.6645,0.85646,0.7482,0.759987,0.7482,0.748163
3,0.3731,0.850847,0.7595,0.771296,0.7595,0.759579
4,0.2091,0.883867,0.7617,0.774163,0.7617,0.763038
5,0.114,0.957123,0.7608,0.774281,0.7608,0.761151
6,0.0619,0.970113,0.7681,0.777233,0.7681,0.768881
7,0.0327,0.974434,0.7745,0.779577,0.7745,0.773479
8,0.0161,1.054328,0.7722,0.780037,0.7722,0.772434


[I 2025-04-02 17:36:20,682] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0003619625649039109, 'weight_decay': 0.006, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5436,0.976258,0.7115,0.724933,0.7115,0.710459
2,0.6631,0.884721,0.7415,0.751383,0.7415,0.740412
3,0.3764,0.864117,0.7535,0.763262,0.7535,0.752543
4,0.2185,0.89301,0.7552,0.768282,0.7552,0.755849


[I 2025-04-02 17:42:16,441] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0008511553835609732, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5321,1.167605,0.6634,0.683855,0.6634,0.659068
2,0.8536,1.002133,0.7139,0.728632,0.7139,0.712122
3,0.5615,0.954153,0.74,0.754038,0.74,0.739986
4,0.3722,0.979102,0.7445,0.755679,0.7445,0.744971


[I 2025-04-02 17:48:09,326] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.00028971301977153915, 'weight_decay': 0.007, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6017,0.944267,0.7189,0.728432,0.7189,0.717917
2,0.6666,0.852527,0.7493,0.758404,0.7493,0.748569
3,0.3721,0.856463,0.7582,0.769969,0.7582,0.758335
4,0.2083,0.880135,0.7562,0.766806,0.7562,0.756756


[I 2025-04-02 17:53:53,178] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0004809479234639863, 'weight_decay': 0.004, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5141,0.997465,0.7094,0.723312,0.7094,0.7071
2,0.7023,0.906999,0.7344,0.747335,0.7344,0.733761
3,0.4141,0.876974,0.7568,0.768048,0.7568,0.756459
4,0.2502,0.911046,0.7566,0.76975,0.7566,0.75747


[I 2025-04-02 17:59:36,996] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.00033031898513994696, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.519,0.952129,0.7149,0.725841,0.7149,0.713723
2,0.6551,0.855822,0.7492,0.758246,0.7492,0.748186
3,0.37,0.847918,0.763,0.772138,0.763,0.761996
4,0.2065,0.892877,0.7605,0.77469,0.7605,0.762356
5,0.1111,0.959684,0.7609,0.772659,0.7609,0.760723
6,0.0611,0.978859,0.7677,0.775578,0.7677,0.767686
7,0.0329,0.980423,0.7698,0.774542,0.7698,0.76858
8,0.0153,1.008954,0.7812,0.786187,0.7812,0.78074
9,0.0068,0.991453,0.7843,0.789252,0.7843,0.78363
10,0.0035,0.983773,0.7836,0.786847,0.7836,0.782658


[I 2025-04-02 18:13:19,780] Trial 143 finished with value: 0.782657856603074 and parameters: {'learning_rate': 0.00033031898513994696, 'weight_decay': 0.01, 'warmup_steps': 5}. Best is trial 8 with value: 0.7885223960150077.


Trial 144 with params: {'learning_rate': 0.0007479377948882576, 'weight_decay': 0.008, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5516,1.140384,0.6661,0.686541,0.6661,0.66373
2,0.8168,0.994134,0.718,0.733904,0.718,0.714838
3,0.5211,0.957658,0.7349,0.751576,0.7349,0.735065
4,0.3386,0.961922,0.7436,0.758443,0.7436,0.743747


[I 2025-04-02 18:19:00,772] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 7.231336253298902e-05, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4975,1.285788,0.6477,0.652807,0.6477,0.642055
2,1.0846,1.001066,0.7161,0.726414,0.7161,0.714869


[I 2025-04-02 18:21:44,133] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.00016064593848171558, 'weight_decay': 0.005, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8759,0.989035,0.7095,0.716153,0.7095,0.707471
2,0.7564,0.858308,0.7512,0.760977,0.7512,0.750758
3,0.45,0.817875,0.758,0.76577,0.758,0.757148
4,0.2689,0.848616,0.7565,0.766206,0.7565,0.757097


[I 2025-04-02 18:27:12,781] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.00037300699894477315, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5159,0.967606,0.7136,0.72572,0.7136,0.712816
2,0.6623,0.862521,0.7489,0.757254,0.7489,0.748024
3,0.3743,0.866692,0.7561,0.766318,0.7561,0.754947
4,0.2165,0.901265,0.7593,0.770758,0.7593,0.759535


[I 2025-04-02 18:32:57,282] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00037537554287465165, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.531,0.96555,0.7181,0.73089,0.7181,0.717623
2,0.6683,0.870007,0.7485,0.758453,0.7485,0.747329
3,0.3801,0.856204,0.7608,0.773363,0.7608,0.760384
4,0.2165,0.905064,0.7568,0.771629,0.7568,0.758103


[I 2025-04-02 18:38:44,950] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0004336974186584821, 'weight_decay': 0.008, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.472,0.991572,0.7071,0.720574,0.7071,0.705228
2,0.6752,0.88838,0.7412,0.754081,0.7412,0.740856
3,0.3937,0.871773,0.7572,0.770145,0.7572,0.756088
4,0.2335,0.89938,0.7616,0.772784,0.7616,0.762336
5,0.1321,0.944402,0.7646,0.775747,0.7646,0.764671
6,0.0737,0.995261,0.7631,0.771544,0.7631,0.763176
7,0.0423,0.98747,0.7705,0.775231,0.7705,0.768964
8,0.0206,1.01683,0.7766,0.782365,0.7766,0.775963


[I 2025-04-02 18:49:56,159] Trial 149 pruned. 


In [None]:
print(best_base_pretrained)

BestRun(run_id='8', objective=0.7885223960150077, hyperparameters={'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}, run_summary=None)


In [None]:
base.reset_seed()


## Prohledávání s destilací s doučením předtrénovaného modelu
Konfigurace jednotlivých tréninků.


In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky.

In [None]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(100)
)

Nastavení prohledávání.

In [None]:
best_distil_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-02 19:45:37,252] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6536,1.035308,0.7268,0.735513,0.7268,0.725206
2,0.8072,0.889646,0.7592,0.769321,0.7592,0.758276


[I 2025-04-02 19:48:27,503] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1729,1.250443,0.6812,0.685109,0.6812,0.676799
2,1.0577,1.021754,0.734,0.745075,0.734,0.733256
3,0.7763,0.927711,0.7507,0.76136,0.7507,0.74992
4,0.6132,0.895263,0.7577,0.766609,0.7577,0.757575


[I 2025-04-02 19:53:57,321] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6718,1.627864,0.6127,0.621803,0.6127,0.603079
2,1.382,1.219192,0.6941,0.703134,0.6941,0.691652


[I 2025-04-02 19:56:43,133] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0577,1.198026,0.6954,0.701302,0.6954,0.692194
2,1.0037,0.991039,0.7436,0.753465,0.7436,0.742478
3,0.729,0.910101,0.7526,0.763265,0.7526,0.75179
4,0.5693,0.87562,0.7634,0.771021,0.7634,0.763299


[I 2025-04-02 20:02:20,663] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5552,1.191641,0.6829,0.696603,0.6829,0.679988
2,0.9249,1.028466,0.7247,0.742468,0.7247,0.722185


[I 2025-04-02 20:05:12,431] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8677,1.5079,0.6096,0.634681,0.6096,0.604347
2,1.2318,1.228522,0.6758,0.693577,0.6758,0.672453
3,0.9403,1.095268,0.7022,0.721216,0.7022,0.702534
4,0.7375,1.032159,0.7206,0.739118,0.7206,0.722209


[I 2025-04-02 20:10:56,438] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5439,1.171266,0.6912,0.706968,0.6912,0.688169
2,0.9199,1.002996,0.7356,0.752101,0.7356,0.733922
3,0.6669,0.946842,0.7484,0.766415,0.7484,0.748139
4,0.5054,0.880969,0.7673,0.780818,0.7673,0.767956


[I 2025-04-02 20:16:11,522] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9138,1.524699,0.5997,0.617587,0.5997,0.59275
2,1.2863,1.279039,0.662,0.686359,0.662,0.66109
3,0.9943,1.139623,0.6936,0.712063,0.6936,0.692264
4,0.7873,1.056091,0.7159,0.731902,0.7159,0.716478


[I 2025-04-02 20:21:52,459] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2222,1.315597,0.6695,0.672921,0.6695,0.66487
2,1.1192,1.055944,0.7274,0.737321,0.7274,0.72624
3,0.834,0.950703,0.7463,0.755757,0.7463,0.745366
4,0.67,0.913919,0.7523,0.75997,0.7523,0.7519


[I 2025-04-02 20:27:28,115] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6314,1.24743,0.6744,0.692185,0.6744,0.672373
2,1.0034,1.076619,0.7131,0.73089,0.7131,0.710292


[I 2025-04-02 20:30:26,303] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0004285183260552018, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5096,1.048298,0.7203,0.730295,0.7203,0.717794
2,0.7991,0.916485,0.7551,0.767687,0.7551,0.754253
3,0.5506,0.850185,0.768,0.782171,0.768,0.767665
4,0.4075,0.841571,0.7736,0.78475,0.7736,0.774048
5,0.3117,0.837429,0.7746,0.788135,0.7746,0.774759
6,0.2474,0.824906,0.7767,0.78806,0.7767,0.777468
7,0.207,0.787163,0.7817,0.788,0.7817,0.781382
8,0.1773,0.775787,0.7904,0.798823,0.7904,0.791033


[I 2025-04-02 20:41:47,560] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0014321301966915287, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7495,1.377743,0.6359,0.653122,0.6359,0.632311
2,1.1315,1.151607,0.6944,0.71706,0.6944,0.692053


[I 2025-04-02 20:44:43,627] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 9.686152689152715e-05, 'weight_decay': 0.002, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1579,1.270877,0.6791,0.682614,0.6791,0.674786
2,1.0754,1.036322,0.7316,0.742509,0.7316,0.730595
3,0.795,0.935465,0.7539,0.763589,0.7539,0.75303
4,0.6318,0.897788,0.7563,0.763931,0.7563,0.755807


[I 2025-04-02 20:50:13,830] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0004052254440503788, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5541,1.037553,0.7271,0.736895,0.7271,0.725136
2,0.8016,0.911032,0.7563,0.767022,0.7563,0.755155
3,0.5552,0.855045,0.7676,0.780981,0.7676,0.767446
4,0.4094,0.822384,0.7785,0.792116,0.7785,0.779684


[I 2025-04-02 20:55:36,918] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0002967370539368567, 'weight_decay': 0.004, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.608,1.026142,0.7239,0.732081,0.7239,0.722475
2,0.8012,0.902276,0.7568,0.767409,0.7568,0.755648
3,0.5505,0.866159,0.7645,0.778186,0.7645,0.76471
4,0.4049,0.825418,0.7748,0.787518,0.7748,0.775971
5,0.3104,0.843406,0.7715,0.786662,0.7715,0.772036
6,0.2481,0.834558,0.7748,0.786544,0.7748,0.775949
7,0.2097,0.796362,0.7821,0.787942,0.7821,0.782002
8,0.1817,0.795882,0.7859,0.797117,0.7859,0.78732


[I 2025-04-02 21:06:39,845] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0009349007798192055, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.586,1.207563,0.6831,0.699463,0.6831,0.679983
2,0.962,1.052505,0.7227,0.739387,0.7227,0.72009


[I 2025-04-02 21:09:35,830] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00022429163078221243, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.661,1.035688,0.7307,0.738229,0.7307,0.729052
2,0.8226,0.906759,0.7572,0.765733,0.7572,0.755837
3,0.5707,0.857994,0.7691,0.781891,0.7691,0.769207
4,0.4215,0.834959,0.7729,0.785377,0.7729,0.774206
5,0.326,0.842558,0.7713,0.78275,0.7713,0.771595
6,0.2629,0.845022,0.7704,0.781856,0.7704,0.771658
7,0.2224,0.811817,0.7763,0.782796,0.7763,0.775982
8,0.1943,0.811817,0.7769,0.7867,0.7769,0.777382


[I 2025-04-02 21:20:49,103] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0006412609358779237, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5249,1.103451,0.7081,0.720859,0.7081,0.704694
2,0.8607,0.967558,0.7415,0.755004,0.7415,0.740348
3,0.6115,0.879462,0.7624,0.776359,0.7624,0.761692
4,0.4583,0.862778,0.7709,0.785053,0.7709,0.771757


[I 2025-04-02 21:26:32,560] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 5.957853392927128e-05, 'weight_decay': 0.004, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5812,1.558818,0.6258,0.632414,0.6258,0.61761
2,1.3275,1.186091,0.7021,0.712305,0.7021,0.700425


[I 2025-04-02 21:29:26,865] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5146,1.04742,0.7207,0.730311,0.7207,0.718319
2,0.8044,0.908964,0.7552,0.76546,0.7552,0.753525
3,0.5586,0.873998,0.763,0.779827,0.763,0.762886
4,0.4147,0.835129,0.774,0.785779,0.774,0.775261
5,0.3168,0.838266,0.776,0.789595,0.776,0.77575
6,0.2504,0.829749,0.7762,0.787695,0.7762,0.777259
7,0.2086,0.789721,0.783,0.790152,0.783,0.78324
8,0.1783,0.778205,0.7892,0.797828,0.7892,0.789374
9,0.157,0.766319,0.7921,0.801852,0.7921,0.793026
10,0.1428,0.767182,0.7891,0.796703,0.7891,0.789286


[I 2025-04-02 21:43:50,699] Trial 19 finished with value: 0.789285986657061 and parameters: {'learning_rate': 0.00045046258144846343, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 19 with value: 0.789285986657061.


Trial 20 with params: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5435,1.048287,0.7213,0.731379,0.7213,0.719139
2,0.8026,0.905385,0.7581,0.766936,0.7581,0.755895
3,0.5561,0.859411,0.7672,0.781963,0.7672,0.767658
4,0.4111,0.825462,0.7779,0.789492,0.7779,0.779099
5,0.3141,0.856722,0.7713,0.785487,0.7713,0.770469
6,0.2489,0.829527,0.7734,0.787608,0.7734,0.774902
7,0.2078,0.79148,0.7842,0.790103,0.7842,0.783484
8,0.1773,0.781514,0.7831,0.791909,0.7831,0.783239


[I 2025-04-02 21:55:10,827] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0008087763473950767, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5364,1.148317,0.6946,0.706304,0.6946,0.69136
2,0.9169,1.023233,0.7272,0.745235,0.7272,0.724608
3,0.6643,0.947371,0.7482,0.765198,0.7482,0.748247
4,0.5009,0.899579,0.7636,0.780855,0.7636,0.765441


[I 2025-04-02 22:06:39,148] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0014554585466308638, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7378,1.35972,0.6425,0.654617,0.6425,0.638246
2,1.1332,1.184049,0.6819,0.704007,0.6819,0.678607


[I 2025-04-02 22:09:30,466] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0002377767876004004, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6607,1.032524,0.7299,0.736849,0.7299,0.728319
2,0.8194,0.909775,0.7566,0.766295,0.7566,0.755645
3,0.566,0.862287,0.7663,0.77956,0.7663,0.766506
4,0.4188,0.842661,0.7734,0.785807,0.7734,0.774588
5,0.3231,0.856587,0.7671,0.778281,0.7671,0.767258
6,0.2598,0.85032,0.7683,0.780717,0.7683,0.769398
7,0.2188,0.813085,0.7779,0.785403,0.7779,0.777806
8,0.1913,0.81393,0.7774,0.786875,0.7774,0.77807


[I 2025-04-02 22:20:51,650] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0007262420484343405, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5104,1.133978,0.7009,0.712862,0.7009,0.698182
2,0.8841,0.979693,0.7428,0.756848,0.7428,0.740984
3,0.6374,0.904074,0.7583,0.77403,0.7583,0.758868
4,0.478,0.876698,0.7667,0.783057,0.7667,0.768717


[I 2025-04-02 22:26:40,790] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0001351015408651554, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9207,1.139681,0.7063,0.712347,0.7063,0.703422
2,0.9464,0.963554,0.747,0.757033,0.747,0.746188
3,0.6803,0.897758,0.7577,0.769565,0.7577,0.757482
4,0.5226,0.868474,0.7664,0.775342,0.7664,0.766716
5,0.4165,0.865152,0.7624,0.771971,0.7624,0.761762
6,0.3443,0.874054,0.7649,0.775816,0.7649,0.766342
7,0.2956,0.845836,0.7723,0.777593,0.7723,0.771021
8,0.2602,0.850462,0.7733,0.781509,0.7733,0.77372


[I 2025-04-02 22:37:58,305] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.0008504497585847681, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5535,1.161436,0.6923,0.703872,0.6923,0.687418
2,0.9299,1.010514,0.7325,0.74765,0.7325,0.729581
3,0.6757,0.942244,0.7478,0.765841,0.7478,0.74828
4,0.5135,0.890621,0.7663,0.779492,0.7663,0.767769


[I 2025-04-02 22:43:49,497] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0029678454905841976, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0975,1.706918,0.5547,0.580356,0.5547,0.548642
2,1.4361,1.390817,0.6367,0.657925,0.6367,0.633151
3,1.1294,1.24338,0.6697,0.689914,0.6697,0.667504
4,0.9146,1.133978,0.6982,0.71617,0.6982,0.69961


[I 2025-04-02 22:49:31,582] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0003192207808838057, 'weight_decay': 0.001, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5607,1.014383,0.7304,0.738399,0.7304,0.72895
2,0.7889,0.89535,0.7597,0.769777,0.7597,0.758899
3,0.5441,0.866089,0.7672,0.780784,0.7672,0.766841
4,0.401,0.832091,0.7746,0.787297,0.7746,0.775887
5,0.3066,0.851244,0.7706,0.784799,0.7706,0.770626
6,0.2462,0.828794,0.7724,0.78404,0.7724,0.772977
7,0.2072,0.799911,0.7825,0.788315,0.7825,0.781989
8,0.18,0.796153,0.7863,0.796034,0.7863,0.78722


[I 2025-04-02 23:01:10,529] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0037867653604961434, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2226,1.811474,0.5273,0.55225,0.5273,0.51946
2,1.5372,1.467652,0.6136,0.639506,0.6136,0.610196


[I 2025-04-02 23:04:05,084] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0007243732057988554, 'weight_decay': 0.0, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5815,1.125988,0.704,0.713819,0.704,0.701327
2,0.8948,0.981422,0.7392,0.754782,0.7392,0.737655
3,0.6405,0.911549,0.7567,0.77209,0.7567,0.756679
4,0.4849,0.88033,0.76,0.774708,0.76,0.760897


[I 2025-04-02 23:09:49,508] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0007704409738135844, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5251,1.166401,0.6996,0.715929,0.6996,0.696081
2,0.9019,0.995251,0.735,0.748377,0.735,0.732772
3,0.6491,0.906796,0.7557,0.771206,0.7557,0.755845
4,0.4907,0.888011,0.7614,0.774887,0.7614,0.762478


[I 2025-04-02 23:15:38,316] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0001663153417666553, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8363,1.092352,0.717,0.723839,0.717,0.715106
2,0.8907,0.937478,0.7506,0.760451,0.7506,0.749758
3,0.6285,0.884843,0.7618,0.773959,0.7618,0.761639
4,0.4733,0.851775,0.7673,0.777892,0.7673,0.768243


[I 2025-04-02 23:21:16,701] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 5.8367877335939255e-05, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5985,1.575837,0.6244,0.631757,0.6244,0.615622
2,1.3408,1.194062,0.6984,0.708541,0.6984,0.696533
3,1.0205,1.044827,0.7272,0.735044,0.7272,0.725523
4,0.8458,0.988611,0.7356,0.741956,0.7356,0.733959


[I 2025-04-02 23:26:50,491] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.00013674282555931614, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9245,1.1394,0.7089,0.713905,0.7089,0.705893
2,0.9454,0.962943,0.7442,0.754976,0.7442,0.743597
3,0.678,0.893138,0.7607,0.772985,0.7607,0.760677
4,0.5212,0.869179,0.7611,0.771268,0.7611,0.761697


[I 2025-04-02 23:32:41,907] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.00030608772315391097, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5901,1.012004,0.7309,0.738382,0.7309,0.729205
2,0.7948,0.888403,0.7622,0.770577,0.7622,0.760679
3,0.5449,0.844273,0.7732,0.784892,0.7732,0.77273
4,0.4023,0.821398,0.781,0.793232,0.781,0.782196
5,0.3079,0.836628,0.7757,0.785521,0.7757,0.774962
6,0.2461,0.828172,0.7783,0.789649,0.7783,0.779333
7,0.2082,0.794308,0.7845,0.792299,0.7845,0.784711
8,0.1805,0.792236,0.7851,0.794533,0.7851,0.785578


[I 2025-04-02 23:44:18,557] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0004808380515103859, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5055,1.048915,0.7216,0.730523,0.7216,0.719057
2,0.8113,0.911194,0.7556,0.766518,0.7556,0.75427
3,0.5647,0.878687,0.7633,0.778359,0.7633,0.762826
4,0.4184,0.84548,0.7732,0.785602,0.7732,0.774116
5,0.3189,0.859536,0.7671,0.781222,0.7671,0.76692
6,0.2529,0.826723,0.78,0.793781,0.78,0.781405
7,0.2096,0.781506,0.7901,0.796137,0.7901,0.790022
8,0.1789,0.767227,0.7915,0.799792,0.7915,0.792028


[I 2025-04-02 23:55:30,454] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0010871570492226124, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6208,1.252091,0.6687,0.681699,0.6687,0.663951
2,1.0127,1.111607,0.7076,0.727215,0.7076,0.705072
3,0.7454,0.9891,0.7366,0.752919,0.7366,0.736231
4,0.5779,0.928154,0.7514,0.765418,0.7514,0.751942


[I 2025-04-03 00:01:10,646] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00032063386881613944, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5601,1.030966,0.7249,0.736334,0.7249,0.72422
2,0.7929,0.897925,0.7576,0.768302,0.7576,0.756748
3,0.5428,0.853897,0.7676,0.781374,0.7676,0.76786
4,0.402,0.826208,0.7783,0.790676,0.7783,0.779579
5,0.3068,0.844025,0.7727,0.785206,0.7727,0.772418
6,0.2455,0.834321,0.7726,0.784537,0.7726,0.773898
7,0.2067,0.799068,0.7873,0.79526,0.7873,0.78766
8,0.1794,0.801566,0.7792,0.788055,0.7792,0.779595


[I 2025-04-03 00:12:28,636] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.0014780818159468043, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.762,1.386027,0.6383,0.657961,0.6383,0.63352
2,1.1273,1.159844,0.6878,0.707667,0.6878,0.685427


[I 2025-04-03 00:15:12,555] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0002902647894684532, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5861,1.026024,0.7277,0.734866,0.7277,0.725686
2,0.7988,0.884763,0.7667,0.77552,0.7667,0.765497
3,0.5474,0.856863,0.7653,0.778616,0.7653,0.765923
4,0.4032,0.822229,0.7751,0.787108,0.7751,0.776577
5,0.3087,0.837733,0.7714,0.785973,0.7714,0.77132
6,0.2481,0.82774,0.7773,0.788386,0.7773,0.777859
7,0.2098,0.789056,0.7844,0.791307,0.7844,0.784431
8,0.1819,0.789627,0.7847,0.79263,0.7847,0.785094


[I 2025-04-03 00:26:23,796] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0005445706888865506, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5131,1.077582,0.7119,0.722075,0.7119,0.709175
2,0.8305,0.934473,0.7492,0.759979,0.7492,0.74765
3,0.5798,0.885746,0.7623,0.77593,0.7623,0.761286
4,0.4334,0.854021,0.7733,0.788095,0.7733,0.774453
5,0.3302,0.850012,0.7704,0.785544,0.7704,0.770168
6,0.2597,0.828505,0.7815,0.793325,0.7815,0.782662
7,0.2142,0.78947,0.785,0.791983,0.785,0.785194
8,0.1811,0.763284,0.7927,0.800966,0.7927,0.79326
9,0.1578,0.755373,0.7927,0.801506,0.7927,0.79363
10,0.1428,0.752566,0.7973,0.803634,0.7973,0.797654


[I 2025-04-03 00:40:37,351] Trial 41 finished with value: 0.7976536507044625 and parameters: {'learning_rate': 0.0005445706888865506, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 42 with params: {'learning_rate': 0.0005835678229504949, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5226,1.078482,0.7162,0.727243,0.7162,0.712723
2,0.8431,0.94671,0.7486,0.761781,0.7486,0.747642
3,0.5949,0.879477,0.7589,0.772415,0.7589,0.759152
4,0.4443,0.87038,0.7646,0.782658,0.7646,0.76675
5,0.338,0.8527,0.7678,0.783548,0.7678,0.767401
6,0.2663,0.830814,0.7752,0.787101,0.7752,0.776144
7,0.2182,0.787478,0.7867,0.792697,0.7867,0.7862
8,0.1836,0.788493,0.7846,0.79507,0.7846,0.784914


[I 2025-04-03 00:52:16,637] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0032088988731785663, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1532,1.736821,0.5498,0.573475,0.5498,0.540339
2,1.4628,1.460268,0.6209,0.645329,0.6209,0.61841


[I 2025-04-03 00:55:10,681] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0005253938499055099, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.493,1.080392,0.7148,0.724113,0.7148,0.712677
2,0.8208,0.932987,0.7531,0.764391,0.7531,0.750825
3,0.5761,0.882127,0.7611,0.779877,0.7611,0.76177
4,0.4285,0.86417,0.7709,0.784978,0.7709,0.772547
5,0.3266,0.858757,0.7684,0.783719,0.7684,0.768489
6,0.2573,0.841922,0.7731,0.787883,0.7731,0.774489
7,0.2134,0.786982,0.7877,0.793719,0.7877,0.787431
8,0.1813,0.773333,0.7941,0.803232,0.7941,0.794312
9,0.1573,0.761475,0.7933,0.801901,0.7933,0.794336
10,0.1429,0.760793,0.794,0.800758,0.794,0.794337


[I 2025-04-03 01:09:35,106] Trial 44 finished with value: 0.7943374153030396 and parameters: {'learning_rate': 0.0005253938499055099, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 41 with value: 0.7976536507044625.


Trial 45 with params: {'learning_rate': 0.0006093255806103784, 'weight_decay': 0.004, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5064,1.083613,0.7096,0.719597,0.7096,0.707466
2,0.8477,0.963095,0.7436,0.756231,0.7436,0.741543
3,0.5986,0.874531,0.7659,0.779861,0.7659,0.765829
4,0.4478,0.872354,0.7626,0.77495,0.7626,0.763624


[I 2025-04-03 01:15:29,533] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0004055078422103245, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5075,1.048098,0.7213,0.730215,0.7213,0.718863
2,0.7982,0.909822,0.7531,0.765629,0.7531,0.752494
3,0.5516,0.857816,0.7725,0.785683,0.7725,0.772579
4,0.4074,0.832575,0.7767,0.791169,0.7767,0.777986
5,0.3112,0.847889,0.7744,0.789206,0.7744,0.774734
6,0.2467,0.822472,0.7835,0.794732,0.7835,0.784152
7,0.2065,0.783662,0.7918,0.798024,0.7918,0.791212
8,0.1778,0.777815,0.7882,0.797483,0.7882,0.789291
9,0.1566,0.767222,0.7932,0.802361,0.7932,0.79397
10,0.1432,0.756299,0.7967,0.802616,0.7967,0.79679


[I 2025-04-03 01:30:00,343] Trial 46 finished with value: 0.7967898445635785 and parameters: {'learning_rate': 0.0004055078422103245, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 47 with params: {'learning_rate': 0.00037490649098330484, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5234,1.034215,0.728,0.737123,0.728,0.726323
2,0.7895,0.904735,0.7551,0.764377,0.7551,0.753085
3,0.5444,0.87107,0.7644,0.780187,0.7644,0.764205
4,0.4022,0.837023,0.7698,0.781872,0.7698,0.770763
5,0.3087,0.850972,0.7695,0.784316,0.7695,0.769556
6,0.2455,0.813364,0.7794,0.792273,0.7794,0.780965
7,0.206,0.787343,0.7863,0.79174,0.7863,0.785748
8,0.1778,0.775704,0.7877,0.795808,0.7877,0.787664


[I 2025-04-03 01:41:21,352] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.00010293611701022262, 'weight_decay': 0.005, 'warmup_steps': 5, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1137,1.244541,0.6837,0.687576,0.6837,0.679632
2,1.0502,1.016575,0.7358,0.745785,0.7358,0.734858


[I 2025-04-03 01:44:06,905] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0009035239021453025, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5664,1.175666,0.6929,0.7057,0.6929,0.690626
2,0.9508,1.025532,0.7228,0.738762,0.7228,0.720605
3,0.6947,0.95617,0.7464,0.758258,0.7464,0.744381
4,0.5316,0.893422,0.7562,0.767993,0.7562,0.756433


[I 2025-04-03 01:49:54,089] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.00026102834587097407, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6182,1.014216,0.733,0.740253,0.733,0.731261
2,0.8045,0.885023,0.7643,0.772933,0.7643,0.763504
3,0.5541,0.859959,0.7679,0.779607,0.7679,0.767567
4,0.4092,0.832151,0.7755,0.78614,0.7755,0.776243
5,0.3137,0.844088,0.7702,0.781248,0.7702,0.769956
6,0.252,0.826609,0.7741,0.78581,0.7741,0.775514
7,0.2135,0.80306,0.7789,0.785778,0.7789,0.778775
8,0.1862,0.799622,0.7859,0.794216,0.7859,0.786196


[I 2025-04-03 02:01:30,722] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00046560532599103875, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5733,1.052157,0.7176,0.728391,0.7176,0.716074
2,0.8166,0.930593,0.751,0.764422,0.751,0.749458
3,0.5692,0.864958,0.7644,0.776896,0.7644,0.763836
4,0.4221,0.83545,0.7765,0.789641,0.7765,0.777951
5,0.3206,0.84972,0.7732,0.789463,0.7732,0.772563
6,0.2549,0.830792,0.7758,0.787493,0.7758,0.776934
7,0.2116,0.786516,0.7814,0.787861,0.7814,0.781309
8,0.1793,0.772422,0.7896,0.799807,0.7896,0.790552


[I 2025-04-03 02:12:56,191] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.00034280320089867777, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5555,1.028831,0.7252,0.733806,0.7252,0.723549
2,0.7943,0.896038,0.7565,0.765949,0.7565,0.755234
3,0.5454,0.854584,0.7697,0.780561,0.7697,0.768924
4,0.4018,0.825329,0.7782,0.790748,0.7782,0.779664
5,0.3065,0.849302,0.7714,0.785918,0.7714,0.771615
6,0.2441,0.833185,0.7762,0.788928,0.7762,0.77765
7,0.2065,0.800974,0.7827,0.790367,0.7827,0.782588
8,0.1784,0.786965,0.7857,0.794347,0.7857,0.785916


[I 2025-04-03 02:24:25,446] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0011687448424065178, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6606,1.297336,0.658,0.6738,0.658,0.653902
2,1.0406,1.099425,0.7057,0.725978,0.7057,0.702619


[I 2025-04-03 02:27:09,182] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.00038353433987985434, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5504,1.035674,0.7234,0.733366,0.7234,0.721699
2,0.7973,0.898691,0.7583,0.766559,0.7583,0.756736
3,0.5472,0.857642,0.7714,0.784256,0.7714,0.770853
4,0.4052,0.822166,0.7799,0.793342,0.7799,0.780943
5,0.3083,0.845369,0.7681,0.78386,0.7681,0.768235
6,0.2459,0.817873,0.7778,0.7902,0.7778,0.779482
7,0.2064,0.789258,0.786,0.793015,0.786,0.78612
8,0.1778,0.781906,0.7841,0.79328,0.7841,0.784942


[I 2025-04-03 02:38:13,415] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 7.242888062473813e-05, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4322,1.431142,0.6507,0.654622,0.6507,0.644096
2,1.22,1.118451,0.7142,0.724855,0.7142,0.712815


[I 2025-04-03 02:41:05,789] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.00045212925624749773, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4957,1.054515,0.7194,0.729894,0.7194,0.717723
2,0.8029,0.9088,0.757,0.769357,0.757,0.755672
3,0.5577,0.867087,0.7666,0.783614,0.7666,0.766832
4,0.4154,0.832014,0.777,0.789207,0.777,0.777631
5,0.3164,0.830248,0.7775,0.789469,0.7775,0.776574
6,0.2524,0.816859,0.7814,0.79262,0.7814,0.782702
7,0.2094,0.775196,0.7922,0.79808,0.7922,0.792228
8,0.1786,0.755583,0.7932,0.802375,0.7932,0.793808
9,0.1571,0.754452,0.7947,0.802969,0.7947,0.795379
10,0.1431,0.749223,0.7954,0.802082,0.7954,0.795738


[I 2025-04-03 02:55:19,862] Trial 56 finished with value: 0.795737830314855 and parameters: {'learning_rate': 0.00045212925624749773, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 57 with params: {'learning_rate': 0.0010151189787525263, 'weight_decay': 0.002, 'warmup_steps': 6, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6085,1.233446,0.6758,0.691099,0.6758,0.672562
2,0.9923,1.055997,0.7188,0.740457,0.7188,0.717643


[I 2025-04-03 02:58:12,982] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0003972648594251077, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5142,1.04524,0.7201,0.728057,0.7201,0.717851
2,0.7948,0.893845,0.7606,0.76953,0.7606,0.758374
3,0.5462,0.871155,0.7638,0.779667,0.7638,0.764184
4,0.4029,0.831791,0.776,0.787999,0.776,0.77703
5,0.3085,0.849209,0.7698,0.783769,0.7698,0.769391
6,0.2462,0.826365,0.7748,0.785766,0.7748,0.776151
7,0.2058,0.784996,0.7854,0.792145,0.7854,0.785505
8,0.1779,0.77949,0.7888,0.798053,0.7888,0.78943


[I 2025-04-03 03:09:29,813] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0006307027249240012, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4965,1.08592,0.7124,0.723444,0.7124,0.709832
2,0.8532,0.970057,0.7423,0.754044,0.7423,0.739686
3,0.6079,0.892259,0.7646,0.779183,0.7646,0.764309
4,0.4555,0.866517,0.7661,0.781112,0.7661,0.767665


[I 2025-04-03 03:15:06,309] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.0003020939879565185, 'weight_decay': 0.005, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6592,1.030829,0.73,0.737241,0.73,0.728303
2,0.8048,0.887583,0.7631,0.772376,0.7631,0.762053
3,0.5528,0.854843,0.7674,0.780885,0.7674,0.766582
4,0.4068,0.818587,0.7798,0.79373,0.7798,0.781623
5,0.3105,0.830735,0.7703,0.783282,0.7703,0.769685
6,0.249,0.813006,0.781,0.790874,0.781,0.781895
7,0.2096,0.791908,0.7836,0.789448,0.7836,0.782949
8,0.1817,0.789536,0.787,0.797243,0.787,0.787832


[I 2025-04-03 03:26:24,996] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0007590728685920777, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5227,1.121884,0.7084,0.720051,0.7084,0.706685
2,0.8978,0.987592,0.7384,0.751305,0.7384,0.736644
3,0.6452,0.917334,0.7577,0.770252,0.7577,0.757317
4,0.4894,0.886004,0.7592,0.772391,0.7592,0.760187


[I 2025-04-03 03:32:15,401] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0006728491105705886, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5098,1.112381,0.7099,0.718363,0.7099,0.707003
2,0.8729,0.974013,0.7413,0.755139,0.7413,0.739581
3,0.6203,0.900764,0.7582,0.772463,0.7582,0.758187
4,0.4676,0.870221,0.7647,0.778352,0.7647,0.765709


[I 2025-04-03 03:37:59,441] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0007470074115115749, 'weight_decay': 0.001, 'warmup_steps': 5, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5305,1.109027,0.7106,0.719344,0.7106,0.708173
2,0.8902,0.984769,0.7326,0.749139,0.7326,0.731419
3,0.644,0.91369,0.757,0.772785,0.757,0.757175
4,0.4871,0.894109,0.7628,0.77711,0.7628,0.764262


[I 2025-04-03 03:43:33,102] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00041098534227771127, 'weight_decay': 0.008, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5812,1.033357,0.7239,0.733385,0.7239,0.722312
2,0.8038,0.903171,0.7593,0.769975,0.7593,0.758025
3,0.5551,0.85665,0.7684,0.780972,0.7684,0.76815
4,0.409,0.842339,0.771,0.784347,0.771,0.771753
5,0.3123,0.850051,0.7668,0.782303,0.7668,0.766818
6,0.2496,0.821316,0.7804,0.791104,0.7804,0.780308
7,0.2088,0.782147,0.7923,0.798495,0.7923,0.791663
8,0.1787,0.780141,0.7881,0.797312,0.7881,0.788545


[I 2025-04-03 03:54:42,533] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0004684615730974478, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5007,1.050309,0.7227,0.730523,0.7227,0.72079
2,0.8097,0.916392,0.7567,0.768375,0.7567,0.755069
3,0.5635,0.866189,0.7666,0.779664,0.7666,0.766033
4,0.4199,0.836141,0.7715,0.784727,0.7715,0.773016
5,0.317,0.862657,0.7666,0.783785,0.7666,0.767642
6,0.2524,0.82974,0.7798,0.792766,0.7798,0.780866
7,0.2091,0.79581,0.785,0.790978,0.785,0.78499
8,0.1793,0.782427,0.7888,0.798086,0.7888,0.789306
9,0.1568,0.762583,0.7905,0.797999,0.7905,0.791112
10,0.1428,0.760096,0.7945,0.801192,0.7945,0.79514


[I 2025-04-03 04:09:01,937] Trial 65 finished with value: 0.7951395212902983 and parameters: {'learning_rate': 0.0004684615730974478, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 66 with params: {'learning_rate': 0.00035646078545282335, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.53,1.032126,0.7249,0.733619,0.7249,0.723515
2,0.7897,0.890689,0.7627,0.770673,0.7627,0.760667
3,0.5421,0.858647,0.7686,0.781683,0.7686,0.768414
4,0.4008,0.838557,0.7736,0.7877,0.7736,0.774722
5,0.3059,0.842132,0.7732,0.786476,0.7732,0.773183
6,0.245,0.823209,0.7731,0.783556,0.7731,0.774232
7,0.2062,0.794857,0.7847,0.79116,0.7847,0.784254
8,0.1779,0.785176,0.7861,0.796797,0.7861,0.786648


[I 2025-04-03 04:20:09,935] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0003505606704200355, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5381,1.028403,0.7273,0.737744,0.7273,0.726165
2,0.7877,0.891376,0.757,0.76788,0.757,0.756077
3,0.544,0.858126,0.7652,0.78252,0.7652,0.765267
4,0.4018,0.830477,0.7732,0.785329,0.7732,0.77401
5,0.3068,0.839904,0.7693,0.785145,0.7693,0.769151
6,0.2454,0.827803,0.7749,0.788063,0.7749,0.776489
7,0.2063,0.789877,0.7851,0.791029,0.7851,0.784505
8,0.1781,0.782741,0.7877,0.796937,0.7877,0.788376
9,0.1589,0.773026,0.7858,0.792753,0.7858,0.786236
10,0.1459,0.772683,0.7856,0.793467,0.7856,0.78582


[I 2025-04-03 04:34:47,397] Trial 67 finished with value: 0.7858197542831361 and parameters: {'learning_rate': 0.0003505606704200355, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 41 with value: 0.7976536507044625.


Trial 68 with params: {'learning_rate': 0.00028694394204558017, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5762,1.022018,0.7294,0.738451,0.7294,0.728303
2,0.7957,0.892995,0.7593,0.769014,0.7593,0.757765
3,0.5471,0.861778,0.7659,0.781909,0.7659,0.766285
4,0.4038,0.832722,0.7762,0.788929,0.7762,0.777437
5,0.3101,0.849783,0.7707,0.782552,0.7707,0.770535
6,0.2481,0.845267,0.7739,0.786522,0.7739,0.775102
7,0.2099,0.812582,0.7792,0.785908,0.7792,0.779083
8,0.1827,0.800838,0.7842,0.793838,0.7842,0.784743


[I 2025-04-03 04:45:58,141] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0003183848757718585, 'weight_decay': 0.01, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5916,1.021599,0.7317,0.740892,0.7317,0.730307
2,0.7969,0.883672,0.7634,0.771404,0.7634,0.761819
3,0.5461,0.85494,0.7704,0.783547,0.7704,0.770109
4,0.4043,0.81301,0.7776,0.790531,0.7776,0.778586
5,0.3081,0.838739,0.7755,0.786349,0.7755,0.775228
6,0.2464,0.82091,0.7774,0.786975,0.7774,0.778398
7,0.2075,0.7944,0.7808,0.786786,0.7808,0.780678
8,0.18,0.788524,0.7838,0.793192,0.7838,0.784349


[I 2025-04-03 04:57:28,208] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00045916045331831156, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4999,1.051926,0.7217,0.729914,0.7217,0.718546
2,0.8069,0.938388,0.7474,0.761171,0.7474,0.746184
3,0.5588,0.859227,0.7738,0.786861,0.7738,0.773555
4,0.4136,0.84653,0.7738,0.78697,0.7738,0.775402
5,0.3165,0.861857,0.7698,0.78444,0.7698,0.768979
6,0.2507,0.828645,0.7746,0.785563,0.7746,0.775489
7,0.2088,0.783777,0.7881,0.794981,0.7881,0.788348
8,0.1787,0.784361,0.7873,0.79748,0.7873,0.787744


[I 2025-04-03 05:09:02,037] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0005457294291189459, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4895,1.076505,0.7183,0.730273,0.7183,0.716531
2,0.8254,0.950672,0.7456,0.757931,0.7456,0.743871
3,0.5807,0.893637,0.7581,0.772993,0.7581,0.757991
4,0.4351,0.847091,0.7702,0.783285,0.7702,0.771207


[I 2025-04-03 05:14:50,600] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0002676640403688576, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6315,1.024076,0.7278,0.734477,0.7278,0.725558
2,0.807,0.900175,0.7569,0.765997,0.7569,0.755391
3,0.5552,0.854723,0.7673,0.779396,0.7673,0.766929
4,0.4094,0.834271,0.7707,0.782357,0.7707,0.771325


[I 2025-04-03 05:20:29,205] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.001702405101737721, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8439,1.463202,0.6185,0.637097,0.6185,0.61509
2,1.21,1.246048,0.6698,0.691366,0.6698,0.667691
3,0.9228,1.120249,0.7004,0.722521,0.7004,0.700358
4,0.7235,1.023896,0.7232,0.74163,0.7232,0.72467


[I 2025-04-03 05:26:18,482] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0005054237674155278, 'weight_decay': 0.004, 'warmup_steps': 11, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5202,1.055959,0.7177,0.728995,0.7177,0.715203
2,0.8187,0.920689,0.7506,0.762045,0.7506,0.748659
3,0.5737,0.884798,0.7622,0.781673,0.7622,0.763432
4,0.4246,0.84877,0.7727,0.784608,0.7727,0.773313
5,0.3226,0.858424,0.7721,0.789461,0.7721,0.772495
6,0.255,0.831946,0.7726,0.786664,0.7726,0.774301
7,0.2109,0.785809,0.7883,0.794029,0.7883,0.788162
8,0.1793,0.782021,0.7869,0.795871,0.7869,0.787347


[I 2025-04-03 05:37:21,630] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00012203923810149627, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.995,1.176405,0.6971,0.701827,0.6971,0.693993
2,0.9834,0.982194,0.7414,0.751563,0.7414,0.740928
3,0.7128,0.908846,0.7566,0.767956,0.7566,0.756266
4,0.5539,0.878698,0.7598,0.768424,0.7598,0.759676


[I 2025-04-03 05:43:07,956] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.00032423698784873585, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6086,1.024937,0.73,0.737043,0.73,0.728135
2,0.7999,0.897244,0.7601,0.7693,0.7601,0.758694
3,0.5487,0.851255,0.7676,0.779312,0.7676,0.767509
4,0.404,0.822604,0.7788,0.790773,0.7788,0.779782
5,0.3093,0.843953,0.7709,0.785544,0.7709,0.771506
6,0.2468,0.83678,0.7736,0.786017,0.7736,0.7745
7,0.207,0.798705,0.7853,0.790103,0.7853,0.784827
8,0.1799,0.797096,0.7833,0.793539,0.7833,0.784176


[I 2025-04-03 05:54:45,835] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0007147452942038555, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.532,1.137464,0.6996,0.709573,0.6996,0.695642
2,0.8863,0.987463,0.7386,0.752077,0.7386,0.736219
3,0.6358,0.90872,0.7602,0.773143,0.7602,0.759528
4,0.4821,0.874616,0.7698,0.782756,0.7698,0.770438


[I 2025-04-03 06:00:39,450] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.00035036902004205526, 'weight_decay': 0.005, 'warmup_steps': 9, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.565,1.02846,0.7264,0.735074,0.7264,0.724901
2,0.7939,0.884782,0.7628,0.770995,0.7628,0.761002
3,0.5463,0.849595,0.7682,0.782124,0.7682,0.768121
4,0.4038,0.82317,0.7784,0.788816,0.7784,0.779272
5,0.309,0.829992,0.7725,0.783149,0.7725,0.771974
6,0.2457,0.818417,0.7762,0.788363,0.7762,0.777589
7,0.2072,0.802788,0.7824,0.789085,0.7824,0.782155
8,0.1785,0.792326,0.7845,0.793601,0.7845,0.785106


[I 2025-04-03 06:12:10,662] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 5.902380787515226e-05, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6077,1.573183,0.6248,0.631625,0.6248,0.616116
2,1.3378,1.191073,0.7014,0.711783,0.7014,0.699495


[I 2025-04-03 06:15:07,532] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0029063834285411286, 'weight_decay': 0.01, 'warmup_steps': 6, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1081,1.67232,0.5604,0.580626,0.5604,0.552561
2,1.4399,1.35886,0.6426,0.66235,0.6426,0.638386
3,1.1365,1.239005,0.6722,0.69169,0.6722,0.670441
4,0.9197,1.140461,0.6948,0.709839,0.6948,0.695031


[I 2025-04-03 06:20:42,391] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0002983245578253742, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.581,1.016023,0.7273,0.73657,0.7273,0.725981
2,0.7972,0.901461,0.7567,0.767634,0.7567,0.755789
3,0.5458,0.858674,0.7701,0.784577,0.7701,0.77
4,0.4019,0.831659,0.7717,0.785855,0.7717,0.77348
5,0.3078,0.846965,0.7696,0.782732,0.7696,0.769544
6,0.2474,0.821438,0.7775,0.788458,0.7775,0.778509
7,0.2085,0.800672,0.7789,0.7841,0.7789,0.778387
8,0.1818,0.799111,0.7877,0.797144,0.7877,0.788278
9,0.1623,0.788165,0.7858,0.793558,0.7858,0.786608
10,0.1499,0.789443,0.7869,0.794043,0.7869,0.787268


[I 2025-04-03 06:35:04,008] Trial 81 finished with value: 0.7872677412251712 and parameters: {'learning_rate': 0.0002983245578253742, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 82 with params: {'learning_rate': 0.0002009108269384491, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7106,1.050096,0.725,0.732208,0.725,0.723021
2,0.843,0.91485,0.754,0.762488,0.754,0.752738
3,0.5869,0.866136,0.7665,0.779091,0.7665,0.766679
4,0.4376,0.842075,0.7724,0.782846,0.7724,0.773147
5,0.34,0.850136,0.7704,0.782012,0.7704,0.770205
6,0.2747,0.858269,0.7711,0.783855,0.7711,0.772674
7,0.2329,0.825509,0.7752,0.78291,0.7752,0.774829
8,0.2032,0.819628,0.7775,0.785746,0.7775,0.777446


[I 2025-04-03 06:46:36,162] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.00043031298943598515, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.511,1.044925,0.7192,0.728375,0.7192,0.717117
2,0.7993,0.918005,0.7531,0.763938,0.7531,0.751078
3,0.5519,0.871346,0.7667,0.78332,0.7667,0.766974
4,0.4088,0.850735,0.771,0.787457,0.771,0.772854


[I 2025-04-03 06:52:30,848] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.00024307283925812103, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6638,1.032651,0.7286,0.735927,0.7286,0.72661
2,0.8187,0.897101,0.7597,0.769132,0.7597,0.759111
3,0.5643,0.860344,0.7663,0.780456,0.7663,0.766608
4,0.4176,0.82561,0.778,0.79056,0.778,0.77919
5,0.3209,0.841103,0.7712,0.78458,0.7712,0.771566
6,0.2582,0.84902,0.7729,0.784437,0.7729,0.773615
7,0.2187,0.802195,0.781,0.785962,0.781,0.780707
8,0.1904,0.802522,0.7808,0.790512,0.7808,0.781749


[I 2025-04-03 07:04:09,510] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0001528070886920543, 'weight_decay': 0.0, 'warmup_steps': 16, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8892,1.108318,0.7166,0.722994,0.7166,0.714254
2,0.9137,0.942403,0.7478,0.756803,0.7478,0.746717
3,0.6485,0.883839,0.7617,0.774385,0.7617,0.761865
4,0.493,0.85184,0.7675,0.776716,0.7675,0.767947


[I 2025-04-03 07:09:49,880] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0003474505044733744, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5623,1.025397,0.7256,0.736467,0.7256,0.724454
2,0.7941,0.895946,0.7582,0.768807,0.7582,0.756967
3,0.5487,0.856161,0.7699,0.786408,0.7699,0.770275
4,0.4041,0.819368,0.7763,0.788962,0.7763,0.777511
5,0.3083,0.835718,0.7753,0.788212,0.7753,0.775689
6,0.2473,0.823041,0.7762,0.789168,0.7762,0.777793
7,0.207,0.792381,0.7845,0.790612,0.7845,0.784335
8,0.1792,0.783046,0.7895,0.799954,0.7895,0.790159
9,0.1589,0.773056,0.7882,0.796757,0.7882,0.789122
10,0.1462,0.767048,0.7908,0.797616,0.7908,0.791478


[I 2025-04-03 07:24:22,406] Trial 86 finished with value: 0.7914784896831583 and parameters: {'learning_rate': 0.0003474505044733744, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}. Best is trial 41 with value: 0.7976536507044625.


Trial 87 with params: {'learning_rate': 0.00018233150992508034, 'weight_decay': 0.004, 'warmup_steps': 10, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7865,1.070119,0.7178,0.724058,0.7178,0.715463
2,0.8699,0.922922,0.7512,0.76023,0.7512,0.750239
3,0.6105,0.879518,0.7637,0.776328,0.7637,0.763719


[I 2025-04-03 07:44:23,263] Trial 88 finished with value: 0.7946999533350013 and parameters: {'learning_rate': 0.000471402699620968, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 41 with value: 0.7976536507044625.


Trial 89 with params: {'learning_rate': 0.0005985024027356293, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5137,1.076301,0.7119,0.723139,0.7119,0.709578
2,0.8432,0.95018,0.7497,0.760074,0.7497,0.74753
3,0.597,0.899427,0.7608,0.776148,0.7608,0.760767
4,0.4484,0.874225,0.7649,0.78038,0.7649,0.765885


[I 2025-04-03 07:50:05,132] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.000928277511187833, 'weight_decay': 0.01, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6037,1.183908,0.6896,0.699872,0.6896,0.685194
2,0.9617,1.051336,0.7191,0.733512,0.7191,0.714825
3,0.705,0.983706,0.7368,0.753654,0.7368,0.736919
4,0.536,0.894787,0.7574,0.76773,0.7574,0.757486


[I 2025-04-03 07:55:52,396] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0003446050030525901, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5322,1.037816,0.7219,0.731039,0.7219,0.720626
2,0.7926,0.901739,0.7588,0.767757,0.7588,0.757176
3,0.5452,0.868561,0.7662,0.7816,0.7662,0.766122
4,0.4004,0.839619,0.7701,0.784697,0.7701,0.77168


[I 2025-04-03 08:01:35,478] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0015837356481811218, 'weight_decay': 0.006, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.793,1.386322,0.6332,0.651843,0.6332,0.627681
2,1.1635,1.190824,0.6836,0.705816,0.6836,0.681124
3,0.8841,1.075666,0.7092,0.731135,0.7092,0.709176
4,0.6903,0.992255,0.7384,0.75364,0.7384,0.738732


[I 2025-04-03 08:06:57,787] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0008050258424201395, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5428,1.162439,0.6926,0.707107,0.6926,0.689831
2,0.9155,0.99801,0.7354,0.749062,0.7354,0.733584


[I 2025-04-03 08:09:42,297] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 9.222884403020037e-05, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1994,1.294744,0.6751,0.678415,0.6751,0.670917
2,1.0979,1.044843,0.7289,0.739106,0.7289,0.728011
3,0.8149,0.944162,0.7493,0.75851,0.7493,0.74843
4,0.6512,0.907721,0.7535,0.760947,0.7535,0.753094


[I 2025-04-03 08:15:23,203] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0007756146753848378, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5843,1.136965,0.7031,0.715404,0.7031,0.700667
2,0.9125,1.015379,0.7304,0.74521,0.7304,0.728036


[I 2025-04-03 08:18:12,008] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6825,1.638576,0.61,0.617817,0.61,0.599549
2,1.3929,1.227008,0.6936,0.703659,0.6936,0.691321
3,1.0609,1.069435,0.7197,0.728201,0.7197,0.718013
4,0.882,1.006242,0.7324,0.739147,0.7324,0.730819


[I 2025-04-03 08:23:56,818] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.00029470240760401894, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5893,1.023163,0.7276,0.737917,0.7276,0.726627
2,0.7949,0.895487,0.7585,0.767131,0.7585,0.756825
3,0.5464,0.860679,0.7674,0.780796,0.7674,0.767744
4,0.4017,0.833669,0.7755,0.788345,0.7755,0.77663
5,0.308,0.846829,0.7722,0.785943,0.7722,0.77262
6,0.2477,0.844283,0.7736,0.786514,0.7736,0.774937
7,0.2092,0.797583,0.7827,0.788732,0.7827,0.782429
8,0.1818,0.792622,0.7836,0.793499,0.7836,0.784344


[I 2025-04-03 08:35:27,827] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0008935329870063688, 'weight_decay': 0.004, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5755,1.186803,0.6843,0.697526,0.6843,0.681884
2,0.9453,1.033217,0.7209,0.741974,0.7209,0.718502


[I 2025-04-03 08:38:17,549] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0003696052957835063, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5354,1.031643,0.7268,0.736516,0.7268,0.724814
2,0.7934,0.897313,0.7618,0.772519,0.7618,0.760896
3,0.5457,0.858559,0.7706,0.78626,0.7706,0.770115
4,0.4025,0.827106,0.7765,0.788243,0.7765,0.77749
5,0.3083,0.838589,0.7746,0.787732,0.7746,0.774433
6,0.2459,0.830711,0.7731,0.783214,0.7731,0.773721
7,0.2067,0.790642,0.7869,0.793174,0.7869,0.786665
8,0.1784,0.783611,0.7867,0.796497,0.7867,0.787255
9,0.1581,0.772488,0.7873,0.796329,0.7873,0.788378
10,0.1449,0.768148,0.7885,0.794929,0.7885,0.788879


[I 2025-04-03 08:52:24,996] Trial 99 finished with value: 0.7888794548905014 and parameters: {'learning_rate': 0.0003696052957835063, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 100 with params: {'learning_rate': 0.0006072500353912349, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5269,1.095706,0.7121,0.721893,0.7121,0.709894
2,0.8477,0.959768,0.7421,0.755499,0.7421,0.740804
3,0.5975,0.877626,0.7635,0.778188,0.7635,0.764002
4,0.4488,0.853564,0.7727,0.785954,0.7727,0.773723
5,0.3421,0.858457,0.7684,0.780308,0.7684,0.767203
6,0.2678,0.815379,0.7809,0.79381,0.7809,0.782538
7,0.2191,0.776725,0.7889,0.796119,0.7889,0.789333
8,0.1849,0.764655,0.7939,0.800951,0.7939,0.794122
9,0.1589,0.754827,0.7949,0.804651,0.7949,0.795965
10,0.1439,0.750541,0.7967,0.804306,0.7967,0.797076


[I 2025-04-03 09:06:30,086] Trial 100 finished with value: 0.7970759758335499 and parameters: {'learning_rate': 0.0006072500353912349, 'weight_decay': 0.007, 'warmup_steps': 13, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 41 with value: 0.7976536507044625.


Trial 101 with params: {'learning_rate': 0.00031502971397332646, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.555,1.025601,0.7299,0.738815,0.7299,0.728457
2,0.7922,0.906086,0.7602,0.770088,0.7602,0.7597
3,0.5438,0.857665,0.7698,0.781909,0.7698,0.769657
4,0.4011,0.837419,0.7732,0.786895,0.7732,0.774864
5,0.3067,0.851485,0.7671,0.78148,0.7671,0.767079
6,0.2459,0.827281,0.7738,0.785062,0.7738,0.775041
7,0.2068,0.79769,0.7842,0.789621,0.7842,0.783874
8,0.18,0.795778,0.7821,0.79223,0.7821,0.782626


[I 2025-04-03 09:17:53,139] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0001288686614173822, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9766,1.15654,0.7052,0.710822,0.7052,0.702159
2,0.9654,0.974011,0.7424,0.752283,0.7424,0.741396
3,0.6957,0.898619,0.7612,0.774009,0.7612,0.761405
4,0.5371,0.877446,0.7618,0.770865,0.7618,0.761848


[I 2025-04-03 09:23:45,967] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.000970360500299281, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5963,1.229109,0.6767,0.691389,0.6767,0.674886
2,0.973,1.060246,0.7154,0.735402,0.7154,0.71316


[I 2025-04-03 09:26:41,005] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.00052807624021888, 'weight_decay': 0.008, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5294,1.07065,0.7161,0.72651,0.7161,0.714293
2,0.8281,0.940955,0.7524,0.765649,0.7524,0.751548
3,0.5777,0.886109,0.7616,0.777729,0.7616,0.761084
4,0.433,0.851754,0.7692,0.780521,0.7692,0.770233


[I 2025-04-03 09:32:29,778] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7522,1.375174,0.6367,0.657575,0.6367,0.63229
2,1.1076,1.145895,0.6955,0.721766,0.6955,0.691819


[I 2025-04-03 09:35:24,340] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0009917189464065082, 'weight_decay': 0.007, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6093,1.228127,0.6743,0.685115,0.6743,0.66949
2,0.9794,1.048191,0.7208,0.738374,0.7208,0.719103
3,0.7257,0.974072,0.7428,0.762223,0.7428,0.743043
4,0.5524,0.913091,0.7563,0.770878,0.7563,0.758033


[I 2025-04-03 09:41:00,981] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0002455412572573596, 'weight_decay': 0.008, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6514,1.027079,0.7286,0.73592,0.7286,0.726986
2,0.8181,0.894205,0.762,0.770272,0.762,0.761291
3,0.5629,0.862325,0.7648,0.778033,0.7648,0.764369
4,0.4153,0.826181,0.7743,0.786221,0.7743,0.77544
5,0.3215,0.844689,0.776,0.789561,0.776,0.77613
6,0.2577,0.836222,0.7739,0.785435,0.7739,0.774948
7,0.2183,0.800596,0.7844,0.790655,0.7844,0.78411
8,0.1901,0.800097,0.7848,0.794569,0.7848,0.785415


[I 2025-04-03 09:51:52,439] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.00025047894986448546, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6517,1.030155,0.7262,0.734431,0.7262,0.724475
2,0.8128,0.895098,0.7604,0.769543,0.7604,0.758897
3,0.5602,0.866262,0.7658,0.781196,0.7658,0.766487
4,0.4126,0.829356,0.7779,0.788594,0.7779,0.778951
5,0.3167,0.847505,0.7701,0.782306,0.7701,0.770178
6,0.2553,0.837145,0.7735,0.784471,0.7735,0.774479
7,0.2157,0.806342,0.7808,0.786494,0.7808,0.780526
8,0.1879,0.805627,0.7815,0.791607,0.7815,0.782383


[I 2025-04-03 10:03:25,257] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.00031600956643731466, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5978,1.019311,0.7289,0.738594,0.7289,0.727729
2,0.7952,0.885558,0.7625,0.771395,0.7625,0.761248
3,0.5465,0.855895,0.77,0.784856,0.77,0.770421
4,0.4039,0.834444,0.775,0.7874,0.775,0.776091
5,0.3089,0.842336,0.7727,0.787092,0.7727,0.773008
6,0.2458,0.821995,0.7764,0.787685,0.7764,0.777634
7,0.2076,0.792869,0.783,0.790319,0.783,0.783101
8,0.1797,0.796765,0.7813,0.792397,0.7813,0.781873


[I 2025-04-03 10:14:30,938] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0003583046450453683, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.532,1.030318,0.7267,0.735687,0.7267,0.724657
2,0.7936,0.89959,0.757,0.765688,0.757,0.755265
3,0.5463,0.859935,0.768,0.782569,0.768,0.767977
4,0.4042,0.841541,0.7691,0.782311,0.7691,0.770104


[I 2025-04-03 10:20:24,242] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0004988049191073227, 'weight_decay': 0.002, 'warmup_steps': 6, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5067,1.039716,0.7261,0.73529,0.7261,0.724778
2,0.8122,0.935596,0.7452,0.757577,0.7452,0.743471


[I 2025-04-03 10:23:17,546] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0008748155000092894, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5482,1.207079,0.6777,0.693157,0.6777,0.674409
2,0.9396,1.0197,0.7296,0.750071,0.7296,0.729101
3,0.6837,0.954664,0.7359,0.752338,0.7359,0.735384
4,0.5212,0.900473,0.7583,0.770495,0.7583,0.759368


[I 2025-04-03 10:28:59,154] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 7.237269287150811e-05, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3736,1.418782,0.6507,0.654394,0.6507,0.644447
2,1.2113,1.11354,0.7159,0.726253,0.7159,0.714747


[I 2025-04-03 10:31:50,447] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0026456473704649522, 'weight_decay': 0.008, 'warmup_steps': 26, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.053,1.656789,0.569,0.590222,0.569,0.561357
2,1.3838,1.344704,0.644,0.668964,0.644,0.64196
3,1.0813,1.183438,0.6829,0.698588,0.6829,0.680772
4,0.8657,1.121364,0.6993,0.717976,0.6993,0.700095


[I 2025-04-03 10:37:32,732] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.00037734598297808, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5319,1.032354,0.7226,0.732596,0.7226,0.720463
2,0.7936,0.906874,0.7608,0.770188,0.7608,0.75842
3,0.5455,0.857134,0.7684,0.783131,0.7684,0.768447
4,0.4046,0.842753,0.7736,0.78795,0.7736,0.775288
5,0.308,0.840697,0.7707,0.785681,0.7707,0.770218
6,0.2458,0.827767,0.7772,0.787498,0.7772,0.778101
7,0.206,0.793179,0.7837,0.789652,0.7837,0.783174
8,0.1781,0.785952,0.7835,0.792741,0.7835,0.783971


[I 2025-04-03 10:48:54,561] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0007487498568017309, 'weight_decay': 0.005, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5266,1.136399,0.7012,0.713514,0.7012,0.698103
2,0.8982,0.977869,0.7379,0.750406,0.7379,0.735865
3,0.6488,0.923237,0.75,0.769963,0.75,0.749761
4,0.4905,0.8687,0.7653,0.77964,0.7653,0.766949


[I 2025-04-03 10:54:48,878] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00027584462589417874, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6182,1.02043,0.7291,0.736564,0.7291,0.727608
2,0.805,0.888139,0.7607,0.770769,0.7607,0.760171
3,0.5528,0.86521,0.7688,0.783691,0.7688,0.769176
4,0.4077,0.83483,0.7725,0.787447,0.7725,0.774327
5,0.3115,0.846133,0.7705,0.784745,0.7705,0.770826
6,0.2498,0.833033,0.7733,0.783923,0.7733,0.774149
7,0.2106,0.804059,0.7824,0.787443,0.7824,0.781923
8,0.1834,0.79853,0.7832,0.792795,0.7832,0.783841


[I 2025-04-03 11:06:07,495] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00039102001183633834, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5585,1.029563,0.728,0.737865,0.728,0.726195
2,0.803,0.913985,0.7505,0.762577,0.7505,0.749411
3,0.5517,0.879334,0.7632,0.779605,0.7632,0.763444
4,0.408,0.825465,0.7783,0.790082,0.7783,0.778855
5,0.3115,0.851494,0.7691,0.784771,0.7691,0.768617
6,0.2475,0.816539,0.781,0.791464,0.781,0.781865
7,0.2074,0.784394,0.7894,0.795486,0.7894,0.788949
8,0.1794,0.78682,0.7847,0.795572,0.7847,0.785542


[I 2025-04-03 11:17:32,203] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.00019468211892604518, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.727,1.055566,0.7264,0.733678,0.7264,0.72494
2,0.8494,0.911517,0.7553,0.763841,0.7553,0.754806
3,0.5907,0.865862,0.763,0.776339,0.763,0.763253
4,0.4411,0.846463,0.7665,0.777967,0.7665,0.767379


[I 2025-04-03 11:23:10,830] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00016104904333464902, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8611,1.095277,0.7189,0.725635,0.7189,0.71676
2,0.8971,0.937509,0.7511,0.760839,0.7511,0.750021
3,0.6348,0.883793,0.762,0.774853,0.762,0.762236
4,0.4803,0.850974,0.7683,0.778184,0.7683,0.768836


[I 2025-04-03 11:28:51,203] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2927,1.340519,0.6646,0.669287,0.6646,0.65954
2,1.1384,1.069219,0.7234,0.73422,0.7234,0.722311
3,0.8471,0.954788,0.7474,0.756813,0.7474,0.746357
4,0.6807,0.914559,0.7545,0.761992,0.7545,0.75404


[I 2025-04-03 11:34:28,486] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0005269600568909998, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4917,1.057202,0.7246,0.734207,0.7246,0.723438
2,0.8218,0.932353,0.7471,0.759475,0.7471,0.745587


[I 2025-04-03 11:37:21,482] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0003451580677471978, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.544,1.020678,0.7263,0.73492,0.7263,0.724371
2,0.789,0.901327,0.7574,0.768379,0.7574,0.756071
3,0.5436,0.847022,0.7713,0.78473,0.7713,0.771436
4,0.4011,0.819943,0.7754,0.786416,0.7754,0.776349
5,0.3068,0.83551,0.7768,0.79015,0.7768,0.776413
6,0.2449,0.834765,0.7765,0.787813,0.7765,0.777213
7,0.2057,0.782994,0.7865,0.793475,0.7865,0.786662
8,0.1791,0.7758,0.7856,0.793442,0.7856,0.785789


[I 2025-04-03 11:48:31,279] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0005303268808189231, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4895,1.058994,0.7221,0.733023,0.7221,0.719979
2,0.8269,0.929172,0.7521,0.764414,0.7521,0.750486
3,0.5769,0.881128,0.7626,0.781215,0.7626,0.762584
4,0.4326,0.859658,0.7682,0.783232,0.7682,0.76927


[I 2025-04-03 11:54:29,600] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0003910325152955209, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5459,1.044505,0.7213,0.731514,0.7213,0.71971
2,0.797,0.902183,0.7585,0.769001,0.7585,0.757291
3,0.5502,0.87521,0.7634,0.778397,0.7634,0.76336
4,0.4065,0.83298,0.7751,0.786796,0.7751,0.776041
5,0.3097,0.835567,0.7747,0.785408,0.7747,0.77405
6,0.245,0.833192,0.7741,0.786638,0.7741,0.77557
7,0.2068,0.790389,0.7837,0.79054,0.7837,0.784039
8,0.1778,0.782572,0.7879,0.798135,0.7879,0.788409
9,0.1577,0.77652,0.7892,0.799138,0.7892,0.79009
10,0.1438,0.770838,0.7859,0.793944,0.7859,0.786274


[I 2025-04-03 12:08:52,087] Trial 125 finished with value: 0.7862736060376101 and parameters: {'learning_rate': 0.0003910325152955209, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 126 with params: {'learning_rate': 0.0005430472318159993, 'weight_decay': 0.003, 'warmup_steps': 16, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5329,1.072703,0.7156,0.725544,0.7156,0.712753
2,0.8326,0.927144,0.7516,0.764238,0.7516,0.750031
3,0.5851,0.89136,0.7605,0.778106,0.7605,0.760347
4,0.4358,0.861083,0.768,0.780584,0.768,0.768537


[I 2025-04-03 12:14:32,219] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.00047125136560991014, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5178,1.055223,0.7235,0.733473,0.7235,0.72139
2,0.8096,0.921979,0.7522,0.763319,0.7522,0.750712
3,0.5628,0.871717,0.7655,0.779393,0.7655,0.765716
4,0.4187,0.858247,0.7696,0.784289,0.7696,0.770842
5,0.319,0.849403,0.7678,0.782882,0.7678,0.767804
6,0.2517,0.826541,0.7802,0.791143,0.7802,0.781132
7,0.2094,0.793969,0.7841,0.791261,0.7841,0.783807
8,0.1796,0.77024,0.7918,0.799456,0.7918,0.791961
9,0.1567,0.765346,0.7911,0.798752,0.7911,0.791317
10,0.1426,0.759958,0.7922,0.799536,0.7922,0.792611


[I 2025-04-03 12:29:05,031] Trial 127 finished with value: 0.792611150403494 and parameters: {'learning_rate': 0.00047125136560991014, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 128 with params: {'learning_rate': 0.00026581348581599117, 'weight_decay': 0.006, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6456,1.021499,0.73,0.737045,0.73,0.728673
2,0.8086,0.900644,0.758,0.767466,0.758,0.756953
3,0.5564,0.865762,0.7687,0.782464,0.7687,0.76923
4,0.4103,0.829628,0.7763,0.787933,0.7763,0.777319
5,0.3147,0.845791,0.7711,0.783154,0.7711,0.770685
6,0.2538,0.836835,0.7724,0.784707,0.7724,0.773828
7,0.2137,0.803599,0.782,0.788037,0.782,0.781703
8,0.1858,0.800237,0.7842,0.792343,0.7842,0.784484


[I 2025-04-03 12:40:22,337] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.00034373083795222295, 'weight_decay': 0.006, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5537,1.023685,0.7278,0.73666,0.7278,0.72652
2,0.793,0.8909,0.757,0.767764,0.757,0.755749
3,0.5453,0.865661,0.7675,0.782832,0.7675,0.76792
4,0.4017,0.826187,0.7759,0.787656,0.7759,0.77703
5,0.3051,0.843739,0.7709,0.78535,0.7709,0.771346
6,0.2438,0.816621,0.7773,0.789151,0.7773,0.778899
7,0.2062,0.788295,0.7884,0.793054,0.7884,0.788154
8,0.1787,0.787839,0.7874,0.795291,0.7874,0.787752
9,0.1589,0.781181,0.787,0.796711,0.787,0.78824
10,0.1457,0.775361,0.7892,0.796626,0.7892,0.789895


[I 2025-04-03 12:54:24,509] Trial 129 finished with value: 0.7898952922649464 and parameters: {'learning_rate': 0.00034373083795222295, 'weight_decay': 0.006, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 130 with params: {'learning_rate': 0.00031277249867263547, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5958,1.021242,0.7304,0.737991,0.7304,0.728662
2,0.7973,0.896781,0.7611,0.772242,0.7611,0.760075
3,0.5484,0.8758,0.7651,0.779352,0.7651,0.765011
4,0.4053,0.827312,0.7784,0.789914,0.7784,0.77962
5,0.3097,0.836156,0.7736,0.787515,0.7736,0.773874
6,0.2482,0.827129,0.7776,0.788785,0.7776,0.778866
7,0.2081,0.798065,0.7848,0.790624,0.7848,0.78465
8,0.1811,0.791859,0.7878,0.796325,0.7878,0.788125
9,0.1615,0.788302,0.7884,0.79682,0.7884,0.789441
10,0.1485,0.790455,0.7875,0.795303,0.7875,0.788154


[I 2025-04-03 13:08:28,902] Trial 130 finished with value: 0.7881540742274943 and parameters: {'learning_rate': 0.00031277249867263547, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 131 with params: {'learning_rate': 0.00025921784029672647, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6512,1.024293,0.729,0.736679,0.729,0.727747
2,0.8107,0.897076,0.7576,0.76686,0.7576,0.756623
3,0.5564,0.859085,0.7657,0.779486,0.7657,0.76566
4,0.4116,0.83036,0.7734,0.786171,0.7734,0.774911
5,0.3162,0.850174,0.7691,0.783581,0.7691,0.769329
6,0.2529,0.84785,0.768,0.779957,0.768,0.768983
7,0.2141,0.807771,0.7801,0.785805,0.7801,0.779719
8,0.1863,0.797906,0.782,0.790127,0.782,0.782372


[I 2025-04-03 13:20:01,210] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0022970136252600734, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9569,1.574136,0.594,0.61781,0.594,0.589964
2,1.3191,1.302973,0.6557,0.677809,0.6557,0.653238


[I 2025-04-03 13:22:47,294] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.000170347548034281, 'weight_decay': 0.005, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8222,1.089325,0.7152,0.722247,0.7152,0.713092
2,0.8864,0.927155,0.7533,0.761611,0.7533,0.751819
3,0.6226,0.880719,0.7624,0.774692,0.7624,0.7626
4,0.4688,0.8453,0.7702,0.778639,0.7702,0.769965


[I 2025-04-03 13:28:31,808] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0005707752893286469, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5063,1.070927,0.7138,0.721427,0.7138,0.710509
2,0.8366,0.926014,0.7523,0.763356,0.7523,0.750578
3,0.5846,0.887896,0.7607,0.776622,0.7607,0.760115
4,0.4394,0.838348,0.7717,0.782771,0.7717,0.771998


[I 2025-04-03 13:34:11,809] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0004220440562565481, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5096,1.041755,0.7227,0.730601,0.7227,0.720059
2,0.7959,0.910315,0.76,0.77034,0.76,0.758958
3,0.5516,0.879138,0.7654,0.780278,0.7654,0.765618
4,0.4066,0.832072,0.7762,0.788664,0.7762,0.777021
5,0.3109,0.841106,0.7711,0.786676,0.7711,0.771774
6,0.2475,0.842739,0.7729,0.785073,0.7729,0.773632
7,0.2067,0.788056,0.7875,0.793706,0.7875,0.787658
8,0.1784,0.778356,0.7894,0.798702,0.7894,0.790212
9,0.1569,0.768641,0.7913,0.800622,0.7913,0.792344
10,0.1428,0.765654,0.7906,0.797564,0.7906,0.791204


[I 2025-04-03 13:48:36,328] Trial 135 finished with value: 0.7912039689936428 and parameters: {'learning_rate': 0.0004220440562565481, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 136 with params: {'learning_rate': 0.0005141143445999837, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5033,1.060687,0.7192,0.728546,0.7192,0.716607
2,0.8213,0.940783,0.745,0.757481,0.745,0.743408


[I 2025-04-03 13:51:33,935] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.000583026189741506, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5087,1.079361,0.713,0.722021,0.713,0.710088
2,0.8373,0.953401,0.7479,0.764549,0.7479,0.747348
3,0.5929,0.898341,0.7596,0.775297,0.7596,0.759291
4,0.4419,0.843603,0.7713,0.784808,0.7713,0.772948
5,0.3375,0.86059,0.7738,0.787025,0.7738,0.772877
6,0.2648,0.839462,0.7724,0.785956,0.7724,0.774023
7,0.2171,0.786959,0.7891,0.796077,0.7891,0.789633
8,0.1833,0.766969,0.7927,0.800999,0.7927,0.79328
9,0.1585,0.756746,0.7973,0.804016,0.7973,0.797935
10,0.143,0.749,0.796,0.803015,0.796,0.796864


[I 2025-04-03 14:05:34,125] Trial 137 finished with value: 0.7968644528423117 and parameters: {'learning_rate': 0.000583026189741506, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 138 with params: {'learning_rate': 0.00024329658801313379, 'weight_decay': 0.008, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6625,1.026013,0.7297,0.738203,0.7297,0.728603
2,0.8163,0.891836,0.761,0.769552,0.761,0.760383
3,0.5625,0.867593,0.7643,0.77891,0.7643,0.764349
4,0.4165,0.829411,0.7732,0.786369,0.7732,0.774543
5,0.3205,0.840181,0.7748,0.787201,0.7748,0.774823
6,0.2586,0.831592,0.7758,0.785666,0.7758,0.776623
7,0.2183,0.801944,0.7828,0.787652,0.7828,0.781916
8,0.1905,0.801391,0.7841,0.792429,0.7841,0.783966


[I 2025-04-03 14:17:00,604] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.00047369539323735213, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5311,1.057057,0.7154,0.723722,0.7154,0.71235
2,0.8146,0.917818,0.7506,0.760113,0.7506,0.748222
3,0.5634,0.856611,0.7687,0.780304,0.7687,0.768234
4,0.4199,0.848017,0.7728,0.786006,0.7728,0.7741
5,0.3172,0.847676,0.7693,0.782752,0.7693,0.768502
6,0.2527,0.820895,0.7792,0.788883,0.7792,0.77939
7,0.209,0.790146,0.7855,0.792071,0.7855,0.785255
8,0.1789,0.774763,0.7903,0.79994,0.7903,0.790526
9,0.1564,0.763106,0.7921,0.799801,0.7921,0.792166
10,0.142,0.76027,0.7922,0.798635,0.7922,0.792531


[I 2025-04-03 14:31:11,913] Trial 139 finished with value: 0.7925305806622915 and parameters: {'learning_rate': 0.00047369539323735213, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 41 with value: 0.7976536507044625.


Trial 140 with params: {'learning_rate': 0.0006799575358945277, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.529,1.103698,0.7069,0.714783,0.7069,0.70475
2,0.8711,0.948156,0.7487,0.759366,0.7487,0.747496
3,0.6218,0.919163,0.7542,0.77149,0.7542,0.75343
4,0.4707,0.878731,0.7638,0.776604,0.7638,0.764493


[I 2025-04-03 14:36:46,503] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0005822659512883618, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5272,1.085661,0.715,0.724643,0.715,0.712194
2,0.8417,0.952022,0.7475,0.759387,0.7475,0.745249


[I 2025-04-03 14:39:37,016] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0008808511461478873, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5547,1.191018,0.688,0.703899,0.688,0.684029
2,0.9429,1.043761,0.7231,0.741195,0.7231,0.721266
3,0.6885,0.954081,0.7481,0.76749,0.7481,0.748651
4,0.5238,0.897767,0.7608,0.775181,0.7608,0.761886


[I 2025-04-03 14:44:56,317] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0002399007857128352, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6812,1.024658,0.7327,0.740029,0.7327,0.730773
2,0.8211,0.896205,0.7597,0.768434,0.7597,0.758428
3,0.565,0.859742,0.7674,0.779363,0.7674,0.767497
4,0.418,0.83519,0.7714,0.783677,0.7714,0.772509


[I 2025-04-03 14:50:48,349] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 5.8193477735771966e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5857,1.573852,0.6242,0.631463,0.6242,0.615426
2,1.3407,1.195839,0.6991,0.710388,0.6991,0.697579
3,1.021,1.045535,0.7275,0.735482,0.7275,0.725985
4,0.8459,0.987381,0.7379,0.744381,0.7379,0.736343


[I 2025-04-03 14:56:32,402] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0016756791454873893, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.808,1.421714,0.6269,0.642482,0.6269,0.623925
2,1.1842,1.206648,0.6771,0.702199,0.6771,0.675937


[I 2025-04-03 14:59:13,742] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0005520796779671372, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5371,1.071296,0.7161,0.724449,0.7161,0.71348
2,0.8307,0.9357,0.7485,0.75917,0.7485,0.744822
3,0.584,0.890365,0.7615,0.7763,0.7615,0.760293
4,0.4374,0.854198,0.7699,0.785847,0.7699,0.770849
5,0.3313,0.874072,0.7689,0.783832,0.7689,0.768116
6,0.2617,0.832011,0.7768,0.788586,0.7768,0.777298
7,0.216,0.797984,0.7886,0.794582,0.7886,0.788644
8,0.1821,0.779552,0.7886,0.798674,0.7886,0.788939
9,0.1579,0.769115,0.7947,0.801766,0.7947,0.795163
10,0.1426,0.76505,0.7889,0.796235,0.7889,0.789445


[I 2025-04-03 15:13:26,205] Trial 146 finished with value: 0.789445314803039 and parameters: {'learning_rate': 0.0005520796779671372, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 6.5}. Best is trial 41 with value: 0.7976536507044625.


Trial 147 with params: {'learning_rate': 0.0005493095551636863, 'weight_decay': 0.01, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5659,1.079801,0.7107,0.722079,0.7107,0.709246
2,0.8344,0.943575,0.7486,0.761559,0.7486,0.747268
3,0.5894,0.883403,0.7631,0.779367,0.7631,0.762945
4,0.4383,0.854061,0.7686,0.781786,0.7686,0.769497


[I 2025-04-03 15:19:12,040] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.000866107581597076, 'weight_decay': 0.008, 'warmup_steps': 18, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5853,1.194386,0.6843,0.698541,0.6843,0.680506
2,0.938,1.030606,0.7276,0.742708,0.7276,0.72435


[I 2025-04-03 15:22:05,389] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0003177982910013901, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5885,1.024475,0.7275,0.736608,0.7275,0.725994
2,0.7953,0.897548,0.7569,0.76611,0.7569,0.755684
3,0.5435,0.861646,0.7644,0.779101,0.7644,0.76459
4,0.4005,0.829411,0.7743,0.787318,0.7743,0.775959
5,0.3078,0.839609,0.7742,0.787755,0.7742,0.77457
6,0.2466,0.846964,0.7704,0.783229,0.7704,0.77187
7,0.2068,0.801338,0.7844,0.790644,0.7844,0.784117
8,0.1795,0.788344,0.7843,0.793015,0.7843,0.784733


[I 2025-04-03 15:33:20,794] Trial 149 pruned. 


In [None]:
print(best_distil_pretrained)

In [None]:
print("Best random init training score: ", best_base_random)
print("Best random init distilation trianing score: ", best_distill_random)
print("Best pretrained (head only) training score: ", best_base_head)
print("Best pretrained distilation (head only) training score: ",best_distill_head)
print("Best pretrained training score: ", best_base_pretrained)
print("Best pretrained distilation training score: ", best_distil_pretrained)