# Notebook pro trénink s destilací nad datasetem CIFAR10
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR10, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

## Import knihoven a definice metod

In [2]:
from transformers import Trainer
from torch.utils.data import ConcatDataset
import pandas as pd
import optuna
import torch
import math
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [3]:
dataset_part = base.get_dataset_part()

Resetování náhodného seedu pro replikovatelnost výsledků.

In [4]:
base.reset_seed()

In [5]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A40


Provedení transformací nad datasetem.

In [6]:
DATASET = "cifar10"

In [7]:
transform = base.base_transforms()

#Poslední train batch použijeme jako eval část...
test = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TEST, transform=transform)
train = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=transform)
eval = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.EVAL, transform=transform)

In [8]:
augment_transform = base.aug_transforms()
train_aug = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=augment_transform)
train_aug = base.remove_diff_pred_class(train, train_aug, pytorch_dataset=True)

Removing entries from augmented dataset that are different from the base one - based on saved logits:   0%|   …

In [9]:
train_combo = ConcatDataset([train, train_aug])

In [10]:
# Test rozložení --> Good Enough
df = pd.DataFrame(eval.labels)
print(df.value_counts())

0
5    1025
9    1022
3    1016
0    1014
1    1014
8    1003
4     997
6     980
7     977
2     952
Name: count, dtype: int64


### Standardní trénink náhodně inicializovaného modelu. 

In [11]:
num_epochs = 10
batch_size = 128

In [12]:
#Nápočet epoch na steps
data_length = len(train)
min_r = math.ceil(data_length/batch_size)*3
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

In [13]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

In [14]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [15]:
base.reset_seed()

In [16]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random_hp-search", logging_dir=f"~/logs/{DATASET}/random_hp-search", epochs=num_epochs, batch_size=batch_size)

In [17]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(10)
)
  

In [18]:
best_base_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-random",
    n_trials=150
)

[I 2025-04-07 19:01:45,083] A new study created in memory with name: Base-random


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5427,1.065142,0.6208,0.623963,0.620284,0.613948
2,0.9628,0.759808,0.7331,0.738952,0.732427,0.732924


[I 2025-04-07 19:04:12,798] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5006,1.144504,0.5928,0.59734,0.592522,0.58454
2,1.0259,0.81401,0.7104,0.715916,0.709735,0.710366
3,0.816,0.671274,0.7617,0.762585,0.761908,0.759309
4,0.6624,0.565211,0.8075,0.806563,0.807211,0.80474


[I 2025-04-07 19:09:02,231] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8574,1.379293,0.4893,0.491574,0.488432,0.482908
2,1.3445,1.101277,0.6045,0.60398,0.603354,0.600669


[I 2025-04-07 19:11:28,191] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6231,1.32958,0.5313,0.548042,0.531037,0.525008
2,1.1669,0.964668,0.6543,0.663736,0.65391,0.653673
3,0.9284,0.753278,0.7304,0.73247,0.730988,0.727117
4,0.7588,0.63186,0.7795,0.779585,0.778936,0.776624


[I 2025-04-07 19:16:18,256] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7563,1.428816,0.4663,0.475231,0.465089,0.445123
2,1.3266,1.113025,0.5974,0.606431,0.596822,0.596616


[I 2025-04-07 19:18:44,517] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7227,1.256654,0.547,0.554628,0.545314,0.541223
2,1.1546,0.905542,0.6781,0.680351,0.67762,0.677648
3,0.8807,0.749652,0.7319,0.73341,0.732233,0.729734
4,0.6903,0.660818,0.7699,0.77191,0.769883,0.767607


[I 2025-04-07 19:23:38,671] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5094,1.107722,0.6197,0.62215,0.619307,0.615312
2,0.9456,0.734451,0.7432,0.746782,0.742409,0.741554
3,0.7155,0.626333,0.7841,0.787591,0.784501,0.782859
4,0.5619,0.534782,0.8161,0.815003,0.815954,0.814281


[I 2025-04-07 19:28:32,534] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7856,1.332656,0.5256,0.52336,0.525053,0.517542
2,1.2336,0.994826,0.6434,0.643934,0.642563,0.640136
3,0.9485,0.818321,0.7081,0.709038,0.708424,0.705687
4,0.7655,0.73362,0.7483,0.74707,0.748012,0.74608


[I 2025-04-07 19:33:28,417] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4872,1.054851,0.6213,0.628116,0.620302,0.617278
2,0.9608,0.730269,0.7413,0.741515,0.740843,0.739362
3,0.7259,0.607033,0.7858,0.790273,0.786501,0.78352
4,0.5765,0.530689,0.8174,0.817476,0.817083,0.815615
5,0.4532,0.464232,0.8398,0.842907,0.839993,0.839548
6,0.3446,0.497679,0.8353,0.842388,0.835469,0.835426
7,0.2448,0.477332,0.8468,0.849816,0.847331,0.846445
8,0.1563,0.46752,0.8525,0.854765,0.852507,0.853036


[I 2025-04-07 19:43:20,619] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4612,1.03666,0.6288,0.641676,0.628829,0.626604
2,0.9706,0.759903,0.7301,0.73373,0.729677,0.729569


[I 2025-04-07 19:45:48,074] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8313,1.377955,0.4927,0.48981,0.492381,0.484175
2,1.3336,1.103166,0.6036,0.60571,0.602924,0.600989


[I 2025-04-07 19:48:14,424] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0020781267255701565, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.768,1.523093,0.4461,0.458184,0.445516,0.436067
2,1.3504,1.131997,0.5919,0.613298,0.591169,0.59253
3,1.1036,0.891914,0.6804,0.678691,0.679951,0.675588
4,0.9327,0.798667,0.7139,0.715482,0.713077,0.711315


[I 2025-04-07 19:53:05,757] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0004229895735463087, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4859,1.110782,0.6152,0.623316,0.614341,0.609443
2,0.9661,0.741254,0.7405,0.747969,0.73938,0.74068
3,0.7329,0.620098,0.7854,0.786789,0.78591,0.782856
4,0.5805,0.545965,0.8105,0.81001,0.810228,0.808515


[I 2025-04-07 19:57:56,676] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0011126479210549016, 'weight_decay': 0.003, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.588,1.270263,0.5389,0.557591,0.538251,0.535436
2,1.1583,0.944843,0.6561,0.670359,0.656142,0.658862
3,0.9451,0.78574,0.7182,0.722341,0.718655,0.715529
4,0.7918,0.67637,0.7647,0.764938,0.764333,0.762997


[I 2025-04-07 20:02:47,659] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.00023364707944876568, 'weight_decay': 0.004, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5777,1.186656,0.6011,0.605681,0.600023,0.593609
2,0.9784,0.741133,0.7384,0.742166,0.737617,0.737166
3,0.7259,0.637795,0.7784,0.780735,0.778984,0.774866
4,0.5587,0.53662,0.8142,0.812951,0.814154,0.812237


[I 2025-04-07 20:07:37,728] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.00030237614474552116, 'weight_decay': 0.006, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5326,1.04803,0.6317,0.632249,0.631146,0.628438
2,0.9615,0.735034,0.7427,0.742734,0.742211,0.741112
3,0.7245,0.623521,0.782,0.782946,0.782292,0.779734
4,0.5643,0.528318,0.8213,0.818546,0.821349,0.818558
5,0.4319,0.492402,0.8339,0.836352,0.834063,0.832993
6,0.318,0.485993,0.8365,0.840302,0.836703,0.836648
7,0.212,0.496831,0.8444,0.849873,0.84492,0.844598
8,0.1241,0.512803,0.8483,0.851694,0.84816,0.848549


[I 2025-04-07 20:17:17,487] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 8.85713447869134e-05, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7662,1.285048,0.5359,0.531983,0.535286,0.531229
2,1.2349,1.032051,0.6305,0.63606,0.629854,0.62886


[I 2025-04-07 20:19:43,376] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.00032926542216520094, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4971,1.116712,0.6135,0.621496,0.61304,0.610672
2,0.9556,0.733406,0.739,0.742431,0.738374,0.738666
3,0.7225,0.635378,0.7755,0.774537,0.775928,0.771573
4,0.5654,0.538325,0.8145,0.813137,0.814348,0.812073
5,0.4368,0.489832,0.8347,0.836427,0.834822,0.834097
6,0.3288,0.489033,0.8382,0.842724,0.838197,0.838642
7,0.2199,0.508145,0.8415,0.846259,0.842198,0.841448
8,0.1289,0.496058,0.8539,0.854711,0.853939,0.853867
9,0.064,0.548156,0.8501,0.857745,0.849978,0.852119
10,0.0293,0.553901,0.8549,0.857727,0.85514,0.85408


[I 2025-04-07 20:31:57,885] Trial 17 finished with value: 0.8540795731672388 and parameters: {'learning_rate': 0.00032926542216520094, 'weight_decay': 0.006, 'warmup_steps': 15}. Best is trial 17 with value: 0.8540795731672388.


Trial 18 with params: {'learning_rate': 0.001932521884180473, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7231,1.410253,0.4816,0.474785,0.479889,0.467568
2,1.2981,1.048713,0.6196,0.623175,0.619365,0.616743
3,1.0485,0.869444,0.6918,0.692428,0.691835,0.690054
4,0.8763,0.724389,0.7413,0.739573,0.740552,0.738151


[I 2025-04-07 20:36:48,064] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0003381803516136172, 'weight_decay': 0.005, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5347,1.118302,0.6063,0.614888,0.605239,0.603793
2,0.9901,0.785745,0.7255,0.733993,0.724423,0.7243


[I 2025-04-07 20:39:14,813] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0010115243099472419, 'weight_decay': 0.008, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6221,1.352096,0.5202,0.532959,0.519568,0.509998
2,1.1737,0.91591,0.6719,0.674345,0.671607,0.670642
3,0.9264,0.743622,0.7366,0.733467,0.736307,0.733335
4,0.7594,0.634915,0.7768,0.774519,0.77663,0.773535


[I 2025-04-07 20:44:07,330] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00025816334594795193, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5321,1.073621,0.6241,0.632995,0.623403,0.621544
2,0.9667,0.755793,0.7312,0.732498,0.730696,0.729811
3,0.7374,0.639073,0.7755,0.776296,0.775911,0.773971
4,0.569,0.542225,0.8157,0.814565,0.8157,0.813597
5,0.4356,0.514454,0.8226,0.825628,0.822623,0.821748
6,0.3133,0.513404,0.8291,0.832379,0.829426,0.829021
7,0.2052,0.53772,0.8348,0.838062,0.83533,0.834234
8,0.1156,0.528002,0.8412,0.841632,0.841351,0.841246


[I 2025-04-07 20:53:48,896] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0002879322945635685, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5211,1.035635,0.6354,0.636708,0.634871,0.629634
2,0.9503,0.753975,0.7376,0.73853,0.737299,0.737381
3,0.718,0.627059,0.7846,0.787273,0.785215,0.782403
4,0.5539,0.535578,0.8172,0.81568,0.817186,0.814429
5,0.425,0.493207,0.8312,0.83179,0.831466,0.829506
6,0.307,0.491113,0.8329,0.836014,0.833113,0.833529
7,0.2015,0.491305,0.8434,0.845704,0.843654,0.843451
8,0.1175,0.514665,0.8426,0.843628,0.842573,0.84244


[I 2025-04-07 21:03:31,552] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.00048566440658776434, 'weight_decay': 0.006, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5214,1.18062,0.594,0.612159,0.593634,0.591894
2,0.9695,0.719463,0.7462,0.748901,0.745238,0.745113
3,0.7366,0.631742,0.7812,0.783825,0.781837,0.777503
4,0.5882,0.539287,0.8141,0.812192,0.813978,0.811851


[I 2025-04-07 21:08:18,651] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 7.176203970997865e-05, 'weight_decay': 0.007, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8542,1.354275,0.5028,0.501875,0.502222,0.498619
2,1.3127,1.074652,0.613,0.609994,0.612204,0.609045
3,1.054,0.940428,0.6615,0.666785,0.661681,0.660904
4,0.8828,0.841766,0.7021,0.699609,0.702109,0.698637


[I 2025-04-07 21:13:08,416] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.00012069214017871715, 'weight_decay': 0.006, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7098,1.241048,0.5513,0.565958,0.549799,0.547218
2,1.1276,0.923141,0.6765,0.679496,0.676233,0.676168


[I 2025-04-07 21:15:34,106] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.0007785351842157595, 'weight_decay': 0.006, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5174,1.206861,0.5813,0.606788,0.581299,0.578884
2,1.0497,0.805629,0.7131,0.720004,0.712396,0.713345
3,0.8198,0.689798,0.7591,0.761255,0.759616,0.755622
4,0.6745,0.579729,0.7996,0.798302,0.799228,0.79694
5,0.5497,0.525498,0.8212,0.822609,0.821135,0.820268
6,0.4519,0.511949,0.8273,0.832755,0.827491,0.82796
7,0.3559,0.463852,0.8486,0.850927,0.84906,0.848359
8,0.2626,0.457452,0.8561,0.857319,0.856114,0.856196
9,0.1768,0.491259,0.8501,0.856586,0.849928,0.851735
10,0.1115,0.53298,0.8494,0.852977,0.849817,0.848848


[I 2025-04-07 21:27:47,854] Trial 26 finished with value: 0.8488482299172375 and parameters: {'learning_rate': 0.0007785351842157595, 'weight_decay': 0.006, 'warmup_steps': 14}. Best is trial 17 with value: 0.8540795731672388.


Trial 27 with params: {'learning_rate': 0.0008587756640811112, 'weight_decay': 0.006, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5399,1.167043,0.5837,0.585508,0.583288,0.575918
2,1.0655,0.841204,0.6998,0.710255,0.699301,0.701873


[I 2025-04-07 21:30:14,603] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.003229448413256766, 'weight_decay': 0.005, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8151,1.515236,0.4244,0.433584,0.423179,0.395718
2,1.4062,1.174001,0.5708,0.579296,0.569682,0.570348


[I 2025-04-07 21:32:40,375] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0023321838809476363, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8176,1.58817,0.3978,0.425354,0.396212,0.37405
2,1.428,1.209813,0.5592,0.583492,0.558417,0.559592
3,1.1722,0.975068,0.6453,0.649626,0.645068,0.641062
4,0.9848,0.852321,0.6971,0.692399,0.696618,0.692245


[I 2025-04-07 21:37:29,021] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0009505662433599415, 'weight_decay': 0.005, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5563,1.245783,0.5615,0.58076,0.561393,0.554077
2,1.0812,0.882107,0.6784,0.689433,0.677568,0.678992
3,0.876,0.732988,0.7417,0.740789,0.742077,0.737322
4,0.7237,0.608896,0.7858,0.784362,0.785451,0.783461
5,0.6034,0.559667,0.8104,0.814755,0.80987,0.809138
6,0.5049,0.4987,0.8319,0.83531,0.831879,0.83268
7,0.4108,0.4988,0.8332,0.837021,0.833899,0.832473
8,0.321,0.463939,0.8472,0.848651,0.847176,0.847622


[I 2025-04-07 21:47:09,790] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0004627060906564533, 'weight_decay': 0.007, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4957,1.146458,0.6056,0.61157,0.605351,0.601195
2,0.9733,0.758677,0.7298,0.734661,0.728755,0.729032
3,0.7495,0.618189,0.7843,0.787952,0.784721,0.782672
4,0.5915,0.516898,0.8223,0.821981,0.822294,0.82068
5,0.4713,0.481934,0.8327,0.834017,0.832793,0.832062
6,0.368,0.46941,0.8442,0.847331,0.84453,0.844645
7,0.2676,0.456266,0.8544,0.854548,0.854842,0.853538
8,0.1757,0.451486,0.86,0.861386,0.860126,0.86033


[I 2025-04-07 21:56:50,595] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.000340964009750757, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4901,1.032212,0.6383,0.637399,0.637539,0.631746
2,0.927,0.700177,0.7545,0.761141,0.754051,0.756345
3,0.7021,0.589141,0.7926,0.797838,0.793291,0.791215
4,0.5504,0.498983,0.8284,0.826762,0.828558,0.826575
5,0.4255,0.479286,0.8384,0.838591,0.838474,0.837291
6,0.3187,0.464426,0.8451,0.848744,0.845269,0.845876
7,0.214,0.473129,0.8515,0.852003,0.851948,0.850878
8,0.1256,0.471601,0.8588,0.859724,0.85893,0.858875
9,0.0621,0.537268,0.8558,0.862718,0.855463,0.857737
10,0.0283,0.536606,0.8559,0.859389,0.856388,0.855826


[I 2025-04-07 22:08:59,558] Trial 32 finished with value: 0.8558262255325605 and parameters: {'learning_rate': 0.000340964009750757, 'weight_decay': 0.007, 'warmup_steps': 11}. Best is trial 32 with value: 0.8558262255325605.


Trial 33 with params: {'learning_rate': 0.0007212770325367088, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5274,1.274407,0.5512,0.580066,0.5509,0.540431
2,1.0493,0.841686,0.6946,0.705228,0.694164,0.694121


[I 2025-04-07 22:11:25,217] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.00026584133525867033, 'weight_decay': 0.007, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5198,1.077565,0.6211,0.624307,0.620231,0.614269
2,0.946,0.746592,0.7349,0.736584,0.734716,0.733044
3,0.7058,0.623752,0.7809,0.783286,0.78143,0.778496
4,0.5467,0.550188,0.8125,0.813109,0.812434,0.810278
5,0.4135,0.493509,0.8314,0.833942,0.831649,0.830705
6,0.2982,0.493117,0.8381,0.842412,0.838111,0.838959
7,0.1913,0.493975,0.8448,0.847608,0.845085,0.845164
8,0.1051,0.512863,0.8485,0.850902,0.84832,0.849049


[I 2025-04-07 22:21:09,068] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.00024241287144281533, 'weight_decay': 0.01, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5654,1.023029,0.6373,0.642234,0.636279,0.63512
2,0.9624,0.747833,0.7363,0.743507,0.735749,0.736486
3,0.712,0.632947,0.7788,0.78087,0.779665,0.774887
4,0.5516,0.527262,0.8208,0.819031,0.820819,0.818616
5,0.4159,0.490983,0.8274,0.831466,0.82765,0.826791
6,0.2979,0.513337,0.8319,0.835149,0.832232,0.832129
7,0.1887,0.51386,0.8385,0.840782,0.838926,0.838529
8,0.1064,0.541434,0.8401,0.840909,0.840108,0.839881


[I 2025-04-07 22:30:42,750] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.000807265399276915, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5504,1.177383,0.578,0.594168,0.577621,0.576241
2,1.079,0.839192,0.6972,0.69919,0.696459,0.695116
3,0.8555,0.704924,0.7495,0.750077,0.749596,0.747103
4,0.6949,0.591269,0.7935,0.790613,0.793113,0.790575
5,0.5724,0.545061,0.8114,0.813974,0.811405,0.810368
6,0.4728,0.514648,0.8247,0.828723,0.824619,0.825311
7,0.378,0.46729,0.8416,0.843018,0.842097,0.841029
8,0.2808,0.453268,0.8537,0.854004,0.853849,0.853482
9,0.1905,0.476235,0.8515,0.855466,0.851382,0.852564
10,0.1226,0.505212,0.8484,0.851902,0.848572,0.848112


[I 2025-04-07 22:42:55,303] Trial 36 finished with value: 0.8481120449948873 and parameters: {'learning_rate': 0.000807265399276915, 'weight_decay': 0.008, 'warmup_steps': 16}. Best is trial 32 with value: 0.8558262255325605.


Trial 37 with params: {'learning_rate': 0.0006309811728474125, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4985,1.107144,0.6106,0.621952,0.610761,0.607304
2,1.0208,0.827863,0.7062,0.715361,0.705941,0.707267


[I 2025-04-07 22:45:21,037] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0013981244819690566, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6486,1.304084,0.5293,0.532766,0.528312,0.521293
2,1.1884,0.948456,0.6606,0.667505,0.659508,0.659644


[I 2025-04-07 22:47:48,649] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8799,1.429491,0.4742,0.476281,0.473583,0.469549
2,1.409,1.165656,0.5762,0.574499,0.575302,0.571748
3,1.1655,1.054888,0.6248,0.63323,0.625163,0.622608
4,0.9976,0.92874,0.674,0.672362,0.673798,0.670111


[I 2025-04-07 22:52:37,946] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.004241076779716196, 'weight_decay': 0.003, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8536,1.601351,0.3954,0.429964,0.394043,0.360778
2,1.493,1.337976,0.5002,0.548237,0.498464,0.498844


[I 2025-04-07 22:55:04,302] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.00047560677122606955, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.519,1.073971,0.616,0.621544,0.615881,0.611034
2,0.995,0.724847,0.7389,0.742957,0.738204,0.738316
3,0.7573,0.617082,0.7807,0.780999,0.781126,0.777766
4,0.6038,0.538092,0.811,0.809497,0.810841,0.808931
5,0.4821,0.491706,0.8325,0.835786,0.832497,0.83148
6,0.3803,0.471357,0.8405,0.842844,0.840561,0.840153
7,0.2773,0.476103,0.8449,0.849046,0.84536,0.845275
8,0.1839,0.451982,0.8576,0.859282,0.857584,0.858022
9,0.1041,0.516827,0.8557,0.863602,0.855531,0.857653
10,0.0536,0.50098,0.8596,0.861638,0.85985,0.859584


[I 2025-04-07 23:07:03,664] Trial 41 finished with value: 0.8595838761996756 and parameters: {'learning_rate': 0.00047560677122606955, 'weight_decay': 0.007, 'warmup_steps': 15}. Best is trial 41 with value: 0.8595838761996756.


Trial 42 with params: {'learning_rate': 0.0003727196987392605, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5015,1.1015,0.6158,0.618564,0.61531,0.610964
2,0.9524,0.754389,0.7337,0.741022,0.732983,0.733898
3,0.7279,0.618066,0.7822,0.781406,0.782595,0.779737
4,0.5772,0.527189,0.8197,0.818146,0.819777,0.817037
5,0.4498,0.503378,0.829,0.831054,0.829198,0.827677
6,0.3385,0.489404,0.8374,0.840815,0.837556,0.837256
7,0.2357,0.497817,0.847,0.850012,0.847438,0.846906
8,0.1454,0.49122,0.855,0.856063,0.854842,0.855226
9,0.0745,0.561791,0.8512,0.858724,0.850866,0.853012
10,0.035,0.555949,0.8524,0.854802,0.852685,0.852171


[I 2025-04-07 23:18:59,545] Trial 42 finished with value: 0.8521705545608315 and parameters: {'learning_rate': 0.0003727196987392605, 'weight_decay': 0.007, 'warmup_steps': 15}. Best is trial 41 with value: 0.8595838761996756.


Trial 43 with params: {'learning_rate': 0.0004516099095797917, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.494,1.093692,0.6247,0.62696,0.623551,0.619934
2,0.9716,0.756352,0.7353,0.737191,0.734403,0.733221
3,0.7418,0.636434,0.7817,0.782926,0.782183,0.778529
4,0.5859,0.526266,0.822,0.820283,0.821896,0.819113
5,0.4651,0.502749,0.8265,0.82973,0.826441,0.825371
6,0.3611,0.47406,0.842,0.846164,0.842132,0.84295
7,0.258,0.458577,0.8482,0.8522,0.848555,0.848734
8,0.1647,0.454312,0.859,0.86122,0.858935,0.859393
9,0.0907,0.512276,0.8573,0.862444,0.857287,0.858443
10,0.0439,0.502546,0.8612,0.863767,0.861496,0.860999


[I 2025-04-07 23:31:07,965] Trial 43 finished with value: 0.860999379458511 and parameters: {'learning_rate': 0.0004516099095797917, 'weight_decay': 0.006, 'warmup_steps': 16}. Best is trial 43 with value: 0.860999379458511.


Trial 44 with params: {'learning_rate': 0.00025886400515271336, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5505,1.111983,0.6068,0.613012,0.605632,0.598951
2,0.9654,0.733254,0.7434,0.745755,0.742701,0.741736
3,0.7166,0.615307,0.7809,0.782197,0.781465,0.778761
4,0.5532,0.538367,0.8167,0.817099,0.816613,0.81513
5,0.4196,0.511107,0.823,0.827836,0.823022,0.822715
6,0.303,0.520518,0.8313,0.834667,0.83145,0.831666
7,0.1982,0.534611,0.8332,0.836966,0.833726,0.833275
8,0.1117,0.527554,0.843,0.844994,0.842938,0.843512


[I 2025-04-07 23:40:48,388] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0002720307888096881, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5464,1.05446,0.6297,0.637873,0.628917,0.627004
2,0.9567,0.723167,0.7468,0.756066,0.746556,0.748046
3,0.7109,0.606155,0.7892,0.790826,0.789752,0.787488
4,0.5483,0.526861,0.8215,0.819843,0.821635,0.819548
5,0.4173,0.475707,0.8366,0.837576,0.836684,0.835931
6,0.3021,0.498075,0.8362,0.839832,0.836222,0.836375
7,0.1959,0.491318,0.8445,0.846239,0.844866,0.844096
8,0.1087,0.513385,0.8495,0.850191,0.849345,0.848979


[I 2025-04-07 23:50:29,532] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00012827851737332596, 'weight_decay': 0.0, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6674,1.196389,0.5655,0.567788,0.564649,0.557683
2,1.076,0.859237,0.6988,0.697942,0.698365,0.696541


[I 2025-04-07 23:52:54,830] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.00030406231678935524, 'weight_decay': 0.004, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5248,1.090395,0.6212,0.629566,0.621127,0.616733
2,0.9803,0.757239,0.7378,0.74025,0.736842,0.735963
3,0.7322,0.604588,0.7906,0.791412,0.79116,0.787509
4,0.5657,0.525721,0.8173,0.816766,0.817392,0.815373
5,0.4333,0.476392,0.8368,0.83672,0.836948,0.835584
6,0.3212,0.478503,0.8418,0.845496,0.841937,0.84246
7,0.2153,0.497917,0.8464,0.849366,0.846701,0.846259
8,0.1254,0.498794,0.8506,0.852895,0.850467,0.851133


[I 2025-04-08 00:02:35,909] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0003460372730824873, 'weight_decay': 0.006, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5008,1.001645,0.6445,0.650855,0.643788,0.643154
2,0.9509,0.724907,0.7416,0.744973,0.740925,0.742034
3,0.7265,0.613737,0.7828,0.785594,0.783276,0.781251
4,0.563,0.5169,0.8228,0.821184,0.822897,0.820541
5,0.4368,0.482663,0.8361,0.840758,0.83626,0.83572
6,0.3265,0.497866,0.8362,0.842594,0.836421,0.836986
7,0.2222,0.464842,0.8538,0.856799,0.854235,0.853959
8,0.1328,0.474345,0.8555,0.856145,0.855539,0.855215
9,0.0653,0.525043,0.8568,0.863131,0.856722,0.858405
10,0.03,0.534593,0.8546,0.85795,0.854896,0.854042


[I 2025-04-08 00:14:40,812] Trial 48 finished with value: 0.8540423421389167 and parameters: {'learning_rate': 0.0003460372730824873, 'weight_decay': 0.006, 'warmup_steps': 12}. Best is trial 43 with value: 0.860999379458511.


Trial 49 with params: {'learning_rate': 0.00032047620809669785, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5101,1.050431,0.6313,0.638537,0.630628,0.626996
2,0.9555,0.740793,0.7378,0.739972,0.737191,0.737037
3,0.722,0.63161,0.7795,0.781105,0.780053,0.776587
4,0.5585,0.524825,0.8195,0.817658,0.819344,0.817139
5,0.4325,0.497683,0.8287,0.833393,0.828855,0.828778
6,0.3189,0.502285,0.8376,0.841778,0.837807,0.837952
7,0.2145,0.479137,0.8488,0.851752,0.849225,0.848871
8,0.125,0.508164,0.8471,0.850433,0.846902,0.847769
9,0.0617,0.573323,0.8468,0.855457,0.84649,0.849088
10,0.0282,0.55403,0.8507,0.854341,0.851075,0.850066


[I 2025-04-08 00:26:47,629] Trial 49 finished with value: 0.8500657320593584 and parameters: {'learning_rate': 0.00032047620809669785, 'weight_decay': 0.006, 'warmup_steps': 11}. Best is trial 43 with value: 0.860999379458511.


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.744,1.563889,0.4276,0.447104,0.427032,0.407801
2,1.3352,1.105659,0.6005,0.605542,0.599433,0.595491


[I 2025-04-08 00:29:13,985] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0004890393330224985, 'weight_decay': 0.005, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4982,1.073264,0.6245,0.636153,0.625145,0.619489
2,0.9715,0.748943,0.7352,0.734954,0.735082,0.733027
3,0.7431,0.621734,0.7825,0.781099,0.782961,0.778301
4,0.5916,0.536144,0.8133,0.811573,0.813178,0.810741
5,0.4696,0.486808,0.8328,0.8372,0.832644,0.832929
6,0.368,0.468292,0.8403,0.844829,0.840506,0.841398
7,0.2669,0.454488,0.8503,0.853328,0.85086,0.849963
8,0.1759,0.443378,0.863,0.86386,0.862936,0.863016
9,0.0974,0.497963,0.8585,0.862713,0.858288,0.859529
10,0.0484,0.518587,0.8557,0.860668,0.856024,0.856065


[I 2025-04-08 00:41:19,604] Trial 51 finished with value: 0.8560652135720556 and parameters: {'learning_rate': 0.0004890393330224985, 'weight_decay': 0.005, 'warmup_steps': 15}. Best is trial 43 with value: 0.860999379458511.


Trial 52 with params: {'learning_rate': 0.00034393819048338754, 'weight_decay': 0.005, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.501,1.077322,0.6216,0.628623,0.621116,0.620499
2,0.9483,0.704701,0.7497,0.75352,0.748622,0.749287
3,0.714,0.614566,0.7899,0.789472,0.790649,0.785969
4,0.5602,0.519763,0.8196,0.81767,0.819558,0.817624
5,0.4407,0.497626,0.8326,0.834437,0.832795,0.831428
6,0.3319,0.476018,0.844,0.84622,0.843977,0.84437
7,0.2252,0.498153,0.8403,0.845327,0.840781,0.840676
8,0.1381,0.495086,0.8498,0.852418,0.849923,0.850518
9,0.0707,0.5465,0.8498,0.857781,0.849741,0.851887
10,0.0326,0.531847,0.8557,0.856302,0.85588,0.855036


[I 2025-04-08 00:53:26,628] Trial 52 finished with value: 0.8550356845653677 and parameters: {'learning_rate': 0.00034393819048338754, 'weight_decay': 0.005, 'warmup_steps': 13}. Best is trial 43 with value: 0.860999379458511.


Trial 53 with params: {'learning_rate': 0.0003539672802862629, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5176,1.018836,0.6354,0.632241,0.635145,0.627217
2,0.9638,0.739703,0.7392,0.74548,0.738565,0.74002
3,0.7245,0.636773,0.7787,0.78109,0.779274,0.775361
4,0.5702,0.524103,0.8196,0.818832,0.819658,0.817801
5,0.4415,0.492472,0.8325,0.835667,0.832622,0.831932
6,0.3312,0.503548,0.8355,0.840009,0.835783,0.835654
7,0.2235,0.513441,0.8428,0.849509,0.843296,0.843381
8,0.1329,0.485285,0.856,0.857183,0.855932,0.856077
9,0.0665,0.573355,0.847,0.854941,0.84672,0.849328
10,0.0307,0.562559,0.8514,0.85585,0.851745,0.851456


[I 2025-04-08 01:05:30,770] Trial 53 finished with value: 0.8514559168850916 and parameters: {'learning_rate': 0.0003539672802862629, 'weight_decay': 0.003, 'warmup_steps': 7}. Best is trial 43 with value: 0.860999379458511.


Trial 54 with params: {'learning_rate': 0.000403916017640712, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4947,1.038606,0.6345,0.63677,0.634374,0.630493
2,0.9505,0.725133,0.7468,0.751916,0.745985,0.746828
3,0.7231,0.619121,0.7832,0.787074,0.784045,0.780325
4,0.5732,0.508963,0.8231,0.820648,0.8231,0.820091
5,0.4505,0.47371,0.8374,0.838934,0.837447,0.837218
6,0.3434,0.462212,0.8472,0.849511,0.847371,0.84763
7,0.2409,0.465638,0.8502,0.85353,0.850551,0.850677
8,0.1485,0.47323,0.8519,0.854155,0.851848,0.852485
9,0.0779,0.521951,0.8579,0.863002,0.857725,0.859239
10,0.0368,0.541073,0.8588,0.860055,0.859059,0.858034


[I 2025-04-08 01:17:36,803] Trial 54 finished with value: 0.8580338240721312 and parameters: {'learning_rate': 0.000403916017640712, 'weight_decay': 0.0, 'warmup_steps': 23}. Best is trial 43 with value: 0.860999379458511.


Trial 55 with params: {'learning_rate': 0.0003644598615777491, 'weight_decay': 0.001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.521,1.106031,0.6212,0.624195,0.62051,0.616866
2,0.9604,0.750465,0.7373,0.73984,0.736809,0.736443
3,0.735,0.637092,0.7793,0.782175,0.779803,0.777437
4,0.577,0.541676,0.8143,0.81232,0.814134,0.812011
5,0.4548,0.51429,0.8256,0.82594,0.825733,0.823465
6,0.345,0.504449,0.8308,0.836885,0.830801,0.832021
7,0.2378,0.47279,0.849,0.851197,0.849352,0.849001
8,0.1476,0.485525,0.8526,0.854048,0.852527,0.852983


[I 2025-04-08 01:27:17,150] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0007442764061691288, 'weight_decay': 0.001, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5327,1.183053,0.5645,0.594173,0.564358,0.555619
2,1.0637,0.829041,0.7058,0.711076,0.70506,0.70396


[I 2025-04-08 01:29:42,481] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0006473370603586072, 'weight_decay': 0.004, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5189,1.163739,0.6,0.609243,0.599261,0.593437
2,1.0116,0.772149,0.7233,0.725681,0.722577,0.721874
3,0.7915,0.651766,0.7738,0.773394,0.774038,0.771652
4,0.6424,0.539625,0.8115,0.810801,0.811304,0.809186
5,0.5238,0.537871,0.8152,0.821282,0.815166,0.814458
6,0.4268,0.503169,0.8316,0.837516,0.831897,0.832509
7,0.3324,0.449023,0.8493,0.851242,0.849724,0.84912
8,0.2381,0.449149,0.8575,0.858532,0.85743,0.857751
9,0.1553,0.492235,0.8556,0.861416,0.855399,0.857125
10,0.0938,0.493244,0.8546,0.856824,0.85487,0.854527


[I 2025-04-08 01:41:48,414] Trial 57 finished with value: 0.8545269293395539 and parameters: {'learning_rate': 0.0006473370603586072, 'weight_decay': 0.004, 'warmup_steps': 19}. Best is trial 43 with value: 0.860999379458511.


Trial 58 with params: {'learning_rate': 0.0012676539210101936, 'weight_decay': 0.001, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6289,1.338926,0.5289,0.551345,0.528183,0.520058
2,1.1763,0.949906,0.6584,0.664653,0.657531,0.658359
3,0.9494,0.795316,0.7124,0.716804,0.712922,0.710999
4,0.7829,0.660872,0.7684,0.765563,0.768228,0.76513


[I 2025-04-08 01:46:41,435] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.00017589467682639802, 'weight_decay': 0.0, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6206,1.08593,0.6161,0.616563,0.615441,0.612925
2,1.023,0.816292,0.712,0.716346,0.711331,0.711111


[I 2025-04-08 01:49:09,180] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00021038509794712006, 'weight_decay': 0.004, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5687,1.079595,0.6067,0.60908,0.606019,0.599591
2,0.9951,0.757365,0.73,0.732814,0.729586,0.729829
3,0.7525,0.634928,0.7787,0.778799,0.77907,0.776417
4,0.5733,0.56778,0.8046,0.804855,0.804502,0.802362


[I 2025-04-08 01:54:00,658] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0003568622546478908, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5046,1.072379,0.6202,0.619274,0.619977,0.61669
2,0.96,0.74716,0.7358,0.736664,0.734947,0.73251
3,0.7328,0.625453,0.7787,0.779287,0.77941,0.775301
4,0.5744,0.512766,0.8227,0.82042,0.822613,0.820539
5,0.4478,0.499501,0.8275,0.830427,0.827592,0.826355
6,0.338,0.480944,0.8397,0.843202,0.839668,0.840375
7,0.2299,0.482095,0.8487,0.851222,0.849099,0.848787
8,0.1411,0.4894,0.8517,0.852145,0.851642,0.851561
9,0.0709,0.574748,0.8427,0.851846,0.842216,0.844898
10,0.0328,0.564579,0.8525,0.855464,0.852816,0.852191


[I 2025-04-08 02:06:13,086] Trial 61 finished with value: 0.8521913430458852 and parameters: {'learning_rate': 0.0003568622546478908, 'weight_decay': 0.0, 'warmup_steps': 14}. Best is trial 43 with value: 0.860999379458511.


Trial 62 with params: {'learning_rate': 0.0006841808240093112, 'weight_decay': 0.004, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5216,1.15538,0.5854,0.597592,0.58512,0.582338
2,1.0321,0.830362,0.7053,0.713056,0.704879,0.70513
3,0.812,0.67908,0.7636,0.765558,0.764163,0.761021
4,0.665,0.565797,0.8018,0.799588,0.801496,0.79877


[I 2025-04-08 02:11:03,744] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0004649481211854686, 'weight_decay': 0.003, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.505,1.089051,0.622,0.628585,0.621532,0.619512
2,0.9761,0.750437,0.7341,0.740225,0.733217,0.733328
3,0.7472,0.632588,0.7784,0.777785,0.778912,0.775088
4,0.5951,0.541827,0.8137,0.8117,0.813672,0.810962


[I 2025-04-08 02:15:53,049] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0004895998848602204, 'weight_decay': 0.0, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5034,1.092915,0.6188,0.626433,0.618508,0.6161
2,0.9881,0.752441,0.7349,0.741917,0.734214,0.734525
3,0.7547,0.628979,0.78,0.780188,0.780561,0.776212
4,0.5962,0.523593,0.82,0.819884,0.819756,0.818243
5,0.4786,0.504055,0.8288,0.830547,0.828975,0.828278
6,0.3776,0.462716,0.844,0.846833,0.844079,0.844629
7,0.2781,0.466265,0.8507,0.853661,0.85121,0.850482
8,0.1846,0.470063,0.8568,0.857586,0.856862,0.856742
9,0.105,0.518988,0.8544,0.860656,0.854209,0.855744
10,0.0547,0.516905,0.86,0.861687,0.860381,0.859504


[I 2025-04-08 02:28:05,295] Trial 64 finished with value: 0.8595035884048002 and parameters: {'learning_rate': 0.0004895998848602204, 'weight_decay': 0.0, 'warmup_steps': 21}. Best is trial 43 with value: 0.860999379458511.


Trial 65 with params: {'learning_rate': 0.000161038922150065, 'weight_decay': 0.001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.655,1.105395,0.6011,0.60189,0.600131,0.594365
2,1.0317,0.80866,0.7116,0.713424,0.710771,0.709749


[I 2025-04-08 02:30:30,095] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.00041868503303822663, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5216,1.086713,0.616,0.619872,0.616029,0.611214
2,0.9855,0.774248,0.7279,0.737122,0.727106,0.725588
3,0.7556,0.628362,0.7774,0.779761,0.778069,0.775319
4,0.596,0.546589,0.8124,0.8099,0.812449,0.808711
5,0.4712,0.479422,0.838,0.837089,0.838031,0.836942
6,0.3666,0.479386,0.8392,0.842179,0.839465,0.839641
7,0.2638,0.477106,0.8429,0.847115,0.843347,0.843371
8,0.1692,0.489846,0.8514,0.853562,0.851354,0.851653


[I 2025-04-08 02:40:09,408] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0008305060672103675, 'weight_decay': 0.0, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5301,1.271048,0.5633,0.578809,0.563199,0.556978
2,1.0464,0.837787,0.6976,0.708271,0.696536,0.697698


[I 2025-04-08 02:42:36,276] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0005628323066366947, 'weight_decay': 0.0, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5378,1.146744,0.6012,0.604537,0.601142,0.595332
2,1.0193,0.792445,0.7166,0.715841,0.716145,0.713664


[I 2025-04-08 02:45:01,042] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.004618563219406311, 'weight_decay': 0.007, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.838,1.532537,0.4384,0.445948,0.437102,0.42184
2,1.4752,1.286532,0.5352,0.554363,0.534148,0.532731
3,1.2351,1.040697,0.6253,0.623495,0.624551,0.619949
4,1.0456,0.878956,0.6861,0.682059,0.685548,0.681578


[I 2025-04-08 02:49:52,107] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0003206465133882276, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5226,1.063318,0.6241,0.626467,0.623605,0.619034
2,0.9693,0.733512,0.7383,0.738186,0.738095,0.737452
3,0.7308,0.615228,0.7836,0.786271,0.784031,0.781736
4,0.5697,0.528582,0.8157,0.813651,0.815852,0.812725
5,0.442,0.511156,0.826,0.828507,0.826084,0.825272
6,0.3289,0.480463,0.8384,0.84133,0.83849,0.838676
7,0.2213,0.491275,0.8447,0.84958,0.845059,0.845329
8,0.1323,0.480979,0.85,0.853282,0.849883,0.850977


[I 2025-04-08 02:59:35,732] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.00044968444267194424, 'weight_decay': 0.004, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5012,1.128155,0.6037,0.625169,0.60341,0.595626
2,0.9742,0.756582,0.7414,0.746119,0.740282,0.738651
3,0.7447,0.623662,0.785,0.788159,0.785524,0.783684
4,0.5932,0.55257,0.8107,0.81061,0.81053,0.807508


[I 2025-04-08 03:04:26,314] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00044052683613976106, 'weight_decay': 0.004, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5291,1.120525,0.6043,0.604741,0.604217,0.598746
2,0.9916,0.752768,0.7352,0.739295,0.734538,0.733475
3,0.7479,0.670685,0.7638,0.767871,0.764308,0.762307
4,0.5958,0.539805,0.8172,0.81526,0.817422,0.814107
5,0.4687,0.498668,0.8305,0.831058,0.830471,0.82897
6,0.3639,0.489214,0.8355,0.840849,0.835558,0.836525
7,0.2619,0.477692,0.848,0.850805,0.848442,0.847991
8,0.1679,0.462105,0.859,0.859424,0.858894,0.858794
9,0.0921,0.508761,0.8548,0.858222,0.854651,0.855944
10,0.0463,0.536986,0.8532,0.855207,0.853418,0.852797


[I 2025-04-08 03:16:30,689] Trial 72 finished with value: 0.852796791463285 and parameters: {'learning_rate': 0.00044052683613976106, 'weight_decay': 0.004, 'warmup_steps': 21}. Best is trial 43 with value: 0.860999379458511.


Trial 73 with params: {'learning_rate': 0.0005275683957624397, 'weight_decay': 0.006, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4963,1.0782,0.6244,0.628559,0.624011,0.620206
2,0.975,0.756336,0.7345,0.735819,0.733766,0.731498
3,0.762,0.65491,0.7689,0.772437,0.769576,0.766906
4,0.6137,0.549821,0.8118,0.810824,0.81159,0.809454


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-04-08 03:21:25,258] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0007804999041891258, 'weight_decay': 0.005, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5474,1.187857,0.5712,0.585323,0.570632,0.569978
2,1.0545,0.81568,0.7059,0.713181,0.704933,0.704752
3,0.8231,0.709494,0.7553,0.762608,0.75619,0.753247
4,0.6677,0.589016,0.7975,0.79621,0.797314,0.794201


[I 2025-04-08 03:26:28,445] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.000371916744428013, 'weight_decay': 0.007, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4923,1.042065,0.6334,0.636652,0.632995,0.630492
2,0.9494,0.715121,0.749,0.754295,0.747965,0.747685
3,0.7171,0.620607,0.7846,0.786076,0.785239,0.78221
4,0.5646,0.52973,0.8192,0.817946,0.819027,0.816347
5,0.4389,0.47824,0.8378,0.839378,0.837804,0.83723
6,0.3294,0.492498,0.8374,0.841866,0.837723,0.837947
7,0.2264,0.473846,0.8471,0.852692,0.847574,0.847521
8,0.1374,0.477524,0.8561,0.855378,0.856219,0.855168
9,0.0701,0.539016,0.8496,0.856463,0.8492,0.851145
10,0.0323,0.538481,0.8543,0.857009,0.854503,0.854014


[I 2025-04-08 03:38:39,502] Trial 75 finished with value: 0.8540135857022273 and parameters: {'learning_rate': 0.000371916744428013, 'weight_decay': 0.007, 'warmup_steps': 4}. Best is trial 43 with value: 0.860999379458511.


Trial 76 with params: {'learning_rate': 0.0016267238580841548, 'weight_decay': 0.004, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6321,1.281575,0.5378,0.540576,0.537375,0.528349
2,1.1797,0.96414,0.653,0.664723,0.652292,0.653228
3,0.9411,0.771648,0.726,0.72752,0.726357,0.722691
4,0.7828,0.651962,0.7769,0.77514,0.77651,0.774603


[I 2025-04-08 03:43:35,373] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00019707032907244397, 'weight_decay': 0.005, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5858,1.072744,0.6136,0.616442,0.612911,0.608241
2,0.9976,0.758242,0.7305,0.733605,0.730033,0.729842
3,0.7501,0.672915,0.7653,0.768443,0.765968,0.762108
4,0.5783,0.546614,0.8081,0.806637,0.808083,0.806852


[I 2025-04-08 03:48:30,063] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0005595974537701847, 'weight_decay': 0.004, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4949,1.0565,0.6243,0.629519,0.623735,0.62022
2,0.9745,0.751386,0.7347,0.742346,0.733464,0.733193
3,0.7578,0.632955,0.7808,0.779987,0.781169,0.77761
4,0.6122,0.548609,0.8095,0.8086,0.809195,0.806938


[I 2025-04-08 03:53:47,491] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0007880393892370808, 'weight_decay': 0.002, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5349,1.133871,0.5977,0.602711,0.597391,0.591455
2,1.0329,0.792451,0.7157,0.724008,0.715013,0.715724
3,0.8114,0.679426,0.7642,0.762349,0.764592,0.760417
4,0.6597,0.551718,0.8072,0.806196,0.807038,0.805465
5,0.5414,0.523012,0.8242,0.827712,0.823991,0.823389
6,0.4397,0.482219,0.8353,0.838011,0.835529,0.835068
7,0.3432,0.473672,0.8412,0.840981,0.841756,0.839787
8,0.2513,0.453241,0.8563,0.856806,0.856421,0.856184
9,0.168,0.491015,0.8521,0.857151,0.851932,0.853528
10,0.1021,0.510346,0.8522,0.85353,0.852474,0.851719


[I 2025-04-08 04:06:02,212] Trial 79 finished with value: 0.8517193777628048 and parameters: {'learning_rate': 0.0007880393892370808, 'weight_decay': 0.002, 'warmup_steps': 14}. Best is trial 43 with value: 0.860999379458511.


Trial 80 with params: {'learning_rate': 0.0005539959276377236, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4781,1.137512,0.611,0.618719,0.610829,0.605073
2,0.976,0.753687,0.7304,0.73382,0.729869,0.73
3,0.757,0.645235,0.7757,0.775333,0.77593,0.772927
4,0.6086,0.531747,0.818,0.81709,0.817742,0.815479
5,0.4897,0.498505,0.8314,0.832228,0.831177,0.830441
6,0.3891,0.468211,0.8444,0.845757,0.844398,0.844178
7,0.2902,0.455318,0.8498,0.851934,0.850114,0.849675
8,0.1972,0.455969,0.8571,0.859315,0.856912,0.857712
9,0.1143,0.509091,0.8558,0.860967,0.855482,0.856956
10,0.0595,0.519831,0.8567,0.859504,0.856939,0.855929


[I 2025-04-08 04:18:15,398] Trial 80 finished with value: 0.8559290040450029 and parameters: {'learning_rate': 0.0005539959276377236, 'weight_decay': 0.006, 'warmup_steps': 15}. Best is trial 43 with value: 0.860999379458511.


Trial 81 with params: {'learning_rate': 7.323713197360346e-05, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8131,1.322417,0.5169,0.51326,0.516542,0.51189
2,1.2966,1.058997,0.623,0.625379,0.622824,0.620606
3,1.0331,0.900887,0.6765,0.676986,0.676755,0.673945
4,0.8586,0.815858,0.7157,0.714713,0.715417,0.712805


[I 2025-04-08 04:23:10,469] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0012315261197628753, 'weight_decay': 0.005, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6397,1.320097,0.5186,0.520813,0.518764,0.505038
2,1.1725,0.931208,0.6642,0.676075,0.662997,0.661742


[I 2025-04-08 04:25:37,826] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0007479738411934122, 'weight_decay': 0.006, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.539,1.311589,0.5605,0.576296,0.559705,0.556895
2,1.0464,0.803473,0.7103,0.710981,0.709744,0.705694
3,0.8267,0.67387,0.76,0.761263,0.759915,0.757274
4,0.677,0.578305,0.8001,0.79908,0.79976,0.797127


[I 2025-04-08 04:30:33,407] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.00042137602038645587, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4851,1.067668,0.6247,0.631563,0.624431,0.619019
2,0.9556,0.723134,0.7445,0.746509,0.743894,0.742188
3,0.7289,0.620783,0.7827,0.784966,0.783529,0.779129
4,0.5799,0.529965,0.8164,0.81387,0.816265,0.81393


[I 2025-04-08 04:35:24,376] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00045483022280775075, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4848,1.100799,0.6286,0.633893,0.62838,0.624094
2,0.9603,0.707986,0.7511,0.75766,0.750377,0.751134
3,0.738,0.604716,0.7908,0.792948,0.7916,0.788068
4,0.5867,0.54012,0.8132,0.810571,0.813288,0.809648
5,0.464,0.489309,0.8301,0.834788,0.830249,0.829794
6,0.3578,0.48461,0.8404,0.844066,0.840759,0.840431
7,0.2562,0.460681,0.8484,0.851441,0.848895,0.848508
8,0.1635,0.473131,0.8582,0.859781,0.858025,0.858249
9,0.0894,0.516865,0.8531,0.859905,0.852852,0.854976
10,0.0433,0.512491,0.861,0.862678,0.861298,0.860801


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-04-08 04:47:39,313] Trial 85 finished with value: 0.8608013415281448 and parameters: {'learning_rate': 0.00045483022280775075, 'weight_decay': 0.007, 'warmup_steps': 14}. Best is trial 43 with value: 0.860999379458511.


Trial 86 with params: {'learning_rate': 0.0003636739376494433, 'weight_decay': 0.007, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4853,1.02661,0.6443,0.648693,0.644332,0.640491
2,0.9343,0.720018,0.7485,0.752098,0.747695,0.746857
3,0.7168,0.599487,0.7893,0.790839,0.789745,0.787256
4,0.566,0.525271,0.8224,0.820667,0.822371,0.819893
5,0.4454,0.493641,0.8334,0.83663,0.833682,0.831847
6,0.3357,0.476084,0.8413,0.844392,0.841358,0.84175
7,0.2342,0.482811,0.8451,0.848058,0.845518,0.845152
8,0.143,0.492528,0.851,0.852583,0.850874,0.851411


[I 2025-04-08 04:57:24,679] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0003242953397935434, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5153,1.074576,0.6332,0.641068,0.632749,0.629833
2,0.9416,0.721705,0.7435,0.748419,0.742645,0.742767
3,0.7082,0.602389,0.7875,0.787979,0.788003,0.784118
4,0.5549,0.524729,0.8198,0.818284,0.819862,0.818182
5,0.4333,0.496245,0.8333,0.835708,0.833503,0.83219
6,0.3214,0.477377,0.8419,0.844934,0.841997,0.84241
7,0.2121,0.485374,0.8456,0.848858,0.846165,0.845383
8,0.1251,0.497595,0.8505,0.851261,0.850603,0.850354


[I 2025-04-08 05:07:08,665] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0005515578517768005, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.509,1.091985,0.6222,0.631956,0.621475,0.619796
2,0.9738,0.761796,0.7311,0.737332,0.730409,0.731442
3,0.756,0.617675,0.7849,0.789066,0.785628,0.78192
4,0.6067,0.534921,0.8174,0.816828,0.817337,0.815328
5,0.4881,0.491084,0.8317,0.833131,0.831532,0.831026
6,0.3867,0.487884,0.8367,0.841784,0.836808,0.837197
7,0.288,0.460041,0.8525,0.85384,0.852889,0.852008
8,0.1949,0.429193,0.8654,0.866273,0.865296,0.865591
9,0.1133,0.49117,0.8609,0.864974,0.860682,0.861971
10,0.061,0.500037,0.8601,0.86163,0.860294,0.859619


[I 2025-04-08 05:19:28,023] Trial 88 finished with value: 0.8596194856112195 and parameters: {'learning_rate': 0.0005515578517768005, 'weight_decay': 0.008, 'warmup_steps': 17}. Best is trial 43 with value: 0.860999379458511.


Trial 89 with params: {'learning_rate': 0.0009369884681024888, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5429,1.181562,0.5678,0.591564,0.567833,0.563854
2,1.0828,0.837336,0.7052,0.713538,0.704518,0.706197


[I 2025-04-08 05:21:54,712] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0008898714614975512, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5594,1.208608,0.5629,0.573147,0.563353,0.554681
2,1.0793,0.836516,0.7003,0.699979,0.700005,0.696849


[I 2025-04-08 05:24:21,518] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.00038974637981961446, 'weight_decay': 0.008, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5159,1.075528,0.6266,0.627079,0.626266,0.623065
2,0.9605,0.725583,0.744,0.749063,0.743504,0.744342
3,0.7274,0.640416,0.7776,0.783981,0.778423,0.775898
4,0.577,0.532979,0.8167,0.814814,0.816472,0.814036
5,0.4551,0.502003,0.8309,0.831031,0.831033,0.829505
6,0.3469,0.487865,0.8391,0.84376,0.839167,0.839667
7,0.2421,0.477711,0.8455,0.847647,0.845816,0.845321
8,0.1507,0.472583,0.8548,0.855284,0.854568,0.854538


[I 2025-04-08 05:34:08,802] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0005205328953754246, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4973,1.171787,0.5905,0.608281,0.590172,0.588343
2,0.9936,0.760639,0.7339,0.734734,0.733296,0.732032
3,0.7748,0.640015,0.7697,0.772077,0.770023,0.7678
4,0.6181,0.54011,0.8108,0.808332,0.810642,0.80826


[I 2025-04-08 05:38:59,851] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0005735184271410196, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4995,1.187961,0.5881,0.600225,0.58743,0.582169
2,1.01,0.805327,0.712,0.713163,0.711184,0.708903
3,0.7925,0.665434,0.7701,0.769628,0.770243,0.766027
4,0.6416,0.57415,0.8014,0.799706,0.801244,0.797839


[I 2025-04-08 05:43:53,058] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0005205554929973225, 'weight_decay': 0.008, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5032,1.091817,0.6116,0.617333,0.61092,0.608425
2,0.9962,0.78073,0.7255,0.729959,0.724785,0.722938


[I 2025-04-08 05:46:18,259] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.000259868758852356, 'weight_decay': 0.006, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5321,1.044322,0.6324,0.637472,0.631749,0.629135
2,0.9678,0.769723,0.7294,0.739658,0.728952,0.731084
3,0.7255,0.608849,0.7909,0.791496,0.791221,0.788713
4,0.5591,0.535811,0.8163,0.816149,0.816143,0.813947
5,0.4202,0.494994,0.8327,0.835068,0.832651,0.832245
6,0.3052,0.503116,0.8346,0.838414,0.834898,0.835146
7,0.1992,0.518765,0.8425,0.843596,0.842811,0.841943
8,0.1159,0.520432,0.8477,0.849579,0.847403,0.847806


[I 2025-04-08 05:56:01,072] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 8.716968522655038e-05, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7595,1.292119,0.529,0.529797,0.528264,0.524174
2,1.2287,1.009403,0.6376,0.642342,0.636671,0.63802


[I 2025-04-08 05:58:28,170] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0006939083285346099, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5284,1.121425,0.5974,0.603167,0.596908,0.59347
2,1.0391,0.792624,0.7202,0.725298,0.719361,0.717517
3,0.8116,0.673631,0.7641,0.770285,0.76471,0.763107
4,0.6555,0.566073,0.8018,0.799958,0.801565,0.799374


[I 2025-04-08 06:03:24,497] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0035054904723296637, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8261,1.532245,0.4229,0.447372,0.421541,0.392777
2,1.3801,1.180344,0.5707,0.584704,0.569724,0.569936
3,1.1427,0.950539,0.6546,0.649576,0.654471,0.650008
4,0.9723,0.821644,0.7042,0.703061,0.703576,0.700633


[I 2025-04-08 06:08:16,767] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00018478584906746223, 'weight_decay': 0.005, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5992,1.088949,0.6133,0.613202,0.612742,0.608551
2,1.0047,0.792948,0.721,0.724783,0.720365,0.72132


[I 2025-04-08 06:10:44,059] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0002448271997840478, 'weight_decay': 0.008, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5563,1.152271,0.6003,0.613881,0.599317,0.59677
2,0.9654,0.792711,0.7217,0.72764,0.721033,0.721467


[I 2025-04-08 06:13:12,644] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00032599950579553674, 'weight_decay': 0.005, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5225,1.071432,0.6182,0.620438,0.617734,0.615153
2,0.979,0.764454,0.7293,0.728936,0.728614,0.726472
3,0.7371,0.621478,0.7833,0.784201,0.783823,0.781096
4,0.5719,0.543943,0.8155,0.814173,0.815384,0.813763
5,0.4425,0.49321,0.8362,0.839775,0.83622,0.835844
6,0.3276,0.491115,0.8383,0.841462,0.838398,0.83817
7,0.2221,0.507457,0.8413,0.846029,0.841572,0.841849
8,0.1319,0.503968,0.85,0.852274,0.849941,0.850257


[I 2025-04-08 06:23:01,319] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0005401989409621472, 'weight_decay': 0.001, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5073,1.167389,0.5916,0.607988,0.592097,0.58294
2,1.0001,0.795876,0.7157,0.720766,0.715112,0.714443


[I 2025-04-08 06:25:30,049] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0006419229335682162, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4939,1.065052,0.6122,0.62109,0.612338,0.608369
2,1.0022,0.779833,0.7223,0.722758,0.721531,0.721166
3,0.7785,0.619322,0.7858,0.7865,0.786061,0.784639
4,0.6273,0.536577,0.814,0.812741,0.81409,0.811653
5,0.5101,0.494903,0.8317,0.831189,0.831781,0.830496
6,0.4077,0.467381,0.8422,0.843493,0.842235,0.84195
7,0.313,0.44612,0.8521,0.854473,0.852479,0.852198
8,0.2188,0.455614,0.8573,0.861208,0.857444,0.857944
9,0.1357,0.498817,0.8543,0.859478,0.854224,0.855515
10,0.0765,0.483072,0.8585,0.860287,0.858884,0.858326


[I 2025-04-08 06:37:34,016] Trial 103 finished with value: 0.8583264096840422 and parameters: {'learning_rate': 0.0006419229335682162, 'weight_decay': 0.007, 'warmup_steps': 14}. Best is trial 43 with value: 0.860999379458511.


Trial 104 with params: {'learning_rate': 0.0007584661773953899, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5268,1.240192,0.5625,0.586044,0.562277,0.556489
2,1.0676,0.837963,0.7022,0.704028,0.701573,0.699294


[I 2025-04-08 06:39:59,398] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0004344710634820757, 'weight_decay': 0.008, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5036,1.07496,0.6261,0.635236,0.625836,0.622544
2,0.96,0.738536,0.7416,0.7475,0.740788,0.741356
3,0.7354,0.649117,0.7741,0.777174,0.774841,0.770788
4,0.5801,0.519936,0.8148,0.812696,0.814548,0.812597
5,0.4591,0.472997,0.8351,0.835421,0.835105,0.834527
6,0.3513,0.479993,0.8359,0.840988,0.835938,0.836926
7,0.2461,0.471592,0.8503,0.853382,0.85084,0.850193
8,0.1551,0.475957,0.8558,0.857046,0.855898,0.85614
9,0.0816,0.525514,0.854,0.860247,0.853778,0.855477
10,0.04,0.519675,0.8596,0.862252,0.859844,0.859509


[I 2025-04-08 06:52:07,615] Trial 105 finished with value: 0.8595087088237501 and parameters: {'learning_rate': 0.0004344710634820757, 'weight_decay': 0.008, 'warmup_steps': 12}. Best is trial 43 with value: 0.860999379458511.


Trial 106 with params: {'learning_rate': 0.000544157354503906, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4902,1.06797,0.6246,0.635979,0.624107,0.622619
2,0.9866,0.787209,0.7211,0.730119,0.719867,0.721018


[I 2025-04-08 06:54:32,206] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0004010026928562447, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4982,1.034509,0.6311,0.643904,0.631135,0.629246
2,0.9537,0.743918,0.7404,0.747733,0.739764,0.741555
3,0.7245,0.621999,0.7837,0.786144,0.784358,0.781246
4,0.5697,0.520262,0.8196,0.818818,0.819693,0.818342
5,0.4505,0.494439,0.8328,0.835463,0.833005,0.831407
6,0.3426,0.473766,0.8447,0.847553,0.844906,0.844666
7,0.2419,0.4629,0.8521,0.854798,0.852494,0.852193
8,0.1505,0.472997,0.8581,0.859629,0.858019,0.858442
9,0.0788,0.509482,0.8538,0.860307,0.85366,0.85569
10,0.0378,0.531526,0.8511,0.855945,0.851561,0.850552


[I 2025-04-08 07:06:40,915] Trial 107 finished with value: 0.8505522269004289 and parameters: {'learning_rate': 0.0004010026928562447, 'weight_decay': 0.008, 'warmup_steps': 14}. Best is trial 43 with value: 0.860999379458511.


Trial 108 with params: {'learning_rate': 0.00029444085247009195, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5155,1.049456,0.6332,0.633397,0.63267,0.628358
2,0.9504,0.740491,0.7392,0.741556,0.738422,0.737865
3,0.7257,0.629874,0.7816,0.783205,0.781903,0.779813
4,0.5598,0.52493,0.8178,0.816752,0.817471,0.815876
5,0.4298,0.482967,0.8361,0.83703,0.836252,0.835307
6,0.3145,0.5115,0.8341,0.839586,0.834167,0.834446
7,0.2061,0.508775,0.8403,0.843167,0.840745,0.840319
8,0.1165,0.508012,0.8468,0.848876,0.846702,0.847434


[I 2025-04-08 07:16:24,284] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0009227778173118728, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5584,1.269652,0.5504,0.567069,0.550536,0.543828
2,1.1062,0.859568,0.6933,0.699326,0.692631,0.693289


[I 2025-04-08 07:18:50,209] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0007175042475164369, 'weight_decay': 0.007, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5238,1.224821,0.5748,0.591257,0.575271,0.571554
2,1.0482,0.859029,0.6962,0.703229,0.695446,0.694406


[I 2025-04-08 07:21:13,391] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0005188005709593968, 'weight_decay': 0.007, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5116,1.170676,0.5914,0.601691,0.591,0.584712
2,1.0095,0.800011,0.7137,0.716094,0.712886,0.710099
3,0.7833,0.644703,0.7763,0.775432,0.776806,0.772923
4,0.6298,0.561526,0.8039,0.801217,0.804052,0.799328


[I 2025-04-08 07:26:04,144] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0006353487089906261, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5018,1.148509,0.5963,0.61228,0.596327,0.592623
2,1.0054,0.757032,0.7275,0.731706,0.727138,0.726253
3,0.7912,0.665236,0.7699,0.77012,0.770204,0.767573
4,0.6425,0.567809,0.8025,0.803018,0.802489,0.799743


[I 2025-04-08 07:30:55,390] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00041158119971645843, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4996,1.106487,0.6108,0.619428,0.610599,0.604259
2,0.9601,0.746814,0.737,0.739935,0.736129,0.73475
3,0.7341,0.623855,0.7845,0.784623,0.784848,0.781609
4,0.5794,0.532161,0.8195,0.817167,0.81951,0.816112
5,0.4581,0.487665,0.8403,0.840382,0.840352,0.83919
6,0.3477,0.476928,0.8409,0.84359,0.841088,0.841111
7,0.2463,0.473532,0.8467,0.849038,0.847234,0.846258
8,0.1522,0.468823,0.8557,0.856451,0.855772,0.855788
9,0.0803,0.53656,0.8552,0.862896,0.855097,0.857373
10,0.0367,0.523415,0.8572,0.859153,0.857499,0.8569


[I 2025-04-08 07:43:05,217] Trial 113 finished with value: 0.8569002222062437 and parameters: {'learning_rate': 0.00041158119971645843, 'weight_decay': 0.01, 'warmup_steps': 8}. Best is trial 43 with value: 0.860999379458511.


Trial 114 with params: {'learning_rate': 0.0002719792289882392, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5055,0.972474,0.6555,0.654764,0.654691,0.649814
2,0.9372,0.752589,0.7355,0.741876,0.73472,0.73451
3,0.7096,0.631147,0.7811,0.78418,0.78177,0.777588
4,0.551,0.51987,0.8219,0.820868,0.821498,0.819439
5,0.4172,0.496747,0.8307,0.833826,0.83093,0.82961
6,0.3041,0.48364,0.8424,0.843873,0.842555,0.841939
7,0.199,0.505864,0.8442,0.846888,0.844531,0.84398
8,0.1127,0.49215,0.8544,0.854599,0.854393,0.854146
9,0.0558,0.567371,0.848,0.854294,0.847724,0.849123
10,0.0252,0.567608,0.8481,0.849462,0.848234,0.84676


[I 2025-04-08 07:55:25,104] Trial 114 finished with value: 0.8467599637186713 and parameters: {'learning_rate': 0.0002719792289882392, 'weight_decay': 0.01, 'warmup_steps': 9}. Best is trial 43 with value: 0.860999379458511.


Trial 115 with params: {'learning_rate': 0.00042597375951862053, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4931,1.119794,0.6039,0.619107,0.603569,0.601004
2,0.9704,0.735326,0.7381,0.741263,0.73751,0.738318
3,0.7436,0.639324,0.7777,0.779637,0.778334,0.774467
4,0.5921,0.534234,0.8136,0.812513,0.813581,0.812394
5,0.4703,0.489623,0.8317,0.83264,0.832024,0.831024
6,0.3656,0.485201,0.8348,0.838954,0.835048,0.834612
7,0.2633,0.473559,0.8458,0.847771,0.846426,0.844889
8,0.1683,0.483209,0.8544,0.856808,0.854222,0.85462


[I 2025-04-08 08:05:09,310] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0006640236016214589, 'weight_decay': 0.01, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4905,1.108436,0.604,0.617086,0.60324,0.601488
2,0.9912,0.771161,0.7273,0.730133,0.726848,0.726841
3,0.7815,0.645101,0.7758,0.77692,0.776211,0.773998
4,0.6315,0.57528,0.803,0.802515,0.802717,0.800114


[I 2025-04-08 08:10:02,133] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00038988535016704984, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5059,1.033212,0.6315,0.641474,0.63067,0.629839
2,0.9526,0.69597,0.7577,0.758809,0.756839,0.755947
3,0.7202,0.6305,0.7818,0.784178,0.782324,0.77914
4,0.568,0.524026,0.8229,0.820943,0.822744,0.820604
5,0.4428,0.476234,0.8376,0.840331,0.83749,0.837277
6,0.3346,0.457487,0.8483,0.85371,0.848269,0.849709
7,0.2331,0.450852,0.8547,0.858351,0.855172,0.854968
8,0.143,0.469779,0.8573,0.856949,0.857489,0.856964
9,0.0736,0.511942,0.858,0.863484,0.857923,0.859551
10,0.0333,0.542162,0.8529,0.855686,0.853468,0.852337


[I 2025-04-08 08:22:18,551] Trial 117 finished with value: 0.8523367132117066 and parameters: {'learning_rate': 0.00038988535016704984, 'weight_decay': 0.0, 'warmup_steps': 18}. Best is trial 43 with value: 0.860999379458511.


Trial 118 with params: {'learning_rate': 0.0006306199182005229, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5069,1.189585,0.5783,0.604546,0.578562,0.574199
2,1.012,0.819058,0.7119,0.718027,0.710976,0.710007
3,0.7816,0.661161,0.7696,0.771908,0.770168,0.767316
4,0.6315,0.562663,0.8057,0.80368,0.805464,0.802397


[I 2025-04-08 08:27:10,338] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0004026915830407254, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5109,1.05558,0.6352,0.637309,0.634997,0.631011
2,0.9616,0.733054,0.7424,0.749061,0.74168,0.742273
3,0.734,0.644345,0.7795,0.777716,0.780001,0.7756
4,0.5794,0.535351,0.8173,0.815793,0.817054,0.815085
5,0.4562,0.503255,0.8275,0.829155,0.827531,0.826367
6,0.3429,0.481732,0.8393,0.845696,0.839067,0.840423
7,0.2383,0.503917,0.8425,0.846654,0.843014,0.842721
8,0.1474,0.489055,0.8525,0.856482,0.852263,0.852964
9,0.0771,0.539193,0.8511,0.857064,0.850942,0.852671
10,0.0361,0.523888,0.854,0.855179,0.854362,0.853124


[I 2025-04-08 08:39:23,991] Trial 119 finished with value: 0.8531240140261245 and parameters: {'learning_rate': 0.0004026915830407254, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}. Best is trial 43 with value: 0.860999379458511.


Trial 120 with params: {'learning_rate': 0.00012654035347595767, 'weight_decay': 0.01, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7001,1.187027,0.5743,0.572596,0.573558,0.56729
2,1.107,0.88433,0.6873,0.688799,0.686712,0.686733
3,0.8491,0.744014,0.7355,0.735385,0.736114,0.730667
4,0.6802,0.653082,0.7701,0.766425,0.769877,0.766633


[I 2025-04-08 08:44:11,593] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.00043597735306631104, 'weight_decay': 0.006, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.509,1.035429,0.6347,0.633569,0.634566,0.630256
2,0.9727,0.757817,0.7368,0.738407,0.736146,0.733446
3,0.7463,0.639082,0.7779,0.782951,0.778793,0.775373
4,0.5908,0.546,0.8134,0.811983,0.813462,0.811013


[I 2025-04-08 08:49:06,233] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00039704482321128813, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4934,1.005378,0.642,0.645143,0.641683,0.639307
2,0.9484,0.734844,0.7477,0.748543,0.747242,0.746297
3,0.7214,0.611306,0.7862,0.785193,0.786684,0.783318
4,0.5689,0.53643,0.8142,0.812009,0.814147,0.81009
5,0.4497,0.482594,0.8355,0.836195,0.83556,0.834509
6,0.3418,0.459722,0.8459,0.848425,0.845917,0.84598
7,0.2365,0.466102,0.8512,0.853966,0.851502,0.85125
8,0.1475,0.464651,0.8579,0.859606,0.857733,0.858269
9,0.0777,0.545388,0.8546,0.862144,0.854295,0.856366
10,0.0372,0.512619,0.8604,0.861238,0.860627,0.859742


[I 2025-04-08 09:01:21,290] Trial 122 finished with value: 0.8597417413677594 and parameters: {'learning_rate': 0.00039704482321128813, 'weight_decay': 0.007, 'warmup_steps': 13}. Best is trial 43 with value: 0.860999379458511.


Trial 123 with params: {'learning_rate': 0.0005311042791442586, 'weight_decay': 0.002, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.515,1.120714,0.6023,0.609998,0.602056,0.597837
2,0.9843,0.769039,0.725,0.732597,0.724201,0.723144


[I 2025-04-08 09:03:49,187] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0004440581013499384, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4742,1.023704,0.6453,0.651418,0.644398,0.643076
2,0.9476,0.739055,0.7398,0.744555,0.739353,0.740114
3,0.7332,0.631283,0.782,0.782692,0.782698,0.778358
4,0.5857,0.539806,0.8166,0.815387,0.816434,0.814013
5,0.4666,0.473169,0.8337,0.837554,0.833622,0.833603
6,0.3623,0.466768,0.8439,0.845334,0.844149,0.843588
7,0.2643,0.479389,0.8484,0.854906,0.848946,0.848922
8,0.1715,0.458053,0.8588,0.860114,0.85885,0.859154
9,0.0932,0.505947,0.856,0.86311,0.8557,0.857635
10,0.0453,0.505118,0.8611,0.862557,0.861307,0.860719


[I 2025-04-08 09:16:06,126] Trial 124 finished with value: 0.8607189048342757 and parameters: {'learning_rate': 0.0004440581013499384, 'weight_decay': 0.007, 'warmup_steps': 11}. Best is trial 43 with value: 0.860999379458511.


Trial 125 with params: {'learning_rate': 0.0004576781071512883, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4845,1.057912,0.6284,0.644463,0.62752,0.624343
2,0.9669,0.725225,0.7453,0.745874,0.744622,0.7438
3,0.7404,0.63573,0.7759,0.776801,0.776515,0.771961
4,0.5891,0.520115,0.825,0.822943,0.825035,0.822878
5,0.4727,0.503199,0.8296,0.830406,0.829492,0.828536
6,0.3646,0.474511,0.8415,0.844541,0.841795,0.841471
7,0.2664,0.44602,0.8546,0.856188,0.854872,0.854424
8,0.171,0.438857,0.8592,0.860264,0.859171,0.85934
9,0.092,0.507897,0.8552,0.861212,0.854874,0.856744
10,0.0457,0.504342,0.8603,0.863618,0.860561,0.860599


[I 2025-04-08 09:28:21,229] Trial 125 finished with value: 0.8605988714678284 and parameters: {'learning_rate': 0.0004576781071512883, 'weight_decay': 0.008, 'warmup_steps': 10}. Best is trial 43 with value: 0.860999379458511.


Trial 126 with params: {'learning_rate': 0.00029954243831773066, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5317,1.098358,0.6158,0.614594,0.61505,0.608787
2,0.9589,0.733295,0.7412,0.7447,0.740303,0.739378
3,0.7202,0.621215,0.7835,0.785005,0.783764,0.781119
4,0.5588,0.533435,0.8174,0.816309,0.817325,0.814914
5,0.4322,0.483281,0.8364,0.837996,0.836587,0.835465
6,0.319,0.501021,0.8392,0.842763,0.83933,0.839571
7,0.2102,0.491063,0.8459,0.8482,0.846186,0.845924
8,0.122,0.525046,0.8434,0.844979,0.843564,0.843506


[I 2025-04-08 09:38:14,925] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0010184433189026677, 'weight_decay': 0.01, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5721,1.273392,0.5479,0.555579,0.547773,0.535908
2,1.1168,0.902038,0.6714,0.675843,0.670652,0.669727


[I 2025-04-08 09:40:42,930] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0017284296976961264, 'weight_decay': 0.007, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7044,1.403231,0.5,0.499679,0.499116,0.486471
2,1.2651,1.048548,0.6245,0.633206,0.623518,0.624906


[I 2025-04-08 09:43:10,591] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0003391993790703202, 'weight_decay': 0.007, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5039,1.032668,0.6361,0.642337,0.635753,0.634911
2,0.9605,0.741647,0.7398,0.744322,0.738861,0.738917
3,0.7315,0.609376,0.7833,0.786815,0.783879,0.78144
4,0.5717,0.547387,0.8146,0.815345,0.814266,0.812529
5,0.4432,0.495577,0.829,0.831894,0.829172,0.828351
6,0.3333,0.477843,0.8412,0.843676,0.841276,0.841131
7,0.2273,0.487983,0.8477,0.85219,0.848182,0.847797
8,0.1355,0.483419,0.8512,0.851531,0.851283,0.85105


[I 2025-04-08 09:52:54,913] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0006931522515587194, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5282,1.153086,0.5922,0.607034,0.591611,0.591176
2,1.0603,0.816638,0.7107,0.71634,0.709705,0.710549
3,0.8386,0.688404,0.7584,0.759917,0.758945,0.755896
4,0.6821,0.604104,0.7924,0.79077,0.792172,0.788142


[I 2025-04-08 09:57:44,733] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0008501942334479195, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5718,1.240018,0.5705,0.574559,0.569735,0.56396
2,1.0891,0.865118,0.693,0.696711,0.692617,0.691383


[I 2025-04-08 10:00:10,738] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.00041777637454537455, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5237,1.09439,0.6146,0.620005,0.614563,0.611611
2,0.979,0.742942,0.7355,0.741602,0.734763,0.735681
3,0.7459,0.617113,0.7866,0.786636,0.787044,0.783766
4,0.588,0.52158,0.8206,0.818208,0.820281,0.81755
5,0.4675,0.498502,0.8274,0.828676,0.827396,0.826493
6,0.3611,0.476732,0.8409,0.842202,0.84119,0.840495
7,0.2576,0.474331,0.8464,0.84964,0.846853,0.84661
8,0.1628,0.484433,0.8497,0.853145,0.849551,0.850458


[I 2025-04-08 10:09:57,579] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0005031861526570372, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5228,1.119258,0.6026,0.616311,0.602148,0.595
2,0.9886,0.768652,0.725,0.731935,0.72428,0.726028
3,0.7571,0.631311,0.78,0.782196,0.780536,0.77789
4,0.6105,0.533577,0.8203,0.819189,0.820142,0.817658
5,0.4887,0.491397,0.8377,0.837913,0.837652,0.836453
6,0.3882,0.471532,0.8395,0.842686,0.839499,0.839912
7,0.2845,0.460214,0.8528,0.855859,0.853231,0.852914
8,0.1889,0.461337,0.8571,0.857438,0.857156,0.856829
9,0.1069,0.496602,0.8552,0.859556,0.85483,0.85632
10,0.056,0.523832,0.8552,0.858563,0.855443,0.854872


[I 2025-04-08 10:22:12,347] Trial 133 finished with value: 0.8548724636609698 and parameters: {'learning_rate': 0.0005031861526570372, 'weight_decay': 0.008, 'warmup_steps': 16}. Best is trial 43 with value: 0.860999379458511.


Trial 134 with params: {'learning_rate': 0.0007313211371092708, 'weight_decay': 0.0, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5183,1.109458,0.6042,0.609572,0.604172,0.600396
2,1.0338,0.837664,0.6996,0.706403,0.698909,0.699155
3,0.8149,0.681023,0.7586,0.760096,0.759368,0.755063
4,0.6567,0.576828,0.8041,0.803798,0.803736,0.801164


[I 2025-04-08 10:27:06,723] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.00039472210100409996, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5246,1.191081,0.5809,0.595888,0.581282,0.572623
2,0.9878,0.75599,0.7327,0.739629,0.73181,0.732588
3,0.7529,0.630674,0.7817,0.781202,0.782059,0.778181
4,0.5884,0.546531,0.8133,0.810511,0.813277,0.810625


[I 2025-04-08 10:32:01,451] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0005280489047611394, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4964,1.139065,0.602,0.614478,0.601604,0.599153
2,0.9776,0.732148,0.74,0.744037,0.739311,0.739791
3,0.7591,0.607114,0.7889,0.791708,0.78941,0.787305
4,0.6056,0.52884,0.8146,0.812421,0.814413,0.811858
5,0.4832,0.500594,0.8291,0.830264,0.829072,0.827781
6,0.3814,0.486584,0.8388,0.84208,0.839156,0.838788
7,0.281,0.447608,0.8561,0.857262,0.85653,0.85532
8,0.1905,0.454799,0.8607,0.862291,0.860582,0.860827
9,0.11,0.491493,0.8617,0.86729,0.86147,0.863281
10,0.0573,0.505849,0.8617,0.863326,0.861856,0.86137


[I 2025-04-08 10:44:09,622] Trial 136 finished with value: 0.8613704644687414 and parameters: {'learning_rate': 0.0005280489047611394, 'weight_decay': 0.008, 'warmup_steps': 13}. Best is trial 136 with value: 0.8613704644687414.


Trial 137 with params: {'learning_rate': 0.00031388490765797727, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4998,1.019632,0.6412,0.639797,0.640534,0.636589
2,0.9416,0.732123,0.7416,0.749223,0.740801,0.742158
3,0.7126,0.613729,0.7836,0.784971,0.784029,0.780778
4,0.5528,0.524294,0.8202,0.817103,0.820286,0.81668
5,0.4255,0.500238,0.8321,0.837764,0.832211,0.831738
6,0.3155,0.492268,0.8384,0.844387,0.83836,0.839526
7,0.2111,0.499939,0.8418,0.845948,0.842161,0.842102
8,0.123,0.514426,0.8489,0.849658,0.848936,0.848853


[I 2025-04-08 10:53:50,392] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0007465165885637105, 'weight_decay': 0.008, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5372,1.202437,0.5729,0.586729,0.572641,0.568172
2,1.0687,0.81347,0.7119,0.719346,0.711272,0.712304


[I 2025-04-08 10:56:15,595] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0001772405333439467, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5906,1.099497,0.6038,0.606871,0.602946,0.599989
2,1.0103,0.793285,0.7134,0.721127,0.712607,0.714032
3,0.7664,0.677213,0.7602,0.76039,0.760459,0.757714
4,0.5874,0.588223,0.7965,0.795367,0.796573,0.794395


[I 2025-04-08 11:01:05,842] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.00014990503583045417, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6422,1.105486,0.6012,0.597121,0.600684,0.594815
2,1.0376,0.832187,0.708,0.711802,0.707506,0.707671


[I 2025-04-08 11:03:30,091] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0003287431275549471, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5253,1.032613,0.6389,0.643461,0.639024,0.634019
2,0.9567,0.739553,0.7381,0.74131,0.737195,0.737552
3,0.7206,0.613592,0.7835,0.78682,0.78392,0.780654
4,0.5592,0.515794,0.8197,0.817905,0.819797,0.817943
5,0.4335,0.494366,0.8301,0.834875,0.829992,0.829718
6,0.3233,0.472287,0.8433,0.844835,0.843397,0.843149
7,0.2187,0.477091,0.8471,0.850266,0.847529,0.84689
8,0.1308,0.478869,0.8537,0.855308,0.853496,0.853969


[I 2025-04-08 11:13:06,640] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0006894599223365288, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5008,1.222768,0.5851,0.59466,0.585036,0.580752
2,1.0207,0.800216,0.7174,0.722341,0.716723,0.715869


[I 2025-04-08 11:15:31,518] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0006231454450277718, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5053,1.095164,0.6035,0.631707,0.603838,0.59996
2,1.0065,0.80241,0.7096,0.715866,0.708852,0.708895
3,0.7858,0.663794,0.7676,0.773669,0.768417,0.765184
4,0.6361,0.542145,0.8119,0.809517,0.8118,0.808779


[I 2025-04-08 11:20:22,991] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.0009245499714965943, 'weight_decay': 0.005, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5697,1.244776,0.5524,0.575369,0.552135,0.548747
2,1.1026,0.877733,0.6821,0.693099,0.681851,0.68149


[I 2025-04-08 11:22:48,709] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.00017670551682372925, 'weight_decay': 0.008, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6246,1.099335,0.6086,0.611302,0.607614,0.604407
2,1.0193,0.805441,0.7136,0.720243,0.712703,0.712617
3,0.7654,0.667334,0.7659,0.765623,0.766335,0.76306
4,0.5915,0.565492,0.8056,0.804379,0.805411,0.804078


[I 2025-04-08 11:27:41,156] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0017049097129868964, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7733,1.476004,0.4533,0.470066,0.453073,0.435554
2,1.3695,1.152891,0.5873,0.589989,0.586954,0.58677


[I 2025-04-08 11:30:06,125] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.00028969186584564016, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5369,1.042246,0.6272,0.629315,0.62613,0.620742
2,0.9581,0.773636,0.728,0.733551,0.727362,0.726831
3,0.7222,0.629288,0.7776,0.780721,0.778463,0.774652
4,0.5646,0.524192,0.8189,0.818248,0.818862,0.816994
5,0.4342,0.492741,0.8314,0.831554,0.831555,0.830276
6,0.3185,0.496008,0.837,0.842445,0.836967,0.83777
7,0.2099,0.504743,0.8399,0.842595,0.840423,0.839435
8,0.1226,0.487008,0.8496,0.85003,0.849658,0.849371


[I 2025-04-08 11:39:51,794] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0005005350996293366, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4946,1.129562,0.6113,0.620089,0.61104,0.606143
2,0.977,0.761083,0.7327,0.739908,0.731687,0.730955
3,0.7597,0.651948,0.7732,0.775197,0.77364,0.770788
4,0.6073,0.533318,0.8172,0.816079,0.8169,0.814824
5,0.4868,0.513817,0.8232,0.827947,0.823414,0.822361
6,0.3811,0.472339,0.843,0.846422,0.843006,0.8437
7,0.2814,0.451907,0.8526,0.852351,0.852906,0.851967
8,0.1892,0.453905,0.8578,0.857712,0.857869,0.857432
9,0.1075,0.508827,0.8527,0.858736,0.852542,0.854167
10,0.0558,0.518586,0.8551,0.857181,0.855476,0.854648


[I 2025-04-08 11:51:57,816] Trial 148 finished with value: 0.8546481014940118 and parameters: {'learning_rate': 0.0005005350996293366, 'weight_decay': 0.007, 'warmup_steps': 15}. Best is trial 136 with value: 0.8613704644687414.


Trial 149 with params: {'learning_rate': 0.00033278420397738303, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5051,1.068179,0.6215,0.640184,0.621841,0.615639
2,0.9373,0.747414,0.7405,0.750662,0.739552,0.741991
3,0.7085,0.633167,0.7821,0.7841,0.782782,0.778653
4,0.5571,0.532129,0.8172,0.816286,0.817077,0.815252
5,0.4308,0.501865,0.831,0.833005,0.831142,0.829452
6,0.32,0.491733,0.8367,0.842868,0.836959,0.837897
7,0.2156,0.509141,0.8402,0.844179,0.840719,0.84013
8,0.126,0.505209,0.8476,0.849351,0.847511,0.847995


[I 2025-04-08 12:01:39,594] Trial 149 pruned. 


In [19]:
print(best_base_random)

BestRun(run_id='136', objective=0.8613704644687414, hyperparameters={'learning_rate': 0.0005280489047611394, 'weight_decay': 0.008, 'warmup_steps': 13}, run_summary=None)


In [15]:
base.reset_seed()

In [16]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-KD_hp-search", logging_dir=f"~/logs/{DATASET}/random-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

In [17]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

In [18]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [19]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(10)
)
  

In [20]:
best_distill_random = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-04 19:42:45,231] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0888,0.807692,0.6203,0.628576,0.619397,0.61555
2,0.708,0.561188,0.7501,0.757068,0.749581,0.750302


[I 2025-04-04 19:46:46,660] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2455,0.923873,0.5418,0.544969,0.540384,0.53058
2,0.8572,0.7162,0.6649,0.669705,0.664346,0.664257
3,0.6929,0.616682,0.72,0.719997,0.720385,0.716843
4,0.5865,0.545823,0.7601,0.762277,0.759708,0.757778


[I 2025-04-04 19:54:44,962] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3584,1.078577,0.4499,0.45548,0.448809,0.446114
2,1.059,0.905734,0.5571,0.560971,0.556376,0.553091


[I 2025-04-04 19:58:51,838] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2293,0.893336,0.5638,0.558867,0.562651,0.552366
2,0.8363,0.707009,0.6715,0.675573,0.671011,0.671508
3,0.6717,0.575971,0.738,0.738518,0.738469,0.736708
4,0.569,0.523743,0.7688,0.766555,0.768679,0.764758


[I 2025-04-04 20:06:51,732] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0454,0.8032,0.6109,0.62628,0.610223,0.605191
2,0.7371,0.58038,0.7437,0.74661,0.743012,0.743599


[I 2025-04-04 20:10:52,316] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1639,0.934297,0.5423,0.548239,0.541821,0.536026
2,0.8729,0.739925,0.6543,0.661707,0.653861,0.655372
3,0.7224,0.63006,0.714,0.713414,0.713897,0.712575
4,0.6299,0.563205,0.7507,0.752031,0.75059,0.747142


[I 2025-04-04 20:18:53,746] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.047,0.800141,0.6189,0.616167,0.618005,0.606947
2,0.7259,0.57256,0.7474,0.745643,0.746727,0.745405
3,0.5877,0.498714,0.7889,0.792073,0.789441,0.785492
4,0.509,0.441972,0.8143,0.815696,0.814021,0.811981


[I 2025-04-04 20:26:52,355] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1469,0.976053,0.5245,0.52923,0.52334,0.512522
2,0.8789,0.741445,0.6488,0.651505,0.6483,0.648466
3,0.7294,0.618645,0.7214,0.721685,0.721591,0.718424
4,0.6221,0.527279,0.766,0.76546,0.765606,0.763029
5,0.5449,0.479966,0.7909,0.798692,0.79082,0.789566
6,0.4847,0.452818,0.8105,0.811911,0.810604,0.810418
7,0.4334,0.425751,0.82,0.825094,0.820452,0.82085
8,0.3844,0.391139,0.8371,0.838953,0.837121,0.837433


[I 2025-04-04 20:43:00,861] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.266,0.956826,0.528,0.52424,0.527063,0.519597
2,0.8952,0.753035,0.644,0.648138,0.643506,0.642897
3,0.7294,0.652483,0.6985,0.69827,0.698823,0.695694
4,0.6246,0.588161,0.7292,0.728106,0.72881,0.725548


[I 2025-04-04 20:51:01,770] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.08,0.822504,0.5999,0.603864,0.599487,0.595703
2,0.7607,0.636464,0.7069,0.714601,0.705999,0.707264


[I 2025-04-04 20:55:01,669] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.004794768110099147, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.236,1.032127,0.4694,0.475387,0.468517,0.445156
2,0.9596,0.813946,0.6096,0.618732,0.608904,0.60837
3,0.8063,0.679601,0.6841,0.685032,0.6841,0.683808
4,0.699,0.585888,0.731,0.728887,0.730554,0.727068


[I 2025-04-04 21:03:10,835] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.00019050351120711566, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1125,0.805502,0.6148,0.619465,0.614243,0.609636
2,0.7378,0.605548,0.729,0.730617,0.728503,0.727368


[I 2025-04-04 21:07:12,060] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.0009349568983679941, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.051,0.80944,0.614,0.621669,0.612916,0.611851
2,0.7635,0.631504,0.7086,0.709787,0.708344,0.707109
3,0.6229,0.524133,0.7717,0.775658,0.772307,0.769081
4,0.5313,0.478677,0.7977,0.800023,0.797365,0.794786
5,0.4626,0.420595,0.8256,0.829178,0.825415,0.825075
6,0.4063,0.399141,0.8379,0.839774,0.838028,0.837943
7,0.3561,0.370574,0.8524,0.856807,0.85277,0.852957
8,0.3084,0.356942,0.8656,0.86651,0.865749,0.865723


[I 2025-04-04 21:23:04,416] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.00440198015702204, 'weight_decay': 0.005, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2323,1.046488,0.4544,0.480318,0.454093,0.437665
2,0.9724,0.830042,0.5965,0.600369,0.595759,0.594834


[I 2025-04-04 21:27:04,292] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.00012341582656849432, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1998,0.879784,0.5691,0.564598,0.567885,0.555149
2,0.8073,0.678441,0.6861,0.688019,0.685384,0.685448
3,0.6506,0.589097,0.7322,0.734362,0.732891,0.729029
4,0.5519,0.51726,0.7685,0.772866,0.768214,0.765999
5,0.4681,0.489064,0.7851,0.793506,0.785132,0.783108
6,0.4025,0.484425,0.7945,0.798021,0.794885,0.794705
7,0.3446,0.475544,0.7982,0.802955,0.798669,0.798032
8,0.2953,0.463333,0.8066,0.80709,0.806586,0.806044


[I 2025-04-04 21:43:01,530] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0009349007798192055, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0525,0.818147,0.5995,0.605562,0.599208,0.594614
2,0.7471,0.607487,0.7203,0.726351,0.719656,0.721302


[I 2025-04-04 21:47:00,642] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0011826297699345555, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0604,0.86447,0.5788,0.604577,0.578426,0.574621
2,0.7712,0.637717,0.7103,0.716538,0.709359,0.71011
3,0.6354,0.54806,0.7591,0.758654,0.759388,0.756533
4,0.5432,0.47077,0.7998,0.803055,0.799422,0.797374
5,0.4731,0.42427,0.8215,0.827468,0.821321,0.820865
6,0.4157,0.397178,0.8329,0.836639,0.833282,0.833103
7,0.3635,0.366215,0.8531,0.855103,0.853382,0.853072
8,0.3162,0.354347,0.8602,0.86332,0.860052,0.860729
9,0.2732,0.359248,0.8553,0.862777,0.855013,0.857175
10,0.2404,0.358134,0.8578,0.85994,0.858042,0.857565


[I 2025-04-04 22:07:17,295] Trial 16 finished with value: 0.8575649528044199 and parameters: {'learning_rate': 0.0011826297699345555, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 16 with value: 0.8575649528044199.


Trial 17 with params: {'learning_rate': 0.0007861828594372495, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0266,0.792819,0.6271,0.648576,0.626282,0.6261
2,0.7176,0.556551,0.7586,0.761666,0.757954,0.75804
3,0.5789,0.494117,0.7894,0.79148,0.78985,0.786964
4,0.4981,0.421954,0.8234,0.824245,0.822878,0.821099


[I 2025-04-04 22:15:14,846] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0016751020144302176, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1335,0.929588,0.536,0.537248,0.535885,0.526532
2,0.8394,0.712418,0.6662,0.670133,0.665583,0.665413


[I 2025-04-04 22:19:14,096] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0033427703798544176, 'weight_decay': 0.006, 'warmup_steps': 6, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2539,1.087804,0.4357,0.437811,0.435415,0.418043
2,1.0205,0.906392,0.5547,0.575387,0.554257,0.55473
3,0.8763,0.756426,0.644,0.636893,0.643798,0.63759
4,0.8038,0.657672,0.6937,0.695347,0.693072,0.69122


[I 2025-04-04 22:27:13,212] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.00042547607186766345, 'weight_decay': 0.004, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.028,0.760251,0.6451,0.650228,0.644379,0.640789
2,0.6895,0.544475,0.7631,0.769153,0.762507,0.764176
3,0.5539,0.475518,0.7957,0.796564,0.796383,0.792518
4,0.4688,0.417878,0.8286,0.831415,0.828128,0.826808
5,0.4041,0.392745,0.8375,0.846001,0.837336,0.837264
6,0.3446,0.371451,0.8513,0.85344,0.851573,0.851272
7,0.2936,0.364758,0.8533,0.8624,0.853811,0.854099
8,0.2491,0.33632,0.866,0.86796,0.866131,0.866265


[I 2025-04-04 22:43:16,528] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00017048302356543796, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1417,0.844577,0.5974,0.610055,0.596926,0.591727
2,0.7487,0.609955,0.7265,0.728259,0.726139,0.726036
3,0.5972,0.530175,0.7659,0.768013,0.766583,0.763382
4,0.4981,0.471143,0.7973,0.799737,0.796909,0.795027
5,0.4194,0.433482,0.8184,0.824119,0.818224,0.818269
6,0.3537,0.442014,0.8175,0.82125,0.817682,0.817569
7,0.2982,0.435452,0.8183,0.825437,0.818792,0.818803
8,0.2526,0.419537,0.829,0.83205,0.828761,0.829526


[I 2025-04-04 22:59:09,010] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.000825101252433044, 'weight_decay': 0.007, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0442,0.780208,0.6295,0.632017,0.629091,0.623573
2,0.732,0.601959,0.7328,0.737144,0.731903,0.732698
3,0.5944,0.499059,0.7827,0.785051,0.78326,0.779921
4,0.5084,0.452634,0.8078,0.808919,0.807444,0.804528


[I 2025-04-04 23:07:09,195] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0005970732440373251, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0416,0.797499,0.6173,0.620342,0.616694,0.611814
2,0.7268,0.565771,0.7526,0.756323,0.752382,0.753153
3,0.5857,0.496766,0.7861,0.78539,0.786745,0.78224
4,0.5008,0.441715,0.8135,0.813963,0.813424,0.810634
5,0.4333,0.394525,0.8351,0.841394,0.83513,0.835108
6,0.375,0.379681,0.849,0.851615,0.849239,0.849513
7,0.3239,0.361953,0.854,0.8596,0.854448,0.854525
8,0.276,0.350798,0.863,0.864832,0.863098,0.863475
9,0.2361,0.35343,0.8612,0.870383,0.860902,0.863377
10,0.2087,0.348973,0.8648,0.867616,0.86505,0.864653


[I 2025-04-04 23:27:03,204] Trial 23 finished with value: 0.8646525069344981 and parameters: {'learning_rate': 0.0005970732440373251, 'weight_decay': 0.003, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.5}. Best is trial 23 with value: 0.8646525069344981.


Trial 24 with params: {'learning_rate': 0.0016905172330177797, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1868,0.95836,0.5221,0.517373,0.521294,0.50986
2,0.8848,0.725422,0.6572,0.669925,0.656642,0.658304


[I 2025-04-04 23:31:02,221] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.00036024945807297103, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0403,0.780013,0.6319,0.635965,0.631148,0.626917
2,0.6926,0.541932,0.7538,0.767237,0.75316,0.754825
3,0.5485,0.467512,0.7978,0.799311,0.798264,0.795243
4,0.4641,0.416933,0.8268,0.8271,0.826695,0.824139
5,0.3951,0.394706,0.8351,0.845703,0.835357,0.834398
6,0.3384,0.375046,0.8481,0.849916,0.848483,0.84814
7,0.2865,0.380053,0.8476,0.853243,0.848175,0.847384
8,0.2429,0.347082,0.8611,0.864147,0.861312,0.861272
9,0.2115,0.353044,0.8593,0.867229,0.859134,0.861545
10,0.1905,0.348463,0.8651,0.86726,0.865333,0.864823


[I 2025-04-04 23:51:05,005] Trial 25 finished with value: 0.8648234701306979 and parameters: {'learning_rate': 0.00036024945807297103, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 25 with value: 0.8648234701306979.


Trial 26 with params: {'learning_rate': 0.00030650985235136386, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0757,0.778768,0.6269,0.629881,0.626224,0.622126
2,0.7008,0.552312,0.7549,0.759307,0.754468,0.754028
3,0.5537,0.474092,0.7944,0.801222,0.794974,0.792195
4,0.4627,0.419662,0.827,0.828422,0.826896,0.825491
5,0.3916,0.394915,0.8359,0.843673,0.835791,0.836365
6,0.3313,0.372251,0.8522,0.853672,0.852507,0.852023
7,0.2788,0.386997,0.8445,0.852604,0.845253,0.845135
8,0.2365,0.356171,0.8574,0.859485,0.857601,0.857674


[I 2025-04-05 00:06:56,439] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0001239170664880913, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2072,0.914197,0.5458,0.553667,0.54526,0.53534
2,0.8137,0.669856,0.6912,0.694171,0.690682,0.690772
3,0.6599,0.592297,0.7303,0.729395,0.730524,0.727073
4,0.5604,0.524215,0.7697,0.770923,0.769666,0.767024
5,0.4785,0.490622,0.7856,0.794089,0.78572,0.783728
6,0.4083,0.494231,0.7891,0.797499,0.789522,0.788964
7,0.3492,0.475919,0.7943,0.802357,0.794673,0.795223
8,0.299,0.475996,0.7987,0.802622,0.798565,0.798757


[I 2025-04-05 00:22:48,606] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0008467446869010279, 'weight_decay': 0.001, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0455,0.81834,0.6111,0.620866,0.610193,0.605323
2,0.7409,0.616458,0.725,0.727916,0.724372,0.723981
3,0.6015,0.513857,0.7762,0.779927,0.776754,0.77272
4,0.5126,0.445544,0.809,0.810866,0.808686,0.806203
5,0.4449,0.416832,0.8249,0.830927,0.824468,0.824381
6,0.3886,0.391365,0.8399,0.842005,0.839934,0.839727
7,0.3391,0.362168,0.8513,0.856882,0.851742,0.852094
8,0.2918,0.340517,0.8656,0.867489,0.86567,0.865693
9,0.2514,0.346919,0.8609,0.867942,0.860553,0.86256
10,0.2213,0.348307,0.8632,0.866022,0.863438,0.862807


[I 2025-04-05 00:42:40,249] Trial 28 finished with value: 0.8628065640155954 and parameters: {'learning_rate': 0.0008467446869010279, 'weight_decay': 0.001, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 25 with value: 0.8648234701306979.


Trial 29 with params: {'learning_rate': 0.00043692695085257406, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0487,0.771498,0.6358,0.643978,0.635274,0.63239
2,0.7093,0.555102,0.7538,0.757929,0.752946,0.75317
3,0.567,0.479034,0.7961,0.799973,0.796715,0.793387
4,0.4799,0.423665,0.8282,0.829111,0.827916,0.82626
5,0.4108,0.390592,0.841,0.847929,0.840803,0.841128
6,0.3539,0.370454,0.8476,0.849138,0.847739,0.847741
7,0.2998,0.363135,0.8551,0.858815,0.855589,0.855406
8,0.2541,0.34239,0.8674,0.868474,0.867427,0.867654
9,0.2182,0.34678,0.8614,0.868816,0.86117,0.863407
10,0.1946,0.342653,0.8655,0.867183,0.865806,0.865132


[I 2025-04-05 01:02:28,873] Trial 29 finished with value: 0.8651321336019976 and parameters: {'learning_rate': 0.00043692695085257406, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.0, 'temperature': 6.5}. Best is trial 29 with value: 0.8651321336019976.


Trial 30 with params: {'learning_rate': 0.00015520595703559064, 'weight_decay': 0.0, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1651,0.848099,0.5916,0.594276,0.590682,0.583876
2,0.7712,0.631561,0.7147,0.719382,0.714243,0.714225


[I 2025-04-05 01:06:27,018] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0015535887254921874, 'weight_decay': 0.002, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1255,0.935174,0.5443,0.542226,0.543554,0.53411
2,0.8409,0.710986,0.6673,0.680447,0.666758,0.67073
3,0.6963,0.61321,0.7243,0.726578,0.724856,0.719913
4,0.5951,0.496732,0.7819,0.780944,0.781393,0.778707
5,0.5195,0.445266,0.8119,0.819483,0.811672,0.811067
6,0.4569,0.420915,0.8278,0.831444,0.828108,0.828377
7,0.4033,0.393905,0.8368,0.840931,0.837192,0.837212
8,0.3536,0.370557,0.8494,0.850307,0.849498,0.849647


[I 2025-04-05 01:22:21,914] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0006904091990644225, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0222,0.771805,0.6337,0.637182,0.633432,0.629662
2,0.7102,0.568737,0.7466,0.752949,0.746233,0.747752
3,0.5776,0.498146,0.7849,0.789277,0.785808,0.780694
4,0.4949,0.442317,0.8139,0.813992,0.813297,0.810686


[I 2025-04-05 01:30:18,856] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.00021845547060944987, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1047,0.799834,0.619,0.624566,0.618364,0.615359
2,0.7277,0.575146,0.7408,0.747997,0.740335,0.741731
3,0.5722,0.520703,0.772,0.77578,0.772987,0.767431
4,0.4775,0.450933,0.8121,0.81436,0.812175,0.809049
5,0.4032,0.416783,0.824,0.829889,0.82414,0.822669
6,0.3421,0.391781,0.8418,0.844871,0.841819,0.842642
7,0.2881,0.392701,0.8394,0.847633,0.83973,0.840475
8,0.245,0.379428,0.8466,0.84792,0.846485,0.846838


[I 2025-04-05 01:46:15,955] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0004737432831999078, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0323,0.768342,0.6411,0.644747,0.640568,0.635033
2,0.7015,0.568633,0.7458,0.757894,0.745213,0.747467
3,0.5666,0.477528,0.7922,0.792774,0.792748,0.789478
4,0.4816,0.428065,0.8223,0.822574,0.821911,0.819583


[I 2025-04-05 01:54:25,851] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0009544307355171225, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0823,0.838779,0.5924,0.604382,0.59185,0.590074
2,0.764,0.615497,0.7149,0.725402,0.714096,0.716405
3,0.618,0.530281,0.7682,0.769261,0.768679,0.765022
4,0.5235,0.442252,0.8161,0.816621,0.815939,0.813053
5,0.4543,0.410392,0.8283,0.833497,0.828244,0.827676
6,0.3976,0.388829,0.8407,0.84394,0.841011,0.841199
7,0.3476,0.359159,0.8559,0.860513,0.856319,0.856178
8,0.2995,0.340443,0.8645,0.86614,0.864457,0.864866
9,0.2586,0.352186,0.8576,0.866402,0.857297,0.859699
10,0.2283,0.344148,0.8669,0.870116,0.867189,0.867029


[I 2025-04-05 02:14:25,912] Trial 35 finished with value: 0.8670294913989979 and parameters: {'learning_rate': 0.0009544307355171225, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 35 with value: 0.8670294913989979.


Trial 36 with params: {'learning_rate': 0.0002642225148645517, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0923,0.795726,0.6263,0.639088,0.626389,0.620107
2,0.7041,0.557732,0.7518,0.760415,0.751013,0.753248
3,0.5554,0.501107,0.7833,0.787683,0.784057,0.780058
4,0.4641,0.42862,0.8233,0.822757,0.8232,0.820613


[I 2025-04-05 02:22:27,915] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0007970736108827397, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0732,0.864111,0.5795,0.613451,0.579809,0.576073
2,0.7587,0.598864,0.729,0.733878,0.728493,0.728949
3,0.6137,0.536426,0.7667,0.7684,0.767246,0.762873
4,0.5224,0.444905,0.812,0.813433,0.811982,0.810248
5,0.4552,0.412198,0.8285,0.83392,0.82833,0.828079
6,0.3967,0.385507,0.8426,0.844639,0.84279,0.843042
7,0.3448,0.369677,0.8513,0.85575,0.851961,0.851646
8,0.2986,0.347342,0.8637,0.865464,0.863701,0.863846
9,0.257,0.35262,0.8604,0.867431,0.860184,0.861787
10,0.226,0.350475,0.8617,0.864375,0.86201,0.861519


[I 2025-04-05 02:42:28,245] Trial 37 finished with value: 0.8615188933144295 and parameters: {'learning_rate': 0.0007970736108827397, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 6.5}. Best is trial 35 with value: 0.8670294913989979.


Trial 38 with params: {'learning_rate': 0.0005409152016664283, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0227,0.754429,0.6507,0.651247,0.650011,0.643206
2,0.6976,0.561784,0.7519,0.75685,0.751395,0.752287
3,0.5638,0.464559,0.8038,0.805984,0.804301,0.801587
4,0.4793,0.41699,0.8288,0.828792,0.828728,0.826556
5,0.4144,0.382628,0.8421,0.849078,0.84199,0.842453
6,0.3566,0.369145,0.8545,0.856444,0.854745,0.854615
7,0.3061,0.349483,0.8601,0.86409,0.860578,0.860363
8,0.2611,0.333461,0.8691,0.870962,0.869123,0.869231
9,0.2238,0.336968,0.8667,0.873125,0.866442,0.868272
10,0.1995,0.334906,0.8693,0.871485,0.869536,0.869142


[I 2025-04-05 03:02:29,601] Trial 38 finished with value: 0.869142381884726 and parameters: {'learning_rate': 0.0005409152016664283, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 38 with value: 0.869142381884726.


Trial 39 with params: {'learning_rate': 0.0009568161048601548, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0506,0.826687,0.5976,0.616288,0.597421,0.59065
2,0.7463,0.595649,0.733,0.742961,0.732214,0.735005


[I 2025-04-05 03:06:29,263] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.00028053353268463405, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0538,0.7384,0.6589,0.665745,0.658097,0.655074
2,0.6853,0.552234,0.7541,0.762187,0.753499,0.754759
3,0.5476,0.478593,0.7934,0.793212,0.794043,0.789881
4,0.4618,0.430759,0.8167,0.818137,0.816276,0.813669
5,0.3944,0.397842,0.8319,0.840245,0.83193,0.831809
6,0.3344,0.391621,0.8398,0.842396,0.840129,0.839532
7,0.2821,0.379989,0.8493,0.852678,0.849738,0.849398
8,0.2398,0.358686,0.8583,0.859485,0.858284,0.858429


[I 2025-04-05 03:22:25,136] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0013857870499512101, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0812,0.882192,0.5721,0.587977,0.571816,0.568148
2,0.7871,0.661887,0.6978,0.712026,0.697033,0.699494


[I 2025-04-05 03:26:24,927] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0022927403832826527, 'weight_decay': 0.0, 'warmup_steps': 24, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1997,1.008502,0.4956,0.490433,0.495223,0.485343
2,0.9088,0.762371,0.6405,0.644977,0.638992,0.638398
3,0.7505,0.648996,0.7057,0.703699,0.705751,0.703692
4,0.6529,0.582417,0.7421,0.741746,0.741616,0.738426


[I 2025-04-05 03:34:24,617] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0004389844983603545, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0166,0.767241,0.6384,0.65011,0.637879,0.634748
2,0.6854,0.551742,0.7584,0.765114,0.758002,0.758774
3,0.548,0.481865,0.7898,0.79224,0.79033,0.786508
4,0.4619,0.415722,0.8339,0.834585,0.83362,0.831627
5,0.3968,0.397277,0.8346,0.844959,0.834731,0.833668
6,0.3404,0.369031,0.8511,0.854868,0.851329,0.851816
7,0.2897,0.353619,0.8605,0.865688,0.860763,0.861242
8,0.2456,0.33806,0.8669,0.867267,0.867103,0.866435
9,0.2121,0.347286,0.8645,0.871517,0.864198,0.866152
10,0.1903,0.337586,0.8686,0.871234,0.868855,0.868582


[I 2025-04-05 03:54:20,176] Trial 43 finished with value: 0.868582221425813 and parameters: {'learning_rate': 0.0004389844983603545, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 38 with value: 0.869142381884726.


Trial 44 with params: {'learning_rate': 0.0007040006959990907, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0339,0.817605,0.6089,0.611888,0.60842,0.60117
2,0.7321,0.579132,0.7397,0.744019,0.738892,0.739847
3,0.59,0.510578,0.7798,0.784737,0.780442,0.776744
4,0.5069,0.424829,0.8212,0.822249,0.820701,0.81873
5,0.4385,0.413819,0.8236,0.828212,0.823748,0.822175
6,0.3807,0.385679,0.8422,0.845081,0.842412,0.842524
7,0.3288,0.363365,0.853,0.857898,0.853459,0.853786
8,0.2822,0.344463,0.8621,0.863722,0.862068,0.862135
9,0.2418,0.343804,0.8619,0.869378,0.861674,0.863749
10,0.2126,0.344049,0.8638,0.865632,0.864117,0.863392


[I 2025-04-05 04:14:03,220] Trial 44 finished with value: 0.8633917664745759 and parameters: {'learning_rate': 0.0007040006959990907, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 5.5}. Best is trial 38 with value: 0.869142381884726.


Trial 45 with params: {'learning_rate': 0.00038746590003762167, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0427,0.803589,0.6215,0.641994,0.620688,0.618651
2,0.6986,0.555337,0.7526,0.756254,0.752279,0.752489
3,0.5564,0.482438,0.7948,0.797184,0.795414,0.792402
4,0.4678,0.428219,0.8219,0.825651,0.82189,0.820392


[I 2025-04-05 04:21:55,593] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00033960696383127, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.035,0.76845,0.6395,0.642922,0.639366,0.634899
2,0.6944,0.548157,0.7541,0.764475,0.753701,0.755631
3,0.5564,0.473469,0.7982,0.799892,0.798868,0.795511
4,0.4689,0.416751,0.8262,0.827329,0.825958,0.824201
5,0.3999,0.395993,0.8334,0.83701,0.83348,0.832708
6,0.3402,0.378868,0.8436,0.844975,0.843866,0.843572
7,0.289,0.374768,0.8489,0.855925,0.849304,0.84975
8,0.2443,0.353785,0.8561,0.857265,0.856075,0.856198


[I 2025-04-05 04:37:46,875] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0025789104733638904, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2021,0.999649,0.4964,0.503931,0.495578,0.488323
2,0.9225,0.760377,0.6365,0.641191,0.635611,0.636431


[I 2025-04-05 04:41:44,594] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0001879624388969655, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.118,0.800039,0.6185,0.624621,0.617731,0.614
2,0.7369,0.601801,0.7249,0.73009,0.724579,0.724597


[I 2025-04-05 04:45:41,035] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.000109532304193339, 'weight_decay': 0.004, 'warmup_steps': 7, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2107,0.907507,0.5565,0.547439,0.55556,0.54592
2,0.8319,0.690176,0.6779,0.680085,0.677251,0.677564
3,0.673,0.604315,0.7232,0.723701,0.72383,0.720377
4,0.5738,0.537816,0.7629,0.763243,0.762574,0.760766


[I 2025-04-05 04:53:37,163] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0021133792752108674, 'weight_decay': 0.005, 'warmup_steps': 15, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1655,0.935227,0.5351,0.542464,0.534143,0.53001
2,0.887,0.743863,0.6542,0.662298,0.653573,0.656021
3,0.7319,0.623965,0.7143,0.710355,0.71456,0.71025
4,0.6271,0.540236,0.7602,0.761962,0.760043,0.757428


[I 2025-04-05 05:01:39,915] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0006656909591964138, 'weight_decay': 0.002, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0366,0.76668,0.6411,0.651531,0.640585,0.637082
2,0.7148,0.584874,0.7434,0.749207,0.742514,0.744031
3,0.581,0.486246,0.7881,0.790543,0.788744,0.785163
4,0.4963,0.437525,0.8191,0.821805,0.81872,0.816524


[I 2025-04-05 05:09:33,177] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.000611933117236963, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0391,0.748844,0.647,0.644394,0.646165,0.639335
2,0.7062,0.558166,0.7543,0.757539,0.753503,0.754063
3,0.5656,0.488806,0.7897,0.796295,0.790541,0.78472
4,0.4806,0.425041,0.8239,0.823229,0.823665,0.821057
5,0.4169,0.383888,0.8414,0.848448,0.841485,0.841314
6,0.361,0.368412,0.8508,0.854161,0.850761,0.851478
7,0.3115,0.357977,0.8551,0.863676,0.855537,0.85606
8,0.2658,0.335036,0.8679,0.869504,0.867981,0.868143
9,0.2287,0.338387,0.8667,0.875738,0.866422,0.869061
10,0.2026,0.334603,0.8708,0.873507,0.87113,0.871021


[I 2025-04-05 05:29:19,122] Trial 52 finished with value: 0.8710213219157834 and parameters: {'learning_rate': 0.000611933117236963, 'weight_decay': 0.001, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 52 with value: 0.8710213219157834.


Trial 53 with params: {'learning_rate': 0.0002858107784464028, 'weight_decay': 0.001, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0734,0.78988,0.631,0.643372,0.629701,0.625842
2,0.7028,0.559888,0.7534,0.75751,0.752972,0.75436
3,0.5541,0.49051,0.7878,0.790723,0.788485,0.784464
4,0.4652,0.431812,0.8208,0.821732,0.820778,0.818302


[I 2025-04-05 05:37:13,784] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0027026130785766608, 'weight_decay': 0.01, 'warmup_steps': 32, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2179,1.05593,0.4618,0.472566,0.461611,0.449403
2,0.9259,0.780675,0.6289,0.630518,0.628607,0.628032
3,0.7676,0.666307,0.6906,0.687325,0.690675,0.687326
4,0.6675,0.58511,0.7331,0.737111,0.732583,0.729527


[I 2025-04-05 05:45:08,314] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0009327671909441211, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0734,0.841014,0.5958,0.596441,0.595319,0.587807
2,0.7515,0.618041,0.7156,0.722455,0.715086,0.715334


[I 2025-04-05 05:49:05,918] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0006629268840947008, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0648,0.80802,0.6146,0.635307,0.613827,0.611503
2,0.7228,0.565109,0.7525,0.75788,0.752171,0.753847
3,0.5828,0.503464,0.779,0.782947,0.77957,0.774227
4,0.4958,0.432184,0.8195,0.820597,0.818977,0.817041
5,0.4333,0.39849,0.8342,0.839981,0.833934,0.833834
6,0.3761,0.384493,0.844,0.84783,0.844183,0.844842
7,0.3245,0.353919,0.8592,0.862874,0.859498,0.859858
8,0.2787,0.339468,0.8686,0.870003,0.868824,0.868613
9,0.2394,0.348083,0.8627,0.871948,0.862274,0.865005
10,0.2118,0.339086,0.8681,0.872234,0.868384,0.868615


[I 2025-04-05 06:08:53,797] Trial 56 finished with value: 0.8686148223817811 and parameters: {'learning_rate': 0.0006629268840947008, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 52 with value: 0.8710213219157834.


Trial 57 with params: {'learning_rate': 0.0006753037251916961, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0572,0.785873,0.6289,0.627241,0.628548,0.623218
2,0.7203,0.566824,0.7457,0.752024,0.74541,0.746506
3,0.5805,0.494231,0.789,0.788065,0.789496,0.785654
4,0.4929,0.427354,0.8237,0.824608,0.823429,0.821567
5,0.4267,0.388217,0.839,0.846185,0.83909,0.838375
6,0.3717,0.371343,0.8467,0.849204,0.846813,0.847234
7,0.3196,0.352774,0.86,0.865025,0.860423,0.860616
8,0.2735,0.33405,0.8705,0.872232,0.870461,0.870962
9,0.2357,0.346094,0.861,0.871265,0.860556,0.863502
10,0.2078,0.333015,0.872,0.873731,0.872367,0.871896


[I 2025-04-05 06:28:38,218] Trial 57 finished with value: 0.8718955084590563 and parameters: {'learning_rate': 0.0006753037251916961, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 57 with value: 0.8718955084590563.


Trial 58 with params: {'learning_rate': 0.0010924905079098262, 'weight_decay': 0.0, 'warmup_steps': 29, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0864,0.854038,0.5832,0.59022,0.582298,0.579764
2,0.7692,0.643038,0.702,0.710415,0.701434,0.703768


[I 2025-04-05 06:32:35,017] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0006387418842694594, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0532,0.783772,0.6259,0.628386,0.625386,0.620958
2,0.7204,0.565073,0.7491,0.756304,0.748265,0.74993
3,0.5809,0.501421,0.7815,0.783856,0.782106,0.778265
4,0.4959,0.431189,0.8166,0.818834,0.815958,0.814218
5,0.4288,0.393606,0.839,0.840645,0.838969,0.838133
6,0.3719,0.382134,0.8463,0.85015,0.846474,0.846694
7,0.3226,0.359713,0.8575,0.863992,0.857792,0.858202
8,0.2756,0.340327,0.8679,0.869864,0.867995,0.868165
9,0.2361,0.352543,0.8626,0.871333,0.862207,0.864813
10,0.2085,0.338868,0.8703,0.872256,0.870596,0.870068


[I 2025-04-05 06:52:24,577] Trial 59 finished with value: 0.8700675962030944 and parameters: {'learning_rate': 0.0006387418842694594, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 57 with value: 0.8718955084590563.


Trial 60 with params: {'learning_rate': 0.0001763702953359746, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1558,0.82816,0.6031,0.607709,0.60235,0.598202
2,0.7543,0.612732,0.7206,0.723237,0.72032,0.719592


[I 2025-04-05 06:56:21,863] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0005943410799444305, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0522,0.769142,0.6365,0.637978,0.635702,0.627287
2,0.7127,0.563734,0.7474,0.749384,0.746992,0.747095
3,0.5767,0.48512,0.7934,0.793566,0.793824,0.789938
4,0.4906,0.425235,0.8256,0.825529,0.8254,0.822874
5,0.4233,0.386405,0.8415,0.844997,0.841385,0.840902
6,0.3664,0.384015,0.8459,0.850499,0.845985,0.846416
7,0.3154,0.362774,0.8528,0.860063,0.853289,0.853509
8,0.2695,0.332655,0.8701,0.871032,0.87005,0.870026
9,0.2322,0.337705,0.8665,0.874438,0.86622,0.868217
10,0.2054,0.327715,0.8754,0.876266,0.875452,0.875092


[I 2025-04-05 07:16:28,411] Trial 61 finished with value: 0.8750916744350314 and parameters: {'learning_rate': 0.0005943410799444305, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 62 with params: {'learning_rate': 0.0005415136364547395, 'weight_decay': 0.0, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0608,0.816844,0.609,0.623034,0.607893,0.606124
2,0.7217,0.56046,0.7564,0.760587,0.755713,0.757015
3,0.5737,0.511589,0.7822,0.78843,0.783235,0.77871
4,0.4871,0.43974,0.8163,0.817128,0.816087,0.813301


[I 2025-04-05 07:24:23,185] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.00045445835172350897, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0515,0.799779,0.6228,0.625791,0.622219,0.617609
2,0.6924,0.541623,0.7625,0.77046,0.76177,0.763293
3,0.5534,0.467485,0.8008,0.798677,0.801006,0.797438
4,0.4695,0.427167,0.8236,0.825809,0.823482,0.821445
5,0.4039,0.389768,0.8405,0.848758,0.840288,0.840647
6,0.3467,0.367555,0.8507,0.852309,0.850866,0.850531
7,0.2957,0.359131,0.8579,0.862041,0.858398,0.858197
8,0.2515,0.3388,0.8655,0.867505,0.865505,0.866043
9,0.217,0.344778,0.864,0.871868,0.863725,0.866182
10,0.194,0.33648,0.8678,0.86949,0.867983,0.867588


[I 2025-04-05 07:44:06,979] Trial 63 finished with value: 0.8675875923408454 and parameters: {'learning_rate': 0.00045445835172350897, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 64 with params: {'learning_rate': 0.0007507612726385802, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0408,0.773102,0.636,0.631565,0.635676,0.629392
2,0.7184,0.58051,0.743,0.757298,0.74188,0.744568
3,0.5835,0.491788,0.7879,0.789489,0.788391,0.785255
4,0.4969,0.437497,0.8183,0.819769,0.818024,0.81637


[I 2025-04-05 07:52:04,265] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0005491963586504306, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0482,0.822182,0.6147,0.626825,0.614302,0.609926
2,0.7148,0.570609,0.7441,0.75,0.743619,0.744174
3,0.5773,0.491294,0.7923,0.793757,0.793006,0.789723
4,0.4874,0.434424,0.8201,0.819585,0.819869,0.817169
5,0.4223,0.391291,0.8399,0.842768,0.839686,0.839553
6,0.3637,0.38515,0.8455,0.850071,0.845935,0.846392
7,0.3125,0.37529,0.8478,0.853933,0.848308,0.848443
8,0.266,0.346713,0.8647,0.865506,0.864782,0.86456
9,0.2293,0.348523,0.8629,0.869997,0.862622,0.864668
10,0.2034,0.345219,0.8642,0.866844,0.864533,0.864349


[I 2025-04-05 08:11:57,329] Trial 65 finished with value: 0.8643486581545314 and parameters: {'learning_rate': 0.0005491963586504306, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 66 with params: {'learning_rate': 0.00020604128044696195, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1324,0.800819,0.6196,0.617701,0.619121,0.612382
2,0.7384,0.605815,0.7233,0.723556,0.722895,0.72261


[I 2025-04-05 08:15:56,560] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0007949065617400258, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0692,0.840082,0.5907,0.59994,0.590698,0.585853
2,0.744,0.629638,0.7123,0.720192,0.711584,0.713117
3,0.6049,0.536238,0.7658,0.773648,0.766098,0.762343
4,0.5151,0.449369,0.8074,0.808994,0.807062,0.80474
5,0.449,0.394524,0.8388,0.840656,0.83891,0.837957
6,0.3925,0.380145,0.8436,0.848159,0.843848,0.844364
7,0.3416,0.366843,0.8521,0.856489,0.852733,0.852299
8,0.2946,0.344739,0.8629,0.86595,0.862922,0.86332


[I 2025-04-05 08:31:51,664] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0005612622308870744, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0594,0.831223,0.6108,0.624512,0.609934,0.606599
2,0.7161,0.579096,0.7449,0.748972,0.74424,0.744946
3,0.5777,0.501102,0.78,0.784147,0.780535,0.776704
4,0.4938,0.439218,0.8185,0.820076,0.818,0.815708
5,0.4259,0.402745,0.8347,0.838125,0.834529,0.833643
6,0.3682,0.376253,0.8448,0.847825,0.844845,0.845438
7,0.3159,0.371035,0.8495,0.854735,0.850092,0.849665
8,0.2705,0.347244,0.8646,0.866305,0.864777,0.864877
9,0.2328,0.34954,0.8608,0.868491,0.860572,0.86272
10,0.2062,0.34452,0.8637,0.866035,0.86403,0.863418


[I 2025-04-05 08:51:40,209] Trial 68 finished with value: 0.8634176684197248 and parameters: {'learning_rate': 0.0005612622308870744, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 6.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 69 with params: {'learning_rate': 7.808255793137976e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 21, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2932,0.976687,0.5202,0.517569,0.519124,0.511604
2,0.9236,0.775551,0.6279,0.632895,0.627018,0.626719


[I 2025-04-05 08:55:38,957] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00041655900616647824, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0303,0.809959,0.6288,0.636852,0.628098,0.622946
2,0.6917,0.531604,0.7733,0.777707,0.772722,0.773477
3,0.5591,0.478145,0.7982,0.798158,0.798457,0.793898
4,0.4727,0.429053,0.8261,0.82762,0.825818,0.823955
5,0.4076,0.387743,0.8394,0.843917,0.83935,0.838854
6,0.3493,0.371186,0.853,0.855102,0.853236,0.853383
7,0.2977,0.363945,0.855,0.858038,0.855541,0.854933
8,0.2526,0.348751,0.8629,0.864554,0.86307,0.863056


[I 2025-04-05 09:11:29,477] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0006397268309837025, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0609,0.787044,0.6262,0.627299,0.625402,0.619658
2,0.716,0.562077,0.7511,0.761283,0.750152,0.752762
3,0.5802,0.483168,0.7926,0.792821,0.792994,0.789833
4,0.4954,0.432329,0.82,0.822008,0.819506,0.816947
5,0.4303,0.387285,0.8392,0.841993,0.839198,0.838894
6,0.3747,0.382675,0.841,0.844572,0.841499,0.841425
7,0.3219,0.371851,0.85,0.855516,0.850633,0.850221
8,0.2739,0.344879,0.8629,0.86549,0.862925,0.863479
9,0.2354,0.348742,0.8633,0.869609,0.863094,0.864902
10,0.208,0.34881,0.8628,0.864873,0.863128,0.862434


[I 2025-04-05 09:31:32,688] Trial 71 finished with value: 0.8624336097910639 and parameters: {'learning_rate': 0.0006397268309837025, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 61 with value: 0.8750916744350314.


Trial 72 with params: {'learning_rate': 0.00030766786159889886, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0877,0.796021,0.6197,0.622146,0.618906,0.614479
2,0.6969,0.55336,0.7537,0.763053,0.753009,0.754464
3,0.5499,0.480925,0.7906,0.796381,0.79127,0.787194
4,0.4617,0.422162,0.8226,0.825644,0.822105,0.820075


[I 2025-04-05 09:39:30,419] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0005651480074944755, 'weight_decay': 0.0, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0537,0.790732,0.6274,0.632052,0.626146,0.619915
2,0.7151,0.574284,0.741,0.743575,0.740237,0.739109
3,0.5831,0.49447,0.7844,0.784444,0.784675,0.781836
4,0.4989,0.445587,0.812,0.814672,0.811358,0.80877
5,0.4316,0.395261,0.8355,0.837887,0.835245,0.834928
6,0.3741,0.384094,0.8443,0.84677,0.844646,0.844581
7,0.3206,0.358845,0.8566,0.860108,0.856946,0.856865
8,0.2734,0.352376,0.8572,0.857688,0.857296,0.85695


[I 2025-04-05 09:55:33,785] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0006400705826765567, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0566,0.808071,0.6108,0.610973,0.609976,0.602953
2,0.7349,0.575065,0.7423,0.748405,0.741585,0.742304


[I 2025-04-05 09:59:31,881] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00046181742329344153, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0464,0.786114,0.6272,0.642031,0.627252,0.624639
2,0.6859,0.528509,0.7712,0.775998,0.770447,0.771202
3,0.5511,0.482887,0.7926,0.795984,0.793272,0.788746
4,0.4698,0.407356,0.8379,0.838171,0.83764,0.83582
5,0.4036,0.38524,0.8411,0.84486,0.841183,0.840115
6,0.3463,0.357977,0.8561,0.85802,0.85626,0.856517
7,0.2962,0.349802,0.8624,0.865698,0.862933,0.862396
8,0.2517,0.335366,0.8684,0.868684,0.868508,0.868152
9,0.2164,0.338489,0.8663,0.872718,0.866143,0.868035
10,0.1933,0.336811,0.8665,0.868839,0.866841,0.866398


[I 2025-04-05 10:19:22,903] Trial 75 finished with value: 0.8663978366072735 and parameters: {'learning_rate': 0.00046181742329344153, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 5.5}. Best is trial 61 with value: 0.8750916744350314.


Trial 76 with params: {'learning_rate': 0.00029402502980269656, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0602,0.757488,0.644,0.649551,0.643457,0.639668
2,0.6932,0.55449,0.7528,0.757736,0.752172,0.75218
3,0.5498,0.469945,0.8014,0.805253,0.802118,0.797985
4,0.4611,0.42753,0.8243,0.828027,0.823981,0.822653
5,0.3914,0.394018,0.8368,0.842357,0.836989,0.835974
6,0.3334,0.381733,0.8437,0.847445,0.843766,0.844259
7,0.2808,0.387587,0.8426,0.8516,0.843212,0.843912
8,0.2384,0.356191,0.8585,0.8604,0.858605,0.858928


[I 2025-04-05 10:35:11,241] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0010195894030461663, 'weight_decay': 0.003, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0866,0.797971,0.616,0.616178,0.615053,0.612493
2,0.7626,0.621213,0.7167,0.720968,0.715812,0.716634
3,0.6248,0.530251,0.7675,0.768836,0.76791,0.763881
4,0.5327,0.466619,0.802,0.806319,0.801482,0.79941


[I 2025-04-05 10:43:06,636] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0011131092762533383, 'weight_decay': 0.0, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0762,0.842978,0.5916,0.60502,0.59235,0.582926
2,0.7644,0.619358,0.7214,0.726667,0.720514,0.721205
3,0.6255,0.522296,0.7711,0.774281,0.771706,0.768244
4,0.5346,0.471532,0.7983,0.799636,0.797805,0.794618


[I 2025-04-05 10:51:03,699] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0021187070250976194, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1507,0.916894,0.542,0.559134,0.541296,0.532942
2,0.8412,0.697016,0.6818,0.690698,0.68087,0.682525


[I 2025-04-05 10:55:03,747] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0006807262799439026, 'weight_decay': 0.0, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0297,0.791104,0.6191,0.628866,0.618934,0.612328
2,0.7103,0.558783,0.7468,0.754155,0.74656,0.747535
3,0.5771,0.499157,0.7808,0.785746,0.781207,0.778205
4,0.4933,0.437898,0.8149,0.814723,0.814638,0.811761
5,0.4271,0.395859,0.8347,0.840094,0.83471,0.833512
6,0.3734,0.375036,0.8493,0.851523,0.849416,0.849551
7,0.323,0.361745,0.86,0.865315,0.860466,0.860691
8,0.2757,0.334874,0.8654,0.867753,0.865522,0.86569
9,0.2373,0.345839,0.8621,0.87037,0.861936,0.863927
10,0.2105,0.336596,0.8675,0.869945,0.8679,0.867429


[I 2025-04-05 11:14:55,094] Trial 80 finished with value: 0.8674290990969714 and parameters: {'learning_rate': 0.0006807262799439026, 'weight_decay': 0.0, 'warmup_steps': 6, 'lambda_param': 0.0, 'temperature': 5.5}. Best is trial 61 with value: 0.8750916744350314.


Trial 81 with params: {'learning_rate': 0.0005759821844322483, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0357,0.830392,0.6038,0.625588,0.602831,0.601438
2,0.7101,0.557325,0.7528,0.754942,0.752377,0.752048
3,0.5732,0.482147,0.794,0.797023,0.794612,0.791132
4,0.4864,0.42112,0.8259,0.828753,0.825641,0.824675
5,0.4198,0.403873,0.8321,0.841316,0.832039,0.831581
6,0.3629,0.371736,0.8515,0.854261,0.851674,0.851832
7,0.3121,0.358485,0.8568,0.86156,0.857378,0.856881
8,0.2659,0.332605,0.8678,0.869945,0.867845,0.868105
9,0.2289,0.337548,0.8672,0.874546,0.866878,0.869061
10,0.2026,0.329616,0.8722,0.87427,0.87236,0.872165


[I 2025-04-05 11:34:41,869] Trial 81 finished with value: 0.8721651356716109 and parameters: {'learning_rate': 0.0005759821844322483, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 82 with params: {'learning_rate': 0.00047789786875556344, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0557,0.766588,0.6389,0.644656,0.63864,0.634721
2,0.7106,0.552717,0.755,0.756011,0.75445,0.754111
3,0.5661,0.487042,0.7916,0.793785,0.792237,0.788267
4,0.4796,0.422097,0.8272,0.828937,0.826801,0.825269
5,0.4115,0.389617,0.8385,0.846326,0.838326,0.838665
6,0.3528,0.364945,0.8561,0.857727,0.856126,0.856302
7,0.3005,0.360286,0.8548,0.859307,0.85527,0.8552
8,0.256,0.335297,0.8681,0.869663,0.868173,0.868311
9,0.2202,0.351623,0.8627,0.873695,0.862165,0.865311
10,0.1966,0.330737,0.8715,0.873038,0.871737,0.871342


[I 2025-04-05 11:54:32,955] Trial 82 finished with value: 0.8713415185681624 and parameters: {'learning_rate': 0.00047789786875556344, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 83 with params: {'learning_rate': 0.000742602614530702, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0547,0.790875,0.6234,0.6237,0.62265,0.617612
2,0.7296,0.579283,0.7439,0.749292,0.743288,0.744312
3,0.5915,0.506099,0.7777,0.780088,0.778273,0.77419
4,0.5024,0.44202,0.8121,0.812559,0.811786,0.809636


[I 2025-04-05 12:02:28,191] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.002136363323695305, 'weight_decay': 0.0, 'warmup_steps': 6, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1577,1.009772,0.4864,0.497945,0.486334,0.469909
2,0.876,0.733798,0.6587,0.659621,0.658266,0.655826
3,0.7306,0.640129,0.7077,0.707478,0.707987,0.704145
4,0.6295,0.539839,0.7596,0.757396,0.759184,0.756011


[I 2025-04-05 12:10:24,629] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00028731625417467325, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0627,0.760116,0.6417,0.64642,0.640737,0.637657
2,0.6971,0.559638,0.7531,0.75433,0.752536,0.752177
3,0.554,0.488975,0.7897,0.793427,0.790357,0.786724
4,0.4643,0.424854,0.8248,0.824802,0.824543,0.822839
5,0.3941,0.388606,0.8434,0.846298,0.843469,0.842649
6,0.334,0.388344,0.84,0.843501,0.840127,0.840878
7,0.2816,0.376153,0.8514,0.855866,0.851817,0.851425
8,0.2392,0.36055,0.8552,0.856421,0.855456,0.855302


[I 2025-04-05 12:26:14,379] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0006463540633440032, 'weight_decay': 0.003, 'warmup_steps': 6, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0253,0.782215,0.6359,0.63653,0.635456,0.631155
2,0.7039,0.556893,0.7556,0.764798,0.755038,0.756774
3,0.5722,0.493148,0.7878,0.790386,0.788463,0.783694
4,0.4886,0.436968,0.8157,0.817668,0.81539,0.813625
5,0.4232,0.397625,0.8393,0.844031,0.839107,0.839446
6,0.3683,0.381329,0.8449,0.846919,0.844947,0.845166
7,0.3176,0.356092,0.8594,0.862971,0.859757,0.85974
8,0.2719,0.345159,0.8621,0.863497,0.862055,0.86236


[I 2025-04-05 12:42:05,829] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0009138722048442266, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0415,0.808122,0.6149,0.629317,0.613786,0.612041
2,0.7339,0.585731,0.7349,0.752942,0.734309,0.737768


[I 2025-04-05 12:46:04,941] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.00048434724920382944, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0246,0.771782,0.6379,0.636541,0.637599,0.631222
2,0.6972,0.551118,0.7543,0.762543,0.753487,0.755159
3,0.5601,0.472508,0.8011,0.802878,0.801552,0.798236
4,0.475,0.424578,0.8205,0.819731,0.820613,0.817905


[I 2025-04-05 12:54:02,118] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0011473631598297122, 'weight_decay': 0.0, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0995,0.863936,0.5684,0.57537,0.568561,0.562421
2,0.7982,0.670323,0.6908,0.696661,0.690208,0.689884
3,0.6562,0.582282,0.7453,0.748577,0.746057,0.741262
4,0.5564,0.473782,0.7974,0.800618,0.796774,0.794672


[I 2025-04-05 13:01:55,404] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.000928277511187833, 'weight_decay': 0.01, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0767,0.888088,0.5641,0.592443,0.563501,0.561465
2,0.7632,0.627345,0.7166,0.72353,0.715693,0.716394
3,0.6265,0.525324,0.7747,0.777866,0.775106,0.77153
4,0.5356,0.483976,0.7946,0.796785,0.794327,0.790958


[I 2025-04-05 13:09:49,099] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.00021177702946688744, 'weight_decay': 0.01, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.096,0.797041,0.6242,0.627395,0.623643,0.620605
2,0.7231,0.583435,0.7372,0.741803,0.736781,0.736394


[I 2025-04-05 13:13:45,367] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0015837356481811218, 'weight_decay': 0.006, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1254,0.90072,0.5583,0.562695,0.557923,0.547898
2,0.8338,0.687269,0.6845,0.688119,0.684288,0.684596
3,0.6823,0.572362,0.7427,0.746983,0.743025,0.741374
4,0.5831,0.495076,0.7841,0.787091,0.783718,0.781656


[I 2025-04-05 13:21:35,279] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00021674079981057922, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1143,0.802411,0.6157,0.624316,0.614787,0.612169
2,0.7301,0.578775,0.737,0.737304,0.736319,0.735156


[I 2025-04-05 13:25:31,729] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.00047900432820591795, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0593,0.758367,0.6445,0.645118,0.643921,0.639015
2,0.7066,0.549798,0.7574,0.762758,0.756903,0.758092
3,0.5619,0.49362,0.7864,0.79028,0.787081,0.782597
4,0.479,0.422075,0.825,0.825197,0.824609,0.82269
5,0.4114,0.39004,0.841,0.844233,0.840991,0.840666
6,0.3532,0.369748,0.8518,0.854545,0.851957,0.85236
7,0.3012,0.360258,0.8553,0.859116,0.855684,0.855433
8,0.2545,0.338636,0.8659,0.867224,0.865927,0.865926
9,0.2185,0.353206,0.8593,0.869988,0.859023,0.86185
10,0.1955,0.340128,0.8673,0.868158,0.867561,0.866781


[I 2025-04-05 13:45:17,965] Trial 94 finished with value: 0.8667806491476027 and parameters: {'learning_rate': 0.00047900432820591795, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 61 with value: 0.8750916744350314.


Trial 95 with params: {'learning_rate': 0.0004655246943642847, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0334,0.763898,0.641,0.650896,0.640732,0.638018
2,0.6915,0.547672,0.7547,0.762713,0.753828,0.755822
3,0.5571,0.479179,0.7994,0.801332,0.799798,0.797328
4,0.4719,0.422412,0.8236,0.824397,0.823669,0.820919
5,0.4061,0.386657,0.8408,0.845999,0.84115,0.839969
6,0.3511,0.376153,0.8477,0.849412,0.847952,0.847646
7,0.2986,0.346019,0.8632,0.866302,0.863522,0.863573
8,0.2535,0.335761,0.8675,0.869702,0.867578,0.867912
9,0.2179,0.342053,0.8648,0.873886,0.864354,0.86701
10,0.1941,0.337507,0.867,0.868672,0.867277,0.866849


[I 2025-04-05 14:05:11,664] Trial 95 finished with value: 0.8668494465252478 and parameters: {'learning_rate': 0.0004655246943642847, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3628,1.080466,0.4613,0.463651,0.459779,0.45561
2,1.051,0.897224,0.5612,0.564555,0.560031,0.556366
3,0.8803,0.788867,0.6235,0.622343,0.62333,0.620345
4,0.773,0.70498,0.6702,0.66839,0.669787,0.666019


[I 2025-04-05 14:13:08,720] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0007001553726578342, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0489,0.791669,0.6201,0.631155,0.619657,0.615198
2,0.729,0.581434,0.7387,0.74504,0.738017,0.738063


[I 2025-04-05 14:17:08,423] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.00040107833533529634, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.044,0.754119,0.6419,0.647742,0.641347,0.6368
2,0.6803,0.526613,0.7697,0.773445,0.769299,0.769938
3,0.5449,0.470815,0.7992,0.800996,0.799913,0.795665
4,0.4618,0.419868,0.8286,0.828258,0.828327,0.825788
5,0.3984,0.388032,0.8379,0.843321,0.837844,0.837659
6,0.3432,0.366712,0.8499,0.853855,0.850062,0.850709
7,0.2912,0.362752,0.8565,0.860725,0.856956,0.856661
8,0.2472,0.340536,0.8667,0.867307,0.866748,0.86647
9,0.2142,0.349383,0.8668,0.873115,0.866471,0.868374
10,0.1923,0.342383,0.8686,0.8704,0.868744,0.868393


[I 2025-04-05 14:37:04,669] Trial 98 finished with value: 0.8683930641339221 and parameters: {'learning_rate': 0.00040107833533529634, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 61 with value: 0.8750916744350314.


Trial 99 with params: {'learning_rate': 0.000481203099471237, 'weight_decay': 0.0, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0218,0.758106,0.6436,0.644174,0.643263,0.637897
2,0.6913,0.552509,0.7566,0.764212,0.756114,0.756927
3,0.5566,0.488381,0.7878,0.792477,0.788817,0.782171
4,0.472,0.4257,0.8202,0.821801,0.819865,0.817283
5,0.408,0.399464,0.8314,0.83935,0.831636,0.830578
6,0.3501,0.367178,0.8529,0.854003,0.853184,0.852648
7,0.3009,0.356938,0.8583,0.863722,0.858834,0.858973
8,0.2551,0.331875,0.8688,0.870112,0.868901,0.868676
9,0.2198,0.339769,0.8667,0.874451,0.866443,0.868454
10,0.1966,0.337383,0.868,0.869837,0.868193,0.867517


[I 2025-04-05 14:56:55,512] Trial 99 finished with value: 0.8675166185360952 and parameters: {'learning_rate': 0.000481203099471237, 'weight_decay': 0.0, 'warmup_steps': 11, 'lambda_param': 0.1, 'temperature': 4.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 100 with params: {'learning_rate': 0.00045169827347201874, 'weight_decay': 0.001, 'warmup_steps': 20, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0454,0.742828,0.6504,0.656435,0.649786,0.64766
2,0.6961,0.541689,0.7611,0.76564,0.760512,0.76138
3,0.5595,0.481679,0.7937,0.798293,0.794286,0.790128
4,0.4736,0.431105,0.8205,0.820095,0.820115,0.817842
5,0.4064,0.390879,0.8367,0.844725,0.836677,0.836541
6,0.3511,0.366063,0.8557,0.857391,0.855736,0.855926
7,0.3,0.370711,0.8507,0.856054,0.851187,0.851245
8,0.2543,0.344237,0.8653,0.866839,0.865342,0.865518
9,0.2192,0.354619,0.8593,0.868705,0.858918,0.861535
10,0.1955,0.347216,0.8635,0.865436,0.863661,0.863212


[I 2025-04-05 15:16:46,833] Trial 100 finished with value: 0.8632121862741018 and parameters: {'learning_rate': 0.00045169827347201874, 'weight_decay': 0.001, 'warmup_steps': 20, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 101 with params: {'learning_rate': 0.00013686872154382955, 'weight_decay': 0.002, 'warmup_steps': 20, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1952,0.88609,0.5733,0.578446,0.572718,0.566636
2,0.791,0.653315,0.7007,0.70496,0.699913,0.700083


[I 2025-04-05 15:20:44,500] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.000251026710679218, 'weight_decay': 0.002, 'warmup_steps': 24, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.104,0.798041,0.6255,0.636193,0.624949,0.623863
2,0.712,0.562551,0.7509,0.756355,0.750309,0.750231
3,0.5602,0.486641,0.7898,0.790832,0.790511,0.786194
4,0.4659,0.440461,0.8139,0.814506,0.813658,0.810768
5,0.3928,0.394261,0.8379,0.841362,0.837906,0.837215
6,0.3331,0.387875,0.8406,0.843467,0.840712,0.840785
7,0.279,0.379001,0.8472,0.852405,0.84749,0.848078
8,0.2374,0.366531,0.8517,0.852313,0.851823,0.851492


[I 2025-04-05 15:36:37,945] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.00027009583847554473, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0815,0.776132,0.6401,0.651462,0.639549,0.6368
2,0.7004,0.558677,0.7514,0.752702,0.750892,0.749704
3,0.5544,0.485587,0.7872,0.788368,0.787817,0.783841
4,0.4641,0.439964,0.8212,0.82248,0.820918,0.817786


[I 2025-04-05 15:44:35,105] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.00024272123473916088, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0948,0.797525,0.6242,0.625784,0.623416,0.616526
2,0.7152,0.57699,0.7411,0.749324,0.740497,0.742058
3,0.5682,0.507826,0.7822,0.785071,0.782891,0.778128
4,0.4703,0.444565,0.8134,0.814096,0.81302,0.810298
5,0.3986,0.397219,0.8346,0.839612,0.834462,0.834131
6,0.3341,0.407626,0.8342,0.840475,0.834322,0.835005
7,0.2816,0.398682,0.8396,0.84831,0.840133,0.84052
8,0.2386,0.371387,0.8495,0.850614,0.849508,0.849419


[I 2025-04-05 16:00:23,640] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0011996241286754293, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0522,0.853109,0.5895,0.597717,0.588932,0.585008
2,0.7634,0.649002,0.699,0.709184,0.698102,0.69969


[I 2025-04-05 16:04:24,495] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0002456957319710178, 'weight_decay': 0.0, 'warmup_steps': 5, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0763,0.803286,0.6178,0.628955,0.617293,0.609628
2,0.7016,0.556733,0.7557,0.760097,0.755243,0.755718
3,0.5515,0.487116,0.7868,0.791541,0.787385,0.785168
4,0.461,0.428844,0.8209,0.822151,0.82076,0.81844
5,0.3892,0.404349,0.8336,0.838926,0.833723,0.832761
6,0.3274,0.394401,0.8396,0.842478,0.839788,0.839947
7,0.2751,0.39055,0.8408,0.846796,0.84123,0.841303
8,0.234,0.370434,0.8477,0.85076,0.847547,0.848279


[I 2025-04-05 16:20:17,175] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0001630679501877863, 'weight_decay': 0.002, 'warmup_steps': 9, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1484,0.86438,0.5895,0.597933,0.588631,0.584073
2,0.7625,0.627441,0.7151,0.715757,0.715022,0.714306


[I 2025-04-05 16:24:17,717] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0007728148177920344, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.051,0.783294,0.6303,0.633384,0.629706,0.625104
2,0.7268,0.566428,0.7445,0.749047,0.743612,0.744401
3,0.5955,0.521204,0.7719,0.776105,0.772618,0.767287
4,0.5102,0.454873,0.8107,0.812127,0.810788,0.807545


[I 2025-04-05 16:32:16,770] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0009882000147245508, 'weight_decay': 0.007, 'warmup_steps': 11, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0724,0.824978,0.614,0.613203,0.612981,0.608715
2,0.7668,0.645142,0.6993,0.706482,0.698572,0.698648
3,0.6266,0.53652,0.7641,0.765626,0.764469,0.760707
4,0.5351,0.467288,0.8067,0.806388,0.80628,0.803481


[I 2025-04-05 16:40:14,695] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0007286491234542012, 'weight_decay': 0.003, 'warmup_steps': 16, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0519,0.828352,0.6006,0.611799,0.599852,0.595363
2,0.7294,0.595512,0.7322,0.733088,0.731639,0.730545
3,0.592,0.512834,0.7758,0.775625,0.776303,0.771604
4,0.5076,0.446626,0.8151,0.817754,0.814794,0.812903
5,0.4433,0.40158,0.8334,0.840145,0.833263,0.833756
6,0.385,0.388098,0.84,0.844399,0.840073,0.840853
7,0.3341,0.361528,0.8564,0.86124,0.856782,0.857007
8,0.2848,0.344383,0.8615,0.863433,0.861567,0.861704


[I 2025-04-05 16:56:13,045] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0003029196447533764, 'weight_decay': 0.002, 'warmup_steps': 12, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0535,0.78723,0.6272,0.635778,0.626175,0.62503
2,0.6899,0.537817,0.7662,0.770012,0.765521,0.766488
3,0.551,0.488111,0.7882,0.794908,0.789194,0.784612
4,0.4624,0.440711,0.8137,0.816124,0.813376,0.81113


[I 2025-04-05 17:04:12,113] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0006791313503650992, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0446,0.829557,0.6013,0.620038,0.60021,0.597239
2,0.7219,0.571948,0.7453,0.746633,0.744747,0.743791
3,0.5857,0.503345,0.7846,0.787241,0.785066,0.781743
4,0.4992,0.445309,0.8151,0.814882,0.814743,0.81278
5,0.4353,0.399442,0.8349,0.837873,0.834884,0.834038
6,0.3803,0.383099,0.8415,0.845103,0.841617,0.842286
7,0.3288,0.367656,0.8512,0.855286,0.85165,0.851481
8,0.2826,0.35046,0.8601,0.861407,0.860083,0.860256


[I 2025-04-05 17:20:06,612] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00031877890716175736, 'weight_decay': 0.003, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0792,0.790556,0.6208,0.620829,0.620394,0.616866
2,0.7044,0.563111,0.7476,0.752044,0.746887,0.74798
3,0.5609,0.487672,0.7914,0.792308,0.791922,0.788033
4,0.4717,0.421661,0.8228,0.824751,0.822296,0.821202
5,0.4007,0.398552,0.8335,0.838322,0.833598,0.832624
6,0.3429,0.38243,0.8466,0.848944,0.846951,0.846575
7,0.2907,0.37569,0.8465,0.852909,0.846981,0.846795
8,0.2463,0.358169,0.8553,0.858642,0.855319,0.855641


[I 2025-04-05 17:36:03,953] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0004871328506739177, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0288,0.769443,0.635,0.63754,0.634318,0.627614
2,0.7009,0.551582,0.7537,0.759147,0.75307,0.753491
3,0.5674,0.496484,0.785,0.789023,0.785497,0.782364
4,0.4855,0.437282,0.8182,0.821552,0.818152,0.816322
5,0.4195,0.394552,0.8383,0.844906,0.838414,0.837879
6,0.3622,0.372775,0.8507,0.853381,0.851006,0.851031
7,0.3094,0.361365,0.855,0.859263,0.855439,0.854989
8,0.263,0.348607,0.8634,0.866407,0.863411,0.863866
9,0.2249,0.34958,0.859,0.86628,0.858767,0.860652
10,0.1995,0.342326,0.8657,0.868144,0.86592,0.865734


[I 2025-04-05 17:56:01,501] Trial 114 finished with value: 0.8657341787094742 and parameters: {'learning_rate': 0.0004871328506739177, 'weight_decay': 0.0, 'warmup_steps': 14, 'lambda_param': 0.1, 'temperature': 3.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 115 with params: {'learning_rate': 0.0006798657280438548, 'weight_decay': 0.001, 'warmup_steps': 11, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0283,0.766453,0.6431,0.643757,0.642407,0.637953
2,0.7184,0.571876,0.7391,0.750343,0.73829,0.741339


[I 2025-04-05 18:00:00,377] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.00011264504731179041, 'weight_decay': 0.007, 'warmup_steps': 21, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2195,0.941606,0.5378,0.544422,0.536838,0.530624
2,0.8329,0.698546,0.6746,0.676869,0.67386,0.673288
3,0.67,0.601109,0.7242,0.724942,0.72468,0.720865
4,0.5681,0.539838,0.7598,0.76272,0.759409,0.756281


[I 2025-04-05 18:07:56,388] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0014579520501657554, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1196,0.874125,0.5764,0.576636,0.57663,0.569941
2,0.807,0.699632,0.6736,0.682721,0.672651,0.675259


[I 2025-04-05 18:11:54,322] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0006748372079391185, 'weight_decay': 0.0, 'warmup_steps': 8, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0351,0.812457,0.6123,0.633109,0.611874,0.608405
2,0.7178,0.582265,0.7379,0.74639,0.737557,0.738212


[I 2025-04-05 18:15:52,923] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.000435118026829303, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0587,0.788668,0.6236,0.640608,0.623185,0.620458
2,0.7061,0.555595,0.7574,0.762776,0.756672,0.758029
3,0.5602,0.478633,0.7989,0.799659,0.799514,0.795229
4,0.4736,0.425608,0.8254,0.824158,0.825203,0.822693
5,0.4067,0.389507,0.8381,0.845218,0.838095,0.837464
6,0.3489,0.377722,0.8456,0.847576,0.845892,0.84562
7,0.2988,0.36098,0.8581,0.864428,0.858595,0.858361
8,0.2534,0.336039,0.8678,0.870251,0.867799,0.868328
9,0.2181,0.350144,0.8613,0.872304,0.860718,0.863935
10,0.1954,0.338251,0.8688,0.870663,0.869008,0.868587


[I 2025-04-05 18:35:40,869] Trial 119 finished with value: 0.8685870629029303 and parameters: {'learning_rate': 0.000435118026829303, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 120 with params: {'learning_rate': 0.00028302501940019695, 'weight_decay': 0.0, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0872,0.782493,0.63,0.637188,0.629309,0.628758
2,0.7065,0.563777,0.749,0.756814,0.748538,0.750538
3,0.5581,0.479138,0.7934,0.794224,0.793858,0.79075
4,0.4655,0.428165,0.8256,0.828032,0.825431,0.82369
5,0.3918,0.398036,0.8326,0.837987,0.83297,0.831557
6,0.3329,0.380569,0.8452,0.846648,0.845309,0.844853
7,0.2821,0.377036,0.8455,0.850385,0.846023,0.845659
8,0.2388,0.358938,0.8589,0.860847,0.858863,0.859301


[I 2025-04-05 18:51:35,478] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2937,0.984827,0.5121,0.511136,0.511289,0.507254
2,0.9331,0.794529,0.6125,0.615626,0.611957,0.610152
3,0.7542,0.682741,0.6861,0.687555,0.686587,0.683019
4,0.6445,0.588785,0.7363,0.737924,0.735847,0.733471


[I 2025-04-05 18:59:34,459] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0005601383782312898, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0465,0.784869,0.6308,0.639601,0.630624,0.623973
2,0.7103,0.555716,0.7558,0.756125,0.755312,0.754118
3,0.5709,0.493587,0.7843,0.78827,0.784948,0.780569
4,0.4895,0.424608,0.8254,0.826008,0.825264,0.823521
5,0.4203,0.387118,0.8431,0.845177,0.843118,0.842359
6,0.3636,0.373973,0.8476,0.85048,0.847929,0.847745
7,0.3119,0.370211,0.8525,0.857919,0.853113,0.852636
8,0.2647,0.338541,0.8669,0.868818,0.866852,0.867117
9,0.2277,0.348331,0.8621,0.871702,0.861825,0.864538
10,0.2022,0.335485,0.87,0.871624,0.870313,0.869751


[I 2025-04-05 19:19:23,932] Trial 122 finished with value: 0.8697505574306643 and parameters: {'learning_rate': 0.0005601383782312898, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 123 with params: {'learning_rate': 0.0007264936455554579, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0485,0.807664,0.616,0.624992,0.615147,0.612134
2,0.727,0.589714,0.7358,0.736252,0.735344,0.73447
3,0.5883,0.501667,0.7853,0.78902,0.785811,0.781764
4,0.5011,0.445235,0.8148,0.815354,0.814578,0.811394
5,0.4359,0.400821,0.836,0.839009,0.835891,0.835322
6,0.379,0.374414,0.8504,0.85368,0.850418,0.851232
7,0.3288,0.359229,0.8562,0.861094,0.856711,0.857055
8,0.2832,0.340846,0.8664,0.867314,0.866494,0.866452
9,0.2423,0.343855,0.8646,0.872077,0.864463,0.866342
10,0.2147,0.340689,0.8656,0.867444,0.865858,0.865313


[I 2025-04-05 19:39:15,201] Trial 123 finished with value: 0.8653130870057909 and parameters: {'learning_rate': 0.0007264936455554579, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 124 with params: {'learning_rate': 0.00034761202167212895, 'weight_decay': 0.0, 'warmup_steps': 19, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0557,0.77392,0.6364,0.640483,0.635616,0.628618
2,0.6859,0.534162,0.7582,0.765392,0.757893,0.758802
3,0.5463,0.469518,0.8015,0.805334,0.802158,0.799464
4,0.4607,0.422621,0.8291,0.830243,0.828638,0.826267
5,0.3898,0.384082,0.8392,0.843535,0.839448,0.838213
6,0.3335,0.371769,0.8506,0.853889,0.85077,0.850886
7,0.2822,0.367594,0.8534,0.85995,0.853943,0.854003
8,0.2392,0.344804,0.864,0.865003,0.864077,0.863798


[I 2025-04-05 19:55:04,360] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.00033740512568788885, 'weight_decay': 0.0, 'warmup_steps': 23, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0577,0.752595,0.6455,0.646614,0.644327,0.63785
2,0.6846,0.544218,0.7621,0.76794,0.761391,0.763132
3,0.5459,0.48056,0.7937,0.799752,0.794519,0.790041
4,0.4621,0.425561,0.8215,0.824318,0.821361,0.819512
5,0.3947,0.391181,0.8374,0.843382,0.837412,0.837233
6,0.336,0.380804,0.8462,0.848894,0.846542,0.846348
7,0.2849,0.372373,0.8548,0.860576,0.855168,0.85533
8,0.242,0.353023,0.8588,0.861681,0.858972,0.859266


[I 2025-04-05 20:10:55,976] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0011938989878580916, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0923,0.878618,0.5801,0.594291,0.579335,0.57702
2,0.778,0.640586,0.709,0.710668,0.70828,0.708348


[I 2025-04-05 20:14:53,368] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.00035355573243091304, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0723,0.781281,0.6296,0.635841,0.628823,0.622997
2,0.7026,0.563799,0.7473,0.756285,0.746751,0.749076
3,0.5581,0.482571,0.797,0.797599,0.797601,0.793784
4,0.4703,0.43425,0.8183,0.823204,0.81801,0.815941
5,0.4017,0.391323,0.8386,0.842114,0.838682,0.83777
6,0.3412,0.363831,0.8514,0.853071,0.851452,0.851492
7,0.2894,0.36473,0.8545,0.859028,0.855039,0.854878
8,0.2459,0.344116,0.8604,0.861443,0.860515,0.860635


[I 2025-04-05 20:30:47,673] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.00045561599315445124, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0513,0.779788,0.631,0.635056,0.630675,0.624783
2,0.7023,0.561188,0.7523,0.760547,0.751455,0.752718
3,0.56,0.487892,0.7924,0.798836,0.792866,0.789704
4,0.4739,0.419126,0.8257,0.826029,0.825493,0.823399
5,0.4076,0.386306,0.8417,0.845323,0.84166,0.841265
6,0.351,0.370015,0.8507,0.851994,0.85088,0.850726
7,0.3024,0.361045,0.8578,0.862838,0.858279,0.857848
8,0.2555,0.341408,0.8693,0.871304,0.869286,0.869529
9,0.2204,0.339079,0.8673,0.872429,0.866896,0.868617


[I 2025-04-05 20:58:30,819] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0006467075628720947, 'weight_decay': 0.0, 'warmup_steps': 24, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0555,0.817908,0.613,0.627938,0.611588,0.605198
2,0.7168,0.567932,0.7514,0.754325,0.750676,0.751358
3,0.5782,0.489133,0.7866,0.787731,0.787021,0.783357
4,0.4898,0.433424,0.82,0.821514,0.819816,0.817518


[I 2025-04-05 21:06:26,431] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0005612567161548509, 'weight_decay': 0.01, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0511,0.767723,0.6357,0.637011,0.634583,0.629112
2,0.7209,0.569173,0.7389,0.746715,0.738552,0.739393
3,0.58,0.493733,0.785,0.787341,0.785544,0.781531
4,0.4921,0.445163,0.8121,0.814467,0.811653,0.80872


[I 2025-04-05 21:14:21,558] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0006314643160078739, 'weight_decay': 0.001, 'warmup_steps': 29, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0352,0.783804,0.6261,0.641038,0.625579,0.622245
2,0.7033,0.559637,0.7547,0.756482,0.754272,0.753469
3,0.5688,0.509118,0.7815,0.785774,0.781905,0.777409
4,0.4852,0.42878,0.8199,0.822325,0.81943,0.817194


[I 2025-04-05 21:22:19,304] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0004227936996421073, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0595,0.786311,0.6257,0.640511,0.625322,0.620804
2,0.7094,0.565962,0.7451,0.749888,0.745136,0.744543
3,0.5618,0.493892,0.7874,0.791411,0.788061,0.785075
4,0.4746,0.428382,0.8201,0.820973,0.819831,0.817674
5,0.4054,0.393204,0.837,0.842119,0.83693,0.836504
6,0.3465,0.37526,0.8484,0.851267,0.848584,0.848971
7,0.2931,0.372067,0.8487,0.854086,0.84936,0.849185
8,0.2496,0.347343,0.8634,0.864237,0.863441,0.863338


[I 2025-04-05 21:38:14,018] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0004945828691020691, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.047,0.774928,0.6297,0.639815,0.628915,0.626547
2,0.7074,0.554173,0.7585,0.76076,0.758054,0.758086
3,0.5698,0.488981,0.7928,0.793555,0.793417,0.789984
4,0.4837,0.43494,0.8245,0.823578,0.824342,0.821865
5,0.4158,0.394952,0.8396,0.8425,0.839721,0.838507
6,0.359,0.380993,0.8423,0.846451,0.842619,0.843041
7,0.3077,0.357395,0.8561,0.859472,0.856638,0.856155
8,0.2631,0.336908,0.8655,0.867125,0.865705,0.865774


[I 2025-04-05 21:54:08,078] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0007279168844919567, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0177,0.73627,0.6535,0.654309,0.653143,0.649538
2,0.7138,0.556395,0.7565,0.758551,0.755842,0.755905
3,0.5848,0.505095,0.7817,0.786518,0.782404,0.778018
4,0.5014,0.429619,0.818,0.817667,0.817511,0.815324
5,0.4373,0.410535,0.8264,0.830977,0.826369,0.825754
6,0.3819,0.391048,0.839,0.844447,0.839286,0.840109
7,0.3284,0.368205,0.8513,0.85571,0.851964,0.851476
8,0.2825,0.346733,0.8611,0.861993,0.861381,0.860658


[I 2025-04-05 22:09:57,254] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0001255724349917824, 'weight_decay': 0.0, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2149,0.900287,0.5533,0.554221,0.551545,0.53988
2,0.8229,0.678252,0.6895,0.690375,0.689126,0.688109


[I 2025-04-05 22:13:53,912] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.000990730767914249, 'weight_decay': 0.0, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0788,0.928857,0.5529,0.582795,0.552545,0.548307
2,0.7651,0.62294,0.7148,0.724738,0.714037,0.715274
3,0.62,0.530055,0.7681,0.773031,0.768476,0.765742
4,0.5275,0.459689,0.8061,0.807369,0.805891,0.802772


[I 2025-04-05 22:21:47,027] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.00028219622473405674, 'weight_decay': 0.0, 'warmup_steps': 24, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0748,0.792643,0.6243,0.632079,0.623702,0.620501
2,0.7078,0.545596,0.7556,0.761435,0.755143,0.755304
3,0.5558,0.479981,0.7908,0.792542,0.79142,0.787279
4,0.4663,0.429242,0.8206,0.8214,0.820312,0.818199
5,0.3961,0.391423,0.837,0.840997,0.837275,0.836042
6,0.336,0.376605,0.8465,0.850565,0.846653,0.8469
7,0.2844,0.368893,0.8542,0.860029,0.854797,0.85438
8,0.241,0.359492,0.8553,0.857296,0.855311,0.855697


[I 2025-04-05 22:37:35,410] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.00037053489289304563, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0607,0.812055,0.6151,0.635834,0.614993,0.613422
2,0.691,0.539679,0.7669,0.770059,0.766198,0.765654
3,0.5535,0.476674,0.7965,0.799727,0.797272,0.793948
4,0.4626,0.419959,0.8265,0.826134,0.826506,0.823644
5,0.3963,0.389043,0.8392,0.844006,0.83921,0.838554
6,0.337,0.381393,0.8434,0.848049,0.843731,0.844175
7,0.2858,0.363764,0.8564,0.861433,0.856906,0.856746
8,0.2425,0.346384,0.8633,0.864544,0.863144,0.863318


[I 2025-04-05 22:53:24,613] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.00045619870379395616, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0567,0.771255,0.6355,0.644438,0.634946,0.633761
2,0.6969,0.539187,0.767,0.770489,0.76627,0.766842
3,0.5517,0.481492,0.7915,0.795495,0.792096,0.789026
4,0.4676,0.413043,0.8315,0.832862,0.83114,0.829313
5,0.4008,0.381208,0.847,0.852966,0.847095,0.84699
6,0.344,0.361487,0.86,0.862272,0.860056,0.860276
7,0.2939,0.356586,0.8575,0.864238,0.858047,0.858051
8,0.2487,0.338132,0.8649,0.866235,0.864942,0.865068
9,0.2155,0.343605,0.8645,0.872258,0.864232,0.866724
10,0.1919,0.334742,0.8695,0.872368,0.869744,0.869508


[I 2025-04-05 23:13:11,476] Trial 140 finished with value: 0.8695084014656335 and parameters: {'learning_rate': 0.00045619870379395616, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 141 with params: {'learning_rate': 0.00045542122819726687, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0666,0.815763,0.6128,0.623598,0.61269,0.608359
2,0.7156,0.564664,0.7505,0.757499,0.749494,0.751101
3,0.5681,0.477117,0.7966,0.799301,0.796904,0.795047
4,0.4799,0.423336,0.824,0.823949,0.823806,0.821317
5,0.4119,0.392621,0.8377,0.845006,0.837604,0.83767
6,0.354,0.376945,0.8455,0.848452,0.845598,0.845876
7,0.304,0.354143,0.8606,0.862894,0.860996,0.860507
8,0.259,0.335205,0.8687,0.870594,0.868771,0.869272
9,0.2218,0.337965,0.8657,0.872127,0.865552,0.867549
10,0.1978,0.333814,0.8707,0.871204,0.870944,0.870155


[I 2025-04-05 23:32:59,517] Trial 141 finished with value: 0.8701546791862735 and parameters: {'learning_rate': 0.00045542122819726687, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 61 with value: 0.8750916744350314.


Trial 142 with params: {'learning_rate': 0.0003625977150201981, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0581,0.75582,0.643,0.645992,0.642252,0.636983
2,0.6956,0.549975,0.759,0.768306,0.75803,0.760288
3,0.5525,0.470504,0.7964,0.795974,0.796902,0.79294
4,0.466,0.417997,0.8256,0.826584,0.825362,0.823104
5,0.397,0.38795,0.8384,0.841789,0.838259,0.837651
6,0.3396,0.37445,0.8448,0.848051,0.845017,0.845598
7,0.2888,0.369277,0.854,0.859668,0.854543,0.854361
8,0.2446,0.347448,0.8592,0.86233,0.859215,0.859906


[I 2025-04-05 23:48:50,765] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0028588774268147863, 'weight_decay': 0.001, 'warmup_steps': 27, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2442,1.065578,0.4491,0.468853,0.448526,0.444334
2,0.9767,0.823414,0.6056,0.618014,0.60508,0.605297


[I 2025-04-05 23:52:49,426] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 5.8193477735771966e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3338,1.041074,0.4765,0.476813,0.475565,0.471237
2,0.9919,0.859299,0.5761,0.582305,0.575023,0.5729
3,0.8324,0.755065,0.6415,0.647665,0.641672,0.639509
4,0.7292,0.683715,0.6773,0.67509,0.677058,0.673133


[I 2025-04-06 00:00:44,435] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0004872120757412702, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0518,0.7824,0.6262,0.630126,0.625851,0.620799
2,0.6932,0.538877,0.7604,0.765257,0.759829,0.759998
3,0.5583,0.47301,0.7959,0.796672,0.796439,0.793927
4,0.4755,0.418221,0.8257,0.825805,0.825775,0.823338
5,0.4097,0.380988,0.8425,0.84907,0.842401,0.842957
6,0.354,0.379475,0.8477,0.851073,0.84781,0.848097
7,0.3029,0.349654,0.8631,0.866297,0.863449,0.863365
8,0.2573,0.340605,0.8661,0.866725,0.866258,0.865916


[I 2025-04-06 00:16:43,874] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0011607614784531854, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0847,0.864779,0.5817,0.586132,0.581511,0.574755
2,0.788,0.646037,0.7038,0.709609,0.703432,0.704519
3,0.6439,0.537493,0.7632,0.765577,0.763917,0.759794
4,0.5498,0.480836,0.7924,0.794961,0.79189,0.788977


[I 2025-04-06 00:24:43,637] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.0005479819880329459, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0424,0.787082,0.6321,0.632137,0.63135,0.627851
2,0.6994,0.547336,0.7616,0.768194,0.760998,0.762122
3,0.5627,0.477757,0.7945,0.798572,0.795211,0.790691
4,0.4776,0.419595,0.8251,0.825885,0.825179,0.822933
5,0.4139,0.388188,0.84,0.844035,0.840251,0.83974
6,0.3589,0.369814,0.8507,0.852444,0.851044,0.850872
7,0.3079,0.35797,0.8587,0.86497,0.85924,0.859311
8,0.262,0.337418,0.8643,0.865213,0.86446,0.864388


[I 2025-04-06 00:40:39,960] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0008342452957696794, 'weight_decay': 0.0, 'warmup_steps': 32, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0474,0.78629,0.6279,0.62471,0.627158,0.620713
2,0.7316,0.576808,0.7359,0.75407,0.735151,0.738266


[I 2025-04-06 00:44:37,904] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0007948125851896977, 'weight_decay': 0.0, 'warmup_steps': 17, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0481,0.843022,0.6014,0.608558,0.601168,0.595615
2,0.7386,0.589394,0.7363,0.742092,0.735589,0.736877
3,0.5979,0.509572,0.7848,0.789235,0.785407,0.78189
4,0.509,0.440768,0.8143,0.813443,0.813995,0.811257
5,0.4402,0.404485,0.8325,0.839022,0.832304,0.832647
6,0.3839,0.385286,0.8471,0.850472,0.847083,0.847701
7,0.3329,0.351667,0.86,0.862348,0.86027,0.860257
8,0.2877,0.344766,0.8638,0.863883,0.863938,0.863393


[I 2025-04-06 01:00:31,309] Trial 149 pruned. 


In [21]:
print(best_distill_random)

BestRun(run_id='61', objective=0.8750916744350314, hyperparameters={'learning_rate': 0.0005943410799444305, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}, run_summary=None)


In [22]:
base.reset_seed()

In [23]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head_hp-search", epochs=num_epochs, batch_size=batch_size)

In [24]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

In [25]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [26]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(10))
)
  

config.json:   0%|          | 0.00/69.8k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/14.2M [00:00<?, ?B/s]

In [27]:
best_base_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-head",
    n_trials=150
)

[I 2025-04-06 01:09:59,492] A new study created in memory with name: Base-head


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.302,0.833241,0.7336,0.733584,0.732961,0.732005
2,0.9577,0.751208,0.7514,0.753175,0.751159,0.75054


[I 2025-04-06 01:12:47,512] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0766,0.74327,0.7486,0.751909,0.748149,0.747301
2,0.8901,0.698056,0.7635,0.766456,0.763349,0.76231
3,0.8617,0.6851,0.7639,0.764567,0.763986,0.761412
4,0.8453,0.669149,0.7719,0.777889,0.771828,0.770768


[I 2025-04-06 01:18:19,212] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7885,1.26664,0.6858,0.683996,0.684836,0.683432
2,1.2763,1.005274,0.7177,0.719031,0.71725,0.717187


[I 2025-04-06 01:21:08,911] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0576,0.725944,0.7513,0.756701,0.750983,0.750542
2,0.8878,0.689583,0.767,0.771254,0.76673,0.766732
3,0.8647,0.679627,0.7647,0.766716,0.764735,0.762616
4,0.8486,0.659375,0.7755,0.778448,0.775305,0.77409


[I 2025-04-06 01:26:45,758] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.007,0.714102,0.7517,0.755827,0.751489,0.749991
2,0.9085,0.703172,0.7607,0.769819,0.760709,0.761578
3,0.889,0.677859,0.7664,0.769292,0.766059,0.763926
4,0.8697,0.663915,0.7736,0.7755,0.773095,0.771817


[I 2025-04-06 01:32:26,978] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5742,1.030562,0.7099,0.708978,0.708992,0.70796
2,1.1033,0.862973,0.732,0.73324,0.731655,0.731456
3,1.0041,0.806938,0.7417,0.742724,0.741473,0.740408
4,0.9589,0.752726,0.7532,0.757961,0.75262,0.752543


[I 2025-04-06 01:38:05,078] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2317,0.800474,0.7383,0.738673,0.737719,0.73664
2,0.9322,0.731742,0.7576,0.759908,0.757379,0.756725
3,0.8877,0.707632,0.7586,0.759686,0.758446,0.756666
4,0.8653,0.688602,0.7682,0.775969,0.767823,0.767829


[I 2025-04-06 01:43:42,075] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6436,1.101316,0.7009,0.699547,0.699999,0.698834
2,1.1548,0.904027,0.7285,0.729617,0.72814,0.728005
3,1.0393,0.837384,0.7376,0.738944,0.737367,0.736339
4,0.9869,0.774057,0.7502,0.754374,0.749608,0.749367


[I 2025-04-06 01:49:14,823] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1905,0.788144,0.7402,0.740776,0.739643,0.738499
2,0.9227,0.724365,0.7591,0.761624,0.758891,0.758203
3,0.8815,0.701797,0.7603,0.761046,0.760185,0.75833
4,0.8604,0.684853,0.7685,0.776265,0.76816,0.768062


[I 2025-04-06 01:54:51,618] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1322,0.766486,0.7444,0.746167,0.743893,0.742784
2,0.9055,0.71033,0.7613,0.763795,0.761114,0.760216
3,0.8703,0.691351,0.7626,0.763033,0.762554,0.76041
4,0.8515,0.67724,0.7692,0.776336,0.768977,0.768452


[I 2025-04-06 02:00:24,563] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0015322576261213353, 'weight_decay': 0.003, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0469,0.721518,0.7512,0.756371,0.75103,0.750134
2,0.8911,0.687924,0.7664,0.770505,0.766219,0.766345
3,0.8692,0.680451,0.766,0.769698,0.765892,0.764012
4,0.8523,0.658335,0.7749,0.777057,0.774635,0.773506
5,0.8444,0.663977,0.7739,0.775543,0.773399,0.771652
6,0.8376,0.686833,0.7638,0.777413,0.763412,0.766462
7,0.8311,0.668813,0.7712,0.775995,0.771419,0.770961
8,0.8267,0.655642,0.7784,0.77989,0.778122,0.777998


[I 2025-04-06 02:11:23,903] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0025419498380802787, 'weight_decay': 0.002, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.032,0.718643,0.7515,0.756417,0.751234,0.749863
2,0.9154,0.710732,0.7569,0.769187,0.756964,0.757958
3,0.8965,0.678592,0.767,0.770033,0.766585,0.764599
4,0.876,0.668075,0.7713,0.773347,0.770683,0.769037
5,0.8658,0.671172,0.7728,0.775127,0.77251,0.770534
6,0.8577,0.690342,0.7632,0.776295,0.762834,0.765346
7,0.8467,0.660241,0.7733,0.77541,0.773357,0.773031
8,0.839,0.656556,0.7746,0.776406,0.774388,0.773931


[I 2025-04-06 02:22:36,800] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.003885078898153256, 'weight_decay': 0.005, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0501,0.770422,0.7362,0.749533,0.735776,0.733927
2,0.9579,0.733339,0.7491,0.77096,0.749075,0.75118
3,0.9392,0.712698,0.7586,0.769626,0.757788,0.756124
4,0.9141,0.692957,0.7658,0.769899,0.765151,0.762647


[I 2025-04-06 02:28:16,376] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0003085662132454162, 'weight_decay': 0.003, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2872,0.821574,0.7344,0.73446,0.733762,0.732719
2,0.9484,0.744045,0.7529,0.754832,0.752677,0.75204


[I 2025-04-06 02:31:05,516] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.003147266239250273, 'weight_decay': 0.0, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0277,0.737505,0.7454,0.753406,0.745121,0.743333
2,0.9324,0.724732,0.7504,0.769788,0.750366,0.752551
3,0.9151,0.691689,0.7633,0.768843,0.762715,0.760925
4,0.8923,0.682761,0.7688,0.772992,0.768029,0.76574


[I 2025-04-06 02:36:43,179] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0009911438163048463, 'weight_decay': 0.006, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.076,0.735414,0.7503,0.755121,0.749883,0.749397
2,0.8869,0.693965,0.7664,0.769961,0.766186,0.765552
3,0.8611,0.683133,0.7637,0.764752,0.763784,0.761255
4,0.8454,0.665206,0.7729,0.777778,0.772801,0.7715
5,0.8364,0.66235,0.7736,0.774998,0.773139,0.771971
6,0.8304,0.683742,0.7635,0.774656,0.7632,0.765852
7,0.8259,0.670642,0.7714,0.776288,0.771508,0.77126
8,0.8226,0.658866,0.7764,0.777861,0.776125,0.775999


[I 2025-04-06 02:48:01,419] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0011846019801520931, 'weight_decay': 0.007, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0617,0.729125,0.7515,0.756813,0.751171,0.750787
2,0.8867,0.691006,0.7671,0.771289,0.76684,0.76667
3,0.8629,0.680399,0.7642,0.765551,0.76427,0.762005
4,0.847,0.661337,0.774,0.777459,0.773829,0.772426
5,0.8385,0.662598,0.7727,0.774209,0.772184,0.770865
6,0.8323,0.684526,0.7634,0.774853,0.763058,0.765779
7,0.8272,0.670463,0.7709,0.776399,0.771087,0.770825
8,0.8236,0.657311,0.7768,0.778321,0.776531,0.776452


[I 2025-04-06 02:59:17,152] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.002034473844870723, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0254,0.714571,0.7524,0.756182,0.752288,0.750732
2,0.9019,0.694477,0.7628,0.768667,0.762718,0.763448
3,0.8814,0.679268,0.7666,0.770513,0.766349,0.764294
4,0.863,0.661653,0.7746,0.776744,0.774198,0.773144
5,0.8547,0.667212,0.7734,0.77484,0.773076,0.770968
6,0.8471,0.688926,0.7643,0.778515,0.763952,0.766896
7,0.8383,0.663433,0.772,0.774702,0.772161,0.771608
8,0.8325,0.655205,0.7763,0.777774,0.776059,0.77578


[I 2025-04-06 03:10:19,360] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0026868566033176914, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0135,0.720928,0.7507,0.755712,0.75043,0.748933
2,0.9188,0.714395,0.7556,0.769709,0.755639,0.756871


[I 2025-04-06 03:13:07,564] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0012889975483555737, 'weight_decay': 0.004, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0435,0.725503,0.7517,0.757121,0.751387,0.75097
2,0.8871,0.689497,0.7675,0.771759,0.767239,0.767219
3,0.8642,0.679563,0.7643,0.766167,0.764347,0.76217
4,0.8482,0.659565,0.7752,0.77815,0.775005,0.773755
5,0.84,0.662976,0.773,0.774527,0.772482,0.77104
6,0.8337,0.685068,0.7628,0.77469,0.762436,0.765207
7,0.8282,0.670214,0.7711,0.776537,0.77129,0.771021
8,0.8243,0.656691,0.7772,0.778675,0.776944,0.776836


[I 2025-04-06 03:24:20,937] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 7.828712010044815e-05, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7064,1.177606,0.6943,0.692611,0.69337,0.69206
2,1.2113,0.950784,0.7232,0.724438,0.722851,0.722738


[I 2025-04-06 03:27:06,687] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0032812185234454374, 'weight_decay': 0.004, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0307,0.743352,0.7429,0.751778,0.742582,0.740702
2,0.9367,0.727186,0.7504,0.770939,0.750369,0.752684
3,0.9195,0.695885,0.7614,0.767885,0.760756,0.759078
4,0.8961,0.685683,0.7672,0.771827,0.766417,0.764035


[I 2025-04-06 03:32:42,452] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0009743336791517001, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0707,0.735641,0.7499,0.754555,0.749485,0.748954
2,0.8868,0.694141,0.7665,0.769886,0.766306,0.765621
3,0.861,0.683323,0.7637,0.764723,0.763791,0.76125
4,0.8453,0.665515,0.7725,0.777426,0.772406,0.771104


[I 2025-04-06 03:38:22,197] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0007902499286334201, 'weight_decay': 0.008, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.107,0.744983,0.7477,0.751117,0.747265,0.746422
2,0.891,0.698657,0.7637,0.76662,0.763535,0.762466
3,0.8621,0.6854,0.7644,0.764937,0.764482,0.761874
4,0.8456,0.669263,0.7715,0.777248,0.771425,0.770298
5,0.8361,0.664009,0.7722,0.773466,0.771863,0.770674
6,0.8301,0.683302,0.7626,0.773063,0.762359,0.764921
7,0.8259,0.671562,0.7708,0.774861,0.770822,0.770595
8,0.8229,0.661327,0.7747,0.776095,0.774402,0.774255


[I 2025-04-06 03:49:41,967] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0009275569639523581, 'weight_decay': 0.004, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0876,0.738096,0.7487,0.753108,0.748279,0.747673
2,0.8878,0.695269,0.7659,0.769096,0.76572,0.764924
3,0.8611,0.683998,0.763,0.763936,0.763098,0.760469
4,0.8452,0.666439,0.7722,0.777251,0.772115,0.770825


[I 2025-04-06 03:55:13,141] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.001372320291131418, 'weight_decay': 0.003, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0415,0.723655,0.7513,0.756625,0.751012,0.750471
2,0.8882,0.68868,0.7667,0.770839,0.766444,0.76648
3,0.8657,0.679543,0.7657,0.768265,0.765698,0.763701
4,0.8495,0.658657,0.7761,0.778713,0.77587,0.774677
5,0.8414,0.663319,0.7738,0.77539,0.773289,0.77179
6,0.8349,0.685617,0.7631,0.775608,0.762723,0.765596
7,0.8291,0.669902,0.7714,0.77678,0.771603,0.771286
8,0.8251,0.656262,0.7776,0.779037,0.777335,0.777215
9,0.8154,0.665501,0.7727,0.776185,0.772123,0.773391
10,0.8118,0.654233,0.7739,0.77636,0.773795,0.7731


[I 2025-04-06 04:09:16,509] Trial 25 finished with value: 0.7730996408345948 and parameters: {'learning_rate': 0.001372320291131418, 'weight_decay': 0.003, 'warmup_steps': 21}. Best is trial 25 with value: 0.7730996408345948.


Trial 26 with params: {'learning_rate': 0.001070312430949223, 'weight_decay': 0.005, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0491,0.731594,0.7506,0.75574,0.750212,0.749811
2,0.8859,0.692261,0.7667,0.770661,0.766461,0.766092
3,0.8613,0.681872,0.7638,0.765032,0.763873,0.761452
4,0.8457,0.663564,0.772,0.776312,0.77187,0.770471


[I 2025-04-06 04:14:49,819] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0007604397100532161, 'weight_decay': 0.001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0992,0.746189,0.7472,0.750327,0.746757,0.745811
2,0.8916,0.699302,0.7639,0.766756,0.763746,0.762665
3,0.8623,0.685563,0.764,0.764458,0.764088,0.761494
4,0.8457,0.669896,0.7715,0.777651,0.771414,0.770392
5,0.8362,0.66438,0.7727,0.773948,0.772363,0.77116
6,0.8302,0.683259,0.7632,0.773432,0.762954,0.765457
7,0.826,0.671812,0.7709,0.775001,0.770918,0.770729
8,0.8231,0.661826,0.7741,0.775392,0.7738,0.773634


[I 2025-04-06 04:26:05,144] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.0012536609613406392, 'weight_decay': 0.004, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0456,0.726466,0.7516,0.757107,0.751292,0.750899
2,0.8868,0.689889,0.7673,0.771595,0.767033,0.76698
3,0.8637,0.679735,0.7645,0.766006,0.764558,0.762347
4,0.8478,0.660091,0.7751,0.778244,0.774918,0.773614
5,0.8394,0.662822,0.7732,0.774691,0.772689,0.771269
6,0.8332,0.684832,0.763,0.77466,0.762639,0.765355
7,0.8279,0.670308,0.7709,0.776376,0.771087,0.770819
8,0.824,0.656893,0.7773,0.778766,0.777048,0.776946
9,0.8149,0.665836,0.7728,0.776303,0.772227,0.773519
10,0.8117,0.654817,0.7741,0.776522,0.773994,0.773291


[I 2025-04-06 04:40:12,750] Trial 28 finished with value: 0.773290513712457 and parameters: {'learning_rate': 0.0012536609613406392, 'weight_decay': 0.004, 'warmup_steps': 19}. Best is trial 28 with value: 0.773290513712457.


Trial 29 with params: {'learning_rate': 0.0015941853235222011, 'weight_decay': 0.002, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.037,0.720242,0.7507,0.755565,0.750574,0.7495
2,0.892,0.687743,0.7674,0.771643,0.767236,0.767478
3,0.8704,0.680703,0.766,0.769988,0.765878,0.764033
4,0.8534,0.658599,0.7742,0.776155,0.773929,0.772796
5,0.8455,0.664201,0.774,0.775578,0.773511,0.771737
6,0.8387,0.687218,0.7637,0.77742,0.763329,0.766367
7,0.8319,0.668227,0.7711,0.775734,0.771322,0.770846
8,0.8274,0.655459,0.7781,0.779657,0.777822,0.777707
9,0.8166,0.665275,0.7721,0.775703,0.771522,0.772803
10,0.8123,0.653481,0.7747,0.777104,0.774587,0.773921


[I 2025-04-06 04:54:17,813] Trial 29 finished with value: 0.7739209056911325 and parameters: {'learning_rate': 0.0015941853235222011, 'weight_decay': 0.002, 'warmup_steps': 25}. Best is trial 29 with value: 0.7739209056911325.


Trial 30 with params: {'learning_rate': 0.0014033968012030652, 'weight_decay': 0.002, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0423,0.723165,0.751,0.756474,0.750741,0.750177
2,0.8887,0.688462,0.7674,0.771616,0.767139,0.767265
3,0.8664,0.679657,0.7658,0.768689,0.765779,0.763834
4,0.85,0.658417,0.7754,0.777816,0.775164,0.773993
5,0.8419,0.663457,0.7737,0.775242,0.773183,0.771602
6,0.8354,0.685847,0.7629,0.775673,0.762519,0.765458
7,0.8295,0.669735,0.7716,0.776764,0.771811,0.771436
8,0.8254,0.656121,0.7778,0.77927,0.777536,0.777418
9,0.8155,0.665446,0.773,0.776563,0.772413,0.773704
10,0.8118,0.654093,0.7739,0.776353,0.773795,0.773108


[I 2025-04-06 05:08:02,131] Trial 30 finished with value: 0.7731075878801307 and parameters: {'learning_rate': 0.0014033968012030652, 'weight_decay': 0.002, 'warmup_steps': 23}. Best is trial 29 with value: 0.7739209056911325.


Trial 31 with params: {'learning_rate': 0.0011226628713869803, 'weight_decay': 0.003, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.057,0.730562,0.751,0.756265,0.75063,0.750271
2,0.8862,0.69169,0.7666,0.770602,0.766349,0.766074
3,0.862,0.681154,0.7635,0.764793,0.76359,0.761228
4,0.8463,0.662555,0.7728,0.776602,0.772656,0.771226
5,0.8376,0.662354,0.7736,0.775156,0.773097,0.771873
6,0.8316,0.684203,0.7642,0.7754,0.763872,0.766537
7,0.8267,0.670493,0.7708,0.77612,0.770973,0.770713
8,0.8231,0.657745,0.7766,0.778103,0.776325,0.77623


[I 2025-04-06 05:19:07,896] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0019546634304671556, 'weight_decay': 0.002, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0265,0.715155,0.7518,0.755707,0.751705,0.750135
2,0.8999,0.692309,0.7639,0.769154,0.763787,0.764456
3,0.8793,0.679889,0.7662,0.770291,0.765985,0.763975
4,0.8611,0.661167,0.775,0.777062,0.774601,0.773577
5,0.853,0.666588,0.7735,0.774878,0.773135,0.771083
6,0.8455,0.688696,0.7637,0.777985,0.763369,0.766406
7,0.837,0.66427,0.7726,0.775542,0.77281,0.772158
8,0.8315,0.65512,0.7769,0.77832,0.776659,0.776396
9,0.8191,0.665609,0.7707,0.774696,0.770083,0.771448
10,0.8135,0.652942,0.7751,0.777543,0.774995,0.774354


[I 2025-04-06 05:33:00,286] Trial 32 finished with value: 0.7743540775172536 and parameters: {'learning_rate': 0.0019546634304671556, 'weight_decay': 0.002, 'warmup_steps': 23}. Best is trial 32 with value: 0.7743540775172536.


Trial 33 with params: {'learning_rate': 0.002876741995359615, 'weight_decay': 0.002, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0269,0.72732,0.749,0.755418,0.748748,0.747183
2,0.9244,0.719061,0.7522,0.768611,0.752226,0.753824
3,0.9066,0.684176,0.767,0.770734,0.766514,0.764604
4,0.8849,0.676048,0.7689,0.772268,0.768149,0.766022


[I 2025-04-06 05:38:27,792] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0010170469716486766, 'weight_decay': 0.001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0687,0.734251,0.7507,0.755648,0.750284,0.749853
2,0.8865,0.693411,0.7669,0.770585,0.766673,0.766125
3,0.8612,0.68274,0.7637,0.764745,0.763785,0.761271
4,0.8455,0.664671,0.7731,0.777775,0.772982,0.771658
5,0.8366,0.662273,0.774,0.775436,0.77353,0.772376
6,0.8306,0.683815,0.7634,0.77459,0.763095,0.765745
7,0.826,0.670597,0.7705,0.775544,0.77062,0.770406
8,0.8227,0.658607,0.7768,0.778284,0.776521,0.776405
9,0.8147,0.667124,0.7724,0.77613,0.771827,0.773224
10,0.8121,0.656647,0.7742,0.776717,0.774112,0.773353


[I 2025-04-06 05:52:39,301] Trial 34 finished with value: 0.7733528235968147 and parameters: {'learning_rate': 0.0010170469716486766, 'weight_decay': 0.001, 'warmup_steps': 23}. Best is trial 32 with value: 0.7743540775172536.


Trial 35 with params: {'learning_rate': 0.0006921418499820603, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1161,0.751631,0.7456,0.748596,0.745164,0.744149
2,0.8948,0.701781,0.7625,0.765124,0.762345,0.761259
3,0.8639,0.686498,0.7639,0.764254,0.763966,0.761447
4,0.8467,0.671672,0.7698,0.776276,0.769698,0.768809


[I 2025-04-06 05:58:21,704] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.004049761177508626, 'weight_decay': 0.006, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0312,0.77349,0.737,0.751003,0.736517,0.734651
2,0.9631,0.733134,0.7497,0.77033,0.749724,0.751499


[I 2025-04-06 06:01:10,761] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0014386007442145413, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0304,0.722026,0.7509,0.756206,0.750668,0.749999
2,0.8889,0.688057,0.767,0.771125,0.76675,0.76686
3,0.8669,0.679734,0.765,0.768111,0.764982,0.763017
4,0.8505,0.658218,0.7755,0.777712,0.775264,0.774032
5,0.8425,0.663516,0.7739,0.775453,0.773389,0.771753
6,0.836,0.68604,0.7628,0.775796,0.762408,0.765367
7,0.8299,0.669495,0.7714,0.776522,0.771617,0.771215
8,0.8257,0.655958,0.778,0.779463,0.777735,0.777606
9,0.8157,0.665366,0.7731,0.776738,0.772511,0.773827
10,0.8119,0.653946,0.7745,0.776934,0.774395,0.773708


[I 2025-04-06 06:15:07,816] Trial 37 finished with value: 0.7737075171271725 and parameters: {'learning_rate': 0.0014386007442145413, 'weight_decay': 0.0, 'warmup_steps': 14}. Best is trial 32 with value: 0.7743540775172536.


Trial 38 with params: {'learning_rate': 0.0008236019566005199, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0778,0.741601,0.7489,0.75249,0.748461,0.747681
2,0.8892,0.697264,0.7644,0.767396,0.764266,0.763208
3,0.8613,0.684841,0.7645,0.765212,0.764589,0.761978
4,0.8452,0.668425,0.7725,0.778212,0.772438,0.771275
5,0.8358,0.663509,0.7725,0.773923,0.77214,0.770992
6,0.8299,0.683217,0.7625,0.772947,0.762242,0.764812
7,0.8257,0.671216,0.7704,0.774568,0.770434,0.770195
8,0.8227,0.660815,0.7759,0.777333,0.775613,0.775478


[I 2025-04-06 06:26:23,586] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.841,1.331052,0.6774,0.676049,0.676392,0.675063
2,1.3245,1.047421,0.7141,0.715675,0.713599,0.713572


[I 2025-04-06 06:29:09,494] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0021663411768748316, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0125,0.713842,0.7515,0.755471,0.751347,0.749825
2,0.9049,0.698431,0.7622,0.769427,0.762154,0.762987
3,0.8849,0.678407,0.7671,0.770555,0.766809,0.764729
4,0.8661,0.662464,0.7745,0.776391,0.774068,0.772924
5,0.8574,0.668138,0.7725,0.774052,0.772206,0.770067
6,0.8497,0.689193,0.7638,0.777656,0.763442,0.766219
7,0.8403,0.662245,0.772,0.774486,0.772133,0.771652
8,0.8341,0.65538,0.7759,0.777471,0.775682,0.775363


[I 2025-04-06 06:40:31,115] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7796,1.266245,0.6858,0.684009,0.684827,0.683441
2,1.2774,1.007153,0.7172,0.71857,0.716743,0.716727


[I 2025-04-06 06:43:19,280] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0016868185626433263, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0234,0.718446,0.7515,0.756021,0.751398,0.750144
2,0.8936,0.687937,0.7663,0.770735,0.766152,0.766542
3,0.8724,0.680909,0.7652,0.769334,0.765045,0.763099
4,0.8551,0.659142,0.7739,0.775907,0.77361,0.772521
5,0.8473,0.664638,0.7744,0.775755,0.773941,0.772076
6,0.8403,0.687657,0.7631,0.777209,0.762742,0.765846
7,0.8331,0.667262,0.771,0.775252,0.771215,0.770704
8,0.8283,0.65527,0.7781,0.77957,0.777827,0.777672
9,0.8172,0.665263,0.7715,0.775231,0.770915,0.772218
10,0.8125,0.653272,0.7746,0.777059,0.774496,0.773827


[I 2025-04-06 06:57:29,087] Trial 42 finished with value: 0.7738266436216666 and parameters: {'learning_rate': 0.0016868185626433263, 'weight_decay': 0.0, 'warmup_steps': 15}. Best is trial 32 with value: 0.7743540775172536.


Trial 43 with params: {'learning_rate': 0.0031207102109740248, 'weight_decay': 0.0, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0204,0.735974,0.7463,0.753982,0.746014,0.744252
2,0.9314,0.724127,0.7507,0.770024,0.750685,0.752852


[I 2025-04-06 07:00:16,977] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0011469303032835437, 'weight_decay': 0.001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0478,0.729375,0.7508,0.756007,0.750437,0.750055
2,0.8861,0.691227,0.7671,0.77119,0.766849,0.766599
3,0.8622,0.680787,0.7639,0.76526,0.763986,0.761646
4,0.8465,0.662056,0.7731,0.776778,0.772943,0.771496
5,0.8379,0.662391,0.7731,0.774607,0.772597,0.771322
6,0.8318,0.684261,0.7641,0.775547,0.763773,0.766498
7,0.8269,0.670456,0.7708,0.776229,0.770978,0.770722
8,0.8233,0.65757,0.7765,0.778037,0.776227,0.776145


[I 2025-04-06 07:11:36,390] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.001857488415426479, 'weight_decay': 0.001, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0291,0.716331,0.7524,0.756631,0.752297,0.75081
2,0.8977,0.690189,0.7638,0.768686,0.763659,0.76428
3,0.8768,0.680547,0.7656,0.769902,0.765426,0.7634
4,0.8589,0.660557,0.7744,0.776496,0.774032,0.772984
5,0.8509,0.665822,0.7735,0.774936,0.773087,0.771156
6,0.8436,0.688396,0.7638,0.77811,0.763464,0.766567
7,0.8356,0.665341,0.7723,0.775552,0.772514,0.771903
8,0.8304,0.655107,0.7776,0.779047,0.77734,0.777131
9,0.8184,0.665445,0.7709,0.774709,0.770301,0.771608
10,0.8131,0.653031,0.7753,0.777688,0.775188,0.774534


[I 2025-04-06 07:25:21,282] Trial 45 finished with value: 0.7745343349809349 and parameters: {'learning_rate': 0.001857488415426479, 'weight_decay': 0.001, 'warmup_steps': 24}. Best is trial 45 with value: 0.7745343349809349.


Trial 46 with params: {'learning_rate': 0.0012237374268108511, 'weight_decay': 0.0, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0402,0.72692,0.7518,0.757148,0.751493,0.751102
2,0.8864,0.690167,0.7675,0.771766,0.76722,0.767128
3,0.8631,0.679919,0.7641,0.765485,0.764167,0.76193
4,0.8473,0.660585,0.7741,0.777308,0.773928,0.772529
5,0.8389,0.66266,0.7734,0.774911,0.772901,0.771527
6,0.8328,0.684624,0.7633,0.774815,0.762946,0.76564
7,0.8275,0.670355,0.7708,0.776284,0.770984,0.77072
8,0.8238,0.657052,0.777,0.778498,0.776743,0.776653
9,0.8148,0.665925,0.7728,0.776309,0.772224,0.773526
10,0.8117,0.654981,0.7743,0.776657,0.774192,0.773476


[I 2025-04-06 07:39:21,111] Trial 46 finished with value: 0.7734759162859809 and parameters: {'learning_rate': 0.0012237374268108511, 'weight_decay': 0.0, 'warmup_steps': 13}. Best is trial 45 with value: 0.7745343349809349.


Trial 47 with params: {'learning_rate': 0.0019190018767284457, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.028,0.715582,0.7515,0.755537,0.751405,0.749874
2,0.8991,0.691485,0.7638,0.768999,0.763678,0.764362
3,0.8784,0.680153,0.7658,0.769946,0.765607,0.763552
4,0.8603,0.660976,0.7743,0.776364,0.773926,0.772881
5,0.8522,0.666305,0.7737,0.775124,0.77332,0.771322
6,0.8448,0.68858,0.7635,0.777909,0.76317,0.766264
7,0.8365,0.66464,0.7728,0.775825,0.772999,0.772381
8,0.8311,0.655103,0.7773,0.778737,0.777049,0.776818
9,0.8188,0.665555,0.7708,0.774772,0.770188,0.771544
10,0.8133,0.652972,0.7752,0.777641,0.77509,0.774464


[I 2025-04-06 07:53:27,425] Trial 47 finished with value: 0.774463534432444 and parameters: {'learning_rate': 0.0019190018767284457, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 45 with value: 0.7745343349809349.


Trial 48 with params: {'learning_rate': 0.0027539633997353424, 'weight_decay': 0.0, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.029,0.723585,0.7496,0.755264,0.749345,0.747874
2,0.921,0.716214,0.7541,0.769034,0.754143,0.755475


[I 2025-04-06 07:56:17,320] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.001948584409848943, 'weight_decay': 0.001, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0296,0.715372,0.7518,0.755764,0.75171,0.750164
2,0.8999,0.692225,0.7637,0.768946,0.76359,0.764255
3,0.8792,0.679934,0.7662,0.770263,0.765995,0.763977
4,0.861,0.66116,0.7749,0.776941,0.774509,0.773481
5,0.8529,0.666539,0.7735,0.77487,0.773127,0.771108
6,0.8454,0.688693,0.7637,0.778013,0.763366,0.766413
7,0.837,0.664322,0.7726,0.775542,0.77281,0.772158
8,0.8315,0.655141,0.7769,0.778343,0.776658,0.776399


[I 2025-04-06 08:07:31,958] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0025480346567739926, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0242,0.718231,0.7512,0.75604,0.750943,0.749527
2,0.9153,0.71075,0.7568,0.769133,0.75686,0.757857
3,0.8965,0.678553,0.7669,0.769944,0.766487,0.76451
4,0.876,0.668074,0.7712,0.773252,0.770578,0.768924


[I 2025-04-06 08:13:06,784] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.001575767139294298, 'weight_decay': 0.001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.031,0.720206,0.7512,0.756202,0.751059,0.75006
2,0.8915,0.687658,0.7674,0.771711,0.767233,0.767473
3,0.8699,0.680557,0.7657,0.769645,0.76558,0.763717
4,0.853,0.658446,0.7745,0.776498,0.774236,0.773107
5,0.8451,0.664076,0.7737,0.775251,0.773211,0.771414
6,0.8383,0.687074,0.7634,0.777151,0.763026,0.766086
7,0.8316,0.668399,0.7713,0.775999,0.771523,0.771047
8,0.8271,0.655506,0.7783,0.779858,0.778024,0.777917
9,0.8165,0.665269,0.7723,0.775893,0.771719,0.773002
10,0.8122,0.653522,0.7748,0.777219,0.774692,0.774028


[I 2025-04-06 08:27:18,733] Trial 51 finished with value: 0.7740279549146425 and parameters: {'learning_rate': 0.001575767139294298, 'weight_decay': 0.001, 'warmup_steps': 19}. Best is trial 45 with value: 0.7745343349809349.


Trial 52 with params: {'learning_rate': 0.002770702226298265, 'weight_decay': 0.001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0213,0.72368,0.75,0.755621,0.749746,0.748238
2,0.9212,0.716524,0.7539,0.769084,0.753947,0.755296
3,0.9032,0.681816,0.7675,0.770835,0.767052,0.765092
4,0.8819,0.673216,0.7695,0.772111,0.768786,0.766779


[I 2025-04-06 08:32:54,959] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0011814726706411407, 'weight_decay': 0.001, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0619,0.729189,0.7516,0.756953,0.751269,0.750897
2,0.8867,0.691031,0.767,0.771191,0.766742,0.766562
3,0.8628,0.680435,0.7641,0.76547,0.764172,0.761913
4,0.847,0.661405,0.7741,0.777605,0.773931,0.772524
5,0.8385,0.662595,0.7726,0.774116,0.772082,0.770767
6,0.8323,0.684506,0.7634,0.774853,0.763058,0.765779
7,0.8272,0.670457,0.7709,0.776399,0.771087,0.770825
8,0.8235,0.657325,0.7769,0.778415,0.77663,0.776548


[I 2025-04-06 08:44:02,814] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0013077995139628242, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.046,0.725212,0.7516,0.756979,0.75128,0.750849
2,0.8874,0.689336,0.7671,0.771369,0.766833,0.766861
3,0.8646,0.679528,0.7647,0.766698,0.764729,0.76259
4,0.8485,0.659319,0.7753,0.778249,0.775106,0.773876
5,0.8403,0.66307,0.7736,0.775148,0.773081,0.771654
6,0.834,0.685172,0.7627,0.774831,0.762334,0.765171
7,0.8284,0.670161,0.7712,0.776636,0.771394,0.771123
8,0.8245,0.656578,0.7771,0.778576,0.776839,0.776734
9,0.8151,0.665671,0.7726,0.776083,0.772022,0.773297
10,0.8117,0.654529,0.7738,0.776199,0.773698,0.772976


[I 2025-04-06 08:58:00,936] Trial 54 finished with value: 0.7729758691156943 and parameters: {'learning_rate': 0.0013077995139628242, 'weight_decay': 0.0, 'warmup_steps': 22}. Best is trial 45 with value: 0.7745343349809349.


Trial 55 with params: {'learning_rate': 0.0018148626158627419, 'weight_decay': 0.002, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.031,0.716946,0.752,0.756417,0.751904,0.750512
2,0.8967,0.689463,0.765,0.769754,0.764859,0.765429
3,0.8757,0.680763,0.7654,0.769642,0.765233,0.763201
4,0.858,0.660248,0.7741,0.77623,0.773751,0.772722
5,0.85,0.665535,0.7737,0.775026,0.773262,0.771371
6,0.8428,0.688291,0.7639,0.778242,0.763556,0.766701
7,0.835,0.66581,0.772,0.775514,0.772205,0.771633
8,0.8299,0.655118,0.7778,0.779259,0.777529,0.777362
9,0.8181,0.665404,0.7712,0.775017,0.770612,0.771925
10,0.8129,0.653081,0.7753,0.77775,0.775188,0.774532


[I 2025-04-06 09:12:06,341] Trial 55 finished with value: 0.7745322702279556 and parameters: {'learning_rate': 0.0018148626158627419, 'weight_decay': 0.002, 'warmup_steps': 25}. Best is trial 45 with value: 0.7745343349809349.


Trial 56 with params: {'learning_rate': 0.003404128178808177, 'weight_decay': 0.003, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0368,0.749223,0.7414,0.751257,0.741028,0.739046
2,0.9409,0.729311,0.7501,0.771824,0.750057,0.752497
3,0.9236,0.699818,0.7607,0.768148,0.760016,0.758352
4,0.8998,0.687964,0.7671,0.771881,0.766346,0.76385


[I 2025-04-06 09:17:39,469] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.001995564173285082, 'weight_decay': 0.002, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0279,0.714923,0.7521,0.755934,0.752004,0.750436
2,0.901,0.693438,0.7631,0.768669,0.763011,0.763698
3,0.8804,0.679584,0.7665,0.77048,0.766276,0.764245
4,0.8621,0.661441,0.7746,0.776756,0.774201,0.773154
5,0.8539,0.6669,0.7739,0.775309,0.77356,0.771467
6,0.8464,0.688814,0.7639,0.778202,0.76356,0.766547
7,0.8377,0.663844,0.7725,0.775387,0.772697,0.7721
8,0.832,0.65516,0.7765,0.777972,0.776252,0.775987


[I 2025-04-06 09:28:49,194] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.00021771047684957567, 'weight_decay': 0.01, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3722,0.874197,0.7273,0.727197,0.726609,0.725769
2,0.9887,0.774624,0.7482,0.749706,0.747926,0.747481


[I 2025-04-06 09:31:37,878] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0014334657492170587, 'weight_decay': 0.002, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0455,0.722841,0.7504,0.75588,0.750168,0.749529
2,0.8893,0.688329,0.7673,0.771361,0.767055,0.767154
3,0.867,0.679833,0.7649,0.767983,0.764889,0.762922
4,0.8505,0.658307,0.7756,0.777833,0.775355,0.774132
5,0.8425,0.663571,0.7737,0.775295,0.773187,0.771555
6,0.836,0.686093,0.7626,0.775645,0.76221,0.765192
7,0.8299,0.669567,0.7715,0.776629,0.771713,0.771326
8,0.8257,0.656003,0.778,0.779476,0.777735,0.77761
9,0.8157,0.665403,0.7731,0.776706,0.772511,0.773815
10,0.8119,0.653986,0.7741,0.776538,0.773996,0.7733


[I 2025-04-06 09:45:26,950] Trial 59 finished with value: 0.7732995035888625 and parameters: {'learning_rate': 0.0014334657492170587, 'weight_decay': 0.002, 'warmup_steps': 27}. Best is trial 45 with value: 0.7745343349809349.


Trial 60 with params: {'learning_rate': 0.00018265618026664144, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4127,0.907209,0.7238,0.723795,0.72301,0.722227
2,1.0138,0.793913,0.7447,0.746123,0.74439,0.744059


[I 2025-04-06 09:48:17,736] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.002018174566413115, 'weight_decay': 0.001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0216,0.714501,0.7522,0.756047,0.752098,0.750546
2,0.9014,0.693943,0.7632,0.768873,0.76312,0.763811
3,0.8809,0.679379,0.7665,0.77042,0.766262,0.76422
4,0.8626,0.661523,0.7745,0.776629,0.774098,0.773045
5,0.8543,0.667036,0.7737,0.775079,0.773372,0.771261
6,0.8468,0.688849,0.7643,0.778526,0.763955,0.766904
7,0.838,0.663609,0.7723,0.775113,0.772481,0.77191
8,0.8323,0.655171,0.7765,0.777959,0.776257,0.775977


[I 2025-04-06 09:59:36,443] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0020036258431615902, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0248,0.714714,0.7519,0.75569,0.751799,0.750223
2,0.9011,0.69359,0.7632,0.768804,0.763116,0.763794
3,0.8806,0.679514,0.7665,0.770459,0.766266,0.764224
4,0.8622,0.661481,0.7745,0.77662,0.774102,0.773033
5,0.854,0.666934,0.7738,0.77519,0.773464,0.77136
6,0.8465,0.688816,0.7642,0.778396,0.763861,0.766809
7,0.8378,0.663751,0.7723,0.77518,0.77249,0.77191
8,0.8321,0.65516,0.7766,0.778056,0.776354,0.776081


[I 2025-04-06 10:10:59,253] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0036945481019053385, 'weight_decay': 0.002, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0384,0.76229,0.7379,0.750289,0.737504,0.735588
2,0.9508,0.732752,0.75,0.772392,0.749934,0.752453
3,0.9329,0.708099,0.759,0.768728,0.758238,0.756546
4,0.9082,0.691645,0.7659,0.770451,0.765201,0.762679


[I 2025-04-06 10:16:37,846] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0014611665515794692, 'weight_decay': 0.001, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0432,0.722307,0.7504,0.755803,0.750196,0.749464
2,0.8897,0.688135,0.7671,0.771197,0.766873,0.766992
3,0.8676,0.679977,0.7653,0.768683,0.765241,0.76336
4,0.851,0.658244,0.7762,0.778437,0.775958,0.774775
5,0.843,0.663673,0.7737,0.775336,0.773191,0.771551
6,0.8364,0.68631,0.7628,0.775906,0.762405,0.765375
7,0.8302,0.669361,0.771,0.776115,0.771213,0.770816
8,0.826,0.655889,0.7782,0.779665,0.777934,0.777807
9,0.8158,0.665359,0.7729,0.776503,0.772315,0.773617
10,0.8119,0.653893,0.7743,0.77669,0.774191,0.773503


[I 2025-04-06 10:30:45,508] Trial 64 finished with value: 0.7735026910172633 and parameters: {'learning_rate': 0.0014611665515794692, 'weight_decay': 0.001, 'warmup_steps': 26}. Best is trial 45 with value: 0.7745343349809349.


Trial 65 with params: {'learning_rate': 0.00010546468583372021, 'weight_decay': 0.008, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6251,1.068721,0.7048,0.70369,0.703901,0.70283
2,1.1299,0.883317,0.7305,0.731738,0.730139,0.730013
3,1.0213,0.821577,0.7397,0.740869,0.73946,0.738418
4,0.9723,0.762832,0.7521,0.756286,0.75153,0.751322


[I 2025-04-06 10:36:22,039] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.004239451106161721, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0593,0.779399,0.7353,0.749604,0.73481,0.733195
2,0.971,0.730734,0.7529,0.770694,0.753,0.754026


[I 2025-04-06 10:39:12,570] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0017467359698663553, 'weight_decay': 0.003, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0315,0.717886,0.752,0.756521,0.751898,0.750578
2,0.8951,0.688561,0.7653,0.769686,0.765173,0.765589
3,0.874,0.68093,0.7656,0.769857,0.765435,0.763435
4,0.8565,0.65973,0.7739,0.775784,0.77359,0.77248
5,0.8486,0.66504,0.774,0.775333,0.773562,0.771716
6,0.8415,0.688017,0.7633,0.777446,0.762946,0.766044
7,0.834,0.666604,0.7717,0.775559,0.771914,0.771354
8,0.8291,0.655191,0.7779,0.779385,0.77763,0.777477
9,0.8176,0.665332,0.7715,0.77528,0.770915,0.772225
10,0.8127,0.653182,0.7748,0.777247,0.774692,0.774019


[I 2025-04-06 10:53:21,823] Trial 67 finished with value: 0.7740188268968129 and parameters: {'learning_rate': 0.0017467359698663553, 'weight_decay': 0.003, 'warmup_steps': 24}. Best is trial 45 with value: 0.7745343349809349.


Trial 68 with params: {'learning_rate': 0.002565160182164868, 'weight_decay': 0.003, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0269,0.718762,0.7507,0.755568,0.750428,0.749044
2,0.9158,0.711274,0.7567,0.76925,0.75676,0.757801
3,0.897,0.678733,0.7671,0.770133,0.766689,0.764712
4,0.8765,0.668493,0.7712,0.773243,0.770573,0.76889


[I 2025-04-06 10:59:00,233] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0014187539722743078, 'weight_decay': 0.003, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0439,0.722993,0.7503,0.755865,0.75006,0.749439
2,0.889,0.688389,0.7675,0.771665,0.767253,0.767354
3,0.8667,0.679748,0.7656,0.768659,0.765589,0.763638
4,0.8503,0.658368,0.7757,0.778011,0.775455,0.774277
5,0.8422,0.663515,0.7738,0.775354,0.773294,0.771677
6,0.8357,0.685979,0.7627,0.775682,0.762317,0.765304
7,0.8297,0.669658,0.7715,0.776642,0.771712,0.771335
8,0.8255,0.656056,0.7781,0.779576,0.777834,0.777718
9,0.8156,0.665434,0.7728,0.776425,0.772212,0.773522
10,0.8119,0.65403,0.774,0.776428,0.773897,0.7732


[I 2025-04-06 11:12:57,270] Trial 69 finished with value: 0.7732004680092858 and parameters: {'learning_rate': 0.0014187539722743078, 'weight_decay': 0.003, 'warmup_steps': 25}. Best is trial 45 with value: 0.7745343349809349.


Trial 70 with params: {'learning_rate': 0.0007232030656500548, 'weight_decay': 0.002, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1109,0.74917,0.7464,0.749445,0.745966,0.745001
2,0.8933,0.700649,0.7628,0.765575,0.762645,0.761583


[I 2025-04-06 11:15:44,647] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.001886198183438793, 'weight_decay': 0.004, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0306,0.716047,0.7513,0.755487,0.751198,0.749695
2,0.8984,0.690792,0.7643,0.769304,0.764161,0.764813
3,0.8776,0.680376,0.7652,0.76938,0.76501,0.762952
4,0.8596,0.660753,0.7743,0.776397,0.773934,0.772885
5,0.8515,0.666049,0.7735,0.774956,0.77311,0.771135
6,0.8442,0.688516,0.7636,0.777847,0.76327,0.766343
7,0.836,0.665007,0.7727,0.77585,0.77291,0.772287
8,0.8307,0.655113,0.7777,0.779161,0.777441,0.77723
9,0.8186,0.66549,0.771,0.77487,0.770385,0.771715
10,0.8132,0.652997,0.7753,0.777697,0.775192,0.774543


[I 2025-04-06 11:29:55,137] Trial 71 finished with value: 0.7745430633961783 and parameters: {'learning_rate': 0.001886198183438793, 'weight_decay': 0.004, 'warmup_steps': 26}. Best is trial 71 with value: 0.7745430633961783.


Trial 72 with params: {'learning_rate': 0.001774323505148733, 'weight_decay': 0.005, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0309,0.717497,0.7522,0.756627,0.752099,0.750739
2,0.8958,0.688895,0.7649,0.769441,0.764773,0.765259
3,0.8747,0.680893,0.7654,0.769668,0.765229,0.763201
4,0.8571,0.659928,0.7742,0.776303,0.773873,0.772823
5,0.8492,0.665233,0.7736,0.774878,0.773165,0.771298
6,0.842,0.688113,0.7636,0.777723,0.763251,0.766357
7,0.8344,0.666286,0.7716,0.775317,0.771816,0.771247
8,0.8294,0.655154,0.7779,0.779383,0.777623,0.777471
9,0.8178,0.66536,0.7712,0.774987,0.77062,0.771923
10,0.8128,0.653141,0.775,0.777424,0.774892,0.774229


[I 2025-04-06 11:44:09,041] Trial 72 finished with value: 0.7742290326441539 and parameters: {'learning_rate': 0.001774323505148733, 'weight_decay': 0.005, 'warmup_steps': 24}. Best is trial 71 with value: 0.7745430633961783.


Trial 73 with params: {'learning_rate': 0.0018232042107775785, 'weight_decay': 0.005, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0277,0.716687,0.7519,0.756247,0.751802,0.750377
2,0.8968,0.689549,0.7646,0.769362,0.764459,0.765034
3,0.8759,0.680714,0.7653,0.769545,0.765133,0.763098
4,0.8581,0.660295,0.7742,0.776327,0.773849,0.772812
5,0.8502,0.66557,0.7736,0.774947,0.773167,0.771265
6,0.8429,0.688298,0.7637,0.778024,0.763356,0.766482
7,0.8351,0.665721,0.7721,0.775547,0.772307,0.771716
8,0.83,0.655118,0.7779,0.779372,0.777628,0.777463
9,0.8181,0.6654,0.7713,0.775102,0.770709,0.772013
10,0.813,0.653069,0.7753,0.777718,0.77519,0.774532


[I 2025-04-06 11:58:21,504] Trial 73 finished with value: 0.774531545841616 and parameters: {'learning_rate': 0.0018232042107775785, 'weight_decay': 0.005, 'warmup_steps': 22}. Best is trial 71 with value: 0.7745430633961783.


Trial 74 with params: {'learning_rate': 0.004417007776501617, 'weight_decay': 0.007, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0554,0.780444,0.736,0.749769,0.735479,0.734124
2,0.9772,0.729119,0.7545,0.770553,0.754647,0.755319


[I 2025-04-06 12:01:09,512] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0019158666126225944, 'weight_decay': 0.005, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0301,0.715689,0.7515,0.755579,0.751404,0.749879
2,0.8991,0.691459,0.764,0.769202,0.763873,0.764564
3,0.8783,0.680174,0.7656,0.769723,0.765407,0.763342
4,0.8602,0.660944,0.7742,0.776242,0.773829,0.772778
5,0.8522,0.666289,0.7737,0.775124,0.77332,0.771322
6,0.8448,0.688609,0.7637,0.778126,0.763373,0.766471
7,0.8365,0.664672,0.7728,0.775825,0.772999,0.772381
8,0.8311,0.6551,0.7774,0.778835,0.777149,0.776923
9,0.8188,0.665547,0.7709,0.774865,0.770286,0.771641
10,0.8133,0.652982,0.7752,0.777641,0.77509,0.774464


[I 2025-04-06 12:15:19,707] Trial 75 finished with value: 0.774463534432444 and parameters: {'learning_rate': 0.0019158666126225944, 'weight_decay': 0.005, 'warmup_steps': 26}. Best is trial 71 with value: 0.7745430633961783.


Trial 76 with params: {'learning_rate': 0.0019146849461574894, 'weight_decay': 0.005, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0291,0.715672,0.7515,0.755514,0.751405,0.749857
2,0.899,0.691376,0.7639,0.769132,0.763771,0.764471
3,0.8783,0.680189,0.7657,0.76982,0.765507,0.763444
4,0.8602,0.660946,0.7743,0.776393,0.773926,0.772885
5,0.8521,0.666269,0.7737,0.775095,0.77332,0.771314
6,0.8448,0.688591,0.7636,0.778019,0.763275,0.766367
7,0.8364,0.664699,0.7728,0.775825,0.772999,0.772381
8,0.8311,0.655115,0.7773,0.778721,0.777049,0.776813


[I 2025-04-06 12:26:43,929] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0025241058441380043, 'weight_decay': 0.006, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0232,0.717752,0.7518,0.756684,0.751553,0.750183
2,0.9146,0.710025,0.7575,0.769531,0.757559,0.758515
3,0.8957,0.678364,0.7663,0.769399,0.765893,0.763892
4,0.8754,0.667592,0.7718,0.773805,0.771194,0.769607


[I 2025-04-06 12:32:26,421] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.00032978721519087825, 'weight_decay': 0.005, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2597,0.812237,0.7351,0.735107,0.734472,0.733348
2,0.9414,0.738851,0.7556,0.757782,0.755372,0.754776
3,0.8939,0.713407,0.7575,0.758673,0.757348,0.755574
4,0.8703,0.692238,0.7673,0.775357,0.766894,0.76702


[I 2025-04-06 12:38:02,459] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0013340511922457711, 'weight_decay': 0.006, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0526,0.725007,0.751,0.756289,0.750685,0.750181
2,0.888,0.689198,0.7664,0.770638,0.76613,0.766162
3,0.8652,0.679573,0.7648,0.766983,0.764819,0.762753
4,0.849,0.659008,0.7758,0.778533,0.775585,0.774383
5,0.8408,0.663201,0.7731,0.774638,0.772577,0.771107
6,0.8344,0.68539,0.7632,0.77551,0.762821,0.765692
7,0.8287,0.670097,0.7712,0.776532,0.771387,0.771098
8,0.8248,0.656454,0.7771,0.778556,0.77684,0.77672


[I 2025-04-06 12:49:15,404] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.00447310219573135, 'weight_decay': 0.005, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0628,0.780123,0.7361,0.74986,0.735579,0.734314
2,0.9796,0.728472,0.754,0.769007,0.754176,0.754645
3,0.9577,0.723908,0.7554,0.769983,0.754398,0.753536
4,0.9318,0.692928,0.7674,0.770659,0.766773,0.764919


[I 2025-04-06 12:54:57,184] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 7.323713197360346e-05, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7398,1.210469,0.6914,0.689608,0.690441,0.689078
2,1.2348,0.970118,0.7213,0.722507,0.720906,0.720775


[I 2025-04-06 12:57:49,233] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0016152179934449862, 'weight_decay': 0.005, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0341,0.719829,0.7508,0.755657,0.750673,0.749584
2,0.8924,0.687746,0.7674,0.771677,0.767232,0.767509
3,0.8709,0.680791,0.7658,0.769795,0.765664,0.763806
4,0.8538,0.658724,0.7744,0.776409,0.774114,0.773
5,0.8459,0.664298,0.7743,0.775861,0.77382,0.772014
6,0.8391,0.687328,0.7637,0.777404,0.763333,0.766375
7,0.8322,0.668016,0.771,0.775573,0.77122,0.770716
8,0.8276,0.655403,0.7782,0.779716,0.777927,0.777792
9,0.8167,0.665264,0.7717,0.775335,0.771118,0.772405
10,0.8123,0.653426,0.7747,0.777125,0.774588,0.773914


[I 2025-04-06 13:12:01,382] Trial 82 finished with value: 0.7739139977898413 and parameters: {'learning_rate': 0.0016152179934449862, 'weight_decay': 0.005, 'warmup_steps': 23}. Best is trial 71 with value: 0.7745430633961783.


Trial 83 with params: {'learning_rate': 0.0018800432432912505, 'weight_decay': 0.005, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0317,0.716171,0.7516,0.755781,0.751493,0.749995
2,0.8983,0.690704,0.7642,0.769179,0.764062,0.764705
3,0.8774,0.680418,0.7652,0.769489,0.765014,0.762985
4,0.8594,0.660717,0.7743,0.776406,0.773933,0.772887
5,0.8514,0.666033,0.7734,0.774871,0.773005,0.771032
6,0.8441,0.688502,0.7637,0.777954,0.763369,0.766437
7,0.8359,0.665072,0.7725,0.775746,0.772715,0.772112
8,0.8307,0.655124,0.7776,0.779058,0.777343,0.77713
9,0.8186,0.665493,0.7709,0.774761,0.770287,0.771618
10,0.8132,0.65301,0.7753,0.777697,0.775192,0.774543


[I 2025-04-06 13:25:48,143] Trial 83 finished with value: 0.7745430633961783 and parameters: {'learning_rate': 0.0018800432432912505, 'weight_decay': 0.005, 'warmup_steps': 27}. Best is trial 71 with value: 0.7745430633961783.


Trial 84 with params: {'learning_rate': 0.0021321940392215913, 'weight_decay': 0.005, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0293,0.714508,0.7516,0.755374,0.751465,0.749894
2,0.9045,0.697627,0.763,0.769981,0.762958,0.763782
3,0.8842,0.678567,0.7673,0.770842,0.767032,0.764939
4,0.8654,0.662369,0.7742,0.776122,0.773774,0.772651
5,0.8569,0.667993,0.7723,0.773851,0.771991,0.769843
6,0.8492,0.689166,0.7634,0.777343,0.76304,0.765858
7,0.8398,0.662524,0.7723,0.774845,0.772442,0.771945
8,0.8338,0.655357,0.7759,0.777432,0.775672,0.775369


[I 2025-04-06 13:37:01,363] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0029091534136538913, 'weight_decay': 0.008, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0313,0.728691,0.7485,0.754967,0.748226,0.746662
2,0.9255,0.719857,0.7519,0.768717,0.751926,0.753594


[I 2025-04-06 13:39:45,274] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0010658998215611014, 'weight_decay': 0.005, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0629,0.732503,0.7504,0.755569,0.750014,0.749627
2,0.8863,0.692582,0.7664,0.77034,0.766149,0.76576
3,0.8615,0.681987,0.764,0.765137,0.764066,0.761658
4,0.8458,0.663688,0.7724,0.776704,0.772262,0.770865


[I 2025-04-06 13:45:22,511] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0012208677352467372, 'weight_decay': 0.004, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0615,0.728123,0.7514,0.756806,0.751091,0.750723
2,0.887,0.690558,0.7674,0.771641,0.767127,0.766982
3,0.8634,0.680057,0.7645,0.765981,0.76456,0.762365
4,0.8475,0.660663,0.7745,0.777693,0.774318,0.772923
5,0.839,0.662754,0.7731,0.774598,0.772591,0.77121
6,0.8328,0.684726,0.7632,0.774672,0.762844,0.765512
7,0.8276,0.670409,0.7706,0.77601,0.770789,0.770498
8,0.8238,0.65709,0.7768,0.778296,0.776544,0.776449


[I 2025-04-06 13:56:26,541] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0010658746562619715, 'weight_decay': 0.007, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0556,0.732101,0.7504,0.755467,0.750012,0.749591
2,0.8861,0.692439,0.7668,0.770736,0.76656,0.766185
3,0.8614,0.681938,0.764,0.765159,0.76407,0.761648
4,0.8458,0.663683,0.7722,0.776569,0.772066,0.770682


[I 2025-04-06 14:02:08,034] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.003414539309470307, 'weight_decay': 0.005, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0291,0.749313,0.7418,0.751609,0.741422,0.739472
2,0.941,0.729345,0.7503,0.77193,0.750259,0.752705
3,0.9237,0.700016,0.7608,0.768268,0.760114,0.758446
4,0.8999,0.688092,0.7669,0.771741,0.766149,0.763652


[I 2025-04-06 14:07:45,941] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0027151418309111987, 'weight_decay': 0.005, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0236,0.722212,0.7499,0.755322,0.749651,0.748203
2,0.9198,0.71523,0.7548,0.769155,0.754843,0.756096
3,0.9016,0.68077,0.7666,0.769778,0.766172,0.764216
4,0.8804,0.671864,0.7698,0.772285,0.76911,0.767153


[I 2025-04-06 14:13:24,068] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0008281307579749255, 'weight_decay': 0.005, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0953,0.742447,0.7486,0.752312,0.748152,0.747411
2,0.8896,0.697513,0.7643,0.767416,0.764159,0.763134


[I 2025-04-06 14:16:16,103] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0023883862599134117, 'weight_decay': 0.005, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0246,0.715757,0.7507,0.754982,0.75047,0.749015
2,0.911,0.705935,0.7595,0.769821,0.759536,0.760447
3,0.8916,0.677744,0.7677,0.770739,0.767321,0.765271
4,0.8718,0.665183,0.7734,0.77526,0.772836,0.771454
5,0.8624,0.669937,0.7727,0.774631,0.77242,0.770316
6,0.8544,0.689756,0.7631,0.77638,0.762741,0.765334
7,0.844,0.660788,0.7734,0.775479,0.773484,0.773085
8,0.8369,0.655995,0.775,0.776591,0.774759,0.774338


[I 2025-04-06 14:27:20,320] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 8.829328561458744e-05, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6739,1.13141,0.6976,0.696063,0.696707,0.695438
2,1.1764,0.921429,0.7271,0.72833,0.726735,0.726653


[I 2025-04-06 14:30:12,575] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0018109200944574327, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0259,0.716797,0.752,0.756413,0.751909,0.750497
2,0.8965,0.689314,0.7646,0.769357,0.764463,0.765032
3,0.8755,0.680749,0.7653,0.76956,0.765133,0.763104
4,0.8578,0.66016,0.7741,0.776193,0.773758,0.772707
5,0.8499,0.665449,0.7736,0.7749,0.773164,0.771258
6,0.8427,0.688206,0.764,0.778259,0.763653,0.766783
7,0.8349,0.665877,0.7722,0.775725,0.772409,0.771825
8,0.8298,0.655132,0.778,0.779461,0.777725,0.777568
9,0.818,0.665372,0.7712,0.775006,0.770612,0.77192
10,0.8129,0.65308,0.7752,0.777627,0.775089,0.774426


[I 2025-04-06 14:44:05,318] Trial 94 finished with value: 0.7744264407517025 and parameters: {'learning_rate': 0.0018109200944574327, 'weight_decay': 0.003, 'warmup_steps': 20}. Best is trial 71 with value: 0.7745430633961783.


Trial 95 with params: {'learning_rate': 0.0014433085079755897, 'weight_decay': 0.004, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0348,0.722166,0.7511,0.756443,0.750882,0.75021
2,0.8891,0.688073,0.7673,0.771383,0.767052,0.767145
3,0.8671,0.679803,0.7648,0.767965,0.764766,0.762822
4,0.8506,0.658247,0.7757,0.777957,0.775461,0.774236
5,0.8426,0.663555,0.7737,0.775308,0.773187,0.771548
6,0.8361,0.686131,0.7627,0.775816,0.762307,0.765283
7,0.83,0.669479,0.7713,0.776409,0.771514,0.771108
8,0.8257,0.655947,0.7781,0.779562,0.777836,0.77771
9,0.8157,0.665367,0.7731,0.776706,0.772511,0.773815
10,0.8119,0.653945,0.7744,0.776811,0.774296,0.773608


[I 2025-04-06 14:58:15,292] Trial 95 finished with value: 0.7736079057607188 and parameters: {'learning_rate': 0.0014433085079755897, 'weight_decay': 0.004, 'warmup_steps': 18}. Best is trial 71 with value: 0.7745430633961783.


Trial 96 with params: {'learning_rate': 0.0016178660421949948, 'weight_decay': 0.006, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0373,0.719937,0.7508,0.755592,0.750683,0.749555
2,0.8925,0.687815,0.7674,0.771684,0.767229,0.7675
3,0.871,0.680817,0.766,0.769994,0.765866,0.763999
4,0.8539,0.65877,0.7746,0.776588,0.774314,0.773209
5,0.846,0.664329,0.7742,0.77575,0.773723,0.771915
6,0.8391,0.68736,0.7635,0.777225,0.763139,0.766171
7,0.8322,0.668003,0.771,0.775573,0.77122,0.770716
8,0.8276,0.655406,0.7782,0.779716,0.777927,0.777792
9,0.8168,0.66527,0.7717,0.775334,0.771118,0.772403
10,0.8123,0.653448,0.7747,0.777125,0.774588,0.773914


[I 2025-04-06 15:12:13,842] Trial 96 finished with value: 0.7739139977898413 and parameters: {'learning_rate': 0.0016178660421949948, 'weight_decay': 0.006, 'warmup_steps': 26}. Best is trial 71 with value: 0.7745430633961783.


Trial 97 with params: {'learning_rate': 0.0024540802589428363, 'weight_decay': 0.004, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.022,0.716521,0.7518,0.756383,0.751558,0.750151
2,0.9127,0.707956,0.7586,0.769598,0.758647,0.759578


[I 2025-04-06 15:15:02,822] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0035054904723296637, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0187,0.752196,0.7404,0.750919,0.740018,0.737957
2,0.9437,0.730618,0.7505,0.772502,0.750449,0.752906


[I 2025-04-06 15:17:51,715] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0026386155044148475, 'weight_decay': 0.004, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0273,0.720419,0.7505,0.755475,0.750237,0.748792
2,0.9178,0.713286,0.756,0.769504,0.756047,0.757206
3,0.8993,0.679609,0.7665,0.769541,0.766085,0.764108
4,0.8784,0.670126,0.7713,0.77339,0.770655,0.768831


[I 2025-04-06 15:23:20,547] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0013387157550011364, 'weight_decay': 0.004, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.049,0.724721,0.751,0.756237,0.750686,0.750179
2,0.8879,0.689097,0.7664,0.77059,0.766128,0.766165
3,0.8652,0.679559,0.7646,0.76684,0.764618,0.762558
4,0.849,0.658951,0.7759,0.77863,0.775683,0.774492
5,0.8408,0.663203,0.7731,0.774651,0.772579,0.7711
6,0.8345,0.685417,0.7633,0.775601,0.762924,0.765796
7,0.8288,0.670061,0.7712,0.776535,0.771392,0.771098
8,0.8248,0.656424,0.7773,0.778738,0.77704,0.776918
9,0.8152,0.665598,0.7725,0.775952,0.771917,0.773182
10,0.8118,0.654386,0.7737,0.776138,0.773593,0.772888


[I 2025-04-06 15:37:37,857] Trial 100 finished with value: 0.7728877671110191 and parameters: {'learning_rate': 0.0013387157550011364, 'weight_decay': 0.004, 'warmup_steps': 26}. Best is trial 71 with value: 0.7745430633961783.


Trial 101 with params: {'learning_rate': 0.0008993841691795507, 'weight_decay': 0.003, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0733,0.738292,0.7493,0.75335,0.748871,0.748229
2,0.8877,0.695558,0.7657,0.768874,0.76553,0.764645
3,0.8609,0.684179,0.7635,0.764407,0.763601,0.760973
4,0.845,0.666933,0.7726,0.777827,0.772522,0.771258


[I 2025-04-06 15:43:03,823] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.00221232211277178, 'weight_decay': 0.003, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0221,0.714271,0.7509,0.754883,0.750732,0.749215
2,0.9063,0.700092,0.7624,0.770041,0.762367,0.763197


[I 2025-04-06 15:45:52,341] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.001986550845798475, 'weight_decay': 0.002, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.02,0.714662,0.7516,0.755513,0.751499,0.749934
2,0.9005,0.69305,0.7631,0.768629,0.763007,0.763703
3,0.8801,0.679632,0.7665,0.770495,0.766276,0.764249
4,0.8618,0.661283,0.7748,0.776954,0.7744,0.773363
5,0.8536,0.666774,0.7738,0.77519,0.773457,0.771362
6,0.8461,0.688754,0.764,0.778304,0.763662,0.766657
7,0.8375,0.663935,0.7724,0.775327,0.772605,0.771981
8,0.8319,0.655134,0.7764,0.777888,0.776164,0.775895


[I 2025-04-06 15:56:58,481] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 6.119956273045214e-05, 'weight_decay': 0.006, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8169,1.300717,0.6814,0.679644,0.680425,0.678996
2,1.3016,1.027244,0.7155,0.716928,0.71503,0.714977


[I 2025-04-06 15:59:49,255] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0006078662726350267, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1112,0.757957,0.7456,0.747934,0.745138,0.744029
2,0.8992,0.705241,0.7624,0.764928,0.762245,0.761168
3,0.8664,0.688218,0.7629,0.763236,0.762912,0.760591
4,0.8486,0.674197,0.7692,0.77607,0.769056,0.768318


[I 2025-04-06 16:05:27,517] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0011278812824820428, 'weight_decay': 0.001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0578,0.730475,0.751,0.756286,0.750639,0.75028
2,0.8863,0.691645,0.7668,0.770827,0.766546,0.766273
3,0.8621,0.68108,0.7636,0.76489,0.763686,0.761346
4,0.8464,0.662441,0.7732,0.776913,0.773048,0.771604


[I 2025-04-06 16:10:57,647] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.003310309541434644, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.032,0.744708,0.7417,0.750763,0.741356,0.73943
2,0.9377,0.727727,0.7506,0.771458,0.750564,0.752885
3,0.9204,0.696803,0.7615,0.768237,0.760841,0.759164
4,0.897,0.68629,0.7669,0.771519,0.766119,0.763675


[I 2025-04-06 16:16:27,237] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.00024201301766163896, 'weight_decay': 0.0, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3474,0.856513,0.7316,0.731424,0.730903,0.729981
2,0.9753,0.764363,0.7492,0.750794,0.748928,0.748433
3,0.917,0.733452,0.7525,0.753507,0.752305,0.750667
4,0.8889,0.704585,0.763,0.770638,0.762477,0.762741


[I 2025-04-06 16:21:56,486] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0011631314317800405, 'weight_decay': 0.002, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0515,0.729175,0.7514,0.756583,0.75105,0.750637
2,0.8863,0.691078,0.7668,0.770951,0.766558,0.766318
3,0.8624,0.680604,0.7641,0.76546,0.764178,0.761863
4,0.8467,0.661727,0.7735,0.777071,0.77334,0.771901


[I 2025-04-06 16:27:20,454] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.00037970811347696283, 'weight_decay': 0.003, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2088,0.795258,0.7391,0.73952,0.738516,0.737438
2,0.9284,0.728889,0.7588,0.761049,0.758565,0.757898


[I 2025-04-06 16:30:05,846] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0016524097301234063, 'weight_decay': 0.002, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0319,0.71923,0.7514,0.756102,0.751288,0.750098
2,0.8931,0.687832,0.7667,0.771014,0.766546,0.766858
3,0.8717,0.680887,0.7658,0.769969,0.765647,0.763744
4,0.8545,0.658961,0.774,0.776009,0.773718,0.77261
5,0.8466,0.664485,0.7745,0.776007,0.774033,0.772211
6,0.8397,0.68755,0.7632,0.77717,0.762849,0.765932
7,0.8327,0.667628,0.771,0.77539,0.771216,0.770737
8,0.828,0.655322,0.7782,0.779701,0.77792,0.777783
9,0.817,0.665264,0.7716,0.775294,0.77102,0.772318
10,0.8124,0.653344,0.7748,0.777218,0.774693,0.77401


[I 2025-04-06 16:43:49,199] Trial 111 finished with value: 0.7740100195687489 and parameters: {'learning_rate': 0.0016524097301234063, 'weight_decay': 0.002, 'warmup_steps': 22}. Best is trial 71 with value: 0.7745430633961783.


Trial 112 with params: {'learning_rate': 0.002128791829494182, 'weight_decay': 0.003, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0293,0.714528,0.7517,0.755532,0.75157,0.750001
2,0.9044,0.697508,0.7627,0.769652,0.762655,0.763483
3,0.8841,0.678593,0.7673,0.770835,0.767032,0.764932
4,0.8653,0.662332,0.7742,0.776122,0.773774,0.772651


[I 2025-04-06 16:49:26,405] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0011444213135820712, 'weight_decay': 0.0, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0624,0.730261,0.7509,0.756156,0.750536,0.750166
2,0.8865,0.691516,0.767,0.771126,0.766756,0.766516
3,0.8623,0.6809,0.7638,0.765118,0.76388,0.761565
4,0.8466,0.662121,0.7731,0.776788,0.772942,0.771493
5,0.838,0.662456,0.7735,0.775035,0.772994,0.771747
6,0.8318,0.684335,0.7639,0.775194,0.76357,0.766253
7,0.8269,0.670507,0.7709,0.776242,0.771079,0.770808
8,0.8233,0.657586,0.7765,0.778003,0.776227,0.776133


[I 2025-04-06 17:00:38,363] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0016833724321484484, 'weight_decay': 0.003, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0343,0.718908,0.7516,0.756314,0.751486,0.750285
2,0.8938,0.688053,0.7667,0.77111,0.766548,0.766931
3,0.8725,0.680961,0.7649,0.769091,0.764745,0.762806
4,0.8552,0.659239,0.7738,0.775847,0.773512,0.772421


[I 2025-04-06 17:06:08,618] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0009298136548997302, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0781,0.737496,0.7489,0.753261,0.74848,0.747877
2,0.8874,0.695065,0.766,0.769154,0.765822,0.765018
3,0.861,0.683893,0.7631,0.76399,0.763207,0.760559
4,0.8451,0.66636,0.7722,0.777264,0.772115,0.770827


[I 2025-04-06 17:11:41,600] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.002569447300867259, 'weight_decay': 0.005, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0321,0.719231,0.751,0.75582,0.750719,0.749313
2,0.9161,0.71151,0.757,0.769623,0.757063,0.758115


[I 2025-04-06 17:14:30,512] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0011427471461749141, 'weight_decay': 0.003, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0625,0.730294,0.7508,0.756084,0.750438,0.75007
2,0.8865,0.691535,0.7672,0.771311,0.76696,0.766707
3,0.8623,0.680923,0.7638,0.765118,0.76388,0.761565
4,0.8466,0.662152,0.773,0.77668,0.772842,0.7714
5,0.8379,0.662433,0.7735,0.775034,0.772994,0.771738
6,0.8318,0.684325,0.764,0.775309,0.763675,0.766359
7,0.8269,0.670501,0.7709,0.776242,0.771079,0.770808
8,0.8233,0.657598,0.7765,0.778003,0.776227,0.776133


[I 2025-04-06 17:25:36,895] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.002045649786210483, 'weight_decay': 0.002, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0243,0.714466,0.7524,0.756183,0.752293,0.750755
2,0.9021,0.694789,0.7626,0.768572,0.762526,0.763261
3,0.8817,0.679176,0.7668,0.770726,0.766549,0.764498
4,0.8633,0.661726,0.7744,0.776501,0.77399,0.772913
5,0.8549,0.667268,0.7733,0.774788,0.772977,0.77088
6,0.8473,0.688919,0.7642,0.778301,0.763846,0.766777
7,0.8384,0.663324,0.7723,0.774964,0.772461,0.77191
8,0.8326,0.655222,0.7766,0.778106,0.776365,0.776082


[I 2025-04-06 17:36:41,465] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0006406713431828303, 'weight_decay': 0.01, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1169,0.755615,0.7458,0.748471,0.745348,0.744268
2,0.8974,0.703832,0.7625,0.764978,0.762354,0.761213


[I 2025-04-06 17:39:25,417] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0016601856874361, 'weight_decay': 0.005, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.036,0.719302,0.7515,0.756304,0.751379,0.750228
2,0.8934,0.687939,0.7668,0.771163,0.766647,0.766984
3,0.872,0.680945,0.7656,0.769763,0.765447,0.76353
4,0.8547,0.65906,0.7744,0.776408,0.774117,0.773019
5,0.8468,0.66455,0.7742,0.775602,0.773744,0.77189
6,0.8399,0.68757,0.7628,0.776826,0.762444,0.765551
7,0.8328,0.66756,0.7712,0.775585,0.771415,0.770934
8,0.8281,0.655312,0.7782,0.779701,0.77792,0.777783
9,0.817,0.665284,0.7716,0.775323,0.77102,0.772331
10,0.8124,0.653346,0.7747,0.777123,0.774595,0.773908


[I 2025-04-06 17:53:20,752] Trial 120 finished with value: 0.7739080727872003 and parameters: {'learning_rate': 0.0016601856874361, 'weight_decay': 0.005, 'warmup_steps': 26}. Best is trial 71 with value: 0.7745430633961783.


Trial 121 with params: {'learning_rate': 0.0015969565009125522, 'weight_decay': 0.002, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0325,0.719982,0.7509,0.755819,0.75077,0.749721
2,0.8919,0.687657,0.7674,0.771708,0.767245,0.767492
3,0.8704,0.680672,0.7659,0.769889,0.76578,0.763936
4,0.8534,0.658596,0.774,0.775979,0.773726,0.772598


[I 2025-04-06 17:58:49,801] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.002594936799993447, 'weight_decay': 0.002, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0271,0.719409,0.7509,0.755875,0.750628,0.749234
2,0.9166,0.7121,0.7565,0.769618,0.756556,0.757689
3,0.898,0.679061,0.7668,0.769922,0.766396,0.764442
4,0.8773,0.669141,0.7711,0.77326,0.77046,0.768731


[I 2025-04-06 18:04:24,769] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.002975239040999052, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0255,0.730715,0.7485,0.755361,0.748217,0.746577
2,0.9272,0.721207,0.7522,0.769693,0.752212,0.754075


[I 2025-04-06 18:07:14,573] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0014980106465233168, 'weight_decay': 0.001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0383,0.721548,0.7506,0.755878,0.750411,0.749607
2,0.8902,0.687864,0.7666,0.770719,0.766388,0.766529
3,0.8683,0.68013,0.7649,0.768399,0.764815,0.76291
4,0.8516,0.65824,0.7759,0.77811,0.775638,0.77449
5,0.8436,0.663773,0.774,0.775626,0.773491,0.771798
6,0.837,0.686536,0.7634,0.776675,0.76301,0.766012
7,0.8307,0.66909,0.7708,0.775791,0.771015,0.770603
8,0.8263,0.655751,0.7781,0.779562,0.777821,0.777689
9,0.816,0.66533,0.7726,0.776212,0.772012,0.773311
10,0.812,0.65376,0.7747,0.777091,0.774588,0.773922


[I 2025-04-06 18:21:13,039] Trial 124 finished with value: 0.7739222189785081 and parameters: {'learning_rate': 0.0014980106465233168, 'weight_decay': 0.001, 'warmup_steps': 23}. Best is trial 71 with value: 0.7745430633961783.


Trial 125 with params: {'learning_rate': 0.0012675417463263157, 'weight_decay': 0.003, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0495,0.726315,0.7516,0.757119,0.751285,0.75089
2,0.8871,0.689842,0.7672,0.771454,0.766942,0.766868
3,0.864,0.679684,0.7642,0.765767,0.76425,0.762037
4,0.848,0.659891,0.7749,0.777833,0.774713,0.773417
5,0.8397,0.662906,0.7731,0.774635,0.772584,0.771159
6,0.8334,0.684956,0.7627,0.774534,0.762333,0.7651
7,0.828,0.670282,0.7711,0.776532,0.77129,0.771005
8,0.8242,0.656818,0.7773,0.778753,0.777041,0.776931
9,0.815,0.665799,0.7726,0.776117,0.772034,0.773326
10,0.8117,0.654747,0.7741,0.776522,0.773994,0.773291


[I 2025-04-06 18:35:04,654] Trial 125 finished with value: 0.773290513712457 and parameters: {'learning_rate': 0.0012675417463263157, 'weight_decay': 0.003, 'warmup_steps': 23}. Best is trial 71 with value: 0.7745430633961783.


Trial 126 with params: {'learning_rate': 0.001610525487083141, 'weight_decay': 0.007, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0321,0.719819,0.7505,0.755413,0.750375,0.749293
2,0.8922,0.687679,0.7676,0.77187,0.767435,0.767694
3,0.8707,0.680741,0.7657,0.769744,0.765565,0.76371
4,0.8537,0.658668,0.7743,0.776301,0.774011,0.772897
5,0.8458,0.664244,0.7743,0.775855,0.77382,0.772016
6,0.839,0.687275,0.7638,0.777519,0.763431,0.766473
7,0.8321,0.66807,0.7712,0.77577,0.771417,0.77092
8,0.8275,0.655428,0.7781,0.779643,0.777825,0.777695
9,0.8167,0.665262,0.7718,0.775448,0.771223,0.772515
10,0.8123,0.65344,0.7747,0.777125,0.774588,0.773914


[I 2025-04-06 18:48:31,202] Trial 126 finished with value: 0.7739139977898413 and parameters: {'learning_rate': 0.001610525487083141, 'weight_decay': 0.007, 'warmup_steps': 21}. Best is trial 71 with value: 0.7745430633961783.


Trial 127 with params: {'learning_rate': 0.003943445800071269, 'weight_decay': 0.001, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0466,0.772128,0.7355,0.749016,0.735058,0.733171
2,0.9598,0.733247,0.7491,0.770689,0.749084,0.751127


[I 2025-04-06 18:51:16,966] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0016329313518799065, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0347,0.719629,0.7511,0.755965,0.750989,0.749861
2,0.8927,0.687808,0.767,0.771307,0.766831,0.767118
3,0.8713,0.680835,0.7659,0.770107,0.765757,0.763884
4,0.8541,0.658839,0.7743,0.776248,0.774017,0.772908
5,0.8463,0.664377,0.7744,0.775913,0.773924,0.772121
6,0.8394,0.687435,0.7632,0.777027,0.762846,0.765889
7,0.8324,0.667849,0.7709,0.775338,0.771116,0.770615
8,0.8278,0.655366,0.7781,0.779598,0.777822,0.777683
9,0.8169,0.665264,0.7716,0.775271,0.77102,0.772312
10,0.8124,0.653404,0.7748,0.777222,0.774693,0.774006


[I 2025-04-06 19:04:57,078] Trial 128 finished with value: 0.774005955770348 and parameters: {'learning_rate': 0.0016329313518799065, 'weight_decay': 0.0, 'warmup_steps': 24}. Best is trial 71 with value: 0.7745430633961783.


Trial 129 with params: {'learning_rate': 0.002382896461519086, 'weight_decay': 0.004, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0237,0.715641,0.7509,0.755204,0.75068,0.749215
2,0.9109,0.705733,0.7599,0.770034,0.759934,0.760798
3,0.8915,0.677721,0.7674,0.770412,0.767018,0.764972
4,0.8717,0.665103,0.7735,0.775387,0.772938,0.77156
5,0.8623,0.669877,0.7727,0.774655,0.772415,0.770327
6,0.8543,0.689767,0.7631,0.776396,0.762741,0.765336
7,0.8439,0.660826,0.7733,0.775384,0.773385,0.772984
8,0.8369,0.655979,0.7751,0.776647,0.774859,0.774432


[I 2025-04-06 19:15:57,912] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0016859792671156029, 'weight_decay': 0.004, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0288,0.718661,0.7514,0.756052,0.751285,0.750079
2,0.8937,0.687986,0.7665,0.770901,0.766351,0.766733
3,0.8725,0.680931,0.765,0.769163,0.764844,0.762903
4,0.8552,0.659193,0.7739,0.775907,0.77361,0.772521
5,0.8473,0.664633,0.7743,0.775739,0.773842,0.771998
6,0.8403,0.687678,0.7631,0.777209,0.762742,0.765846
7,0.8331,0.667259,0.7711,0.775372,0.771313,0.770812
8,0.8284,0.65527,0.7782,0.779669,0.777925,0.777776
9,0.8172,0.665263,0.7715,0.775231,0.770915,0.772218
10,0.8125,0.653285,0.7747,0.777158,0.774595,0.773931


[I 2025-04-06 19:29:48,969] Trial 130 finished with value: 0.7739313100469557 and parameters: {'learning_rate': 0.0016859792671156029, 'weight_decay': 0.004, 'warmup_steps': 20}. Best is trial 71 with value: 0.7745430633961783.


Trial 131 with params: {'learning_rate': 0.0014335464702994274, 'weight_decay': 0.0, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0432,0.722715,0.7506,0.756069,0.750366,0.749738
2,0.8892,0.688276,0.7674,0.771443,0.767154,0.767249
3,0.867,0.679847,0.7649,0.76798,0.764889,0.762924
4,0.8505,0.658313,0.7757,0.777953,0.775456,0.774239
5,0.8425,0.663561,0.7738,0.775399,0.773284,0.77166
6,0.8359,0.686092,0.7628,0.775865,0.762408,0.765384
7,0.8299,0.66953,0.7715,0.776629,0.771713,0.771326
8,0.8257,0.655981,0.778,0.77947,0.777735,0.777612
9,0.8157,0.665403,0.7731,0.77671,0.772511,0.773817
10,0.8119,0.653982,0.7742,0.776652,0.774096,0.77341


[I 2025-04-06 19:43:44,292] Trial 131 finished with value: 0.7734098913763919 and parameters: {'learning_rate': 0.0014335464702994274, 'weight_decay': 0.0, 'warmup_steps': 25}. Best is trial 71 with value: 0.7745430633961783.


Trial 132 with params: {'learning_rate': 0.0019153257014904671, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0362,0.715995,0.752,0.756155,0.751895,0.75042
2,0.8993,0.691567,0.7639,0.769112,0.763776,0.764464
3,0.8784,0.680232,0.7656,0.769738,0.765407,0.763349
4,0.8603,0.660985,0.7743,0.776331,0.773927,0.772876
5,0.8522,0.666338,0.7735,0.774971,0.773123,0.771149
6,0.8448,0.688625,0.7636,0.778019,0.763275,0.766367
7,0.8365,0.664685,0.7729,0.775906,0.773097,0.772473
8,0.8311,0.655124,0.7776,0.779048,0.777352,0.777122
9,0.8188,0.665555,0.7709,0.774865,0.770286,0.771641
10,0.8133,0.652978,0.7752,0.777641,0.77509,0.774464


[I 2025-04-06 19:57:22,115] Trial 132 finished with value: 0.774463534432444 and parameters: {'learning_rate': 0.0019153257014904671, 'weight_decay': 0.0, 'warmup_steps': 32}. Best is trial 71 with value: 0.7745430633961783.


Trial 133 with params: {'learning_rate': 0.0029148480570998308, 'weight_decay': 0.001, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.033,0.729034,0.7486,0.755082,0.748324,0.746761
2,0.9257,0.719972,0.752,0.768891,0.752024,0.753709


[I 2025-04-06 20:00:06,887] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0016035790863720442, 'weight_decay': 0.001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0443,0.720482,0.7504,0.755272,0.75028,0.749173
2,0.8924,0.687894,0.7678,0.772046,0.767626,0.767876
3,0.8708,0.680815,0.7659,0.769878,0.765772,0.763908
4,0.8536,0.658694,0.7745,0.776462,0.774217,0.773098
5,0.8458,0.664303,0.7742,0.775748,0.773718,0.771923
6,0.8389,0.687286,0.7637,0.777396,0.763329,0.766364
7,0.8321,0.668137,0.7712,0.77577,0.771417,0.77093
8,0.8275,0.655437,0.7782,0.779734,0.777926,0.777797
9,0.8167,0.665284,0.7719,0.775514,0.771315,0.7726
10,0.8123,0.65346,0.7746,0.777035,0.77449,0.773818


[I 2025-04-06 20:14:05,819] Trial 134 finished with value: 0.7738184420694365 and parameters: {'learning_rate': 0.0016035790863720442, 'weight_decay': 0.001, 'warmup_steps': 32}. Best is trial 71 with value: 0.7745430633961783.


Trial 135 with params: {'learning_rate': 0.0011832468080808546, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0652,0.729323,0.7515,0.756813,0.751171,0.750787
2,0.8868,0.691061,0.7672,0.771375,0.76694,0.766764
3,0.8629,0.680445,0.764,0.765357,0.764073,0.761813
4,0.8471,0.66137,0.774,0.777457,0.773829,0.772427


[I 2025-04-06 20:19:39,668] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0016986117880025766, 'weight_decay': 0.0, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0391,0.718938,0.7515,0.756196,0.751401,0.750182
2,0.8943,0.68825,0.7661,0.770456,0.765956,0.766342
3,0.8729,0.681014,0.7654,0.769617,0.765251,0.763297
4,0.8555,0.659386,0.7737,0.775727,0.773415,0.772325
5,0.8477,0.664789,0.7742,0.775588,0.773753,0.771901
6,0.8406,0.687793,0.763,0.777181,0.762642,0.76576
7,0.8333,0.667131,0.771,0.775041,0.771205,0.770666
8,0.8285,0.655255,0.7781,0.77957,0.777826,0.777677
9,0.8173,0.665314,0.7715,0.775231,0.770915,0.772218
10,0.8126,0.653251,0.7746,0.777044,0.774496,0.773821


[I 2025-04-06 20:33:24,437] Trial 136 finished with value: 0.7738209833843731 and parameters: {'learning_rate': 0.0016986117880025766, 'weight_decay': 0.0, 'warmup_steps': 30}. Best is trial 71 with value: 0.7745430633961783.


Trial 137 with params: {'learning_rate': 0.001748856701589005, 'weight_decay': 0.004, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0336,0.717968,0.752,0.756534,0.751897,0.750584
2,0.8953,0.68862,0.7652,0.769566,0.765076,0.765483
3,0.8741,0.680944,0.7656,0.769836,0.765435,0.763431
4,0.8565,0.65973,0.774,0.775875,0.773687,0.772575


[I 2025-04-06 20:38:51,992] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0014084368172778302, 'weight_decay': 0.002, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0386,0.722905,0.7512,0.756706,0.750944,0.750379
2,0.8887,0.688356,0.7674,0.771613,0.767144,0.767267
3,0.8664,0.679638,0.7659,0.768867,0.765883,0.763942
4,0.85,0.658376,0.7756,0.777982,0.775358,0.77418
5,0.842,0.663439,0.7738,0.775345,0.773285,0.771693
6,0.8355,0.685879,0.7628,0.775615,0.762416,0.765363
7,0.8296,0.669714,0.7717,0.776882,0.771913,0.771538
8,0.8254,0.656112,0.778,0.779501,0.777734,0.777621
9,0.8156,0.66544,0.7731,0.77666,0.772512,0.773802
10,0.8118,0.654075,0.774,0.776418,0.773894,0.773193


[I 2025-04-06 20:52:43,630] Trial 138 finished with value: 0.7731930228205086 and parameters: {'learning_rate': 0.0014084368172778302, 'weight_decay': 0.002, 'warmup_steps': 20}. Best is trial 71 with value: 0.7745430633961783.


Trial 139 with params: {'learning_rate': 0.0001772405333439467, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4094,0.91173,0.7228,0.72278,0.722005,0.721214
2,1.0176,0.797052,0.7439,0.745355,0.743587,0.743304
3,0.9467,0.758436,0.7494,0.750265,0.749174,0.747803
4,0.9131,0.720253,0.7595,0.765916,0.758961,0.75918


[I 2025-04-06 20:58:17,497] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.002914017243549048, 'weight_decay': 0.002, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0219,0.728395,0.7483,0.754865,0.748044,0.746461
2,0.9253,0.719845,0.7525,0.769216,0.752527,0.754195


[I 2025-04-06 21:01:02,867] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0018763065849378617, 'weight_decay': 0.001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0267,0.715965,0.7519,0.756035,0.751799,0.750282
2,0.898,0.690519,0.7642,0.769179,0.764057,0.764707
3,0.8772,0.680431,0.7652,0.769468,0.765016,0.762978
4,0.8593,0.660653,0.7742,0.776303,0.773835,0.772783
5,0.8513,0.665954,0.7733,0.774746,0.7729,0.770929
6,0.844,0.688466,0.7637,0.77797,0.763369,0.766449
7,0.8359,0.665118,0.7727,0.776006,0.77291,0.772324
8,0.8306,0.655097,0.7778,0.779248,0.77754,0.77733
9,0.8185,0.665471,0.7708,0.774665,0.770189,0.771521
10,0.8132,0.653012,0.7753,0.777697,0.775192,0.774543


[I 2025-04-06 21:14:51,609] Trial 141 finished with value: 0.7745430633961783 and parameters: {'learning_rate': 0.0018763065849378617, 'weight_decay': 0.001, 'warmup_steps': 22}. Best is trial 71 with value: 0.7745430633961783.


Trial 142 with params: {'learning_rate': 0.0009044297332056876, 'weight_decay': 0.002, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0788,0.738421,0.7488,0.75293,0.748367,0.747726
2,0.8878,0.69557,0.766,0.769143,0.765827,0.76496
3,0.861,0.684156,0.7636,0.764515,0.763703,0.761066
4,0.8451,0.666864,0.7728,0.777985,0.772722,0.771451


[I 2025-04-06 21:20:31,597] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0021741682362094576, 'weight_decay': 0.001, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0261,0.714381,0.7509,0.754795,0.750747,0.749209
2,0.9055,0.698938,0.7621,0.769537,0.762066,0.762905
3,0.8854,0.678276,0.7673,0.770701,0.767018,0.764948
4,0.8664,0.6627,0.7745,0.776363,0.774056,0.772912
5,0.8577,0.66829,0.7723,0.773935,0.772007,0.769886
6,0.85,0.689245,0.764,0.777768,0.763645,0.766397
7,0.8405,0.662162,0.7719,0.774423,0.772029,0.771568
8,0.8343,0.655428,0.7759,0.777495,0.775675,0.775368


[I 2025-04-06 21:31:41,274] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.002836720193057427, 'weight_decay': 0.006, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0306,0.726226,0.7495,0.755662,0.749254,0.747689
2,0.9234,0.718225,0.7532,0.769337,0.753235,0.754785


[I 2025-04-06 21:34:27,242] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.004068820908484375, 'weight_decay': 0.0, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0539,0.775988,0.7346,0.748427,0.734147,0.732319
2,0.9646,0.732483,0.7497,0.769845,0.749731,0.751342
3,0.945,0.716254,0.7575,0.769987,0.756625,0.755224
4,0.9196,0.69342,0.7653,0.768946,0.764668,0.762315


[I 2025-04-06 21:39:58,061] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0017049097129868964, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0262,0.718284,0.7515,0.756142,0.751403,0.75017
2,0.8941,0.688091,0.7663,0.770737,0.766158,0.766571
3,0.8729,0.680914,0.7655,0.769695,0.76535,0.763395
4,0.8555,0.65933,0.7735,0.775465,0.773211,0.772101


[I 2025-04-06 21:45:37,750] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.0011319995819311095, 'weight_decay': 0.006, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0599,0.730464,0.7509,0.756204,0.75054,0.750194
2,0.8863,0.691632,0.7672,0.771293,0.766955,0.766698
3,0.8622,0.681036,0.7638,0.765081,0.763888,0.761548
4,0.8464,0.662376,0.7732,0.776913,0.773048,0.771604
5,0.8378,0.662405,0.7736,0.775145,0.773092,0.771844
6,0.8317,0.684272,0.7641,0.775339,0.763773,0.766443
7,0.8268,0.670491,0.7709,0.776219,0.771079,0.77081
8,0.8232,0.657686,0.7764,0.777925,0.776125,0.776035


[I 2025-04-06 21:56:47,704] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0019456043343706092, 'weight_decay': 0.002, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0266,0.715247,0.7517,0.755686,0.751605,0.75006
2,0.8997,0.692088,0.7642,0.769434,0.764082,0.764753
3,0.8791,0.679958,0.7661,0.770169,0.765898,0.763862
4,0.8609,0.661131,0.7747,0.776709,0.774311,0.773267
5,0.8528,0.666512,0.7735,0.77487,0.773127,0.7711
6,0.8453,0.688659,0.7637,0.778013,0.763366,0.766413
7,0.8369,0.664345,0.7727,0.775636,0.772908,0.772257
8,0.8314,0.655118,0.7769,0.77833,0.776659,0.776396


[I 2025-04-06 22:07:53,019] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0007248597626666188, 'weight_decay': 0.002, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1176,0.749488,0.7462,0.749249,0.745771,0.7448
2,0.8935,0.700712,0.763,0.765795,0.762845,0.76179


[I 2025-04-06 22:10:44,090] Trial 149 pruned. 


In [28]:
print(best_base_head)

BestRun(run_id='71', objective=0.7745430633961783, hyperparameters={'learning_rate': 0.001886198183438793, 'weight_decay': 0.004, 'warmup_steps': 26}, run_summary=None)


In [29]:
base.reset_seed()

In [30]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-head-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-head-KD_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

In [31]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

In [32]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [33]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.freeze_model(base.get_mobilenet(10))
)

In [34]:
best_distill_head = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-06 23:16:45,934] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9171,0.685989,0.7363,0.73842,0.735711,0.735201
2,0.7669,0.652735,0.7584,0.759606,0.758088,0.757283


[I 2025-04-06 23:19:35,853] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0754,0.776661,0.7079,0.710108,0.706966,0.705957
2,0.8233,0.697766,0.7348,0.736089,0.734361,0.733982
3,0.7846,0.676069,0.7418,0.74447,0.741587,0.739988
4,0.768,0.654167,0.7513,0.758618,0.750731,0.750495


[I 2025-04-06 23:25:11,674] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2059,0.910968,0.6744,0.676167,0.673303,0.672011
2,0.9064,0.766328,0.7164,0.718687,0.715876,0.71552


[I 2025-04-06 23:28:00,306] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0443,0.757747,0.713,0.715073,0.7121,0.711138
2,0.8119,0.688797,0.7384,0.739554,0.737965,0.737585
3,0.7777,0.669792,0.7449,0.74743,0.744687,0.743073
4,0.7627,0.649885,0.7551,0.762852,0.754567,0.754362


[I 2025-04-06 23:33:34,896] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8286,0.652996,0.7529,0.758287,0.7526,0.75267
2,0.7546,0.637574,0.7678,0.7699,0.767474,0.767136
3,0.7447,0.633053,0.7654,0.772008,0.765033,0.764093
4,0.7398,0.621043,0.7734,0.777488,0.772643,0.772653


[I 2025-04-06 23:39:08,438] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8226,0.653565,0.7538,0.758464,0.75346,0.753048
2,0.7717,0.648858,0.7535,0.771043,0.753526,0.753927
3,0.7637,0.645191,0.7593,0.773756,0.75839,0.759293
4,0.7585,0.628342,0.7655,0.7712,0.76473,0.764984
5,0.7519,0.621355,0.7716,0.773567,0.771011,0.770332
6,0.7475,0.640859,0.7528,0.771638,0.752773,0.755671
7,0.7438,0.631821,0.7642,0.773711,0.764112,0.764108
8,0.7375,0.621862,0.7699,0.769918,0.769571,0.768872


[I 2025-04-06 23:50:15,005] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8253,0.653025,0.7531,0.758456,0.752787,0.752825
2,0.7544,0.637967,0.7682,0.770483,0.767872,0.767583
3,0.7444,0.632788,0.766,0.772177,0.765628,0.764611
4,0.7396,0.62091,0.7734,0.777402,0.772648,0.772617


[I 2025-04-06 23:55:48,310] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8177,0.655475,0.7529,0.755238,0.752504,0.751289
2,0.7759,0.651331,0.7522,0.769967,0.752395,0.752543
3,0.7676,0.646105,0.7582,0.773601,0.75728,0.758408
4,0.7621,0.626136,0.7681,0.77325,0.767391,0.767885
5,0.7548,0.622704,0.7709,0.772834,0.770278,0.769277
6,0.7502,0.64806,0.75,0.774756,0.750084,0.753146
7,0.7462,0.633295,0.7633,0.773179,0.763221,0.763043
8,0.7392,0.622447,0.7697,0.770005,0.769376,0.768718


[I 2025-04-07 00:06:56,040] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0909,0.79884,0.702,0.704009,0.701042,0.699996
2,0.8376,0.709644,0.7317,0.732957,0.731252,0.73085
3,0.7938,0.684731,0.7392,0.742359,0.738966,0.737412
4,0.7752,0.659964,0.75,0.756597,0.749388,0.749146


[I 2025-04-07 00:12:29,238] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8265,0.652566,0.7516,0.755616,0.751319,0.751282
2,0.7573,0.634213,0.7684,0.771492,0.768093,0.768249
3,0.7485,0.635399,0.7623,0.771784,0.761835,0.761077
4,0.7439,0.622382,0.7716,0.776323,0.770665,0.770752


[I 2025-04-07 00:18:02,865] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8424,0.725739,0.7235,0.748001,0.723859,0.720074
2,0.8062,0.672925,0.747,0.759286,0.747102,0.746643
3,0.7951,0.658213,0.7505,0.759101,0.749777,0.750527
4,0.785,0.643488,0.7585,0.769033,0.757643,0.760309
5,0.7756,0.628145,0.766,0.770713,0.765602,0.765718
6,0.7667,0.663293,0.7388,0.780489,0.738569,0.745008
7,0.7611,0.638442,0.7605,0.761879,0.760276,0.757815
8,0.7501,0.626449,0.7635,0.769893,0.763123,0.763282


[I 2025-04-07 00:29:07,092] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0023774407201803105, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.821,0.662622,0.7536,0.755033,0.753125,0.751321
2,0.7825,0.647034,0.7524,0.763425,0.752604,0.752976


[I 2025-04-07 00:31:51,413] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.002376024890572026, 'weight_decay': 0.001, 'warmup_steps': 12, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8253,0.662804,0.7532,0.754704,0.752741,0.750907
2,0.7826,0.646886,0.7524,0.763253,0.752601,0.752994
3,0.7733,0.646524,0.7567,0.772276,0.755857,0.757074
4,0.7673,0.624303,0.7666,0.771951,0.765861,0.766812
5,0.759,0.625212,0.7694,0.77191,0.768773,0.767677
6,0.754,0.656951,0.7449,0.776516,0.744986,0.748447
7,0.7497,0.634713,0.763,0.772483,0.762808,0.762418
8,0.7416,0.623384,0.7679,0.768752,0.767582,0.766966


[I 2025-04-07 00:43:00,034] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.003064104261670614, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8337,0.690125,0.7395,0.75041,0.739435,0.737455
2,0.7974,0.65645,0.7536,0.761111,0.75351,0.753751


[I 2025-04-07 00:45:46,516] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.003645100232010343, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8461,0.731471,0.721,0.746559,0.72134,0.717341
2,0.808,0.67587,0.7462,0.758958,0.746353,0.745884
3,0.7969,0.657389,0.7525,0.760511,0.751748,0.752761
4,0.7865,0.644962,0.7579,0.768107,0.757024,0.759673


[I 2025-04-07 00:51:22,257] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0003173012733215097, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8875,0.679213,0.7398,0.741819,0.739213,0.738718
2,0.7629,0.649738,0.7601,0.761422,0.759814,0.758983
3,0.7464,0.641562,0.7602,0.76225,0.760144,0.758515
4,0.7387,0.626886,0.7667,0.772844,0.766356,0.765498


[I 2025-04-07 00:56:52,682] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0007549727386624846, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8293,0.654339,0.753,0.758272,0.752636,0.752712
2,0.754,0.639741,0.7682,0.77131,0.767848,0.767827
3,0.7436,0.632374,0.7658,0.770187,0.765463,0.764205
4,0.7385,0.620321,0.7724,0.776479,0.771668,0.771636
5,0.7352,0.619561,0.7717,0.773594,0.771282,0.769968
6,0.7318,0.629478,0.7597,0.76966,0.759314,0.761395
7,0.731,0.62952,0.7665,0.771843,0.766619,0.766362
8,0.7283,0.621292,0.7686,0.768617,0.768161,0.767784


[I 2025-04-07 01:07:48,364] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.00432979299982574, 'weight_decay': 0.006, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8517,0.752608,0.7149,0.744385,0.715214,0.711523
2,0.8198,0.696515,0.7429,0.754794,0.743342,0.740844


[I 2025-04-07 01:10:36,420] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.004952142492162866, 'weight_decay': 0.004, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8667,0.787158,0.7034,0.739434,0.703494,0.69916
2,0.8312,0.739311,0.7256,0.749118,0.726473,0.72299


[I 2025-04-07 01:13:23,165] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0024742711486919313, 'weight_decay': 0.0, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8246,0.66499,0.7517,0.753377,0.751218,0.749415
2,0.7847,0.64443,0.754,0.762188,0.754146,0.754605
3,0.775,0.64732,0.7541,0.769636,0.753278,0.754403
4,0.7689,0.624125,0.7666,0.771858,0.765871,0.766835
5,0.7603,0.625861,0.7686,0.771261,0.767974,0.766937
6,0.7552,0.658859,0.7437,0.776609,0.7438,0.747334
7,0.7507,0.634983,0.7636,0.772515,0.763385,0.762842
8,0.7424,0.623727,0.7678,0.768901,0.767473,0.7669


[I 2025-04-07 01:24:25,838] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.00018075272535631178, 'weight_decay': 0.001, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9772,0.714662,0.7255,0.727298,0.724767,0.723862
2,0.7843,0.666387,0.7496,0.750995,0.749201,0.748863


[I 2025-04-07 01:27:12,603] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0002914650199349515, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9049,0.68355,0.7376,0.739704,0.737004,0.736499
2,0.7655,0.651751,0.7594,0.760721,0.759109,0.758294
3,0.7479,0.642895,0.7597,0.762112,0.759624,0.758091
4,0.7397,0.628168,0.7659,0.772338,0.765554,0.764758


[I 2025-04-07 01:32:37,286] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0007704382204769961, 'weight_decay': 0.008, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8291,0.653997,0.7528,0.758103,0.752451,0.752491
2,0.7541,0.639326,0.7683,0.771251,0.76796,0.767869
3,0.7437,0.632406,0.766,0.77079,0.765651,0.764449
4,0.7388,0.620485,0.7727,0.776827,0.771978,0.771963


[I 2025-04-07 01:38:09,428] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0011679499783391444, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8178,0.653185,0.7504,0.754459,0.750163,0.750036
2,0.759,0.634712,0.7671,0.7729,0.76675,0.767562
3,0.7506,0.635309,0.7623,0.770712,0.761791,0.761105
4,0.7459,0.624563,0.7701,0.776008,0.769082,0.769262
5,0.7415,0.619977,0.7705,0.771956,0.770197,0.768883
6,0.7377,0.629041,0.7607,0.772115,0.760295,0.762598
7,0.7356,0.626229,0.7681,0.771773,0.768087,0.767934
8,0.7317,0.620475,0.7697,0.769331,0.769323,0.768714
9,0.728,0.627102,0.762,0.767073,0.761118,0.762414
10,0.7247,0.619285,0.7719,0.774789,0.771753,0.770931


[I 2025-04-07 01:51:57,652] Trial 23 finished with value: 0.770931117904081 and parameters: {'learning_rate': 0.0011679499783391444, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 23 with value: 0.770931117904081.


Trial 24 with params: {'learning_rate': 0.0006167475117131566, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8383,0.660207,0.7535,0.759164,0.753057,0.753299
2,0.7538,0.642361,0.7645,0.768804,0.764035,0.76455
3,0.7423,0.633801,0.7663,0.76765,0.76605,0.764359
4,0.7369,0.618507,0.7727,0.776724,0.772012,0.771824


[I 2025-04-07 01:57:19,560] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.002741527169910745, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8261,0.675123,0.7476,0.751869,0.747158,0.745439
2,0.7905,0.644389,0.7545,0.759597,0.754547,0.754996
3,0.78,0.650968,0.7532,0.768081,0.752424,0.753407
4,0.7733,0.626129,0.7673,0.77368,0.766623,0.767999
5,0.764,0.626204,0.7696,0.772215,0.769029,0.768312
6,0.7584,0.661223,0.7415,0.777783,0.741546,0.746249
7,0.7536,0.635712,0.7653,0.772,0.765043,0.763964
8,0.7444,0.624651,0.7673,0.769329,0.766943,0.766513


[I 2025-04-07 02:08:30,874] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.001974013007215149, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8174,0.653851,0.7528,0.755974,0.752453,0.751618
2,0.7739,0.650408,0.7521,0.769905,0.752211,0.752446


[I 2025-04-07 02:11:17,956] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0009763056571587962, 'weight_decay': 0.007, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8247,0.652108,0.7529,0.757526,0.752603,0.752628
2,0.7561,0.634865,0.768,0.770118,0.767677,0.767536
3,0.747,0.635121,0.7622,0.771598,0.761784,0.760963
4,0.7423,0.621694,0.7718,0.775994,0.770947,0.771003
5,0.7386,0.620499,0.7703,0.771533,0.769997,0.768383
6,0.7349,0.628873,0.7608,0.771799,0.760384,0.762586
7,0.7335,0.627304,0.7667,0.770251,0.766739,0.766498
8,0.7302,0.620542,0.7701,0.769801,0.769686,0.769188
9,0.7269,0.626515,0.7619,0.766793,0.761045,0.762331
10,0.7241,0.61939,0.7716,0.774476,0.771453,0.77061


[I 2025-04-07 02:25:10,938] Trial 27 finished with value: 0.7706103662570558 and parameters: {'learning_rate': 0.0009763056571587962, 'weight_decay': 0.007, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 23 with value: 0.770931117904081.


Trial 28 with params: {'learning_rate': 0.0008041816117029834, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8334,0.653584,0.753,0.758327,0.752677,0.752726
2,0.7544,0.638464,0.7683,0.770744,0.767973,0.767695
3,0.7443,0.632663,0.766,0.771758,0.765634,0.764617
4,0.7393,0.620847,0.773,0.777054,0.772255,0.772228
5,0.7359,0.6199,0.7713,0.773093,0.770906,0.769496
6,0.7325,0.629295,0.7599,0.770232,0.759495,0.761633
7,0.7316,0.629107,0.7673,0.772164,0.767408,0.767112
8,0.7287,0.621043,0.7695,0.769392,0.769061,0.768658
9,0.7259,0.626223,0.7625,0.766968,0.761671,0.762885
10,0.7236,0.619576,0.7719,0.774939,0.771754,0.770952


[I 2025-04-07 02:39:05,338] Trial 28 finished with value: 0.7709516419841524 and parameters: {'learning_rate': 0.0008041816117029834, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14, 'lambda_param': 0.4, 'temperature': 5.0}. Best is trial 28 with value: 0.7709516419841524.


Trial 29 with params: {'learning_rate': 0.0009169206205391921, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8272,0.652359,0.7527,0.757721,0.752399,0.752448
2,0.7554,0.635821,0.768,0.769651,0.767691,0.767305
3,0.746,0.634386,0.7624,0.770997,0.761995,0.761127
4,0.7412,0.621438,0.7721,0.776059,0.771298,0.771309


[I 2025-04-07 02:44:35,167] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00016483417644193386, 'weight_decay': 0.007, 'warmup_steps': 14, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.982,0.721166,0.724,0.725781,0.723259,0.722298
2,0.7886,0.67012,0.7481,0.749456,0.747697,0.747387


[I 2025-04-07 02:47:18,037] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.001301487150977713, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8267,0.655368,0.751,0.756089,0.750787,0.750768
2,0.7615,0.6371,0.7621,0.771711,0.761803,0.762999
3,0.7533,0.636513,0.7633,0.770993,0.762705,0.762086
4,0.7486,0.629371,0.768,0.775353,0.76694,0.76713
5,0.7436,0.619149,0.7704,0.772557,0.770021,0.769132
6,0.7398,0.629743,0.7604,0.772075,0.760056,0.762397
7,0.7372,0.626517,0.7674,0.771993,0.767389,0.767401
8,0.7329,0.620603,0.7694,0.769041,0.769046,0.768398


[I 2025-04-07 02:58:20,511] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.001173126058354271, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8178,0.653238,0.7506,0.754641,0.750365,0.750237
2,0.7591,0.634781,0.7667,0.772612,0.766357,0.767171
3,0.7507,0.635323,0.7627,0.771001,0.762183,0.761487
4,0.746,0.624707,0.7698,0.775782,0.768767,0.768943


[I 2025-04-07 03:03:52,895] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.00048632798376135763, 'weight_decay': 0.01, 'warmup_steps': 18, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8636,0.667799,0.7473,0.75161,0.746777,0.746703
2,0.7556,0.642475,0.765,0.768269,0.764532,0.764708
3,0.7424,0.635861,0.7641,0.76457,0.763935,0.762119
4,0.7364,0.619704,0.7709,0.774971,0.77033,0.769672
5,0.7323,0.618881,0.7708,0.772351,0.770334,0.769549
6,0.7292,0.632408,0.7593,0.769113,0.758948,0.761041
7,0.7288,0.630266,0.7667,0.772314,0.766745,0.766597
8,0.7266,0.623464,0.7671,0.767467,0.766639,0.766387


[I 2025-04-07 03:14:57,894] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0007433567102958115, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8382,0.654981,0.7529,0.758301,0.752521,0.752617
2,0.7541,0.640148,0.7674,0.770698,0.767053,0.767071
3,0.7435,0.632442,0.7656,0.769653,0.765267,0.763973
4,0.7384,0.620251,0.7723,0.776344,0.77157,0.771529


[I 2025-04-07 03:20:34,200] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 5.817102176211476e-05, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1848,0.892003,0.6787,0.680209,0.677624,0.676294
2,0.8949,0.756953,0.7185,0.720456,0.717996,0.71762
3,0.828,0.717336,0.7259,0.730824,0.725579,0.724192
4,0.8009,0.681294,0.7431,0.749434,0.742423,0.742019


[I 2025-04-07 03:26:02,638] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0014083369557526964, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8186,0.656084,0.7503,0.756003,0.750073,0.750057
2,0.7633,0.639354,0.7591,0.77135,0.758871,0.760047
3,0.7553,0.638004,0.7629,0.771875,0.762229,0.761979
4,0.7504,0.632625,0.7672,0.775525,0.766106,0.766263


[I 2025-04-07 03:31:44,775] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.0002700275447331568, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9096,0.687128,0.7368,0.738947,0.736213,0.735704
2,0.7676,0.65336,0.7582,0.759374,0.757889,0.757131
3,0.7493,0.644095,0.7589,0.761641,0.758815,0.757322
4,0.7407,0.629304,0.7654,0.772159,0.765046,0.764308


[I 2025-04-07 03:37:10,210] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.000895042112871815, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8353,0.652796,0.7524,0.757514,0.752104,0.752136
2,0.7553,0.636313,0.7679,0.769601,0.767594,0.767203
3,0.7457,0.63411,0.7631,0.77118,0.762709,0.761788
4,0.7409,0.621382,0.7728,0.776738,0.772016,0.772
5,0.7374,0.620297,0.7707,0.772204,0.770349,0.768845
6,0.7338,0.629052,0.7602,0.771015,0.759775,0.761947
7,0.7326,0.628143,0.7671,0.770917,0.767163,0.76683
8,0.7295,0.62071,0.77,0.769777,0.769579,0.769117
9,0.7264,0.626343,0.762,0.766712,0.761144,0.762384
10,0.7239,0.619454,0.7717,0.774675,0.771548,0.770724


[I 2025-04-07 03:50:58,841] Trial 38 finished with value: 0.7707238869642621 and parameters: {'learning_rate': 0.000895042112871815, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 28 with value: 0.7709516419841524.


Trial 39 with params: {'learning_rate': 0.0013294294274437538, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8291,0.655816,0.7507,0.755929,0.750489,0.750407
2,0.7621,0.637747,0.7612,0.771618,0.760935,0.762164
3,0.7538,0.636919,0.7631,0.771037,0.762488,0.76193
4,0.7491,0.630441,0.768,0.775713,0.766914,0.767093


[I 2025-04-07 03:56:31,345] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0004597839616253116, 'weight_decay': 0.003, 'warmup_steps': 20, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8689,0.669223,0.7471,0.751011,0.746582,0.746449
2,0.7563,0.642847,0.765,0.767711,0.764557,0.764537
3,0.7426,0.636517,0.7633,0.763851,0.763141,0.761358
4,0.7364,0.620652,0.7706,0.774799,0.770052,0.769293
5,0.7322,0.619211,0.771,0.77242,0.77055,0.769786
6,0.7291,0.63284,0.7589,0.768871,0.758544,0.76067
7,0.7288,0.630347,0.7662,0.77167,0.766225,0.766104
8,0.7265,0.623809,0.7672,0.767563,0.766748,0.766493


[I 2025-04-07 04:07:22,371] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0003191940495908429, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9047,0.68012,0.7396,0.741802,0.738998,0.738546
2,0.7633,0.649953,0.7597,0.760986,0.759402,0.75853


[I 2025-04-07 04:10:04,268] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0012696352935288625, 'weight_decay': 0.003, 'warmup_steps': 24, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8276,0.655002,0.7506,0.755486,0.750395,0.750341
2,0.761,0.636485,0.7634,0.772069,0.763089,0.764206
3,0.7527,0.636119,0.7634,0.771104,0.762828,0.762191
4,0.748,0.628192,0.7687,0.775569,0.767639,0.767771
5,0.7431,0.61928,0.7709,0.772833,0.770544,0.769523
6,0.7393,0.629531,0.7598,0.771459,0.759443,0.761777
7,0.7368,0.626384,0.7675,0.771812,0.767481,0.76743
8,0.7326,0.620574,0.7697,0.769332,0.769334,0.768694
9,0.7286,0.627472,0.762,0.76739,0.761115,0.762488
10,0.725,0.619259,0.7718,0.774683,0.771643,0.770843


[I 2025-04-07 04:23:48,764] Trial 42 finished with value: 0.7708428780081122 and parameters: {'learning_rate': 0.0012696352935288625, 'weight_decay': 0.003, 'warmup_steps': 24, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 28 with value: 0.7709516419841524.


Trial 43 with params: {'learning_rate': 0.0013303403872947518, 'weight_decay': 0.004, 'warmup_steps': 21, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8253,0.655642,0.7506,0.755832,0.750387,0.750329
2,0.762,0.637673,0.7611,0.7715,0.760837,0.762051
3,0.7538,0.636877,0.7631,0.771003,0.762487,0.761936
4,0.7491,0.630387,0.768,0.775686,0.766914,0.767083


[I 2025-04-07 04:29:15,390] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.003978050713448368, 'weight_decay': 0.004, 'warmup_steps': 19, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8495,0.743515,0.718,0.746735,0.718304,0.714258
2,0.814,0.684032,0.7436,0.755515,0.743843,0.742452
3,0.8026,0.654661,0.7552,0.762216,0.754467,0.75614
4,0.7917,0.64782,0.7598,0.768375,0.758867,0.761077


[I 2025-04-07 04:34:52,109] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0003597274039521202, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8957,0.675914,0.7415,0.744016,0.740939,0.740548
2,0.7606,0.647355,0.7623,0.763741,0.761956,0.76124
3,0.7449,0.639934,0.7607,0.762063,0.760614,0.758888
4,0.7376,0.625145,0.7689,0.774667,0.768523,0.76761


[I 2025-04-07 04:40:30,089] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.002651950335777674, 'weight_decay': 0.001, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8351,0.671671,0.7496,0.753163,0.749128,0.747484
2,0.7889,0.643054,0.7557,0.760846,0.755786,0.756158
3,0.7785,0.649748,0.7531,0.768274,0.752332,0.75334
4,0.772,0.62514,0.7675,0.773533,0.766817,0.768093
5,0.7629,0.626347,0.7687,0.771448,0.768103,0.767242
6,0.7574,0.660915,0.7411,0.776715,0.74116,0.745492
7,0.7527,0.635461,0.7652,0.772453,0.764959,0.763992
8,0.7438,0.62438,0.7671,0.768724,0.766743,0.766275


[I 2025-04-07 04:51:35,869] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0011318215135633493, 'weight_decay': 0.002, 'warmup_steps': 19, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8264,0.653281,0.7515,0.755515,0.751249,0.751158
2,0.7585,0.634404,0.7681,0.773176,0.767786,0.768511
3,0.75,0.635292,0.7626,0.771107,0.762111,0.76135
4,0.7453,0.623747,0.7702,0.775706,0.769207,0.769384


[I 2025-04-07 04:57:10,763] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0010142882440889334, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8269,0.652323,0.7514,0.755575,0.751098,0.751074
2,0.7567,0.63447,0.7684,0.770958,0.76807,0.768088
3,0.7477,0.635362,0.7623,0.771633,0.76185,0.761016
4,0.743,0.621956,0.7721,0.776371,0.771215,0.771291
5,0.7392,0.620536,0.7698,0.770954,0.769498,0.767932
6,0.7355,0.628832,0.7608,0.77186,0.760387,0.76259
7,0.7339,0.626961,0.7673,0.770834,0.767324,0.767107
8,0.7305,0.620495,0.7698,0.76949,0.769392,0.768875


[I 2025-04-07 05:07:59,007] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.00043406938493577594, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8706,0.670427,0.7456,0.749107,0.745079,0.744834
2,0.757,0.643493,0.7641,0.766355,0.763678,0.763428
3,0.7429,0.637232,0.7627,0.763327,0.76256,0.760787
4,0.7366,0.621748,0.7707,0.775306,0.770213,0.769401
5,0.7322,0.619628,0.7709,0.772234,0.770458,0.769747
6,0.7291,0.633275,0.7596,0.769621,0.759252,0.761375
7,0.7287,0.630479,0.7661,0.771683,0.766098,0.766073
8,0.7265,0.624187,0.7669,0.767287,0.766447,0.766192


[I 2025-04-07 05:18:46,748] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0015619163345157118, 'weight_decay': 0.002, 'warmup_steps': 18, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8225,0.656377,0.75,0.756376,0.749755,0.749561
2,0.766,0.643034,0.7552,0.770184,0.755058,0.755917
3,0.7582,0.640902,0.7623,0.773638,0.76153,0.761754
4,0.7532,0.633757,0.7656,0.773538,0.764602,0.764692


[I 2025-04-07 05:24:12,272] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0012567891316673302, 'weight_decay': 0.004, 'warmup_steps': 18, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8243,0.654641,0.7512,0.755977,0.750993,0.750938
2,0.7607,0.636185,0.7638,0.771938,0.763494,0.764531
3,0.7524,0.635948,0.7634,0.771044,0.762831,0.762151
4,0.7477,0.627653,0.7689,0.775484,0.767836,0.767918


[I 2025-04-07 05:29:43,578] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0007032096819574004, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8414,0.656394,0.7538,0.759145,0.753396,0.753505
2,0.754,0.641157,0.7664,0.770177,0.766016,0.766204
3,0.743,0.632693,0.765,0.768038,0.764671,0.763224
4,0.7379,0.61972,0.7719,0.775743,0.771185,0.771063
5,0.7345,0.619196,0.7716,0.773521,0.771168,0.769986
6,0.7312,0.629815,0.7594,0.769111,0.759023,0.761059
7,0.7305,0.629845,0.7662,0.771816,0.766327,0.766079
8,0.7278,0.621586,0.7687,0.768748,0.768256,0.76789
9,0.7253,0.626259,0.7622,0.766677,0.761383,0.762608
10,0.7234,0.619819,0.7716,0.774688,0.771449,0.770657


[I 2025-04-07 05:43:33,805] Trial 52 finished with value: 0.7706573233927353 and parameters: {'learning_rate': 0.0007032096819574004, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.5, 'temperature': 4.0}. Best is trial 28 with value: 0.7709516419841524.


Trial 53 with params: {'learning_rate': 0.00032792752760371554, 'weight_decay': 0.005, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8947,0.678568,0.7407,0.742801,0.74011,0.739642
2,0.7624,0.649252,0.76,0.761316,0.759699,0.758852


[I 2025-04-07 05:46:18,079] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0001324011031485879, 'weight_decay': 0.007, 'warmup_steps': 27, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0272,0.743477,0.7161,0.718111,0.715249,0.714275
2,0.8026,0.681329,0.7419,0.743215,0.741464,0.741168
3,0.7717,0.664332,0.7472,0.749592,0.746969,0.745468
4,0.7581,0.646011,0.7562,0.764151,0.755673,0.75556


[I 2025-04-07 05:51:49,892] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0004121857112356016, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.871,0.671413,0.7442,0.747281,0.74368,0.743357
2,0.7577,0.644299,0.7636,0.765561,0.763214,0.762799
3,0.7433,0.637909,0.7622,0.763051,0.762089,0.760309
4,0.7368,0.622759,0.7704,0.775356,0.769956,0.769117


[I 2025-04-07 05:57:11,221] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0006573931482419404, 'weight_decay': 0.01, 'warmup_steps': 18, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8455,0.658512,0.7538,0.759284,0.753374,0.753541
2,0.7539,0.642086,0.7651,0.769237,0.764685,0.765063
3,0.7426,0.633244,0.7655,0.76751,0.765247,0.763636
4,0.7373,0.619068,0.7722,0.775968,0.771494,0.771326
5,0.7339,0.61883,0.7719,0.773777,0.771446,0.770369
6,0.7306,0.630211,0.7593,0.768904,0.758936,0.760926
7,0.7301,0.630037,0.7662,0.772018,0.76632,0.76611
8,0.7275,0.62189,0.7686,0.768733,0.768139,0.767816


[I 2025-04-07 06:08:17,102] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0011097554266708747, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8335,0.653385,0.7515,0.755672,0.751236,0.751191
2,0.7583,0.634356,0.768,0.772256,0.767695,0.76823
3,0.7496,0.635349,0.7628,0.771589,0.762319,0.761554
4,0.7449,0.623321,0.7702,0.775447,0.769216,0.769323


[I 2025-04-07 06:13:57,225] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0005365451703704387, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8525,0.664908,0.7497,0.754861,0.749181,0.749284
2,0.7545,0.642356,0.7658,0.76955,0.765303,0.765678
3,0.7421,0.634909,0.7649,0.765452,0.764715,0.762869
4,0.7364,0.618564,0.7722,0.775904,0.771581,0.771064
5,0.7325,0.618502,0.7715,0.773274,0.77103,0.770203
6,0.7294,0.631684,0.7593,0.768911,0.758962,0.761037
7,0.7291,0.630187,0.7658,0.771732,0.765895,0.765751
8,0.7267,0.6229,0.7678,0.768025,0.767344,0.767035


[I 2025-04-07 06:24:58,647] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0009438088990516876, 'weight_decay': 0.006, 'warmup_steps': 23, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8335,0.652529,0.7528,0.757531,0.752496,0.752519
2,0.7558,0.635402,0.7679,0.769693,0.767582,0.767283
3,0.7465,0.634849,0.7634,0.772456,0.762987,0.762169
4,0.7418,0.621562,0.7718,0.775797,0.770978,0.771001
5,0.7381,0.620445,0.7702,0.771491,0.769878,0.768308
6,0.7345,0.628944,0.7605,0.771438,0.760081,0.762311
7,0.7331,0.627616,0.7669,0.770593,0.766953,0.766677
8,0.7299,0.620597,0.7706,0.770296,0.770182,0.769691
9,0.7267,0.626445,0.7619,0.766702,0.761038,0.762295
10,0.724,0.619408,0.7715,0.774416,0.771349,0.770503


[I 2025-04-07 06:39:07,858] Trial 59 finished with value: 0.7705033164563245 and parameters: {'learning_rate': 0.0009438088990516876, 'weight_decay': 0.006, 'warmup_steps': 23, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 28 with value: 0.7709516419841524.


Trial 60 with params: {'learning_rate': 0.0001938612175720016, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9642,0.709037,0.7273,0.729142,0.72659,0.725728
2,0.7808,0.663552,0.751,0.752373,0.750615,0.75023


[I 2025-04-07 06:41:55,155] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0001407553820931298, 'weight_decay': 0.0, 'warmup_steps': 19, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0113,0.736247,0.7184,0.720299,0.717575,0.716587
2,0.7982,0.677865,0.7435,0.744811,0.743076,0.742761
3,0.769,0.661877,0.748,0.750419,0.747753,0.746281
4,0.756,0.644216,0.7568,0.764748,0.756274,0.756132


[I 2025-04-07 06:47:37,492] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0011319126215349317, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8294,0.653456,0.7513,0.755318,0.751048,0.750937
2,0.7586,0.63444,0.7679,0.773007,0.767579,0.768313
3,0.75,0.63531,0.7626,0.771095,0.762108,0.761358
4,0.7453,0.623785,0.7701,0.775605,0.769109,0.769277


[I 2025-04-07 06:52:59,862] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0011052704741368483, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8169,0.652439,0.7521,0.756086,0.751823,0.75182
2,0.7579,0.634161,0.7681,0.772195,0.767785,0.768253
3,0.7493,0.635297,0.7625,0.771358,0.762026,0.761256
4,0.7447,0.623057,0.7703,0.775428,0.769325,0.769392
5,0.7405,0.620337,0.7704,0.771681,0.770091,0.768633
6,0.7368,0.628872,0.7611,0.772393,0.760692,0.762941
7,0.7349,0.626402,0.7679,0.771373,0.767909,0.76774
8,0.7312,0.620451,0.7695,0.769188,0.769097,0.768551
9,0.7276,0.626886,0.762,0.766967,0.761127,0.76241
10,0.7245,0.619312,0.7717,0.774573,0.771551,0.770733


[I 2025-04-07 07:06:33,686] Trial 63 finished with value: 0.7707326054035463 and parameters: {'learning_rate': 0.0011052704741368483, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 28 with value: 0.7709516419841524.


Trial 64 with params: {'learning_rate': 0.0009736467928099919, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8177,0.651758,0.7529,0.757554,0.752606,0.752631
2,0.7559,0.634799,0.7681,0.770173,0.767785,0.767625
3,0.7469,0.635078,0.7619,0.771415,0.761487,0.760657
4,0.7422,0.621594,0.7717,0.775856,0.770858,0.770918
5,0.7385,0.620479,0.7703,0.771501,0.769997,0.768381
6,0.7348,0.628861,0.7608,0.771778,0.760381,0.76258
7,0.7334,0.627341,0.7667,0.770247,0.766739,0.76649
8,0.7301,0.620532,0.77,0.7697,0.769588,0.769085
9,0.7268,0.626493,0.7618,0.76666,0.76094,0.762209
10,0.7241,0.619378,0.7716,0.774485,0.771453,0.770614


[I 2025-04-07 07:20:05,619] Trial 64 finished with value: 0.7706136780750437 and parameters: {'learning_rate': 0.0009736467928099919, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 28 with value: 0.7709516419841524.


Trial 65 with params: {'learning_rate': 0.002621885896423578, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8239,0.670277,0.75,0.752677,0.749515,0.747739
2,0.7879,0.642905,0.7558,0.761525,0.75592,0.756337
3,0.7777,0.649168,0.7529,0.768305,0.752109,0.753188
4,0.7713,0.624734,0.767,0.772745,0.766304,0.767465


[I 2025-04-07 07:25:38,816] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0008569037366990733, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8232,0.652547,0.7524,0.757516,0.752098,0.752096
2,0.7547,0.637031,0.7677,0.769597,0.767386,0.766982
3,0.745,0.633322,0.7646,0.771744,0.764219,0.763222
4,0.7402,0.621145,0.7732,0.777153,0.772436,0.77242
5,0.7367,0.620132,0.7712,0.77278,0.770835,0.769321
6,0.7332,0.629134,0.7599,0.770499,0.759482,0.761628
7,0.7321,0.628575,0.7669,0.771189,0.766985,0.766665
8,0.7291,0.620833,0.7698,0.769624,0.769371,0.768933
9,0.7262,0.626263,0.7626,0.76716,0.761758,0.762961
10,0.7238,0.6195,0.7719,0.774904,0.771753,0.770949


[I 2025-04-07 07:39:31,989] Trial 66 finished with value: 0.7709485240556331 and parameters: {'learning_rate': 0.0008569037366990733, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 28 with value: 0.7709516419841524.


Trial 67 with params: {'learning_rate': 0.0008428796121128485, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8243,0.652739,0.7523,0.75767,0.751998,0.752048
2,0.7546,0.637407,0.7676,0.76969,0.767274,0.76693
3,0.7448,0.633095,0.7651,0.771731,0.764733,0.763764
4,0.7399,0.621055,0.7733,0.777401,0.77254,0.77257
5,0.7365,0.620072,0.7713,0.772937,0.770927,0.769437
6,0.733,0.629173,0.7596,0.770177,0.759178,0.761318
7,0.732,0.628727,0.7671,0.771401,0.767188,0.766835
8,0.729,0.620893,0.7696,0.769456,0.769169,0.768749
9,0.7261,0.626242,0.7627,0.767266,0.761859,0.763069
10,0.7237,0.619516,0.7719,0.774904,0.771753,0.770949


[I 2025-04-07 07:53:28,148] Trial 67 finished with value: 0.7709485240556331 and parameters: {'learning_rate': 0.0008428796121128485, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 28 with value: 0.7709516419841524.


Trial 68 with params: {'learning_rate': 0.0008350401275702886, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.824,0.652803,0.7526,0.758012,0.75229,0.752362
2,0.7545,0.637579,0.7678,0.769904,0.767478,0.767143
3,0.7446,0.63298,0.7656,0.772169,0.765229,0.764296
4,0.7398,0.621004,0.7735,0.777557,0.772738,0.772733
5,0.7364,0.620032,0.7712,0.772911,0.770825,0.769358
6,0.7329,0.629197,0.7597,0.770202,0.759278,0.761403
7,0.7319,0.628807,0.7671,0.771519,0.767192,0.766854
8,0.7289,0.620913,0.7696,0.769489,0.769163,0.768754
9,0.726,0.626245,0.7625,0.767029,0.76166,0.762868
10,0.7237,0.619528,0.772,0.774994,0.771852,0.771047


[I 2025-04-07 08:07:16,027] Trial 68 finished with value: 0.771047008351801 and parameters: {'learning_rate': 0.0008350401275702886, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 68 with value: 0.771047008351801.


Trial 69 with params: {'learning_rate': 0.0016690802669119614, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8139,0.656008,0.7505,0.7573,0.750179,0.750037
2,0.7678,0.644992,0.7548,0.77127,0.754698,0.755467


[I 2025-04-07 08:10:02,291] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0005321301627318898, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8471,0.66482,0.7489,0.753902,0.74839,0.748475
2,0.7545,0.642228,0.7661,0.76985,0.765618,0.765979
3,0.7421,0.63497,0.7649,0.765408,0.764714,0.762846
4,0.7363,0.618603,0.7721,0.775864,0.771481,0.770956
5,0.7325,0.618505,0.7712,0.772947,0.770733,0.769915
6,0.7294,0.631729,0.7589,0.768581,0.758561,0.760652
7,0.7291,0.630165,0.7659,0.771759,0.765994,0.765837
8,0.7267,0.622945,0.7676,0.767832,0.767144,0.766839


[I 2025-04-07 08:21:03,755] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0007489429461438716, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8289,0.654469,0.7529,0.758205,0.752528,0.752612
2,0.7539,0.639885,0.7682,0.771405,0.767845,0.767856
3,0.7435,0.632379,0.7659,0.770031,0.765567,0.764271
4,0.7384,0.620246,0.7725,0.776531,0.771771,0.771723
5,0.7351,0.619512,0.7718,0.77371,0.771382,0.770079
6,0.7317,0.629515,0.7597,0.769654,0.759314,0.761399
7,0.7309,0.629563,0.7665,0.771847,0.766619,0.766359
8,0.7282,0.621323,0.7686,0.768607,0.768162,0.767779


[I 2025-04-07 08:32:03,118] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.002395028549399942, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8214,0.662926,0.7531,0.75439,0.752624,0.75079
2,0.7829,0.646486,0.7525,0.763198,0.752705,0.753094


[I 2025-04-07 08:34:51,084] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0007173802752171283, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8308,0.655486,0.7537,0.759002,0.753312,0.75341
2,0.7538,0.640705,0.767,0.770635,0.766637,0.766767
3,0.7431,0.632534,0.7656,0.769029,0.765276,0.763885
4,0.738,0.619844,0.7724,0.77632,0.771683,0.771583
5,0.7346,0.61927,0.7716,0.773479,0.771181,0.769958
6,0.7313,0.629695,0.7594,0.769155,0.759022,0.761037
7,0.7306,0.629757,0.7662,0.771756,0.766327,0.766066
8,0.7279,0.621502,0.7686,0.76861,0.768155,0.76777


[I 2025-04-07 08:45:54,116] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 6.24006692401181e-05, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1704,0.874197,0.6826,0.683962,0.681559,0.680245
2,0.8835,0.747267,0.7215,0.72323,0.720997,0.720632


[I 2025-04-07 08:48:45,979] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0005895453039433079, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8416,0.661724,0.7521,0.757739,0.75163,0.751833
2,0.7539,0.642428,0.7647,0.768876,0.764233,0.764734
3,0.7421,0.634156,0.7661,0.767014,0.765876,0.764135
4,0.7367,0.618322,0.7732,0.77697,0.772507,0.772241
5,0.733,0.618453,0.771,0.772759,0.770534,0.769598
6,0.7299,0.630959,0.7585,0.767848,0.758144,0.76011
7,0.7295,0.630138,0.7664,0.772142,0.766514,0.766296
8,0.727,0.622414,0.7683,0.768513,0.767843,0.767541


[I 2025-04-07 09:00:02,598] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.00033260162284746086, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8843,0.67745,0.741,0.743071,0.740423,0.739941
2,0.7618,0.648766,0.7601,0.761492,0.759795,0.758964


[I 2025-04-07 09:02:49,162] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0005334642190246767, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8463,0.664732,0.749,0.754023,0.748484,0.748579
2,0.7544,0.642215,0.766,0.769767,0.765509,0.765886
3,0.7421,0.63495,0.7649,0.765408,0.764714,0.762846
4,0.7363,0.61858,0.7721,0.775819,0.771474,0.770933
5,0.7325,0.61849,0.7711,0.772846,0.770629,0.769808
6,0.7294,0.631703,0.7588,0.76841,0.758461,0.760541
7,0.7291,0.630173,0.7659,0.771759,0.765994,0.765837
8,0.7267,0.62293,0.7675,0.767729,0.767042,0.766741


[I 2025-04-07 09:13:58,645] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.00026700057451460336, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9062,0.687393,0.7375,0.739584,0.736913,0.736366
2,0.7678,0.653506,0.7581,0.75925,0.75778,0.757054
3,0.7495,0.644258,0.7585,0.761372,0.75841,0.756927
4,0.7408,0.629446,0.7652,0.772016,0.76484,0.764127


[I 2025-04-07 09:19:34,299] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0035106996924098538, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8442,0.723804,0.7241,0.747925,0.724473,0.720711
2,0.8056,0.671755,0.7478,0.759948,0.747887,0.747565


[I 2025-04-07 09:22:17,949] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0010713042654264288, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8156,0.652069,0.7519,0.7559,0.751623,0.751583
2,0.7573,0.634058,0.7684,0.771762,0.768096,0.768326
3,0.7487,0.635358,0.7622,0.771527,0.761728,0.760983
4,0.7441,0.622463,0.7717,0.776542,0.770759,0.770846
5,0.74,0.620467,0.7704,0.771603,0.770106,0.768598
6,0.7363,0.628816,0.7611,0.772315,0.760696,0.762934
7,0.7345,0.626561,0.7679,0.771336,0.767914,0.767711
8,0.7309,0.620449,0.7693,0.768984,0.768892,0.768355
9,0.7274,0.626774,0.7619,0.76682,0.761032,0.762307
10,0.7244,0.619325,0.7718,0.774689,0.771647,0.770826


[I 2025-04-07 09:35:55,537] Trial 80 finished with value: 0.7708255380417006 and parameters: {'learning_rate': 0.0010713042654264288, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 68 with value: 0.771047008351801.


Trial 81 with params: {'learning_rate': 0.0008970374183423795, 'weight_decay': 0.005, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.823,0.652261,0.7526,0.757761,0.752302,0.75236
2,0.7551,0.63619,0.7678,0.769441,0.767504,0.767066
3,0.7456,0.634008,0.7632,0.771231,0.762819,0.761862
4,0.7409,0.621323,0.7729,0.77688,0.772114,0.772107
5,0.7373,0.620266,0.7705,0.77201,0.770151,0.768632
6,0.7337,0.62903,0.7604,0.771326,0.759974,0.762169
7,0.7326,0.628144,0.7669,0.770731,0.766959,0.766636
8,0.7295,0.620711,0.7699,0.769694,0.769477,0.76902
9,0.7264,0.626336,0.762,0.76675,0.761148,0.762397
10,0.7239,0.619451,0.7718,0.774804,0.771653,0.77083


[I 2025-04-07 09:49:32,181] Trial 81 finished with value: 0.7708303181123487 and parameters: {'learning_rate': 0.0008970374183423795, 'weight_decay': 0.005, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 68 with value: 0.771047008351801.


Trial 82 with params: {'learning_rate': 0.002096872507181025, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8168,0.656109,0.7534,0.75555,0.753007,0.751702
2,0.7765,0.651511,0.7524,0.770047,0.75261,0.752719
3,0.7681,0.64611,0.7582,0.773667,0.757294,0.758445
4,0.7625,0.625982,0.7682,0.773274,0.767481,0.76798


[I 2025-04-07 09:54:56,641] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0008088686309741357, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8258,0.653197,0.7533,0.758711,0.752974,0.753022
2,0.7543,0.63828,0.7687,0.77115,0.768377,0.768124
3,0.7443,0.632647,0.766,0.771921,0.76564,0.764621
4,0.7394,0.620834,0.7733,0.777298,0.772555,0.772514
5,0.736,0.619913,0.7714,0.773227,0.771003,0.769602
6,0.7325,0.629279,0.76,0.770321,0.759594,0.761731
7,0.7316,0.629066,0.7672,0.77196,0.767306,0.766985
8,0.7287,0.621024,0.7695,0.769398,0.769066,0.768659


[I 2025-04-07 10:05:52,454] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.001277083900119427, 'weight_decay': 0.005, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8168,0.654536,0.751,0.755825,0.750789,0.750751
2,0.7609,0.636498,0.7626,0.771339,0.762301,0.763389
3,0.7527,0.636124,0.7634,0.771093,0.762826,0.762186
4,0.748,0.628252,0.7684,0.775353,0.767347,0.767524


[I 2025-04-07 10:11:26,595] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00028731625417467325, 'weight_decay': 0.0, 'warmup_steps': 10, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9052,0.684163,0.7371,0.739278,0.736508,0.736027
2,0.7658,0.652034,0.7591,0.760376,0.758808,0.757994


[I 2025-04-07 10:14:10,280] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0008709534506426192, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8226,0.652419,0.7523,0.75756,0.752002,0.752052
2,0.7548,0.636719,0.768,0.769825,0.767689,0.767279
3,0.7452,0.633549,0.7641,0.771642,0.763722,0.762785
4,0.7404,0.621212,0.7736,0.777583,0.772828,0.772823
5,0.7369,0.620177,0.7711,0.772618,0.770745,0.769224
6,0.7334,0.629094,0.7598,0.770656,0.75939,0.761573
7,0.7323,0.628428,0.7672,0.771369,0.767283,0.766947
8,0.7292,0.620787,0.7698,0.769631,0.769374,0.768943
9,0.7262,0.626284,0.7623,0.766961,0.761452,0.76268
10,0.7238,0.619475,0.7719,0.774933,0.771753,0.770951


[I 2025-04-07 10:27:58,206] Trial 86 finished with value: 0.770951179476348 and parameters: {'learning_rate': 0.0008709534506426192, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 68 with value: 0.771047008351801.


Trial 87 with params: {'learning_rate': 0.0011858870554994313, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.818,0.653426,0.7507,0.754871,0.750464,0.750344
2,0.7593,0.634945,0.7668,0.772844,0.766463,0.767278
3,0.7509,0.635367,0.7633,0.771584,0.762768,0.762136
4,0.7463,0.625085,0.7696,0.775716,0.768557,0.768745


[I 2025-04-07 10:33:38,563] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.002851868651933803, 'weight_decay': 0.01, 'warmup_steps': 19, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8336,0.680654,0.7449,0.7516,0.744593,0.742885
2,0.793,0.648164,0.7563,0.761475,0.756252,0.756544


[I 2025-04-07 10:36:25,101] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0008770945686960543, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8243,0.652465,0.7526,0.757844,0.752304,0.752351
2,0.7549,0.636613,0.7677,0.769463,0.767384,0.766966
3,0.7453,0.63367,0.7642,0.771883,0.763828,0.762904
4,0.7405,0.621232,0.7738,0.777781,0.773023,0.773018
5,0.737,0.62021,0.7707,0.772249,0.770337,0.768833
6,0.7335,0.62908,0.76,0.770854,0.75958,0.761763
7,0.7323,0.628365,0.7674,0.77147,0.767486,0.767135
8,0.7293,0.620775,0.77,0.769816,0.769574,0.769137
9,0.7263,0.626298,0.7621,0.766797,0.761252,0.762491
10,0.7238,0.61947,0.772,0.775016,0.771855,0.771043


[I 2025-04-07 10:49:54,026] Trial 89 finished with value: 0.7710429021154336 and parameters: {'learning_rate': 0.0008770945686960543, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 68 with value: 0.771047008351801.


Trial 90 with params: {'learning_rate': 0.0004157550952188889, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8628,0.670769,0.7452,0.748276,0.744674,0.744334
2,0.7574,0.643973,0.7635,0.765588,0.763119,0.762734
3,0.7432,0.637771,0.7619,0.762647,0.761791,0.759973
4,0.7367,0.622535,0.7703,0.775189,0.769851,0.769015
5,0.7322,0.619975,0.7707,0.771855,0.770265,0.769489
6,0.7291,0.633579,0.7591,0.769172,0.758747,0.760904
7,0.7287,0.630585,0.7665,0.772036,0.76648,0.766469
8,0.7266,0.624475,0.7667,0.767119,0.766249,0.765998


[I 2025-04-07 11:00:51,404] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0009640939810981695, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8197,0.651846,0.7527,0.757358,0.75241,0.752422
2,0.7558,0.635004,0.7681,0.770081,0.767784,0.767565
3,0.7468,0.634963,0.7631,0.77226,0.7627,0.761877
4,0.7421,0.621576,0.7719,0.776,0.771061,0.771095
5,0.7384,0.620465,0.7702,0.771463,0.769887,0.768288
6,0.7347,0.628881,0.7609,0.771838,0.760483,0.762675
7,0.7333,0.627439,0.7668,0.770282,0.766842,0.766555
8,0.7301,0.620555,0.7704,0.770104,0.769987,0.769489
9,0.7268,0.626479,0.7618,0.76666,0.76094,0.762209
10,0.7241,0.619395,0.7717,0.7746,0.771551,0.770708


[I 2025-04-07 11:14:19,354] Trial 91 finished with value: 0.7707078729011625 and parameters: {'learning_rate': 0.0009640939810981695, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 68 with value: 0.771047008351801.


Trial 92 with params: {'learning_rate': 0.0018454953055328997, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8148,0.654517,0.7535,0.758786,0.753148,0.752806
2,0.7712,0.648229,0.7537,0.770938,0.753696,0.754105
3,0.7633,0.644994,0.7594,0.773663,0.758492,0.759371
4,0.7582,0.628692,0.7661,0.771944,0.765336,0.765582


[I 2025-04-07 11:19:45,503] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0032572835209714033, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8329,0.70162,0.7339,0.75014,0.734077,0.731878
2,0.8011,0.663392,0.7514,0.760861,0.751388,0.751267


[I 2025-04-07 11:22:30,560] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.0011966852711537432, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8197,0.653661,0.7507,0.755014,0.750455,0.750382
2,0.7595,0.635122,0.7663,0.772495,0.765975,0.766787
3,0.7512,0.635435,0.7636,0.771827,0.763061,0.762457
4,0.7465,0.625479,0.77,0.776294,0.768953,0.769157
5,0.7419,0.619766,0.7704,0.771914,0.770066,0.768855
6,0.7382,0.629143,0.7602,0.771651,0.759807,0.762124
7,0.736,0.626221,0.7681,0.771929,0.768096,0.76799
8,0.732,0.620491,0.7696,0.769277,0.769228,0.768633
9,0.7281,0.627208,0.7617,0.766887,0.760818,0.762145
10,0.7247,0.619282,0.7717,0.774603,0.771552,0.770738


[I 2025-04-07 11:36:00,697] Trial 94 finished with value: 0.7707375710540674 and parameters: {'learning_rate': 0.0011966852711537432, 'weight_decay': 0.004, 'warmup_steps': 9, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 68 with value: 0.771047008351801.


Trial 95 with params: {'learning_rate': 0.0009749094125181565, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8194,0.651829,0.7529,0.757451,0.752607,0.752619
2,0.756,0.634856,0.7681,0.770171,0.767783,0.767615
3,0.7469,0.635071,0.7622,0.77154,0.761789,0.760964
4,0.7423,0.621633,0.7716,0.775782,0.770758,0.770834


[I 2025-04-07 11:41:24,920] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2092,0.915753,0.674,0.675905,0.672913,0.671643
2,0.9096,0.7691,0.716,0.718296,0.715466,0.715134
3,0.8365,0.725305,0.7251,0.730492,0.724773,0.72344
4,0.8069,0.686464,0.74,0.746151,0.739307,0.738814


[I 2025-04-07 11:46:47,182] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0007808068359114527, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8285,0.653774,0.753,0.758353,0.75266,0.752723
2,0.7541,0.639041,0.7684,0.771169,0.768074,0.767904
3,0.7439,0.632448,0.766,0.77103,0.765652,0.764474
4,0.7389,0.620593,0.7731,0.777135,0.772369,0.772328
5,0.7355,0.619744,0.7712,0.773101,0.770792,0.769454
6,0.7321,0.62938,0.7596,0.7698,0.759214,0.76132
7,0.7313,0.629314,0.767,0.772081,0.767105,0.766848
8,0.7285,0.621158,0.7689,0.768854,0.768464,0.76808


[I 2025-04-07 11:57:43,819] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.001003222583710779, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8193,0.651863,0.7523,0.756617,0.751996,0.752014
2,0.7564,0.634538,0.7678,0.770205,0.76747,0.767408
3,0.7475,0.635278,0.7624,0.771869,0.761956,0.761162
4,0.7428,0.621808,0.7718,0.776048,0.770929,0.771005
5,0.739,0.620527,0.7699,0.771105,0.769596,0.768046
6,0.7353,0.628832,0.7606,0.771683,0.760186,0.762421
7,0.7338,0.627064,0.7672,0.770713,0.767221,0.767006
8,0.7304,0.620497,0.7699,0.76961,0.769489,0.768992
9,0.727,0.62658,0.7617,0.766594,0.760846,0.762128
10,0.7242,0.619362,0.7716,0.774503,0.771446,0.770625


[I 2025-04-07 12:11:41,352] Trial 98 finished with value: 0.7706245722097269 and parameters: {'learning_rate': 0.001003222583710779, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 68 with value: 0.771047008351801.


Trial 99 with params: {'learning_rate': 0.0017422533204379319, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8152,0.655335,0.751,0.757358,0.750678,0.750471
2,0.7692,0.64649,0.7543,0.771085,0.754224,0.754972
3,0.7614,0.643865,0.76,0.773569,0.759142,0.759857
4,0.7563,0.630735,0.7664,0.772656,0.765598,0.765726


[I 2025-04-07 12:17:16,892] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0011122950327189845, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8225,0.652851,0.7517,0.755788,0.751436,0.751389
2,0.7581,0.634238,0.7682,0.772425,0.767898,0.768384
3,0.7495,0.635286,0.763,0.771719,0.762514,0.761763
4,0.7449,0.623273,0.7702,0.775436,0.769219,0.769322
5,0.7406,0.620299,0.7707,0.772016,0.770397,0.768955
6,0.7369,0.628883,0.7611,0.772374,0.760689,0.762936
7,0.735,0.626367,0.7679,0.771378,0.767915,0.76772
8,0.7313,0.620456,0.7698,0.769464,0.769404,0.768845
9,0.7277,0.62692,0.7618,0.766843,0.760924,0.762231
10,0.7245,0.619311,0.7718,0.774649,0.77165,0.770823


[I 2025-04-07 12:31:25,014] Trial 100 finished with value: 0.7708225692861899 and parameters: {'learning_rate': 0.0011122950327189845, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}. Best is trial 68 with value: 0.771047008351801.


Trial 101 with params: {'learning_rate': 0.0006548738468781863, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8423,0.658503,0.754,0.759514,0.753572,0.753748
2,0.7538,0.642085,0.7647,0.768903,0.764284,0.764683
3,0.7426,0.633277,0.7651,0.767038,0.764852,0.76322
4,0.7373,0.619017,0.7723,0.776127,0.771593,0.771439
5,0.7338,0.618802,0.7718,0.773687,0.771345,0.770282
6,0.7306,0.630233,0.7593,0.768904,0.758936,0.760926
7,0.73,0.630039,0.7662,0.772028,0.766318,0.766106
8,0.7275,0.621908,0.7686,0.768749,0.768138,0.76782


[I 2025-04-07 12:42:34,145] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.001057910969657289, 'weight_decay': 0.007, 'warmup_steps': 14, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8246,0.652466,0.7518,0.755817,0.751519,0.751486
2,0.7573,0.634197,0.7683,0.771402,0.767993,0.768153
3,0.7485,0.635385,0.7624,0.771883,0.76194,0.761193
4,0.7439,0.622391,0.7717,0.776422,0.770765,0.770851
5,0.7398,0.620493,0.77,0.771214,0.769695,0.768192
6,0.7361,0.628832,0.7611,0.772314,0.760696,0.762916
7,0.7344,0.626635,0.7676,0.771024,0.767617,0.767406
8,0.7309,0.620463,0.7692,0.768913,0.768795,0.768275


[I 2025-04-07 12:54:11,755] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0022200207360307494, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8325,0.658822,0.7526,0.753992,0.752191,0.750542
2,0.7794,0.650571,0.7511,0.767442,0.751348,0.751657


[I 2025-04-07 12:56:53,273] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0031268777922674553, 'weight_decay': 0.007, 'warmup_steps': 14, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8346,0.693677,0.7377,0.750311,0.737716,0.73576
2,0.7987,0.658784,0.7536,0.761755,0.753533,0.753653
3,0.787,0.656124,0.7513,0.76298,0.750556,0.750945
4,0.779,0.634126,0.7616,0.771622,0.760849,0.763169


[I 2025-04-07 13:02:31,293] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0008227778546752969, 'weight_decay': 0.01, 'warmup_steps': 16, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8338,0.65337,0.7526,0.757871,0.752288,0.75231
2,0.7546,0.637978,0.7681,0.770308,0.767775,0.767462
3,0.7445,0.632899,0.7659,0.772113,0.765523,0.764552
4,0.7396,0.620984,0.7731,0.777119,0.772347,0.772331
5,0.7362,0.620004,0.7711,0.772882,0.770716,0.769267
6,0.7327,0.629243,0.76,0.770607,0.759576,0.761744
7,0.7318,0.62892,0.7672,0.771841,0.767305,0.766992
8,0.7289,0.620965,0.7696,0.769482,0.769162,0.768755


[I 2025-04-07 13:13:30,046] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0006982420981323802, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8313,0.656209,0.7533,0.758651,0.752899,0.753006
2,0.7537,0.641141,0.7656,0.76948,0.765213,0.765441
3,0.7429,0.632713,0.7653,0.768275,0.764999,0.763493
4,0.7378,0.619578,0.7725,0.776388,0.771788,0.771672
5,0.7344,0.619107,0.7717,0.773605,0.771262,0.770091
6,0.7311,0.629839,0.759,0.768693,0.758622,0.760665
7,0.7304,0.629861,0.7662,0.771816,0.766327,0.766079
8,0.7278,0.621621,0.7688,0.768849,0.768354,0.767991


[I 2025-04-07 13:24:37,369] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0005413838540835159, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.30000000000000004, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8513,0.664618,0.75,0.755157,0.749476,0.749596
2,0.7545,0.642359,0.766,0.769867,0.765519,0.76593
3,0.7421,0.634831,0.7648,0.765386,0.764613,0.762769
4,0.7364,0.618504,0.7724,0.776108,0.771767,0.771266
5,0.7326,0.61848,0.7713,0.77308,0.770837,0.770001
6,0.7295,0.63161,0.7591,0.768707,0.758762,0.760831
7,0.7291,0.63018,0.7659,0.771791,0.765991,0.765845
8,0.7267,0.622853,0.7679,0.768106,0.767437,0.767125


[I 2025-04-07 13:35:42,303] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.00045647890833104847, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8555,0.66861,0.7472,0.75091,0.746681,0.746498
2,0.756,0.642637,0.7649,0.767519,0.764464,0.764405


[I 2025-04-07 13:38:26,735] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0011419062611767409, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8215,0.653114,0.7507,0.754688,0.750454,0.750328
2,0.7586,0.634452,0.7677,0.77288,0.767365,0.768109
3,0.7501,0.635283,0.7628,0.771312,0.762309,0.761555
4,0.7455,0.623937,0.7703,0.775863,0.769302,0.769488
5,0.7411,0.620137,0.7708,0.772271,0.770502,0.769121
6,0.7374,0.62895,0.7608,0.772205,0.760391,0.762694
7,0.7353,0.626282,0.7685,0.772031,0.768502,0.768327
8,0.7315,0.620467,0.7698,0.769513,0.769413,0.768861
9,0.7278,0.627024,0.7617,0.766759,0.760826,0.762132
10,0.7246,0.619298,0.7719,0.774789,0.771753,0.770931


[I 2025-04-07 13:52:33,476] Trial 109 finished with value: 0.770931117904081 and parameters: {'learning_rate': 0.0011419062611767409, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 68 with value: 0.771047008351801.


Trial 110 with params: {'learning_rate': 0.00026046496674044905, 'weight_decay': 0.003, 'warmup_steps': 8, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9151,0.689162,0.7366,0.73865,0.736006,0.73545
2,0.7688,0.654245,0.7582,0.759242,0.757881,0.757131
3,0.75,0.644739,0.7577,0.760569,0.757607,0.756125
4,0.7413,0.629906,0.7648,0.771643,0.764451,0.76375


[I 2025-04-07 13:58:06,973] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0014439116858802404, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8184,0.656226,0.75,0.755799,0.749755,0.74972
2,0.7639,0.64021,0.7576,0.770846,0.757391,0.75858


[I 2025-04-07 14:00:56,268] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0019380910382424488, 'weight_decay': 0.01, 'warmup_steps': 7, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.818,0.653514,0.753,0.756693,0.752651,0.75193
2,0.7732,0.649958,0.7523,0.770185,0.752382,0.752674
3,0.7651,0.645707,0.7591,0.773939,0.758186,0.759182
4,0.7598,0.627247,0.7657,0.770949,0.764964,0.765283


[I 2025-04-07 14:06:34,928] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0004159495593249149, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.874,0.671422,0.7441,0.747268,0.743586,0.74326
2,0.7576,0.644192,0.7635,0.765505,0.763114,0.762705


[I 2025-04-07 14:09:25,444] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0009056774422885444, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8232,0.652228,0.7525,0.757555,0.752194,0.752239
2,0.7552,0.636008,0.7676,0.769273,0.767304,0.766882
3,0.7458,0.634159,0.7628,0.771103,0.762414,0.761511
4,0.741,0.621356,0.7725,0.776501,0.771703,0.771732
5,0.7375,0.620295,0.7709,0.772341,0.770551,0.769034
6,0.7339,0.629001,0.7602,0.771054,0.759775,0.761948
7,0.7327,0.628043,0.7669,0.770707,0.766965,0.766647
8,0.7296,0.620681,0.77,0.769773,0.769577,0.769111
9,0.7264,0.626352,0.762,0.766744,0.761144,0.762391
10,0.7239,0.619442,0.7718,0.77481,0.771653,0.770832


[I 2025-04-07 14:23:37,106] Trial 114 finished with value: 0.7708319353466178 and parameters: {'learning_rate': 0.0009056774422885444, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 68 with value: 0.771047008351801.


Trial 115 with params: {'learning_rate': 0.000658799017748327, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8325,0.657928,0.753,0.758543,0.752582,0.752742
2,0.7536,0.641821,0.7651,0.769317,0.764699,0.765095
3,0.7425,0.633201,0.7654,0.767382,0.765138,0.763494
4,0.7373,0.61899,0.7727,0.776488,0.772004,0.771846
5,0.7338,0.618787,0.772,0.773844,0.771542,0.770473
6,0.7306,0.630167,0.7594,0.76901,0.759038,0.761025
7,0.7301,0.630007,0.7662,0.772002,0.766316,0.7661
8,0.7275,0.62187,0.7684,0.768571,0.767945,0.76763


[I 2025-04-07 14:34:44,329] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0017578960938125499, 'weight_decay': 0.008, 'warmup_steps': 10, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8185,0.654793,0.7513,0.757347,0.750993,0.750743
2,0.7696,0.646944,0.7534,0.770291,0.753341,0.75399
3,0.7617,0.644064,0.7602,0.773816,0.759335,0.760063
4,0.7567,0.630353,0.7666,0.772838,0.765811,0.766007


[I 2025-04-07 14:40:11,399] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0013916423804027373, 'weight_decay': 0.006, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8167,0.655901,0.7505,0.755987,0.750278,0.750232
2,0.7629,0.638948,0.7596,0.771283,0.759373,0.760522


[I 2025-04-07 14:43:01,829] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0005136804308213444, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8495,0.665801,0.7485,0.753254,0.747982,0.748012
2,0.7548,0.642177,0.7659,0.769477,0.765415,0.765717
3,0.7421,0.635266,0.7642,0.7646,0.764015,0.762172
4,0.7363,0.618947,0.7715,0.775366,0.770879,0.770306


[I 2025-04-07 14:48:45,174] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0008894488646859071, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8362,0.652868,0.7524,0.757555,0.752109,0.752142
2,0.7553,0.636428,0.7676,0.769324,0.767287,0.766874
3,0.7456,0.634023,0.7634,0.771437,0.763017,0.762099
4,0.7408,0.621368,0.7729,0.776839,0.772114,0.7721
5,0.7373,0.620278,0.7705,0.772035,0.770146,0.768649
6,0.7337,0.629056,0.7603,0.771164,0.759875,0.76205
7,0.7325,0.6282,0.7673,0.771199,0.767373,0.767032
8,0.7294,0.620732,0.7698,0.76958,0.769378,0.768913
9,0.7264,0.626334,0.762,0.766717,0.761148,0.76239
10,0.7239,0.619457,0.7717,0.774682,0.771548,0.770727


[I 2025-04-07 15:03:15,581] Trial 119 finished with value: 0.7707272841440134 and parameters: {'learning_rate': 0.0008894488646859071, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 68 with value: 0.771047008351801.


Trial 120 with params: {'learning_rate': 0.0030673702350425963, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8307,0.690275,0.7413,0.752245,0.741194,0.739324
2,0.7974,0.656402,0.7543,0.761786,0.754199,0.754457


[I 2025-04-07 15:06:05,267] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.0013774158434911351, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8127,0.655453,0.7507,0.75599,0.750474,0.750403
2,0.7626,0.638462,0.7603,0.771723,0.760053,0.761287
3,0.7546,0.637504,0.7632,0.771649,0.762551,0.762182
4,0.7498,0.631679,0.7675,0.775634,0.766404,0.766608


[I 2025-04-07 15:11:47,906] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.000997766320442339, 'weight_decay': 0.01, 'warmup_steps': 5, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.821,0.651974,0.7525,0.756855,0.752202,0.752195
2,0.7563,0.634617,0.7676,0.769923,0.76727,0.767185
3,0.7474,0.635266,0.7626,0.771975,0.762163,0.761351
4,0.7427,0.621787,0.7717,0.775917,0.770838,0.770908


[I 2025-04-07 15:17:40,340] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0013447443410259095, 'weight_decay': 0.007, 'warmup_steps': 10, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.819,0.655534,0.7514,0.756562,0.751188,0.75109
2,0.7622,0.637934,0.7611,0.771817,0.760836,0.762007
3,0.754,0.637021,0.7636,0.771649,0.762978,0.76249
4,0.7493,0.630786,0.7683,0.776153,0.767202,0.767363


[I 2025-04-07 15:23:38,416] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0004334031470282451, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8631,0.66998,0.7462,0.749663,0.745677,0.745419
2,0.7568,0.643376,0.7638,0.766026,0.763387,0.763093


[I 2025-04-07 15:26:42,633] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0034878517078439164, 'weight_decay': 0.002, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8455,0.722531,0.7242,0.747629,0.72457,0.720868
2,0.8052,0.671141,0.7477,0.75961,0.747785,0.74741
3,0.794,0.658466,0.7501,0.759185,0.749393,0.750134
4,0.7842,0.642388,0.7585,0.76932,0.757644,0.760376


[I 2025-04-07 15:32:20,640] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0009049791490282845, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8362,0.652773,0.7524,0.757377,0.752102,0.752101
2,0.7554,0.636113,0.7683,0.769942,0.76799,0.767595
3,0.7459,0.634285,0.762,0.770443,0.761596,0.760724
4,0.7411,0.62142,0.7723,0.776263,0.771504,0.771524
5,0.7375,0.62033,0.7706,0.772058,0.770258,0.768754
6,0.7339,0.629022,0.76,0.770876,0.759577,0.76178
7,0.7327,0.628032,0.7668,0.770504,0.766865,0.766523
8,0.7296,0.620685,0.7701,0.769877,0.769677,0.769216
9,0.7265,0.626362,0.762,0.766769,0.761144,0.762401
10,0.7239,0.619449,0.7716,0.77456,0.771448,0.770611


[I 2025-04-07 15:46:55,910] Trial 126 finished with value: 0.7706111150248297 and parameters: {'learning_rate': 0.0009049791490282845, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 3.5}. Best is trial 68 with value: 0.771047008351801.


Trial 127 with params: {'learning_rate': 0.0006461276985836845, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8391,0.658791,0.7539,0.759544,0.753468,0.753682
2,0.7537,0.642145,0.7651,0.769366,0.764677,0.765121
3,0.7425,0.633384,0.7654,0.767171,0.765148,0.763517
4,0.7372,0.618882,0.7726,0.776472,0.771899,0.771761
5,0.7337,0.618714,0.7718,0.773609,0.771344,0.770284
6,0.7305,0.630311,0.7592,0.768841,0.758842,0.760846
7,0.7299,0.630052,0.7663,0.772191,0.766419,0.766223
8,0.7274,0.621971,0.7685,0.768663,0.768041,0.767726


[I 2025-04-07 15:58:12,986] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0004825313546103289, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8514,0.667302,0.7477,0.751846,0.747183,0.747072
2,0.7553,0.642229,0.7649,0.767996,0.76443,0.764569
3,0.7423,0.635894,0.7637,0.764166,0.763546,0.76171
4,0.7363,0.619787,0.7706,0.774689,0.770029,0.769346
5,0.7322,0.618884,0.7708,0.772322,0.770344,0.769559
6,0.7291,0.632437,0.7593,0.769146,0.758952,0.761057
7,0.7288,0.630237,0.767,0.772574,0.767039,0.766894
8,0.7265,0.623501,0.7671,0.767429,0.766643,0.766368


[I 2025-04-07 16:09:10,298] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.000128606050919097, 'weight_decay': 0.004, 'warmup_steps': 13, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0233,0.745322,0.7161,0.718023,0.715242,0.714261
2,0.8041,0.682665,0.7414,0.742763,0.74097,0.74068


[I 2025-04-07 16:11:58,625] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0013055945701646568, 'weight_decay': 0.005, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8174,0.65499,0.7509,0.755914,0.750689,0.750679
2,0.7614,0.637077,0.7619,0.771516,0.761596,0.762771


[I 2025-04-07 16:14:42,014] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0005612567161548509, 'weight_decay': 0.01, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8611,0.664139,0.7509,0.756117,0.750395,0.750516
2,0.7545,0.64265,0.7659,0.769953,0.765428,0.765863
3,0.7422,0.634624,0.7656,0.766307,0.765404,0.763595
4,0.7366,0.618392,0.7735,0.777301,0.772832,0.772493
5,0.7328,0.61847,0.771,0.772696,0.770536,0.769645
6,0.7297,0.631348,0.7592,0.768693,0.758859,0.76084
7,0.7293,0.630199,0.7659,0.771728,0.765994,0.765838
8,0.7269,0.622664,0.7678,0.768023,0.767339,0.76704


[I 2025-04-07 16:25:38,462] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.001171378659338834, 'weight_decay': 0.005, 'warmup_steps': 6, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8186,0.653291,0.7506,0.754667,0.750367,0.750223
2,0.7591,0.634761,0.7669,0.772731,0.766552,0.767368
3,0.7506,0.635322,0.7624,0.770796,0.761885,0.761202
4,0.746,0.624666,0.7698,0.77577,0.768767,0.768934


[I 2025-04-07 16:31:05,613] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0009807989001418627, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8192,0.65183,0.7525,0.756984,0.752213,0.752218
2,0.7561,0.634764,0.7679,0.770038,0.767575,0.767438
3,0.747,0.635123,0.7623,0.771777,0.761883,0.761075
4,0.7424,0.621663,0.7716,0.775827,0.770743,0.770798
5,0.7386,0.620493,0.7702,0.771381,0.769894,0.768278
6,0.7349,0.628854,0.7608,0.771799,0.760384,0.762586
7,0.7335,0.627268,0.767,0.770539,0.767037,0.766801
8,0.7302,0.620528,0.77,0.7697,0.769588,0.769085
9,0.7269,0.626519,0.7618,0.76666,0.76094,0.762209
10,0.7241,0.619378,0.7716,0.774485,0.771453,0.770614


[I 2025-04-07 16:44:43,080] Trial 133 finished with value: 0.7706136780750437 and parameters: {'learning_rate': 0.0009807989001418627, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 68 with value: 0.771047008351801.


Trial 134 with params: {'learning_rate': 0.0007078235644524678, 'weight_decay': 0.005, 'warmup_steps': 10, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8364,0.656044,0.7538,0.759145,0.753404,0.753504
2,0.7539,0.640997,0.7665,0.770235,0.766123,0.766301
3,0.743,0.632629,0.7654,0.768567,0.765065,0.763644
4,0.7379,0.619756,0.7719,0.775793,0.771185,0.771072
5,0.7345,0.619204,0.7714,0.773299,0.77097,0.769768
6,0.7312,0.629777,0.7594,0.76908,0.759017,0.761044
7,0.7305,0.629821,0.7662,0.771782,0.766327,0.766073
8,0.7279,0.621554,0.7686,0.768621,0.768155,0.76777


[I 2025-04-07 16:55:46,933] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0022168006989622636, 'weight_decay': 0.005, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8236,0.659038,0.7526,0.754117,0.752192,0.750528
2,0.7791,0.65074,0.7509,0.76728,0.751144,0.75145
3,0.7704,0.64609,0.7585,0.774215,0.757617,0.758913
4,0.7646,0.625198,0.7675,0.772833,0.766768,0.767541


[I 2025-04-07 17:01:17,700] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 0.0003692612841231174, 'weight_decay': 0.002, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.881,0.674354,0.7423,0.744816,0.741756,0.741324
2,0.7597,0.646587,0.7628,0.764287,0.762479,0.761819


[I 2025-04-07 17:04:07,283] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0014655651353288315, 'weight_decay': 0.003, 'warmup_steps': 10, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8183,0.656253,0.75,0.755872,0.749757,0.749646
2,0.7643,0.640744,0.7574,0.771354,0.757207,0.75836
3,0.7563,0.638999,0.7625,0.772408,0.76179,0.761677
4,0.7515,0.633608,0.7662,0.774538,0.765125,0.765225


[I 2025-04-07 17:09:42,272] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0012032710688641614, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8202,0.653761,0.7511,0.755438,0.750861,0.750782
2,0.7596,0.63522,0.7661,0.772569,0.765772,0.766619
3,0.7513,0.63547,0.7635,0.771525,0.762958,0.762358
4,0.7466,0.62569,0.77,0.776286,0.768951,0.769141
5,0.742,0.619722,0.7702,0.771772,0.76986,0.76869
6,0.7383,0.629171,0.7597,0.771171,0.759313,0.761631
7,0.736,0.626232,0.7679,0.771769,0.767897,0.767786
8,0.732,0.620505,0.7694,0.769089,0.769033,0.768438
9,0.7282,0.627236,0.7617,0.766887,0.760818,0.762145
10,0.7248,0.61928,0.7717,0.774603,0.771552,0.770738


[I 2025-04-07 17:23:51,540] Trial 138 finished with value: 0.7707375710540674 and parameters: {'learning_rate': 0.0012032710688641614, 'weight_decay': 0.002, 'warmup_steps': 10, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 68 with value: 0.771047008351801.


Trial 139 with params: {'learning_rate': 0.0018115708376665036, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8271,0.654413,0.7525,0.757987,0.752154,0.751909
2,0.7708,0.648061,0.7537,0.771004,0.753683,0.754139


[I 2025-04-07 17:26:42,846] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0018027738075844498, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8206,0.654128,0.7529,0.758376,0.75257,0.752279
2,0.7705,0.647839,0.7533,0.770574,0.753278,0.753754
3,0.7626,0.644612,0.7599,0.773905,0.759006,0.759842
4,0.7575,0.629388,0.7666,0.772473,0.765831,0.766036


[I 2025-04-07 17:32:10,459] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0011111221341183373, 'weight_decay': 0.006, 'warmup_steps': 7, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8197,0.652689,0.7518,0.755789,0.751539,0.751481
2,0.758,0.63423,0.7682,0.772422,0.767895,0.768362
3,0.7495,0.635297,0.7627,0.771508,0.762218,0.761458
4,0.7449,0.623209,0.7704,0.775617,0.769419,0.769521


[I 2025-04-07 17:37:45,570] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0010811891653463791, 'weight_decay': 0.002, 'warmup_steps': 15, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8248,0.652682,0.7515,0.755632,0.751222,0.751217
2,0.7576,0.634163,0.7685,0.772001,0.768193,0.768453
3,0.749,0.635369,0.7623,0.77145,0.761826,0.761091
4,0.7443,0.62272,0.7716,0.77653,0.770656,0.770749
5,0.7402,0.620433,0.7703,0.77154,0.769998,0.76851
6,0.7365,0.62884,0.7611,0.772303,0.760694,0.762921
7,0.7347,0.626506,0.768,0.771462,0.768015,0.767828
8,0.7311,0.620457,0.7691,0.768783,0.7687,0.768157


[I 2025-04-07 17:48:46,903] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.002663651745866679, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8227,0.671495,0.7491,0.752642,0.748626,0.746963
2,0.7888,0.642995,0.7559,0.761116,0.755999,0.756378


[I 2025-04-07 17:51:32,672] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 5.8193477735771966e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1852,0.892099,0.6784,0.679957,0.677326,0.676005
2,0.8949,0.756945,0.7185,0.720456,0.717996,0.71762
3,0.828,0.717306,0.7259,0.730824,0.725579,0.724192
4,0.8009,0.681276,0.7431,0.749434,0.742423,0.742019


[I 2025-04-07 17:57:10,452] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0005400007972757662, 'weight_decay': 0.002, 'warmup_steps': 8, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8502,0.664615,0.7501,0.755237,0.749576,0.74968
2,0.7545,0.642329,0.766,0.769851,0.765519,0.765926
3,0.7421,0.634845,0.7649,0.765494,0.764718,0.762875
4,0.7364,0.61852,0.7722,0.775907,0.771563,0.771043
5,0.7326,0.61848,0.7714,0.7732,0.770935,0.770106
6,0.7295,0.631628,0.7591,0.768732,0.758762,0.760842
7,0.7291,0.630178,0.7659,0.771791,0.765991,0.765845
8,0.7267,0.622865,0.7678,0.76803,0.767339,0.767033


[I 2025-04-07 18:08:18,362] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0012559808371099888, 'weight_decay': 0.001, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8186,0.654396,0.751,0.75581,0.750794,0.750748
2,0.7606,0.636119,0.7637,0.771846,0.763399,0.764451
3,0.7523,0.635896,0.7639,0.771543,0.763339,0.762695
4,0.7476,0.627529,0.7693,0.775834,0.76824,0.768332
5,0.7428,0.619369,0.7706,0.772492,0.770259,0.76919
6,0.739,0.629431,0.7599,0.771525,0.759541,0.761863
7,0.7366,0.62633,0.7672,0.771422,0.767175,0.767134
8,0.7325,0.620551,0.7698,0.769471,0.769429,0.76881
9,0.7285,0.627418,0.7621,0.767449,0.761213,0.762572
10,0.7249,0.619265,0.7717,0.774586,0.771545,0.770747


[I 2025-04-07 18:22:05,647] Trial 146 finished with value: 0.7707468502676488 and parameters: {'learning_rate': 0.0012559808371099888, 'weight_decay': 0.001, 'warmup_steps': 8, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 68 with value: 0.771047008351801.


Trial 147 with params: {'learning_rate': 0.0014768117334707435, 'weight_decay': 0.001, 'warmup_steps': 12, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8194,0.656303,0.7497,0.755704,0.749461,0.74934
2,0.7645,0.641029,0.7572,0.771388,0.757019,0.758074


[I 2025-04-07 18:24:49,490] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0007256138168366938, 'weight_decay': 0.008, 'warmup_steps': 6, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8328,0.655299,0.7538,0.759043,0.753415,0.753484
2,0.7539,0.640535,0.7678,0.771372,0.76743,0.76753
3,0.7432,0.632483,0.7655,0.769094,0.765174,0.763815
4,0.7381,0.619975,0.772,0.775938,0.771284,0.771198
5,0.7348,0.619339,0.7715,0.773393,0.771082,0.769855
6,0.7314,0.629653,0.7595,0.769368,0.75912,0.761165
7,0.7307,0.629712,0.7662,0.77166,0.76633,0.766046
8,0.728,0.621453,0.7686,0.768619,0.768159,0.767778


[I 2025-04-07 18:36:07,274] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.00069247511617349, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8375,0.656667,0.7537,0.759171,0.753298,0.753431
2,0.7538,0.641344,0.7655,0.7695,0.765109,0.765373
3,0.7429,0.632783,0.765,0.76772,0.764698,0.76317
4,0.7377,0.619541,0.7721,0.776025,0.771385,0.771273
5,0.7343,0.619081,0.7718,0.773689,0.771365,0.770208
6,0.731,0.629887,0.7592,0.768921,0.758825,0.760848
7,0.7304,0.629897,0.7662,0.771825,0.766327,0.766089
8,0.7277,0.621657,0.7688,0.768862,0.768354,0.767995


[I 2025-04-07 18:47:20,929] Trial 149 pruned. 


In [35]:
print(best_distill_head)

BestRun(run_id='68', objective=0.771047008351801, hyperparameters={'learning_rate': 0.0008350401275702886, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}, run_summary=None)


In [14]:
base.reset_seed()

In [15]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained_hp-search", epochs=num_epochs, batch_size=batch_size)

In [16]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

In [17]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [18]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(10)
)
  

In [None]:
best_base_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-01 22:33:40,519] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4596,0.258939,0.9116,0.915043,0.911785,0.911368
2,0.1598,0.193215,0.9373,0.93789,0.937633,0.937199


[I 2025-04-01 22:37:42,262] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4847,0.330501,0.8928,0.897418,0.89309,0.892802
2,0.2498,0.261883,0.9116,0.913688,0.911897,0.911256
3,0.1611,0.242587,0.922,0.922544,0.922435,0.921389
4,0.1071,0.239998,0.9275,0.928043,0.927786,0.927322


[I 2025-04-01 22:45:41,256] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6549,0.280514,0.9025,0.90479,0.902698,0.902248
2,0.2363,0.208545,0.9289,0.92957,0.929122,0.928992


[I 2025-04-01 22:49:41,809] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5807,0.424856,0.858,0.866648,0.85809,0.857939
2,0.3234,0.300635,0.9005,0.902773,0.900615,0.900084
3,0.2222,0.253618,0.9165,0.916362,0.91682,0.916057
4,0.1523,0.245073,0.9244,0.924922,0.9244,0.924195


[I 2025-04-01 22:57:42,177] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7004,0.49795,0.8329,0.839325,0.83336,0.83211
2,0.416,0.368345,0.8774,0.880401,0.877676,0.876772


[I 2025-04-01 23:01:43,850] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5364,0.253377,0.9121,0.915445,0.912267,0.911884
2,0.177,0.189681,0.935,0.935363,0.935339,0.934906
3,0.077,0.226532,0.9334,0.933752,0.933946,0.933006
4,0.0336,0.247574,0.9333,0.935029,0.933397,0.933268
5,0.0153,0.244396,0.9407,0.940726,0.940968,0.94055
6,0.0083,0.270599,0.9386,0.939487,0.938865,0.938749
7,0.0038,0.294375,0.9358,0.936837,0.936163,0.935789
8,0.0022,0.262632,0.9422,0.942706,0.942396,0.942246


[I 2025-04-01 23:17:45,830] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4493,0.274062,0.9068,0.913297,0.907036,0.906187
2,0.1763,0.195734,0.9352,0.935676,0.935569,0.934999
3,0.095,0.217913,0.9316,0.932597,0.931847,0.931431
4,0.0563,0.243946,0.9348,0.935429,0.934911,0.934726


[I 2025-04-01 23:25:45,394] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5645,0.260435,0.9089,0.912147,0.90903,0.908718
2,0.1928,0.193117,0.9347,0.935183,0.935003,0.934737
3,0.0882,0.230885,0.9291,0.929558,0.929566,0.928725
4,0.0386,0.253303,0.9321,0.933788,0.932278,0.932043
5,0.017,0.241608,0.9399,0.940012,0.940033,0.939874
6,0.0092,0.273631,0.9375,0.938419,0.937748,0.937656
7,0.0051,0.284439,0.933,0.933767,0.933314,0.932962
8,0.0025,0.279241,0.9357,0.936283,0.935856,0.93573


[I 2025-04-01 23:41:44,960] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.429,0.276451,0.9062,0.910149,0.906231,0.906036
2,0.1808,0.223711,0.9268,0.928942,0.927123,0.926638
3,0.1013,0.211734,0.9349,0.934825,0.93514,0.934678
4,0.0634,0.221087,0.9381,0.939287,0.938144,0.937998
5,0.0354,0.232375,0.9404,0.940913,0.940554,0.940421
6,0.022,0.271839,0.9373,0.93892,0.937467,0.937396
7,0.0129,0.259222,0.9453,0.946062,0.945662,0.945345
8,0.0045,0.244425,0.9495,0.949972,0.949584,0.949597
9,0.002,0.252028,0.9523,0.95345,0.952423,0.952623
10,0.0008,0.238196,0.9522,0.952345,0.952488,0.952157


[I 2025-04-02 00:01:45,934] Trial 8 finished with value: 0.9521566790618318 and parameters: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.9521566790618318.


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4394,0.273684,0.9052,0.907007,0.905281,0.904879
2,0.2054,0.239878,0.9236,0.925431,0.924061,0.923221


[I 2025-04-02 00:05:45,090] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6129,0.275757,0.9034,0.906117,0.903579,0.903096
2,0.2267,0.20745,0.9292,0.929905,0.929392,0.929283
3,0.1197,0.22189,0.9287,0.92892,0.929099,0.92842
4,0.0593,0.23871,0.9311,0.932593,0.93124,0.931199


[I 2025-04-02 00:13:42,706] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0008255712395727001, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4941,0.339162,0.8863,0.891814,0.886396,0.886323
2,0.261,0.294965,0.9065,0.909797,0.90712,0.905976


[I 2025-04-02 00:17:41,411] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 5.3573067071623195e-05, 'weight_decay': 0.0, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7275,0.296934,0.8966,0.899084,0.896827,0.896336
2,0.2652,0.218624,0.9246,0.92546,0.924844,0.924725
3,0.1558,0.222689,0.9284,0.928526,0.928849,0.928044
4,0.0898,0.232121,0.929,0.930588,0.929149,0.929108


[I 2025-04-02 00:25:39,553] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 5.372291923575569e-05, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6687,0.294927,0.8971,0.899439,0.897327,0.89677
2,0.2603,0.216586,0.9257,0.926453,0.92594,0.925826


[I 2025-04-02 00:29:38,082] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 8.840349414475647e-05, 'weight_decay': 0.006, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6058,0.26649,0.9066,0.909674,0.906784,0.906462
2,0.2027,0.198124,0.9327,0.933226,0.933003,0.932684
3,0.0974,0.221575,0.9302,0.930464,0.930702,0.929869
4,0.0433,0.243187,0.9324,0.933576,0.932643,0.932416
5,0.0193,0.243756,0.9382,0.938355,0.938374,0.9382
6,0.0097,0.275429,0.9346,0.935422,0.934879,0.934686
7,0.0059,0.286676,0.9328,0.933843,0.93313,0.932783
8,0.003,0.277253,0.9359,0.936732,0.936081,0.936038


[I 2025-04-02 00:45:34,922] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.000545387384751194, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4496,0.271639,0.907,0.910736,0.907077,0.907282
2,0.2048,0.228024,0.9266,0.927644,0.926682,0.926255
3,0.1235,0.226007,0.9289,0.928836,0.929214,0.928594
4,0.0764,0.214245,0.9388,0.938965,0.938908,0.938699


[I 2025-04-02 00:53:32,632] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.00017466177826022436, 'weight_decay': 0.007, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4734,0.242073,0.9176,0.921187,0.917709,0.917401
2,0.1577,0.185244,0.9399,0.940699,0.940312,0.939873
3,0.0686,0.216283,0.9364,0.936788,0.936904,0.936227
4,0.0332,0.269463,0.9322,0.934783,0.932472,0.932219
5,0.0176,0.237783,0.9431,0.942983,0.943436,0.942952
6,0.0099,0.281854,0.9386,0.939827,0.938816,0.938752
7,0.0049,0.288208,0.9403,0.941364,0.94064,0.940395
8,0.0022,0.265472,0.9437,0.94444,0.94391,0.943841
9,0.0011,0.272403,0.9458,0.947792,0.945872,0.946389
10,0.0007,0.259449,0.9437,0.944205,0.944066,0.943562


[I 2025-04-02 01:13:34,792] Trial 16 finished with value: 0.9435619186046218 and parameters: {'learning_rate': 0.00017466177826022436, 'weight_decay': 0.007, 'warmup_steps': 12}. Best is trial 8 with value: 0.9521566790618318.


Trial 17 with params: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 26}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6687,0.499758,0.8363,0.84381,0.836724,0.835742
2,0.3894,0.341009,0.8897,0.891641,0.890034,0.88914


[I 2025-04-02 01:17:34,553] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.00022338791112731283, 'weight_decay': 0.006, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4563,0.243034,0.9166,0.919652,0.916659,0.916623
2,0.1566,0.191471,0.9386,0.939426,0.939028,0.938286
3,0.0721,0.212654,0.9354,0.935769,0.935848,0.935239
4,0.0387,0.243186,0.9361,0.937124,0.936367,0.936067


[I 2025-04-02 01:25:32,855] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.002961935479501581, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7642,0.545277,0.8123,0.827151,0.812526,0.81119
2,0.4592,0.419529,0.8567,0.862826,0.856901,0.856486
3,0.3297,0.347155,0.8799,0.881142,0.880589,0.878827
4,0.243,0.302252,0.9006,0.901041,0.900665,0.900001


[I 2025-04-02 01:33:31,401] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0007288044441792408, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4612,0.318291,0.895,0.897735,0.895141,0.894965
2,0.2419,0.250544,0.9188,0.920429,0.919288,0.918262


[I 2025-04-02 01:37:29,979] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.00012506179601739963, 'weight_decay': 0.006, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4982,0.252683,0.9119,0.915816,0.912031,0.911725
2,0.1709,0.188821,0.9351,0.935883,0.935412,0.935051


[I 2025-04-02 07:09:22,182] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.00041039947079278945, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4225,0.277413,0.9076,0.911268,0.907645,0.907573
2,0.1827,0.213697,0.9286,0.929813,0.928891,0.92829
3,0.1012,0.214136,0.933,0.933385,0.933314,0.932642
4,0.06,0.226174,0.9365,0.9371,0.936661,0.936493
5,0.0363,0.226138,0.9443,0.944496,0.944577,0.944175
6,0.0225,0.258011,0.9437,0.944562,0.943867,0.943931
7,0.0114,0.267923,0.9437,0.944644,0.944023,0.943614
8,0.0048,0.238382,0.9527,0.952877,0.952844,0.952766
9,0.002,0.245463,0.9517,0.952589,0.951688,0.951934
10,0.0008,0.245073,0.9496,0.949946,0.949831,0.949422


[I 2025-04-02 07:29:25,380] Trial 44 finished with value: 0.9494217412429234 and parameters: {'learning_rate': 0.00041039947079278945, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 8 with value: 0.9521566790618318.


Trial 45 with params: {'learning_rate': 0.000505806103535046, 'weight_decay': 0.008, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4348,0.275241,0.9045,0.90782,0.904594,0.904308
2,0.1948,0.211046,0.9313,0.932147,0.931474,0.931175
3,0.1158,0.23589,0.929,0.929013,0.929501,0.928336
4,0.0742,0.2302,0.9358,0.936626,0.936011,0.935674


[I 2025-04-02 07:37:27,082] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0004633522311354366, 'weight_decay': 0.001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4452,0.276691,0.9055,0.908995,0.905766,0.905528
2,0.189,0.208018,0.9317,0.932329,0.931991,0.931385
3,0.1097,0.213408,0.9312,0.931597,0.931601,0.930916
4,0.0678,0.236832,0.9367,0.937203,0.936696,0.93642
5,0.0411,0.237798,0.9393,0.939968,0.939367,0.939253
6,0.0251,0.229367,0.9445,0.944957,0.944636,0.944614
7,0.0133,0.252009,0.9449,0.94564,0.945146,0.944957
8,0.0058,0.209962,0.953,0.953163,0.953154,0.953057
9,0.0017,0.23671,0.9527,0.953756,0.952793,0.952989
10,0.0007,0.229799,0.9512,0.951564,0.951495,0.951107


[I 2025-04-02 07:57:26,632] Trial 46 finished with value: 0.9511073105974992 and parameters: {'learning_rate': 0.0004633522311354366, 'weight_decay': 0.001, 'warmup_steps': 13}. Best is trial 8 with value: 0.9521566790618318.


Trial 47 with params: {'learning_rate': 0.0005251428386846794, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4562,0.295552,0.9026,0.908226,0.902813,0.902594
2,0.2059,0.227478,0.9258,0.927328,0.926107,0.925393


[I 2025-04-02 08:01:25,725] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0005876706272090374, 'weight_decay': 0.003, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4616,0.31186,0.8943,0.899844,0.894391,0.89425
2,0.2154,0.243152,0.9217,0.923673,0.922015,0.921072


[I 2025-04-02 08:05:25,411] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0005240492841791668, 'weight_decay': 0.0, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4601,0.294125,0.8991,0.902543,0.899258,0.899096
2,0.2052,0.23688,0.9252,0.926548,0.925564,0.92471
3,0.1232,0.215748,0.9297,0.92937,0.93007,0.929429
4,0.0759,0.231211,0.9346,0.935469,0.934538,0.934599


[I 2025-04-02 08:13:24,278] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7525,0.513382,0.8241,0.830767,0.824258,0.823755
2,0.4475,0.355347,0.8787,0.882587,0.87857,0.878358
3,0.3218,0.346968,0.8845,0.886814,0.885203,0.883763
4,0.2318,0.29362,0.9043,0.904922,0.904332,0.904271


[I 2025-04-02 08:21:21,379] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.00035902365609060815, 'weight_decay': 0.006, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4257,0.26931,0.9107,0.91533,0.910812,0.910946
2,0.1722,0.207761,0.9334,0.935085,0.933855,0.933056
3,0.0939,0.210878,0.9349,0.934839,0.935315,0.93467
4,0.0553,0.22591,0.9406,0.941474,0.94066,0.940409
5,0.0326,0.209824,0.9452,0.945232,0.94536,0.94511
6,0.0191,0.274986,0.9394,0.940647,0.939575,0.939608
7,0.0116,0.267393,0.9427,0.94363,0.942977,0.942702
8,0.004,0.246826,0.9496,0.949904,0.949818,0.949632
9,0.0014,0.256916,0.9498,0.951537,0.949816,0.950258
10,0.0007,0.252707,0.9489,0.949472,0.949153,0.948807


[I 2025-04-02 08:41:14,356] Trial 51 finished with value: 0.948806827216574 and parameters: {'learning_rate': 0.00035902365609060815, 'weight_decay': 0.006, 'warmup_steps': 5}. Best is trial 8 with value: 0.9521566790618318.


Trial 52 with params: {'learning_rate': 0.004803130612126116, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8818,0.605223,0.7976,0.804939,0.797663,0.796538
2,0.5253,0.463361,0.8437,0.849495,0.84401,0.84378
3,0.3926,0.387699,0.8719,0.871826,0.872407,0.870413
4,0.3023,0.336368,0.8871,0.887349,0.887226,0.88596


[I 2025-04-02 08:49:11,227] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.0004756716108624255, 'weight_decay': 0.0, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4384,0.316162,0.8936,0.900613,0.893691,0.893776
2,0.1943,0.211697,0.9305,0.931066,0.930803,0.930165


[I 2025-04-02 08:53:10,055] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0003260637812444757, 'weight_decay': 0.003, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4325,0.259569,0.9128,0.916688,0.912923,0.912946
2,0.1641,0.190041,0.937,0.937631,0.937205,0.936912
3,0.0864,0.221394,0.9323,0.933266,0.932737,0.931879
4,0.0511,0.225855,0.9378,0.938852,0.938006,0.937821
5,0.0292,0.224798,0.9451,0.945182,0.945254,0.945051
6,0.018,0.249922,0.9443,0.945116,0.944573,0.944385
7,0.0098,0.25364,0.948,0.948586,0.948348,0.948038
8,0.0042,0.239034,0.9494,0.94994,0.94968,0.949406
9,0.0014,0.248985,0.9508,0.952164,0.950868,0.951237
10,0.0007,0.232947,0.949,0.949228,0.94933,0.948924


[I 2025-04-02 09:13:06,766] Trial 54 finished with value: 0.9489238785986833 and parameters: {'learning_rate': 0.0003260637812444757, 'weight_decay': 0.003, 'warmup_steps': 10}. Best is trial 8 with value: 0.9521566790618318.


Trial 55 with params: {'learning_rate': 0.0002118035053063769, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4359,0.246776,0.9136,0.917753,0.913823,0.913595
2,0.1557,0.19444,0.9368,0.937709,0.937211,0.936601
3,0.0708,0.219716,0.9376,0.93763,0.938017,0.937149
4,0.0374,0.238469,0.9358,0.936556,0.936079,0.935802


[I 2025-04-02 09:21:01,956] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.004913837305728667, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9866,0.677755,0.769,0.787074,0.768877,0.769546
2,0.6023,0.519267,0.8249,0.831917,0.82477,0.824716
3,0.459,0.445733,0.8497,0.850483,0.849995,0.848184
4,0.3597,0.378192,0.8767,0.877494,0.876798,0.876284


[I 2025-04-02 09:28:56,484] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.001055942623842628, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5312,0.397367,0.8674,0.874856,0.867671,0.867553
2,0.2929,0.282527,0.9072,0.909607,0.907333,0.906592


[I 2025-04-02 09:33:06,501] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.00021771047684957567, 'weight_decay': 0.01, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4691,0.239897,0.9145,0.917677,0.91462,0.914352
2,0.1565,0.190422,0.9377,0.939113,0.938023,0.937619
3,0.0716,0.21307,0.9368,0.93709,0.93727,0.936576
4,0.0383,0.263822,0.9333,0.935238,0.93342,0.933479


[I 2025-04-02 09:41:04,750] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.00020125892142886225, 'weight_decay': 0.001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4717,0.240194,0.9152,0.918782,0.915315,0.91519
2,0.1568,0.186881,0.9384,0.939157,0.938681,0.938341
3,0.07,0.211892,0.9351,0.935094,0.935463,0.934857
4,0.0362,0.25026,0.9371,0.937637,0.937278,0.936925
5,0.0188,0.241305,0.9421,0.942373,0.94215,0.942069
6,0.0122,0.280925,0.939,0.940043,0.939339,0.939019
7,0.006,0.291921,0.9394,0.940336,0.939692,0.939422
8,0.0026,0.257418,0.9435,0.943907,0.943723,0.943624


[I 2025-04-02 09:56:58,485] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.0011700191952905836, 'weight_decay': 0.003, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5656,0.367279,0.8749,0.881345,0.875199,0.87502
2,0.3087,0.297849,0.9013,0.904194,0.901482,0.900865


[I 2025-04-02 10:00:57,332] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0003448810117040242, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4293,0.275853,0.9039,0.909786,0.904128,0.903937
2,0.1677,0.196095,0.936,0.937226,0.93624,0.935868
3,0.0886,0.217575,0.9347,0.934788,0.935035,0.934465
4,0.0532,0.236199,0.9372,0.93818,0.937222,0.937108
5,0.0315,0.22739,0.9451,0.945445,0.945102,0.945031
6,0.0195,0.242938,0.9445,0.945362,0.944692,0.944606
7,0.0107,0.25169,0.9458,0.946698,0.945995,0.94584
8,0.0037,0.241493,0.9507,0.9516,0.950658,0.950906
9,0.0016,0.270421,0.948,0.950004,0.947984,0.948508
10,0.0006,0.230747,0.9506,0.950795,0.95089,0.950564


[I 2025-04-02 10:20:52,270] Trial 61 finished with value: 0.9505640304789615 and parameters: {'learning_rate': 0.0003448810117040242, 'weight_decay': 0.003, 'warmup_steps': 7}. Best is trial 8 with value: 0.9521566790618318.


Trial 62 with params: {'learning_rate': 0.0011105964042853027, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.534,0.373523,0.877,0.884381,0.877043,0.876861
2,0.294,0.292457,0.9059,0.908104,0.90624,0.905498


[I 2025-04-02 10:24:55,955] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.00045375692612046855, 'weight_decay': 0.004, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4318,0.283093,0.9047,0.908875,0.904821,0.904351
2,0.1885,0.21577,0.9283,0.928865,0.928713,0.927832
3,0.1066,0.224973,0.9295,0.930147,0.929747,0.929389
4,0.0675,0.217545,0.9392,0.939453,0.939386,0.939064
5,0.0391,0.225563,0.9417,0.941916,0.941857,0.941552
6,0.0242,0.251059,0.9427,0.944788,0.94291,0.942893
7,0.0132,0.253253,0.9463,0.946876,0.946502,0.946266
8,0.0056,0.237393,0.9512,0.951496,0.951337,0.951204
9,0.0019,0.246278,0.9517,0.953448,0.951753,0.952229
10,0.0007,0.251356,0.949,0.949618,0.949279,0.948792


[I 2025-04-02 10:44:55,948] Trial 63 finished with value: 0.9487922013807746 and parameters: {'learning_rate': 0.00045375692612046855, 'weight_decay': 0.004, 'warmup_steps': 6}. Best is trial 8 with value: 0.9521566790618318.


Trial 64 with params: {'learning_rate': 0.0005366808566698341, 'weight_decay': 0.001, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4502,0.304069,0.8973,0.901244,0.897092,0.89701
2,0.2055,0.244418,0.9214,0.923199,0.921655,0.920879


[I 2025-04-02 10:48:53,280] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0001149609886084542, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5203,0.246703,0.9131,0.916003,0.91325,0.912896
2,0.1769,0.189915,0.9363,0.937023,0.936655,0.936361
3,0.0765,0.230839,0.9321,0.932434,0.932602,0.931718
4,0.0334,0.254491,0.936,0.937418,0.936215,0.936041


[I 2025-04-02 10:56:49,868] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0003702687807283022, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4202,0.274084,0.9063,0.911236,0.906363,0.906195
2,0.1758,0.210233,0.9355,0.936794,0.935728,0.935348
3,0.0944,0.223561,0.933,0.933253,0.933459,0.932524
4,0.0575,0.243934,0.9377,0.938457,0.937844,0.937594
5,0.0342,0.227017,0.9439,0.944019,0.944109,0.943929
6,0.02,0.258485,0.9402,0.941106,0.94048,0.940316
7,0.0114,0.267453,0.9422,0.942692,0.942541,0.942122
8,0.0044,0.243956,0.9489,0.949427,0.949069,0.949067
9,0.0015,0.247317,0.9503,0.951272,0.950447,0.950731
10,0.0006,0.238732,0.9515,0.951724,0.951806,0.951457


[I 2025-04-02 11:16:40,820] Trial 66 finished with value: 0.9514567813019299 and parameters: {'learning_rate': 0.0003702687807283022, 'weight_decay': 0.007, 'warmup_steps': 1}. Best is trial 8 with value: 0.9521566790618318.


Trial 67 with params: {'learning_rate': 0.0002202686274287985, 'weight_decay': 0.001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4464,0.251149,0.9143,0.917737,0.914444,0.914238
2,0.1569,0.196232,0.9389,0.939089,0.939317,0.938629
3,0.0724,0.210522,0.9374,0.937335,0.937818,0.937112
4,0.0372,0.253586,0.9361,0.93827,0.936285,0.936389
5,0.0199,0.241849,0.9425,0.942772,0.942606,0.942487
6,0.0125,0.271748,0.9403,0.941512,0.940522,0.94045
7,0.0067,0.282776,0.9412,0.941796,0.941539,0.941129
8,0.0037,0.2556,0.9471,0.947849,0.947213,0.947393


[I 2025-04-02 11:32:32,970] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0002865173410993904, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4147,0.248098,0.9149,0.917917,0.915206,0.914826
2,0.159,0.194135,0.9372,0.937721,0.93756,0.937058
3,0.0796,0.214662,0.9352,0.935327,0.935516,0.934968
4,0.0466,0.240823,0.9366,0.937698,0.936743,0.93639
5,0.0261,0.248503,0.9404,0.941446,0.94042,0.940523
6,0.0168,0.245287,0.9449,0.945279,0.945113,0.944979
7,0.0091,0.260549,0.9442,0.94471,0.9445,0.944145
8,0.0034,0.227467,0.9507,0.951001,0.950849,0.950724
9,0.0011,0.246873,0.9514,0.95278,0.951403,0.95181
10,0.0006,0.227556,0.9514,0.951772,0.951628,0.951295


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Tue Mar 25 13:21:30 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-04-02 11:52:56,392] Trial 68 finished with value: 0.9512953166672959 and parameters: {'learning_rate': 0.0002865173410993904, 'weight_decay': 0.007, 'warmup_steps': 0}. Best is trial 8 with value: 0.9521566790618318.


Trial 69 with params: {'learning_rate': 0.0006981555057377885, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4625,0.297951,0.9016,0.904726,0.901815,0.901472
2,0.2333,0.263807,0.9126,0.915241,0.912825,0.912159


[I 2025-04-02 11:57:03,812] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0002572687835023352, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.424,0.244977,0.9147,0.918648,0.914924,0.914663
2,0.1572,0.196067,0.9352,0.935824,0.935627,0.93492
3,0.0758,0.219668,0.9351,0.935127,0.935488,0.934817
4,0.0413,0.23722,0.9402,0.941423,0.940232,0.940223
5,0.027,0.227606,0.9457,0.946119,0.945743,0.945749
6,0.0141,0.274661,0.9388,0.940076,0.938961,0.938897
7,0.0072,0.283793,0.9424,0.943296,0.942738,0.942302
8,0.0027,0.24185,0.9492,0.949473,0.949399,0.949246
9,0.001,0.2502,0.9501,0.951804,0.950122,0.950588
10,0.0006,0.241412,0.9507,0.950907,0.951018,0.950592


[I 2025-04-02 12:16:51,990] Trial 70 finished with value: 0.9505922235912692 and parameters: {'learning_rate': 0.0002572687835023352, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 8 with value: 0.9521566790618318.


Trial 71 with params: {'learning_rate': 0.0003255753216604302, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4165,0.27215,0.9079,0.913905,0.908246,0.907995
2,0.1649,0.201974,0.9327,0.933428,0.932996,0.932518
3,0.087,0.23628,0.9303,0.930831,0.930752,0.929906
4,0.051,0.269178,0.9341,0.935113,0.934368,0.933995


[I 2025-04-02 12:24:41,989] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 8.487013955212462e-05, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5596,0.262936,0.9066,0.909711,0.906779,0.906417
2,0.2032,0.199957,0.9324,0.932872,0.9327,0.932409
3,0.098,0.221832,0.9307,0.930999,0.931176,0.930425
4,0.0438,0.250092,0.9306,0.932604,0.930759,0.930695


[I 2025-04-02 12:32:34,795] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0001671895575548827, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.454,0.242071,0.9156,0.918999,0.915775,0.915751
2,0.1565,0.184538,0.9392,0.939957,0.939508,0.939187
3,0.0691,0.212904,0.9363,0.936235,0.936755,0.935938
4,0.0329,0.253469,0.9344,0.935751,0.934541,0.934454


[I 2025-04-02 12:40:32,735] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.00037845129038763524, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4205,0.2616,0.9107,0.91484,0.910843,0.910932
2,0.1741,0.227683,0.9293,0.930298,0.929772,0.929125


[I 2025-04-02 12:44:33,536] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00044496513129480827, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4239,0.256234,0.9139,0.916211,0.913862,0.91371
2,0.1863,0.220095,0.9291,0.929867,0.929442,0.928669
3,0.1086,0.207363,0.9331,0.933718,0.933242,0.93302
4,0.0654,0.226414,0.9357,0.936605,0.935828,0.935656


[I 2025-04-02 12:52:32,456] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0002885850487576123, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4183,0.251875,0.9155,0.917836,0.915759,0.915389
2,0.1576,0.201,0.9371,0.937306,0.937414,0.93686
3,0.0818,0.226346,0.9326,0.932711,0.933006,0.932202
4,0.0456,0.253964,0.9328,0.934129,0.932991,0.932889


[I 2025-04-02 13:00:30,069] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.001007761125954244, 'weight_decay': 0.01, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5421,0.360886,0.8793,0.884489,0.879309,0.878985
2,0.2869,0.310431,0.9023,0.905916,0.902342,0.901785
3,0.1886,0.248842,0.9176,0.91838,0.918117,0.917248
4,0.1271,0.246206,0.9263,0.926659,0.926546,0.926188


[I 2025-04-02 13:08:28,154] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0004054701644850641, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.428,0.251863,0.9148,0.91798,0.914838,0.914942
2,0.1798,0.214373,0.9318,0.932844,0.932156,0.931495
3,0.0992,0.216191,0.9343,0.934272,0.93461,0.934137
4,0.0596,0.254142,0.9338,0.934511,0.934014,0.933431


[I 2025-04-02 13:16:24,256] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.00033553947456535017, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4149,0.255233,0.9139,0.918025,0.913839,0.913875
2,0.1665,0.201596,0.9385,0.939226,0.9389,0.938289
3,0.0896,0.207881,0.9379,0.937836,0.938237,0.937724
4,0.0541,0.22359,0.9407,0.941645,0.940684,0.940669
5,0.03,0.220794,0.9454,0.945397,0.945542,0.945367
6,0.0158,0.254703,0.9433,0.944154,0.943459,0.943443
7,0.0101,0.25582,0.9465,0.946962,0.946751,0.946401
8,0.0035,0.224668,0.9524,0.952723,0.952527,0.952534
9,0.0014,0.257279,0.9511,0.952678,0.95118,0.951514
10,0.0006,0.250435,0.9496,0.950035,0.949923,0.949416


[I 2025-04-02 13:36:09,996] Trial 79 finished with value: 0.949415797168942 and parameters: {'learning_rate': 0.00033553947456535017, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}. Best is trial 8 with value: 0.9521566790618318.


Trial 80 with params: {'learning_rate': 0.0002768079009825162, 'weight_decay': 0.007, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4325,0.262239,0.9092,0.91406,0.909419,0.909223
2,0.1571,0.194515,0.937,0.937588,0.937313,0.936832
3,0.0795,0.216502,0.937,0.937079,0.937413,0.936717
4,0.0457,0.22491,0.9386,0.939596,0.938771,0.938646
5,0.0259,0.237479,0.9416,0.942126,0.941858,0.941464
6,0.0134,0.263957,0.9408,0.941895,0.940999,0.941027
7,0.0077,0.260445,0.9456,0.946369,0.945857,0.945625
8,0.0034,0.242549,0.9502,0.950877,0.95039,0.950331
9,0.0013,0.252875,0.9493,0.950468,0.949415,0.949721
10,0.0007,0.236352,0.9503,0.950576,0.95061,0.950177


[I 2025-04-02 13:56:06,230] Trial 80 finished with value: 0.9501766116042074 and parameters: {'learning_rate': 0.0002768079009825162, 'weight_decay': 0.007, 'warmup_steps': 7}. Best is trial 8 with value: 0.9521566790618318.


Trial 81 with params: {'learning_rate': 0.0009092970178043692, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5083,0.340261,0.8864,0.889502,0.886404,0.886167
2,0.2682,0.277732,0.91,0.912048,0.91018,0.909279


[I 2025-04-02 14:00:07,181] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.000338835790848692, 'weight_decay': 0.004, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4277,0.286796,0.901,0.907634,0.901224,0.900574
2,0.1684,0.200376,0.935,0.935416,0.93549,0.934564
3,0.0889,0.221177,0.932,0.932315,0.932218,0.931655
4,0.0551,0.255894,0.9351,0.936578,0.935186,0.934977


[I 2025-04-02 14:08:06,074] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.00021231174015027594, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4347,0.254857,0.9115,0.916664,0.911569,0.911556
2,0.1538,0.18167,0.9394,0.9406,0.939739,0.939343
3,0.07,0.219509,0.9364,0.936442,0.936853,0.935952
4,0.0368,0.244027,0.9372,0.938188,0.937325,0.937063
5,0.0206,0.234587,0.9452,0.945151,0.945433,0.945052
6,0.012,0.251888,0.9425,0.944129,0.942742,0.942948
7,0.0063,0.261238,0.9422,0.942953,0.94256,0.942168
8,0.003,0.24039,0.948,0.948482,0.948134,0.948124


[I 2025-04-02 14:24:01,164] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.00034222837165809826, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4383,0.278039,0.904,0.909204,0.903954,0.903919
2,0.1718,0.214151,0.9315,0.932424,0.931866,0.931218
3,0.0882,0.218615,0.9337,0.934227,0.933896,0.933516
4,0.0528,0.242299,0.9365,0.937661,0.936535,0.936356
5,0.0313,0.240234,0.9415,0.941847,0.941572,0.941461
6,0.0187,0.245001,0.9448,0.944928,0.945038,0.944774
7,0.0096,0.262576,0.944,0.944871,0.944244,0.944053
8,0.0042,0.233487,0.9511,0.951472,0.951234,0.951174
9,0.0019,0.257968,0.9494,0.950926,0.949454,0.949766
10,0.0007,0.231309,0.9514,0.951487,0.951695,0.951328


[I 2025-04-02 14:43:56,413] Trial 84 finished with value: 0.9513284826570377 and parameters: {'learning_rate': 0.00034222837165809826, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12}. Best is trial 8 with value: 0.9521566790618318.


Trial 85 with params: {'learning_rate': 0.0001193073828361285, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5172,0.251457,0.9107,0.913671,0.910782,0.910626
2,0.1731,0.19549,0.9363,0.936946,0.936606,0.936243
3,0.0756,0.241275,0.9327,0.933063,0.933252,0.932203
4,0.0321,0.254203,0.9326,0.934183,0.932801,0.932641


[I 2025-04-02 14:51:55,876] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.00022614776456424213, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4397,0.242809,0.9151,0.919095,0.915278,0.915367
2,0.1531,0.185428,0.9408,0.941456,0.941061,0.940797
3,0.072,0.208155,0.9376,0.937539,0.938012,0.937371
4,0.039,0.247631,0.9352,0.93702,0.935187,0.935241


[I 2025-04-02 14:59:51,914] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0004784243444433218, 'weight_decay': 0.008, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4461,0.276045,0.9079,0.912018,0.908059,0.907611
2,0.1934,0.231371,0.9217,0.923292,0.922018,0.921426


[I 2025-04-02 15:03:50,593] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0004155320206691301, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4443,0.264454,0.9136,0.917647,0.913642,0.913412
2,0.1824,0.208334,0.9301,0.931412,0.930403,0.929911


[I 2025-04-02 15:07:51,180] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0004528573190155414, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4336,0.294097,0.9035,0.908675,0.90365,0.903689
2,0.1918,0.19862,0.9336,0.934791,0.933865,0.933417
3,0.1073,0.228055,0.9309,0.931026,0.93125,0.930517
4,0.0671,0.239934,0.9347,0.935819,0.934721,0.934518


[I 2025-04-02 15:15:49,876] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 7.959945406253996e-05, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5723,0.265499,0.9054,0.907996,0.905591,0.905211
2,0.2098,0.203363,0.9296,0.930301,0.929893,0.92967
3,0.1048,0.224196,0.9292,0.929687,0.929666,0.928879
4,0.0484,0.246401,0.9321,0.933441,0.932325,0.932108


[I 2025-04-02 15:23:45,553] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.00031540315205742727, 'weight_decay': 0.01, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4405,0.260113,0.9144,0.918272,0.914684,0.914298
2,0.165,0.201137,0.9332,0.93361,0.933481,0.933124
3,0.0839,0.207331,0.937,0.936871,0.937423,0.936747
4,0.0495,0.236019,0.938,0.938928,0.93801,0.938033
5,0.0289,0.225015,0.9428,0.942947,0.942921,0.942808
6,0.0172,0.242011,0.944,0.944727,0.944341,0.944055
7,0.0091,0.265632,0.9446,0.945582,0.944918,0.94464
8,0.004,0.239182,0.9487,0.94961,0.948778,0.948866


[I 2025-04-02 15:39:39,309] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0007645083024211361, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4754,0.348463,0.8841,0.891304,0.884218,0.884422
2,0.2467,0.266443,0.9125,0.915291,0.912832,0.912009
3,0.1563,0.244489,0.9226,0.922694,0.923089,0.922083
4,0.1008,0.227107,0.9312,0.932184,0.931184,0.931304


[I 2025-04-02 15:47:34,122] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.000249191551768324, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4447,0.256934,0.9141,0.918083,0.914279,0.914202
2,0.1564,0.200165,0.9374,0.938921,0.937707,0.937228
3,0.0756,0.210381,0.9366,0.936904,0.936927,0.936432
4,0.0426,0.243247,0.9387,0.93944,0.938684,0.938508
5,0.0238,0.219844,0.9473,0.947427,0.947375,0.947288
6,0.0131,0.26964,0.9428,0.943854,0.943036,0.943098
7,0.008,0.253344,0.9482,0.948773,0.948355,0.94828
8,0.0027,0.262967,0.9464,0.946974,0.946501,0.946461


[I 2025-04-02 16:03:28,539] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.00018516336845037087, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4434,0.241913,0.9181,0.922008,0.918345,0.917963
2,0.154,0.186559,0.9399,0.940713,0.940184,0.939788
3,0.0663,0.210663,0.9403,0.940393,0.940748,0.939989
4,0.0345,0.238283,0.9368,0.938027,0.93693,0.936755
5,0.0184,0.237693,0.9443,0.944522,0.944443,0.94435
6,0.0094,0.272656,0.9412,0.942124,0.941428,0.941355
7,0.0055,0.281536,0.9396,0.940195,0.939889,0.939632
8,0.0026,0.270998,0.9431,0.943425,0.943305,0.943158


[I 2025-04-02 16:19:17,788] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0003494640865090891, 'weight_decay': 0.002, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4373,0.283235,0.9035,0.909907,0.903737,0.903567
2,0.1712,0.197555,0.9357,0.935986,0.935977,0.93544
3,0.0902,0.213319,0.9365,0.936315,0.936884,0.936281
4,0.0542,0.230519,0.9363,0.937312,0.936375,0.936414


[I 2025-04-02 16:27:12,256] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.004437495491946959, 'weight_decay': 0.005, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8752,0.592728,0.7954,0.808884,0.795603,0.794721
2,0.5212,0.450045,0.8466,0.852675,0.846568,0.8466
3,0.3888,0.394555,0.8688,0.870294,0.869335,0.867267
4,0.2938,0.328174,0.8896,0.889173,0.889784,0.888808


[I 2025-04-02 16:35:08,962] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.00041460912369498836, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4325,0.283143,0.9013,0.906122,0.901599,0.901278
2,0.1812,0.220181,0.933,0.93414,0.933436,0.932756
3,0.1033,0.212315,0.9348,0.935021,0.93501,0.934566
4,0.0618,0.226779,0.9375,0.938327,0.937499,0.937408
5,0.0357,0.234515,0.9422,0.942708,0.942255,0.942309
6,0.0231,0.259959,0.9386,0.939388,0.938786,0.938597
7,0.0127,0.240919,0.9458,0.946325,0.946016,0.945734
8,0.0055,0.225433,0.9515,0.951496,0.951685,0.951464
9,0.0015,0.241632,0.9508,0.9521,0.950834,0.951173
10,0.0009,0.235915,0.9512,0.951509,0.951466,0.951106


[I 2025-04-02 16:54:59,933] Trial 97 finished with value: 0.9511060819049098 and parameters: {'learning_rate': 0.00041460912369498836, 'weight_decay': 0.007, 'warmup_steps': 8}. Best is trial 8 with value: 0.9521566790618318.


Trial 98 with params: {'learning_rate': 0.0035054904723296637, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8342,0.524659,0.8214,0.831203,0.821556,0.821327
2,0.5,0.425653,0.8567,0.860815,0.856643,0.856321


[I 2025-04-02 16:58:57,203] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00027235923085987393, 'weight_decay': 0.007, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4306,0.250109,0.913,0.916007,0.913241,0.913046
2,0.1561,0.201349,0.9353,0.936003,0.935688,0.935036
3,0.0767,0.204134,0.9375,0.937434,0.937842,0.937097
4,0.0448,0.248528,0.9352,0.936197,0.935432,0.934959


[I 2025-04-02 17:06:51,676] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0006507853611092277, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4584,0.305118,0.8984,0.90318,0.898471,0.898646
2,0.2268,0.259426,0.9149,0.917592,0.915267,0.91454
3,0.141,0.227838,0.9287,0.928235,0.929153,0.92811
4,0.0899,0.240398,0.9333,0.934129,0.933228,0.933149


[I 2025-04-02 17:14:47,954] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0005460685262143751, 'weight_decay': 0.006, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4487,0.271974,0.9089,0.912464,0.909173,0.908992
2,0.205,0.239458,0.9266,0.928287,0.926859,0.926274


[I 2025-04-02 17:18:47,131] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0004951787902530472, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.442,0.313499,0.8946,0.90139,0.894798,0.894422
2,0.1974,0.238053,0.9268,0.928374,0.927136,0.926304
3,0.1146,0.226373,0.9287,0.928454,0.929055,0.928428
4,0.0741,0.231084,0.9345,0.934918,0.934522,0.934329


[I 2025-04-02 17:26:43,107] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0004282613152726735, 'weight_decay': 0.007, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4375,0.262614,0.9125,0.915167,0.912648,0.91241
2,0.1828,0.205413,0.935,0.935343,0.93535,0.934784
3,0.1052,0.20607,0.9346,0.934974,0.934819,0.934585
4,0.0635,0.239399,0.9378,0.939196,0.937681,0.937757
5,0.0376,0.262699,0.9371,0.938221,0.937062,0.937175
6,0.0218,0.282234,0.9362,0.937308,0.936385,0.936371
7,0.0126,0.267586,0.9437,0.944657,0.944035,0.94371
8,0.0055,0.247499,0.9489,0.949692,0.949029,0.949126
9,0.0018,0.262593,0.9468,0.948432,0.946871,0.947276
10,0.0008,0.256869,0.9491,0.949492,0.949421,0.948945


[I 2025-04-02 17:46:42,036] Trial 103 finished with value: 0.9489453254802018 and parameters: {'learning_rate': 0.0004282613152726735, 'weight_decay': 0.007, 'warmup_steps': 11}. Best is trial 8 with value: 0.9521566790618318.


Trial 104 with params: {'learning_rate': 0.00025219070769811177, 'weight_decay': 0.004, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4414,0.251422,0.915,0.918855,0.915225,0.9149
2,0.1554,0.199605,0.9343,0.935241,0.934685,0.934124
3,0.0764,0.210886,0.9351,0.935346,0.935419,0.934964
4,0.0407,0.22838,0.9408,0.941326,0.941037,0.940887
5,0.0248,0.236525,0.9416,0.94172,0.941829,0.941552
6,0.0147,0.275301,0.9406,0.941734,0.940967,0.940606
7,0.0075,0.269681,0.9442,0.944931,0.94449,0.944145
8,0.0025,0.245764,0.9497,0.950359,0.949886,0.949806
9,0.0011,0.248253,0.9504,0.951782,0.950419,0.950847
10,0.0006,0.250529,0.9492,0.949638,0.949472,0.9491


[I 2025-04-02 18:06:43,389] Trial 104 finished with value: 0.9490999553817903 and parameters: {'learning_rate': 0.00025219070769811177, 'weight_decay': 0.004, 'warmup_steps': 11}. Best is trial 8 with value: 0.9521566790618318.


Trial 105 with params: {'learning_rate': 0.0004516365644745528, 'weight_decay': 0.002, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4287,0.282898,0.9059,0.91004,0.906029,0.905592
2,0.1897,0.212998,0.9295,0.930764,0.929722,0.929383


[I 2025-04-02 18:10:44,343] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0005794247063685915, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4444,0.308591,0.8969,0.902517,0.897096,0.897071
2,0.2133,0.266202,0.917,0.919505,0.917369,0.916436
3,0.1309,0.243695,0.9208,0.920793,0.921264,0.920407
4,0.0831,0.223131,0.9367,0.937285,0.936875,0.936646
5,0.05,0.239119,0.9372,0.938234,0.937507,0.937079
6,0.0306,0.243712,0.9389,0.940101,0.939192,0.939072
7,0.0182,0.238163,0.946,0.946226,0.946296,0.946028
8,0.0071,0.231337,0.9503,0.950822,0.950402,0.950471
9,0.0025,0.257163,0.9495,0.951539,0.949595,0.949992
10,0.0008,0.229536,0.951,0.951442,0.951268,0.950964


[I 2025-04-02 18:30:44,914] Trial 106 finished with value: 0.950963941667039 and parameters: {'learning_rate': 0.0005794247063685915, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 8 with value: 0.9521566790618318.


Trial 107 with params: {'learning_rate': 0.0007012587471462421, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4645,0.299042,0.8971,0.900273,0.89737,0.896906
2,0.2354,0.23604,0.9244,0.925288,0.924821,0.923968


[I 2025-04-02 18:34:46,437] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0009865566020736737, 'weight_decay': 0.008, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5155,0.390734,0.8701,0.878996,0.870101,0.870236
2,0.2809,0.267269,0.9127,0.914256,0.912839,0.912383


[I 2025-04-02 18:38:46,653] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0011178483841623244, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5285,0.398837,0.8648,0.870653,0.864878,0.864817
2,0.2968,0.28939,0.9035,0.906446,0.903637,0.903169
3,0.2021,0.273975,0.9085,0.908788,0.908926,0.907957
4,0.1397,0.240499,0.9251,0.925962,0.92506,0.924847


[I 2025-04-02 18:46:50,383] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.000517497192148971, 'weight_decay': 0.006, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4384,0.300199,0.8983,0.902718,0.898602,0.898337
2,0.1972,0.230761,0.9285,0.929334,0.928799,0.928081
3,0.1221,0.224369,0.9318,0.931574,0.932123,0.931454
4,0.0777,0.234861,0.9354,0.936345,0.935644,0.935385
5,0.0445,0.246146,0.9372,0.937246,0.937375,0.936879
6,0.0275,0.234334,0.9433,0.943896,0.943568,0.943478
7,0.0141,0.281594,0.9387,0.939995,0.938944,0.938815
8,0.0058,0.246659,0.951,0.951289,0.951171,0.951037
9,0.0022,0.249267,0.9501,0.950911,0.950266,0.950369
10,0.0008,0.259206,0.9472,0.947615,0.947514,0.946972


[I 2025-04-02 19:06:46,255] Trial 110 finished with value: 0.9469716124613534 and parameters: {'learning_rate': 0.000517497192148971, 'weight_decay': 0.006, 'warmup_steps': 4}. Best is trial 8 with value: 0.9521566790618318.


Trial 111 with params: {'learning_rate': 0.0002055999503680448, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4367,0.238857,0.9171,0.920574,0.917075,0.917155
2,0.1531,0.179751,0.9402,0.941189,0.940422,0.940288
3,0.0687,0.217457,0.9366,0.936949,0.937005,0.936346
4,0.036,0.228246,0.9409,0.941897,0.941007,0.940948
5,0.0188,0.231164,0.9445,0.945201,0.944795,0.944552
6,0.0123,0.289911,0.9383,0.939376,0.938589,0.938347
7,0.0059,0.275105,0.941,0.941724,0.9413,0.941026
8,0.0028,0.241143,0.9475,0.948194,0.947616,0.947712


[I 2025-04-02 19:22:43,718] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00029629833117863005, 'weight_decay': 0.007, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4208,0.251816,0.913,0.916632,0.913119,0.912913
2,0.162,0.199313,0.934,0.934726,0.934405,0.933837
3,0.0823,0.223543,0.9315,0.931661,0.931862,0.931187
4,0.047,0.220974,0.9384,0.939263,0.938414,0.93825
5,0.0272,0.248387,0.9428,0.943824,0.943052,0.942819
6,0.0172,0.274166,0.9396,0.940934,0.939852,0.939762
7,0.0093,0.266441,0.9446,0.945028,0.944856,0.944563
8,0.0039,0.243857,0.9466,0.946851,0.946811,0.946647


[I 2025-04-02 19:38:32,630] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00026117555242364024, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.424,0.24508,0.9147,0.918858,0.914783,0.915049
2,0.1539,0.199966,0.9357,0.937339,0.936066,0.935629
3,0.0753,0.220996,0.9342,0.934395,0.934644,0.93388
4,0.0418,0.223331,0.9431,0.94395,0.943289,0.94322
5,0.0245,0.235141,0.9436,0.943529,0.943688,0.943453
6,0.0146,0.256657,0.9414,0.9421,0.941664,0.941468
7,0.0072,0.255416,0.945,0.945926,0.945329,0.944963
8,0.0033,0.228632,0.9518,0.951956,0.951983,0.951862
9,0.0012,0.234057,0.9523,0.953379,0.952394,0.952658
10,0.0006,0.234847,0.9503,0.950594,0.950609,0.950139


[I 2025-04-02 19:58:21,492] Trial 113 finished with value: 0.9501392047806873 and parameters: {'learning_rate': 0.00026117555242364024, 'weight_decay': 0.008, 'warmup_steps': 1}. Best is trial 8 with value: 0.9521566790618318.


Trial 114 with params: {'learning_rate': 0.0010093346757777623, 'weight_decay': 0.001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.525,0.396438,0.8701,0.878889,0.870344,0.870289
2,0.2849,0.275111,0.9111,0.913191,0.911321,0.910748
3,0.1902,0.269868,0.9113,0.912287,0.911773,0.910913
4,0.1301,0.245969,0.9245,0.92457,0.924727,0.924207


[I 2025-04-02 20:06:14,437] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0003777529601113003, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.422,0.27452,0.9091,0.91318,0.909316,0.909283
2,0.1749,0.22168,0.9305,0.931434,0.930841,0.93


[I 2025-04-02 20:10:13,718] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.00025082607371371005, 'weight_decay': 0.004, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4381,0.256192,0.9131,0.917217,0.913259,0.913102
2,0.1573,0.197809,0.9388,0.939461,0.939124,0.938714
3,0.076,0.222512,0.9334,0.933818,0.933804,0.933137
4,0.0418,0.213182,0.9423,0.94276,0.942479,0.942307
5,0.0258,0.223049,0.9431,0.943221,0.943295,0.943019
6,0.0138,0.250838,0.9408,0.94157,0.940969,0.941042
7,0.0076,0.270383,0.9406,0.941194,0.940885,0.940551
8,0.0036,0.232576,0.9496,0.949845,0.949793,0.94964
9,0.0013,0.251624,0.9471,0.949082,0.947173,0.947609
10,0.0006,0.24185,0.9498,0.949832,0.950112,0.94962


[I 2025-04-02 20:30:02,619] Trial 116 finished with value: 0.9496195407064093 and parameters: {'learning_rate': 0.00025082607371371005, 'weight_decay': 0.004, 'warmup_steps': 8}. Best is trial 8 with value: 0.9521566790618318.


Trial 117 with params: {'learning_rate': 0.0027121193476131807, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7365,0.500415,0.8294,0.835568,0.829457,0.829197
2,0.4364,0.379094,0.8723,0.875679,0.872646,0.872181


[I 2025-04-02 20:34:01,453] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0004541128841106071, 'weight_decay': 0.006, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4394,0.282112,0.9022,0.906961,0.902166,0.902292
2,0.1917,0.227595,0.9273,0.928472,0.92771,0.926977


[I 2025-04-02 20:38:00,841] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0003926712461639229, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.419,0.265468,0.9101,0.914023,0.910375,0.910165
2,0.1762,0.200813,0.9338,0.935124,0.934121,0.933675
3,0.0985,0.208934,0.9348,0.934678,0.935138,0.934547
4,0.0595,0.23686,0.9342,0.935478,0.934356,0.934025


[I 2025-04-02 20:45:59,839] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00012654035347595767, 'weight_decay': 0.01, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5468,0.253049,0.9118,0.915558,0.911929,0.911826
2,0.1725,0.191529,0.9356,0.935896,0.935951,0.935448
3,0.0742,0.226737,0.9311,0.931525,0.931641,0.930724
4,0.0323,0.240458,0.9367,0.937708,0.936871,0.936657


[I 2025-04-02 20:54:04,205] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.00019679072474745938, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4499,0.244668,0.9177,0.921248,0.91798,0.917694
2,0.1563,0.191107,0.9371,0.937769,0.937436,0.936994
3,0.0694,0.218528,0.9351,0.935499,0.935658,0.93466
4,0.0344,0.250162,0.9375,0.939164,0.937609,0.93753
5,0.0187,0.228048,0.9444,0.944477,0.944532,0.944319
6,0.0112,0.264136,0.9426,0.943386,0.942867,0.942805
7,0.0063,0.284276,0.943,0.943929,0.943279,0.943061
8,0.0023,0.254574,0.948,0.948222,0.948167,0.948065


[I 2025-04-02 21:10:06,815] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00017010968527914131, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4691,0.243934,0.9189,0.922463,0.919165,0.918885
2,0.156,0.183368,0.9405,0.941113,0.940828,0.940534
3,0.0674,0.221486,0.9339,0.933889,0.934403,0.933541
4,0.0325,0.247796,0.9336,0.935187,0.933812,0.933516


[I 2025-04-02 21:18:07,864] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0004900947174017381, 'weight_decay': 0.007, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4369,0.277699,0.9048,0.909513,0.904782,0.90473
2,0.195,0.217014,0.9288,0.9295,0.929187,0.928458
3,0.1124,0.230794,0.93,0.930013,0.930237,0.929626
4,0.072,0.26321,0.9286,0.930314,0.928708,0.928415


[I 2025-04-02 21:26:04,842] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.000310912568300806, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4333,0.261109,0.9116,0.91543,0.911734,0.911641
2,0.163,0.196939,0.9379,0.938486,0.93811,0.937756
3,0.0869,0.20539,0.9383,0.938376,0.938653,0.937961
4,0.0486,0.234713,0.9374,0.938454,0.937674,0.937361
5,0.0302,0.227316,0.9445,0.944703,0.944688,0.944472
6,0.017,0.260425,0.9419,0.943606,0.942043,0.942099
7,0.0088,0.26909,0.9443,0.944892,0.944572,0.944256
8,0.0036,0.238909,0.9512,0.951522,0.951292,0.951268
9,0.0014,0.250237,0.9514,0.952838,0.951431,0.95181
10,0.0006,0.258339,0.948,0.948463,0.948276,0.947794


[I 2025-04-02 21:45:59,131] Trial 124 finished with value: 0.9477938988229905 and parameters: {'learning_rate': 0.000310912568300806, 'weight_decay': 0.007, 'warmup_steps': 9}. Best is trial 8 with value: 0.9521566790618318.


Trial 125 with params: {'learning_rate': 0.0003503851949083734, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4204,0.255508,0.9116,0.91572,0.911717,0.911678
2,0.169,0.199988,0.9358,0.936855,0.936104,0.935711
3,0.0924,0.236516,0.9305,0.931461,0.930973,0.930153
4,0.0534,0.217342,0.9409,0.941751,0.940973,0.940989
5,0.0313,0.241631,0.9406,0.940592,0.940811,0.94049
6,0.0186,0.240896,0.9471,0.947373,0.947358,0.947178
7,0.01,0.273838,0.9436,0.944722,0.943887,0.943685


[I 2025-04-02 22:05:52,949] Trial 125 finished with value: 0.9508576806890584 and parameters: {'learning_rate': 0.0003503851949083734, 'weight_decay': 0.005, 'warmup_steps': 1}. Best is trial 8 with value: 0.9521566790618318.


Trial 126 with params: {'learning_rate': 0.00040590693849007297, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4259,0.288668,0.9031,0.909422,0.903235,0.90325
2,0.1795,0.218998,0.9289,0.92945,0.929294,0.928471


[I 2025-04-02 22:09:53,520] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0002116965709457561, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4332,0.255318,0.9108,0.915602,0.910947,0.911044
2,0.1507,0.193006,0.9368,0.937642,0.937149,0.93671
3,0.0693,0.225937,0.9335,0.933694,0.934023,0.933219
4,0.0368,0.245568,0.9361,0.936688,0.936374,0.93593
5,0.0205,0.248054,0.9398,0.940065,0.939919,0.939774
6,0.0129,0.25604,0.9428,0.943045,0.943082,0.942881
7,0.0068,0.259918,0.9436,0.944364,0.943908,0.943614
8,0.0026,0.24037,0.9468,0.947469,0.946934,0.946905


[I 2025-04-02 22:25:43,914] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0003149649013243245, 'weight_decay': 0.004, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4264,0.280261,0.9022,0.908486,0.902404,0.902091
2,0.164,0.201926,0.9366,0.937243,0.936923,0.936373
3,0.0862,0.201916,0.9359,0.935851,0.936312,0.935495
4,0.0481,0.248466,0.9374,0.938737,0.937476,0.9375
5,0.0283,0.216552,0.9465,0.94681,0.946644,0.946539
6,0.017,0.250335,0.9445,0.945339,0.944695,0.94476
7,0.0089,0.255749,0.9449,0.945795,0.945079,0.945007
8,0.0036,0.230174,0.9508,0.950964,0.950936,0.95086
9,0.0012,0.252152,0.9509,0.952385,0.950969,0.951351
10,0.0006,0.244812,0.9461,0.946471,0.946423,0.945779


[I 2025-04-02 22:45:17,632] Trial 128 finished with value: 0.9457787224288625 and parameters: {'learning_rate': 0.0003149649013243245, 'weight_decay': 0.004, 'warmup_steps': 5}. Best is trial 8 with value: 0.9521566790618318.


Trial 129 with params: {'learning_rate': 0.0004423957840569796, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4261,0.275625,0.909,0.912259,0.909314,0.909176
2,0.1838,0.212606,0.9323,0.933072,0.93273,0.931991
3,0.1054,0.234276,0.9289,0.92939,0.929474,0.928268
4,0.066,0.227652,0.9382,0.938763,0.938317,0.938002
5,0.0376,0.247934,0.9379,0.938764,0.937978,0.938048
6,0.0223,0.277127,0.9367,0.938181,0.937001,0.936935
7,0.0138,0.249654,0.9466,0.946883,0.946796,0.946662
8,0.006,0.242678,0.9495,0.950359,0.949691,0.949616
9,0.0023,0.252862,0.9502,0.951657,0.950286,0.950722
10,0.0008,0.239603,0.9495,0.950208,0.949712,0.949414


[I 2025-04-02 23:05:00,671] Trial 129 finished with value: 0.9494135066081819 and parameters: {'learning_rate': 0.0004423957840569796, 'weight_decay': 0.006, 'warmup_steps': 2}. Best is trial 8 with value: 0.9521566790618318.


Trial 130 with params: {'learning_rate': 0.00048169327353437164, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4305,0.289787,0.9014,0.906338,0.901394,0.901512
2,0.1946,0.231826,0.9251,0.926217,0.925357,0.924754


[I 2025-04-02 23:08:58,925] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0003263459985413098, 'weight_decay': 0.005, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4192,0.26636,0.9065,0.911491,0.906692,0.906383
2,0.1657,0.206889,0.9368,0.937091,0.937235,0.936321
3,0.0886,0.213127,0.9329,0.933063,0.933213,0.932675
4,0.0516,0.238136,0.9359,0.937694,0.935912,0.936029
5,0.029,0.233786,0.9437,0.943758,0.943808,0.943611
6,0.0186,0.250146,0.9429,0.943803,0.943159,0.943054
7,0.0095,0.259207,0.9443,0.944665,0.944577,0.944245
8,0.0039,0.239587,0.9474,0.94782,0.947568,0.947503


[I 2025-04-02 23:24:48,859] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.00038422165523671213, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4314,0.279041,0.9049,0.909558,0.905064,0.904756
2,0.1745,0.223454,0.9272,0.928442,0.927584,0.926697


[I 2025-04-02 23:28:45,086] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.00015876606653488976, 'weight_decay': 0.006, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4755,0.248741,0.9132,0.917266,0.91336,0.913138
2,0.1591,0.193622,0.9372,0.938245,0.937591,0.937161
3,0.0711,0.218762,0.9357,0.936005,0.936157,0.935434
4,0.0321,0.25204,0.9357,0.937052,0.935822,0.935707
5,0.0166,0.252297,0.9414,0.941861,0.941678,0.941501
6,0.0096,0.287208,0.9395,0.94093,0.939755,0.939765
7,0.0046,0.286741,0.9398,0.940445,0.940086,0.939778
8,0.002,0.258383,0.9433,0.943656,0.943533,0.943336


[I 2025-04-02 23:44:31,714] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0002648402195834234, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.44,0.255577,0.9095,0.913332,0.909814,0.909345
2,0.159,0.183391,0.9396,0.940072,0.939909,0.939563
3,0.0794,0.206785,0.9362,0.936308,0.936557,0.936132
4,0.0453,0.2216,0.9437,0.94407,0.943928,0.943707
5,0.0243,0.239749,0.9423,0.942426,0.942521,0.942189
6,0.0148,0.251114,0.9434,0.944726,0.943572,0.943738
7,0.008,0.26668,0.9422,0.943032,0.942557,0.942118
8,0.0033,0.230596,0.9499,0.950498,0.950037,0.950042
9,0.001,0.249187,0.9498,0.951219,0.949873,0.95027
10,0.0006,0.237042,0.9518,0.95204,0.952132,0.951655


[I 2025-04-03 00:04:23,493] Trial 134 finished with value: 0.9516552987747655 and parameters: {'learning_rate': 0.0002648402195834234, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}. Best is trial 8 with value: 0.9521566790618318.


Trial 135 with params: {'learning_rate': 0.0002045918662679625, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4521,0.243673,0.9157,0.918726,0.91592,0.915524
2,0.1545,0.186242,0.9393,0.940201,0.93964,0.939233
3,0.0715,0.215427,0.9345,0.934824,0.934997,0.934217
4,0.0378,0.262541,0.9363,0.937403,0.936527,0.936096
5,0.0209,0.242466,0.941,0.941215,0.941172,0.941009
6,0.0108,0.281892,0.9385,0.9399,0.938835,0.938641
7,0.0063,0.259393,0.9456,0.946254,0.945801,0.945713
8,0.0031,0.250599,0.9458,0.94618,0.945977,0.945909


[I 2025-04-03 00:20:11,207] Trial 135 pruned. 


Trial 136 with params: {'learning_rate': 8.251692766362866e-05, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6011,0.266653,0.9067,0.90938,0.906858,0.906416
2,0.2078,0.200735,0.929,0.929474,0.929334,0.92904


[I 2025-04-03 00:24:08,354] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.00026122213765811047, 'weight_decay': 0.01, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4458,0.242896,0.9176,0.920762,0.917749,0.91769
2,0.1582,0.197334,0.9374,0.938162,0.937644,0.93718
3,0.0786,0.208884,0.9359,0.935797,0.936335,0.935596
4,0.0452,0.228065,0.9431,0.943632,0.943232,0.943078
5,0.025,0.219209,0.9451,0.945407,0.945182,0.94516
6,0.0142,0.264594,0.9418,0.943199,0.942065,0.942037
7,0.0078,0.236841,0.9463,0.946532,0.94663,0.94632
8,0.0034,0.238622,0.948,0.948706,0.948119,0.948108


[I 2025-04-03 00:39:54,358] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0003504111700780832, 'weight_decay': 0.006, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4504,0.281624,0.9058,0.911812,0.906018,0.905859
2,0.174,0.195046,0.9371,0.937838,0.937303,0.936949
3,0.0921,0.227038,0.9297,0.929994,0.930065,0.929281
4,0.0549,0.220069,0.9402,0.940947,0.940276,0.940051
5,0.0318,0.220283,0.9453,0.945476,0.945332,0.945308
6,0.0191,0.271234,0.9424,0.943119,0.942558,0.942549
7,0.0105,0.289612,0.9404,0.941613,0.940742,0.940393
8,0.0042,0.237762,0.9506,0.951186,0.950705,0.950746
9,0.0014,0.253192,0.9521,0.953667,0.952133,0.952517
10,0.0007,0.242645,0.9508,0.950992,0.951076,0.950715


[I 2025-04-03 00:59:43,304] Trial 138 finished with value: 0.9507149763074032 and parameters: {'learning_rate': 0.0003504111700780832, 'weight_decay': 0.006, 'warmup_steps': 21}. Best is trial 8 with value: 0.9521566790618318.


Trial 139 with params: {'learning_rate': 0.0003939643406130682, 'weight_decay': 0.005, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4502,0.267056,0.9089,0.912132,0.908998,0.908607
2,0.1807,0.21069,0.9331,0.9337,0.933406,0.932931
3,0.1003,0.21511,0.9312,0.930962,0.931609,0.93087
4,0.0612,0.237291,0.9371,0.937342,0.937326,0.936885


[I 2025-04-03 01:07:39,851] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.000237861851096054, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4416,0.242789,0.915,0.917689,0.915068,0.914918
2,0.1559,0.184373,0.9431,0.943627,0.943362,0.94303
3,0.0734,0.21705,0.9355,0.93589,0.93591,0.935299
4,0.0397,0.220377,0.9443,0.944902,0.944526,0.944296
5,0.0223,0.230915,0.9434,0.943386,0.943559,0.943282
6,0.013,0.260395,0.9412,0.942137,0.941508,0.941332
7,0.0068,0.263633,0.9422,0.942756,0.942491,0.942291
8,0.0029,0.249378,0.9482,0.948734,0.948352,0.948258


[I 2025-04-03 01:23:25,782] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0008088753872268728, 'weight_decay': 0.007, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5009,0.341798,0.885,0.893969,0.885059,0.885594
2,0.2555,0.297094,0.9033,0.906419,0.903801,0.903019


[I 2025-04-03 01:27:28,074] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0005479199724864826, 'weight_decay': 0.002, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4487,0.296265,0.8957,0.900703,0.895992,0.895505
2,0.2094,0.223075,0.9298,0.930656,0.930095,0.92942
3,0.1228,0.223652,0.928,0.927925,0.928518,0.927514
4,0.0788,0.21983,0.9364,0.937043,0.936462,0.93637
5,0.0474,0.238472,0.9388,0.939361,0.938983,0.938943
6,0.0298,0.24227,0.944,0.944829,0.944333,0.944014
7,0.0151,0.273512,0.9415,0.94225,0.941785,0.941573
8,0.0059,0.238145,0.9511,0.951339,0.95123,0.9512
9,0.0025,0.257004,0.9496,0.951038,0.949639,0.949958
10,0.001,0.243588,0.9502,0.950481,0.950467,0.950078


[I 2025-04-03 01:47:15,949] Trial 142 finished with value: 0.9500779072471459 and parameters: {'learning_rate': 0.0005479199724864826, 'weight_decay': 0.002, 'warmup_steps': 9}. Best is trial 8 with value: 0.9521566790618318.


Trial 143 with params: {'learning_rate': 0.0006289074352108747, 'weight_decay': 0.006, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.47,0.343896,0.8888,0.895961,0.88873,0.888743
2,0.2221,0.256516,0.9181,0.919883,0.918405,0.917741


[I 2025-04-03 01:51:13,141] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.00016146998593807586, 'weight_decay': 0.007, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4994,0.249647,0.9132,0.916645,0.913326,0.913051
2,0.1606,0.188881,0.9384,0.938949,0.938665,0.938395
3,0.0692,0.22324,0.9357,0.936091,0.936181,0.935437
4,0.0324,0.246117,0.9376,0.938801,0.937857,0.937658
5,0.0172,0.239174,0.9426,0.942507,0.942873,0.942499
6,0.0087,0.259347,0.942,0.942778,0.942269,0.942142
7,0.0049,0.29625,0.938,0.939073,0.93838,0.937903
8,0.0026,0.254558,0.9442,0.944852,0.944389,0.944224


[I 2025-04-03 02:07:20,637] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0002766308519922785, 'weight_decay': 0.002, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4678,0.244895,0.9164,0.919239,0.916623,0.916181
2,0.1623,0.198224,0.9366,0.937303,0.936978,0.936345
3,0.0811,0.21052,0.9352,0.935167,0.935618,0.934825
4,0.0453,0.240812,0.9377,0.938684,0.937662,0.937488
5,0.0252,0.23895,0.9434,0.94336,0.943624,0.943243
6,0.016,0.261733,0.9387,0.939352,0.938972,0.938717
7,0.0088,0.287212,0.9403,0.94126,0.940617,0.940286
8,0.0041,0.26361,0.9438,0.944361,0.944025,0.943818


[I 2025-04-03 02:23:06,931] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.00047473505776353647, 'weight_decay': 0.005, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4296,0.291126,0.902,0.907465,0.902165,0.902207
2,0.1935,0.220848,0.9291,0.929861,0.929508,0.92876
3,0.1113,0.227783,0.9289,0.929719,0.929203,0.928601
4,0.0694,0.224721,0.9349,0.935172,0.935008,0.934624


[I 2025-04-03 02:31:04,534] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.00039008847424631585, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4353,0.310946,0.8952,0.903242,0.895482,0.894916
2,0.1756,0.219723,0.9299,0.930955,0.930192,0.929812


[I 2025-04-03 02:35:04,832] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0002728018504634417, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4188,0.243009,0.9135,0.917394,0.913725,0.913686
2,0.1557,0.195179,0.9362,0.936412,0.936463,0.936074
3,0.077,0.214537,0.9359,0.935876,0.9364,0.935453
4,0.0443,0.255471,0.9349,0.936557,0.93498,0.935083


[I 2025-04-03 02:43:04,278] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0004159019964208297, 'weight_decay': 0.003, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.449,0.272625,0.9103,0.913168,0.91031,0.910191
2,0.1842,0.217769,0.9284,0.930273,0.928638,0.928418
3,0.103,0.2149,0.9309,0.930603,0.931259,0.930556
4,0.0648,0.210533,0.9405,0.940802,0.940574,0.940344
5,0.036,0.239283,0.9407,0.940915,0.9408,0.940567
6,0.0243,0.260252,0.9398,0.940612,0.940106,0.939974
7,0.0127,0.260881,0.9441,0.944476,0.944323,0.944119
8,0.0045,0.252123,0.9487,0.949794,0.948752,0.949044
9,0.0019,0.259649,0.9484,0.949849,0.948443,0.948842
10,0.0007,0.239273,0.9508,0.950834,0.9511,0.950712


[I 2025-04-03 03:02:58,143] Trial 149 finished with value: 0.9507122835349253 and parameters: {'learning_rate': 0.0004159019964208297, 'weight_decay': 0.003, 'warmup_steps': 18}. Best is trial 8 with value: 0.9521566790618318.


In [None]:
print(best_base_pretrained)

BestRun(run_id='8', objective=0.9521566790618318, hyperparameters={'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}, run_summary=None)


In [None]:
base.reset_seed()

In [None]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-KD_hp-search", logging_dir=f"~/logs/{DATASET}/pretrained-KD_hp-search",  remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

In [None]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

In [None]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [None]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    model_init = lambda: base.get_mobilenet(10)
)

In [None]:
best_distil_pretrained = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill",
    n_trials=150
)

[I 2025-04-03 03:03:15,965] A new study created in memory with name: Distill


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4029,0.257398,0.9176,0.921376,0.917812,0.917882
2,0.2259,0.202859,0.9419,0.942554,0.942269,0.941777


[I 2025-04-03 03:07:18,533] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.48,0.268041,0.9117,0.914355,0.911892,0.911718
2,0.2546,0.221763,0.9342,0.935541,0.934543,0.934129
3,0.2014,0.214814,0.9373,0.937447,0.93766,0.937073
4,0.1739,0.212626,0.9398,0.941491,0.939909,0.939876


[I 2025-04-03 03:15:19,340] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 27, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5613,0.299712,0.8991,0.901592,0.899306,0.898956
2,0.3001,0.24156,0.9257,0.926975,0.925914,0.925789


[I 2025-04-03 03:19:20,557] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4558,0.268095,0.9142,0.917642,0.914377,0.914425
2,0.2467,0.215573,0.9378,0.938337,0.938159,0.937558
3,0.1954,0.213312,0.9374,0.937472,0.937713,0.937167
4,0.1692,0.211055,0.9401,0.941606,0.94029,0.940154


[I 2025-04-03 03:27:19,560] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.407,0.312549,0.8883,0.899262,0.88826,0.889473
2,0.2697,0.247509,0.9194,0.922301,0.919631,0.919037


[I 2025-04-03 03:31:21,292] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4842,0.352732,0.8657,0.870129,0.865772,0.865401
2,0.3248,0.280891,0.8984,0.902546,0.89861,0.898124
3,0.2649,0.272378,0.9061,0.906961,0.90674,0.90527
4,0.2251,0.239197,0.9221,0.922473,0.922356,0.921836


[I 2025-04-03 03:39:17,196] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3963,0.299249,0.8945,0.900623,0.894388,0.89476
2,0.266,0.238634,0.924,0.925818,0.924213,0.923714


[I 2025-04-03 03:43:17,916] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4955,0.374531,0.8504,0.858528,0.850559,0.850172
2,0.3307,0.287969,0.8986,0.900516,0.89878,0.898103
3,0.2686,0.273344,0.903,0.903792,0.903493,0.902151
4,0.2313,0.246959,0.9184,0.92031,0.918426,0.918437


[I 2025-04-03 03:51:18,019] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4687,0.275665,0.9102,0.913148,0.910372,0.910178
2,0.2615,0.224651,0.9318,0.933005,0.932161,0.93178
3,0.2082,0.217234,0.9365,0.936582,0.93689,0.936174
4,0.1793,0.216145,0.9355,0.937579,0.935598,0.935627


[I 2025-04-03 03:59:19,150] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 17, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4267,0.304785,0.8923,0.896827,0.892556,0.892394
2,0.2852,0.253601,0.9139,0.915473,0.914028,0.913392


[I 2025-04-03 04:03:20,856] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 7.577669953489166e-05, 'weight_decay': 0.003, 'warmup_steps': 20, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.509,0.282935,0.9068,0.909718,0.906944,0.906776
2,0.2735,0.22888,0.9292,0.930452,0.929491,0.929195
3,0.2183,0.221771,0.9358,0.935914,0.936184,0.935483
4,0.1878,0.218118,0.936,0.937665,0.936099,0.93613
5,0.1683,0.204656,0.9407,0.941313,0.940865,0.940829
6,0.1578,0.208024,0.9384,0.939144,0.938689,0.938665
7,0.1516,0.212643,0.9344,0.935727,0.934744,0.934532
8,0.1471,0.202465,0.9401,0.94113,0.940286,0.940378


[I 2025-04-03 04:19:21,113] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 6.790606967091412e-05, 'weight_decay': 0.002, 'warmup_steps': 23, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.526,0.286933,0.9065,0.909067,0.906755,0.906425
2,0.2827,0.232385,0.9288,0.929837,0.929099,0.92883
3,0.2265,0.225254,0.9326,0.932888,0.933023,0.932259


[I 2025-04-03 04:35:21,098] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 6.179524839391358e-05, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5303,0.293213,0.9052,0.907929,0.905394,0.905129
2,0.2895,0.237619,0.9258,0.927141,0.926087,0.925866
3,0.233,0.228401,0.9329,0.9334,0.933353,0.932524
4,0.2011,0.224453,0.9333,0.935406,0.933384,0.933453


[I 2025-04-03 04:43:18,631] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.000279676327533877, 'weight_decay': 0.006, 'warmup_steps': 17, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3954,0.26423,0.9123,0.917047,0.912517,0.912659
2,0.2268,0.207689,0.9385,0.939629,0.938889,0.938319
3,0.1827,0.201762,0.9427,0.94261,0.94301,0.942567
4,0.1607,0.199217,0.944,0.944884,0.944061,0.944087
5,0.1486,0.186257,0.95,0.950644,0.950199,0.950192
6,0.142,0.185873,0.9486,0.948994,0.948803,0.948686
7,0.1385,0.188075,0.9481,0.948915,0.948361,0.948207
8,0.1356,0.177578,0.9516,0.952088,0.951795,0.951756


[I 2025-04-03 04:59:13,000] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.00034286724359860115, 'weight_decay': 0.009000000000000001, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3894,0.261465,0.915,0.918854,0.91519,0.915299
2,0.2292,0.214669,0.9345,0.935652,0.934776,0.93413
3,0.1847,0.214896,0.9368,0.937291,0.937319,0.936497
4,0.1627,0.205473,0.943,0.944324,0.943024,0.943061
5,0.149,0.183916,0.9516,0.952003,0.951805,0.951655
6,0.1424,0.187029,0.9487,0.949199,0.948875,0.948808
7,0.1382,0.180803,0.952,0.952574,0.95223,0.952092
8,0.1354,0.174294,0.9535,0.954015,0.953639,0.953721


[I 2025-04-03 05:15:10,842] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0007941317649660098, 'weight_decay': 0.009000000000000001, 'warmup_steps': 21, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4097,0.303595,0.8953,0.902365,0.895422,0.895725
2,0.2655,0.240893,0.9233,0.925335,0.923435,0.923021
3,0.2149,0.232109,0.9293,0.929794,0.929773,0.928638
4,0.1833,0.21266,0.9392,0.940372,0.939242,0.939185
5,0.163,0.193226,0.9463,0.946431,0.946561,0.946262
6,0.1497,0.188707,0.9492,0.950059,0.949382,0.949396
7,0.1425,0.183764,0.952,0.952233,0.952263,0.951971
8,0.1378,0.176808,0.9542,0.954579,0.954391,0.954279
9,0.1351,0.173595,0.9559,0.957119,0.955989,0.956279
10,0.1335,0.176371,0.9528,0.952974,0.953058,0.952782


[I 2025-04-03 05:35:02,788] Trial 16 finished with value: 0.9527815853348957 and parameters: {'learning_rate': 0.0007941317649660098, 'weight_decay': 0.009000000000000001, 'warmup_steps': 21, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 16 with value: 0.9527815853348957.


Trial 17 with params: {'learning_rate': 0.0002919622540240043, 'weight_decay': 0.01, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.387,0.264288,0.9125,0.917324,0.912668,0.912781
2,0.2265,0.213991,0.9359,0.936917,0.936299,0.935659
3,0.1821,0.210102,0.9419,0.942323,0.942296,0.941653
4,0.1608,0.207115,0.943,0.944276,0.943054,0.942878
5,0.1482,0.185738,0.9502,0.950498,0.950418,0.950261
6,0.1419,0.1831,0.9507,0.951041,0.950883,0.950829
7,0.1379,0.181314,0.9516,0.952356,0.951827,0.951748
8,0.1354,0.175803,0.9543,0.954688,0.954492,0.954455


[I 2025-04-03 05:50:54,437] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0002595115087810992, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3822,0.252478,0.9187,0.922109,0.918827,0.918884
2,0.2257,0.207377,0.9416,0.942027,0.942,0.941407
3,0.1817,0.209107,0.9381,0.938455,0.938578,0.937783
4,0.16,0.200168,0.9464,0.947623,0.94653,0.946434
5,0.1479,0.190561,0.9475,0.948396,0.947697,0.947622
6,0.1417,0.187122,0.9482,0.948979,0.948366,0.948472
7,0.138,0.184007,0.9482,0.948933,0.948498,0.94821
8,0.1356,0.17768,0.9522,0.952707,0.952366,0.952353


[I 2025-04-03 06:06:49,790] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0027034068363461865, 'weight_decay': 0.01, 'warmup_steps': 24, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5309,0.388465,0.848,0.857369,0.847972,0.848073
2,0.3549,0.30313,0.8907,0.893752,0.890736,0.890301
3,0.2918,0.284775,0.9012,0.901338,0.901651,0.900189
4,0.2492,0.255166,0.9137,0.915115,0.91358,0.913552


[I 2025-04-03 06:14:48,134] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0003465760054507741, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3797,0.256037,0.9177,0.920847,0.917736,0.917879
2,0.228,0.212034,0.9382,0.938521,0.938604,0.937824
3,0.1854,0.212255,0.9407,0.940797,0.94105,0.940333
4,0.1633,0.207945,0.9393,0.940948,0.93935,0.939267


[I 2025-04-03 06:22:46,220] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.0008680338407851465, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4168,0.294237,0.8987,0.902351,0.898639,0.898773
2,0.2719,0.25539,0.9131,0.915993,0.913125,0.912583
3,0.2204,0.236054,0.9245,0.924839,0.924968,0.924085
4,0.1865,0.212274,0.9382,0.93893,0.938122,0.938147
5,0.1651,0.199853,0.9427,0.942749,0.94287,0.942631
6,0.1512,0.190223,0.9481,0.948385,0.948327,0.948228
7,0.1432,0.191501,0.9465,0.947304,0.946915,0.946416
8,0.1384,0.177705,0.953,0.953336,0.953199,0.953097
9,0.1354,0.178773,0.9528,0.954483,0.952859,0.953228
10,0.1337,0.176249,0.9534,0.953489,0.953686,0.953382


[I 2025-04-03 06:42:37,989] Trial 21 finished with value: 0.953381701069801 and parameters: {'learning_rate': 0.0008680338407851465, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.5, 'temperature': 6.5}. Best is trial 21 with value: 0.953381701069801.


Trial 22 with params: {'learning_rate': 0.0013618190830901906, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4515,0.34632,0.8723,0.880066,0.87258,0.8722
2,0.3001,0.266654,0.9071,0.909812,0.907179,0.906707


[I 2025-04-03 06:46:39,041] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0005261925323189436, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3969,0.280272,0.9071,0.913197,0.907241,0.907554
2,0.2457,0.225712,0.9297,0.931044,0.930069,0.929348
3,0.1983,0.210287,0.9398,0.9401,0.940271,0.939483
4,0.1712,0.200616,0.9432,0.944193,0.943183,0.94325
5,0.1546,0.188652,0.9488,0.94895,0.949085,0.948751
6,0.1449,0.187532,0.9498,0.950338,0.950029,0.94997
7,0.14,0.180844,0.9517,0.952047,0.951982,0.951777
8,0.1367,0.174859,0.9555,0.956397,0.955685,0.955704
9,0.1343,0.173775,0.9559,0.957674,0.956002,0.956412
10,0.1329,0.17275,0.9579,0.95809,0.958181,0.95789


[I 2025-04-03 07:06:38,724] Trial 23 finished with value: 0.957889508474951 and parameters: {'learning_rate': 0.0005261925323189436, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 23 with value: 0.957889508474951.


Trial 24 with params: {'learning_rate': 0.00035212253552859024, 'weight_decay': 0.006, 'warmup_steps': 25, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3973,0.262589,0.9129,0.916877,0.913116,0.913284
2,0.2308,0.221553,0.9331,0.934674,0.933421,0.932688
3,0.1863,0.209477,0.9419,0.941914,0.942422,0.941533
4,0.1636,0.200622,0.9441,0.94506,0.944239,0.944135
5,0.15,0.188863,0.9487,0.948973,0.948925,0.94868
6,0.1424,0.184277,0.9513,0.952057,0.951538,0.951485
7,0.1384,0.181214,0.951,0.951517,0.951297,0.951097
8,0.1355,0.176039,0.953,0.953392,0.953224,0.953143


[I 2025-04-03 07:22:32,761] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0006584164829996585, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4032,0.288302,0.899,0.904942,0.898792,0.899301
2,0.2535,0.227254,0.9294,0.931085,0.92971,0.929152
3,0.2072,0.223925,0.9313,0.931548,0.931603,0.931035
4,0.1784,0.209593,0.9406,0.941485,0.940594,0.940527
5,0.1585,0.191861,0.9471,0.947364,0.947226,0.947218
6,0.1479,0.190369,0.949,0.94947,0.949168,0.949177
7,0.1414,0.184947,0.9525,0.95301,0.952762,0.952596
8,0.1371,0.178605,0.9533,0.954057,0.95345,0.953535
9,0.1348,0.17914,0.9538,0.955256,0.953881,0.954266
10,0.1332,0.179002,0.9537,0.953872,0.953951,0.953652


[I 2025-04-03 07:42:25,617] Trial 25 finished with value: 0.9536520129793555 and parameters: {'learning_rate': 0.0006584164829996585, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 23 with value: 0.957889508474951.


Trial 26 with params: {'learning_rate': 0.0007020330969419561, 'weight_decay': 0.006, 'warmup_steps': 22, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4025,0.281917,0.9044,0.908402,0.904282,0.90452
2,0.2578,0.238802,0.9228,0.925093,0.922941,0.922402


[I 2025-04-03 07:46:23,153] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.0018704128628130754, 'weight_decay': 0.005, 'warmup_steps': 24, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4848,0.348125,0.8673,0.874513,0.867493,0.867456
2,0.3226,0.277422,0.9024,0.905041,0.902604,0.90202
3,0.2605,0.256943,0.9145,0.914632,0.915072,0.913972
4,0.2241,0.23686,0.9255,0.926253,0.925598,0.925272


[I 2025-04-03 07:54:21,980] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.00014990576923674527, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4468,0.260252,0.9167,0.920202,0.916849,0.916925
2,0.2371,0.210175,0.9402,0.941093,0.940594,0.940083
3,0.1876,0.208822,0.942,0.941947,0.942308,0.941818
4,0.1638,0.207664,0.9404,0.942007,0.94054,0.940521
5,0.1501,0.192876,0.9456,0.945765,0.945891,0.945623
6,0.144,0.191424,0.9468,0.94738,0.94703,0.946985
7,0.1403,0.195959,0.9447,0.94569,0.945001,0.944773
8,0.1377,0.185828,0.9472,0.947955,0.947376,0.947417


[I 2025-04-03 08:10:15,722] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.00034411885087626086, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3945,0.260573,0.9172,0.921044,0.917442,0.917476
2,0.2299,0.21068,0.9371,0.938863,0.93741,0.936966
3,0.1861,0.206446,0.9431,0.943501,0.943463,0.942965
4,0.1636,0.200137,0.9441,0.945184,0.944117,0.944138
5,0.1493,0.184475,0.9507,0.951234,0.95089,0.950817
6,0.1422,0.180926,0.9518,0.952059,0.952064,0.951935
7,0.1383,0.180915,0.9512,0.951729,0.951506,0.951277
8,0.1354,0.174905,0.9537,0.954009,0.953924,0.953818
9,0.1335,0.176673,0.9547,0.956842,0.954732,0.95527
10,0.1324,0.177401,0.9521,0.952368,0.952403,0.952081


[I 2025-04-03 08:30:05,257] Trial 29 finished with value: 0.9520805477997882 and parameters: {'learning_rate': 0.00034411885087626086, 'weight_decay': 0.001, 'warmup_steps': 21, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 23 with value: 0.957889508474951.


Trial 30 with params: {'learning_rate': 0.0012988238377680513, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4548,0.334117,0.8776,0.884628,0.877516,0.877505
2,0.2992,0.268335,0.9073,0.910428,0.907482,0.906452


[I 2025-04-03 08:34:05,897] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0007961666216850647, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4143,0.286069,0.9,0.903227,0.899865,0.899964
2,0.2655,0.245106,0.9223,0.924472,0.922442,0.921724
3,0.2147,0.224168,0.9327,0.932752,0.933117,0.932311
4,0.1839,0.210547,0.9397,0.940646,0.939821,0.939767
5,0.1628,0.204644,0.9436,0.943749,0.943818,0.943373
6,0.1508,0.185546,0.9519,0.952415,0.952028,0.952107
7,0.1422,0.183747,0.9516,0.952114,0.951934,0.951614
8,0.138,0.176927,0.9531,0.953155,0.953318,0.953119
9,0.1351,0.177338,0.9556,0.9572,0.955648,0.956007
10,0.1334,0.177038,0.9537,0.953832,0.954016,0.953671


[I 2025-04-03 08:54:14,107] Trial 31 finished with value: 0.9536709526870333 and parameters: {'learning_rate': 0.0007961666216850647, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 23 with value: 0.957889508474951.


Trial 32 with params: {'learning_rate': 0.00036812264073506984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4,0.261134,0.9148,0.918555,0.914855,0.914927
2,0.2328,0.216685,0.9378,0.938572,0.938103,0.937455
3,0.1878,0.212028,0.9411,0.941679,0.941441,0.940952
4,0.1634,0.200578,0.9457,0.946766,0.945923,0.945819
5,0.1495,0.187934,0.9476,0.948191,0.947745,0.947734
6,0.1425,0.180545,0.9518,0.952261,0.952016,0.952027
7,0.1384,0.18192,0.9498,0.950205,0.950142,0.949771
8,0.1354,0.173987,0.9556,0.956002,0.95577,0.955724
9,0.1338,0.176376,0.9528,0.95463,0.952877,0.953305
10,0.1323,0.17713,0.9535,0.953885,0.953775,0.953489


[I 2025-04-03 09:14:07,989] Trial 32 finished with value: 0.9534887893018886 and parameters: {'learning_rate': 0.00036812264073506984, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 23 with value: 0.957889508474951.


Trial 33 with params: {'learning_rate': 0.00022622054335192507, 'weight_decay': 0.01, 'warmup_steps': 30, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4157,0.257947,0.9182,0.921471,0.918359,0.918352
2,0.2273,0.208046,0.9389,0.939957,0.939333,0.938686
3,0.182,0.208063,0.9408,0.940963,0.941186,0.940557
4,0.1605,0.205813,0.9432,0.944679,0.943186,0.943263
5,0.1475,0.189328,0.9475,0.948344,0.947608,0.947688
6,0.1421,0.189252,0.9462,0.946608,0.946418,0.946258
7,0.1384,0.191188,0.947,0.947769,0.947257,0.947081
8,0.1358,0.182497,0.9514,0.951878,0.951598,0.951513


[I 2025-04-03 09:30:02,492] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.00075298877519294, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4128,0.292756,0.8984,0.90466,0.898502,0.89892
2,0.2606,0.239998,0.92,0.922273,0.920304,0.919726


[I 2025-04-03 09:33:59,696] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0008718140900619618, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4202,0.306413,0.8909,0.89709,0.890816,0.891295
2,0.2718,0.249123,0.9166,0.918873,0.917058,0.916146
3,0.2188,0.234578,0.9283,0.928441,0.928643,0.927879
4,0.1887,0.219597,0.9339,0.935837,0.933814,0.934013


[I 2025-04-03 09:41:55,534] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0027467105678064636, 'weight_decay': 0.007, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5337,0.376889,0.8516,0.856818,0.851653,0.851626
2,0.3538,0.317255,0.8782,0.88286,0.878303,0.877704


[I 2025-04-03 09:45:53,527] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.00044351639002977673, 'weight_decay': 0.008, 'warmup_steps': 30, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.399,0.268591,0.9107,0.914771,0.910677,0.910742
2,0.2367,0.2158,0.9349,0.936535,0.935143,0.934668
3,0.1923,0.214899,0.9378,0.93795,0.938118,0.937592
4,0.1674,0.208232,0.9412,0.942331,0.941194,0.941086
5,0.1514,0.191571,0.9449,0.944932,0.945163,0.944784
6,0.1435,0.181374,0.9526,0.95297,0.952731,0.952734
7,0.1385,0.18194,0.9522,0.952725,0.952399,0.952271
8,0.1357,0.177267,0.9533,0.953772,0.953475,0.953413
9,0.1338,0.177566,0.953,0.954995,0.953093,0.953547
10,0.1324,0.17675,0.9542,0.954358,0.954477,0.95417


[I 2025-04-03 10:05:44,803] Trial 37 finished with value: 0.9541699235587912 and parameters: {'learning_rate': 0.00044351639002977673, 'weight_decay': 0.008, 'warmup_steps': 30, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 23 with value: 0.957889508474951.


Trial 38 with params: {'learning_rate': 0.000322195902622439, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4001,0.25518,0.9172,0.920709,0.917397,0.917593
2,0.229,0.20843,0.9382,0.939214,0.938449,0.938007
3,0.1847,0.203071,0.9411,0.940929,0.941474,0.940943
4,0.1617,0.202848,0.9432,0.944042,0.943338,0.943144
5,0.1495,0.189486,0.948,0.949242,0.948143,0.948168
6,0.1427,0.182989,0.9517,0.952416,0.951928,0.951948
7,0.1383,0.183857,0.9491,0.949591,0.949365,0.949153
8,0.1357,0.174851,0.9551,0.955409,0.955319,0.955212
9,0.1339,0.177679,0.9524,0.954606,0.95247,0.952943
10,0.1326,0.175385,0.9541,0.954408,0.954397,0.954077


[I 2025-04-03 10:25:43,521] Trial 38 finished with value: 0.9540769015633165 and parameters: {'learning_rate': 0.000322195902622439, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 23 with value: 0.957889508474951.


Trial 39 with params: {'learning_rate': 0.00026451683103872954, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4076,0.256696,0.9185,0.922233,0.918569,0.918758
2,0.2277,0.204898,0.9421,0.943091,0.942468,0.941885
3,0.1818,0.209329,0.941,0.941485,0.941412,0.940729
4,0.1604,0.204184,0.9427,0.943865,0.942742,0.942717


[I 2025-04-03 10:33:42,610] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0001702345185051393, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4286,0.261621,0.9151,0.919332,0.915194,0.915485
2,0.2323,0.207403,0.9399,0.94056,0.94031,0.939729
3,0.1843,0.209643,0.9404,0.940241,0.94077,0.940215
4,0.1613,0.207912,0.9398,0.941332,0.939829,0.939862
5,0.1485,0.189833,0.9474,0.947689,0.947674,0.94744
6,0.1427,0.192157,0.9457,0.946282,0.945908,0.945888
7,0.1394,0.194529,0.9453,0.946346,0.945568,0.945436
8,0.1368,0.185685,0.9488,0.94947,0.949045,0.949033


[I 2025-04-03 10:49:50,604] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0008332898028831513, 'weight_decay': 0.007, 'warmup_steps': 27, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4165,0.291861,0.897,0.901062,0.897219,0.896969
2,0.2687,0.246905,0.9173,0.919146,0.917671,0.916718


[I 2025-04-03 10:54:48,835] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0004817410860900052, 'weight_decay': 0.005, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4021,0.272803,0.9095,0.914653,0.909718,0.90982
2,0.2397,0.227588,0.9302,0.931998,0.930557,0.929608
3,0.1948,0.214398,0.9394,0.939521,0.939762,0.939103
4,0.1684,0.20579,0.9415,0.942267,0.941552,0.941341
5,0.1524,0.191383,0.9494,0.949987,0.94952,0.949438
6,0.1444,0.182667,0.9518,0.95221,0.951952,0.951989
7,0.1391,0.181079,0.9509,0.951651,0.951208,0.950983
8,0.1358,0.172568,0.9568,0.957195,0.956981,0.956901
9,0.1339,0.172939,0.9563,0.958374,0.956384,0.956782
10,0.1325,0.172465,0.9555,0.955776,0.955751,0.955491


[I 2025-04-03 11:14:49,992] Trial 42 finished with value: 0.9554911716867691 and parameters: {'learning_rate': 0.0004817410860900052, 'weight_decay': 0.005, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 23 with value: 0.957889508474951.


Trial 43 with params: {'learning_rate': 0.0003144108703125815, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4034,0.259182,0.9121,0.916307,0.912157,0.91233
2,0.2286,0.208686,0.9398,0.940506,0.940151,0.939553
3,0.1846,0.205502,0.9414,0.941707,0.941731,0.94121
4,0.1624,0.203941,0.9452,0.946311,0.945381,0.945261
5,0.1489,0.186971,0.9498,0.950374,0.949962,0.949835
6,0.1422,0.183063,0.9509,0.951404,0.951065,0.951138
7,0.1381,0.182326,0.9504,0.950813,0.950745,0.950377
8,0.1354,0.174013,0.9538,0.953883,0.954045,0.953884
9,0.1339,0.1786,0.9542,0.956063,0.954268,0.954699
10,0.1325,0.176814,0.9548,0.955134,0.955097,0.954822


[I 2025-04-03 11:34:50,145] Trial 43 finished with value: 0.9548218242134704 and parameters: {'learning_rate': 0.0003144108703125815, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 23 with value: 0.957889508474951.


Trial 44 with params: {'learning_rate': 0.000526369209665657, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3995,0.280202,0.9062,0.911336,0.906354,0.906657
2,0.2441,0.222401,0.9304,0.931252,0.930777,0.930188
3,0.1979,0.223992,0.9321,0.932174,0.932533,0.931604
4,0.1725,0.207489,0.9415,0.942484,0.941437,0.941496
5,0.1543,0.189013,0.9493,0.949821,0.949404,0.949435
6,0.1449,0.187426,0.9492,0.949786,0.949442,0.949256
7,0.1397,0.179334,0.9525,0.952631,0.952788,0.952558
8,0.1362,0.175879,0.9549,0.955268,0.955082,0.95501
9,0.1341,0.17642,0.953,0.954713,0.953032,0.953508
10,0.1327,0.176822,0.9539,0.954175,0.954177,0.953825


[I 2025-04-03 11:55:01,233] Trial 44 finished with value: 0.953825036263573 and parameters: {'learning_rate': 0.000526369209665657, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 23 with value: 0.957889508474951.


Trial 45 with params: {'learning_rate': 8.728082931090421e-05, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5032,0.278271,0.9092,0.912239,0.909348,0.909298
2,0.2648,0.223222,0.9305,0.931778,0.930823,0.930449
3,0.2105,0.217858,0.9364,0.936522,0.936745,0.936081
4,0.1811,0.215238,0.9369,0.938527,0.937001,0.93704


[I 2025-04-03 12:02:59,396] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.0007326672952757187, 'weight_decay': 0.002, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4112,0.29617,0.8967,0.901928,0.89681,0.896934
2,0.2603,0.241151,0.9217,0.923256,0.922035,0.921111
3,0.2116,0.237159,0.9257,0.926558,0.926249,0.9253
4,0.1822,0.216083,0.9338,0.936202,0.933669,0.933984


[I 2025-04-03 12:10:56,912] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.00043434613647948115, 'weight_decay': 0.006, 'warmup_steps': 29, 'lambda_param': 0.9, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.398,0.281852,0.9063,0.913985,0.906309,0.907171
2,0.2362,0.215842,0.9366,0.937692,0.936923,0.936334
3,0.192,0.209861,0.9411,0.940921,0.941367,0.940798
4,0.1679,0.206511,0.94,0.940527,0.940044,0.939864


[I 2025-04-03 12:18:54,646] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.00035415566716214843, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4011,0.270944,0.911,0.916489,0.911293,0.91162
2,0.2313,0.211451,0.9371,0.938129,0.937436,0.936904
3,0.1869,0.213248,0.9381,0.9388,0.938502,0.937836
4,0.1634,0.200964,0.9445,0.945583,0.944588,0.944483
5,0.1496,0.18522,0.95,0.950554,0.950168,0.950109
6,0.1422,0.179827,0.9537,0.954097,0.95393,0.953823
7,0.1381,0.1845,0.9488,0.949626,0.949077,0.94888
8,0.1353,0.173459,0.9559,0.956137,0.956099,0.955982
9,0.1336,0.175364,0.9555,0.956954,0.955594,0.955966
10,0.1324,0.176881,0.9543,0.954406,0.954622,0.954275


[I 2025-04-03 12:38:48,069] Trial 48 finished with value: 0.9542749400104178 and parameters: {'learning_rate': 0.00035415566716214843, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.9, 'temperature': 5.0}. Best is trial 23 with value: 0.957889508474951.


Trial 49 with params: {'learning_rate': 0.0004768682136469425, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.399,0.266907,0.9136,0.917616,0.913685,0.913973
2,0.2403,0.22338,0.932,0.933431,0.932339,0.9315
3,0.1933,0.223308,0.9318,0.932297,0.93224,0.931333
4,0.1699,0.206915,0.9413,0.942553,0.94138,0.94128
5,0.1518,0.191169,0.9474,0.947684,0.94755,0.947335
6,0.1441,0.183653,0.9519,0.952316,0.952078,0.952107
7,0.1394,0.18387,0.949,0.949976,0.949274,0.949127
8,0.1357,0.177007,0.9532,0.953833,0.953315,0.953402


[I 2025-04-03 12:54:45,778] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0005153861472168533, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3989,0.278351,0.9053,0.910928,0.905278,0.905635
2,0.2427,0.214521,0.9345,0.935756,0.934726,0.934355
3,0.1977,0.214126,0.9405,0.940266,0.940866,0.940241
4,0.1709,0.20644,0.9411,0.942578,0.941021,0.941094
5,0.1537,0.188118,0.9491,0.949335,0.949263,0.949129
6,0.1452,0.185104,0.9519,0.95213,0.952122,0.95192
7,0.14,0.181963,0.953,0.953318,0.953226,0.953014
8,0.1365,0.173562,0.9566,0.956855,0.956749,0.956726
9,0.1342,0.175195,0.9537,0.955667,0.953754,0.954193
10,0.1328,0.176525,0.9544,0.954502,0.954701,0.954294


[I 2025-04-03 13:14:46,239] Trial 50 finished with value: 0.9542944914472097 and parameters: {'learning_rate': 0.0005153861472168533, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}. Best is trial 23 with value: 0.957889508474951.


Trial 51 with params: {'learning_rate': 0.001123849782237451, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4384,0.310325,0.891,0.895861,0.890994,0.891213
2,0.2882,0.255165,0.912,0.914022,0.912096,0.911711


[I 2025-04-03 13:18:45,893] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0003617058860384599, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3994,0.270923,0.9087,0.913294,0.908742,0.908917
2,0.2318,0.212421,0.9368,0.937699,0.937109,0.936582
3,0.1876,0.210517,0.9409,0.941217,0.941288,0.940682
4,0.1639,0.197548,0.9466,0.947575,0.946566,0.946713
5,0.1501,0.1897,0.9492,0.949716,0.949482,0.949173
6,0.1424,0.185273,0.9489,0.949661,0.949065,0.949131
7,0.1384,0.18287,0.9502,0.950638,0.950527,0.950219
8,0.1357,0.175108,0.9548,0.955151,0.954992,0.954944
9,0.1337,0.178526,0.953,0.954619,0.953095,0.953511
10,0.1325,0.178015,0.9524,0.952781,0.952689,0.952365


[I 2025-04-03 13:38:47,001] Trial 52 finished with value: 0.9523645685465724 and parameters: {'learning_rate': 0.0003617058860384599, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}. Best is trial 23 with value: 0.957889508474951.


Trial 53 with params: {'learning_rate': 0.0010564144200746963, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.433,0.31894,0.884,0.891816,0.884107,0.884346
2,0.2847,0.263832,0.9085,0.912557,0.908651,0.908386


[I 2025-04-03 13:42:48,477] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.00018591100871980046, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3977,0.253108,0.9212,0.924018,0.921344,0.921308
2,0.2285,0.201662,0.9434,0.944155,0.943799,0.943321
3,0.1827,0.202636,0.9441,0.943986,0.944511,0.943906
4,0.1603,0.205616,0.9402,0.941671,0.940334,0.940199
5,0.1478,0.187893,0.9493,0.949808,0.949489,0.949368
6,0.1421,0.186768,0.9505,0.950886,0.950736,0.950671
7,0.1389,0.188559,0.9474,0.948086,0.947718,0.947449
8,0.1365,0.178967,0.9529,0.953248,0.953129,0.953027


[I 2025-04-03 13:58:45,522] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.00022982979054722176, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4161,0.255641,0.9177,0.921409,0.917891,0.917856
2,0.2274,0.203017,0.9438,0.944414,0.944119,0.943681
3,0.1827,0.208137,0.9408,0.940878,0.941207,0.940451
4,0.1602,0.20197,0.9434,0.944834,0.943486,0.943399
5,0.1476,0.187489,0.9478,0.948283,0.947943,0.947867
6,0.1416,0.184211,0.9513,0.951706,0.951494,0.951444
7,0.1383,0.190238,0.9469,0.947444,0.947218,0.946875
8,0.136,0.179968,0.9505,0.9509,0.950725,0.950549


[I 2025-04-03 14:14:38,256] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.00023818113371569337, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4126,0.259775,0.9158,0.919524,0.915971,0.9161
2,0.2274,0.210055,0.9384,0.939675,0.938821,0.938198
3,0.1819,0.20824,0.9408,0.940867,0.941151,0.9406
4,0.1605,0.19972,0.9463,0.947572,0.946287,0.946362
5,0.1478,0.186656,0.9488,0.949407,0.949071,0.948916
6,0.1419,0.185344,0.9488,0.949519,0.948976,0.94905
7,0.1382,0.186411,0.9489,0.949676,0.949148,0.948965
8,0.1359,0.178989,0.9506,0.951104,0.950813,0.950753


[I 2025-04-03 14:30:37,598] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00039741571814277894, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3998,0.274394,0.9106,0.915469,0.910703,0.911056
2,0.2333,0.207099,0.9411,0.941549,0.941479,0.940784
3,0.1891,0.211995,0.9383,0.938709,0.938668,0.938146
4,0.1654,0.197596,0.9453,0.945966,0.945341,0.945261
5,0.1513,0.189477,0.9493,0.949951,0.949493,0.949382
6,0.1432,0.177946,0.9534,0.953705,0.953596,0.9535
7,0.1387,0.180313,0.9523,0.952772,0.952527,0.952347
8,0.1356,0.173103,0.9555,0.955844,0.955727,0.955609
9,0.1338,0.177412,0.9545,0.956622,0.954568,0.955031
10,0.1324,0.174329,0.956,0.956171,0.956256,0.955997


[I 2025-04-03 14:50:39,682] Trial 57 finished with value: 0.9559974740779698 and parameters: {'learning_rate': 0.00039741571814277894, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 23 with value: 0.957889508474951.


Trial 58 with params: {'learning_rate': 0.0004316804170661842, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3986,0.278484,0.9084,0.914787,0.908368,0.909022
2,0.2361,0.214849,0.9356,0.93668,0.935916,0.935324
3,0.1924,0.221311,0.9349,0.935299,0.935257,0.934669
4,0.1666,0.199989,0.9461,0.946707,0.946175,0.945995
5,0.1523,0.189559,0.9496,0.950052,0.949771,0.949637
6,0.1438,0.181402,0.9522,0.952673,0.952361,0.95243
7,0.1388,0.186295,0.9481,0.948557,0.948367,0.948129
8,0.1359,0.177062,0.9521,0.952644,0.952309,0.952303


[I 2025-04-03 15:06:34,087] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.00015652078478854955, 'weight_decay': 0.006, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4425,0.258413,0.9175,0.921092,0.91771,0.917809
2,0.2354,0.210799,0.9385,0.939548,0.93895,0.93826
3,0.1861,0.21159,0.9386,0.938586,0.938941,0.938347
4,0.1626,0.20541,0.943,0.944403,0.943081,0.943082
5,0.1494,0.18949,0.948,0.948309,0.948239,0.948075
6,0.1436,0.190756,0.946,0.946612,0.946256,0.946197
7,0.1397,0.195217,0.9438,0.944698,0.944106,0.94384
8,0.1375,0.187659,0.9471,0.947733,0.947274,0.947234


[I 2025-04-03 15:22:31,163] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00046762991988506683, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.374,0.269056,0.9095,0.914269,0.909787,0.90975
2,0.2367,0.216029,0.9342,0.935123,0.93462,0.933944
3,0.194,0.223577,0.9326,0.933635,0.933084,0.932317
4,0.1681,0.204632,0.9415,0.942214,0.941637,0.941359


[I 2025-04-03 15:30:28,223] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00432172380795687, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5872,0.399469,0.8339,0.840844,0.833886,0.834016
2,0.3986,0.344906,0.8628,0.870385,0.862784,0.863083
3,0.3289,0.31719,0.8797,0.882407,0.88037,0.878645
4,0.2838,0.279276,0.902,0.902292,0.901992,0.901185


[I 2025-04-03 15:38:27,411] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0004841398065350778, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4008,0.272525,0.91,0.914253,0.910016,0.910282
2,0.2395,0.216621,0.9359,0.936497,0.936305,0.935623
3,0.1961,0.222391,0.9334,0.933775,0.93389,0.932902
4,0.1691,0.211905,0.9371,0.938368,0.937182,0.936952


[I 2025-04-03 15:46:24,910] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0008609198888343718, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4135,0.295674,0.8959,0.90064,0.895986,0.896062
2,0.269,0.235576,0.925,0.926689,0.925145,0.924639
3,0.2194,0.232116,0.9303,0.930508,0.930616,0.929913
4,0.1862,0.215283,0.9371,0.937636,0.937208,0.936898


[I 2025-04-03 15:54:27,192] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.003794022270756288, 'weight_decay': 0.002, 'warmup_steps': 17, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5748,0.410398,0.8357,0.842392,0.835733,0.835692
2,0.3877,0.327541,0.8708,0.873714,0.871113,0.870669


[I 2025-04-03 15:58:28,580] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0007444972337830161, 'weight_decay': 0.006, 'warmup_steps': 27, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4111,0.288993,0.8986,0.902763,0.898538,0.898654
2,0.2629,0.232055,0.9255,0.926943,0.925918,0.925198
3,0.2122,0.226912,0.9315,0.931846,0.931847,0.931187
4,0.1819,0.207396,0.9419,0.942572,0.941876,0.941698
5,0.1621,0.197757,0.9459,0.946144,0.946091,0.945963
6,0.1493,0.194528,0.9471,0.947729,0.947383,0.947305
7,0.1425,0.187932,0.9475,0.948305,0.947815,0.947577
8,0.1377,0.177599,0.9546,0.955015,0.954789,0.954808
9,0.135,0.181891,0.951,0.953501,0.951111,0.951649
10,0.1334,0.17638,0.9543,0.954431,0.954597,0.95431


[I 2025-04-03 16:18:29,901] Trial 65 finished with value: 0.9543103525571481 and parameters: {'learning_rate': 0.0007444972337830161, 'weight_decay': 0.006, 'warmup_steps': 27, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 23 with value: 0.957889508474951.


Trial 66 with params: {'learning_rate': 0.00071867637646695, 'weight_decay': 0.004, 'warmup_steps': 18, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4038,0.292262,0.8998,0.903931,0.899893,0.899816
2,0.2583,0.242517,0.9207,0.922743,0.920959,0.920277


[I 2025-04-03 16:22:29,246] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 8.465954991738309e-05, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4904,0.277801,0.909,0.912411,0.909167,0.909066
2,0.2653,0.225406,0.9315,0.932623,0.93182,0.931455
3,0.2111,0.219325,0.9348,0.934831,0.935183,0.934519
4,0.182,0.216242,0.9371,0.938989,0.937208,0.937266


[I 2025-04-03 16:30:26,868] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.000600122657064969, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3952,0.268318,0.913,0.916249,0.913129,0.913129
2,0.2485,0.223192,0.9308,0.93159,0.931049,0.930305
3,0.2022,0.229817,0.9294,0.92922,0.929875,0.928748
4,0.1758,0.20309,0.9426,0.943666,0.942663,0.942605
5,0.1559,0.189575,0.949,0.949274,0.949083,0.949033
6,0.1455,0.182577,0.952,0.952189,0.952254,0.952054
7,0.1408,0.179474,0.9544,0.954842,0.954606,0.954449
8,0.1369,0.173842,0.9564,0.956631,0.956583,0.956448
9,0.1344,0.173382,0.9577,0.959003,0.95775,0.95806
10,0.133,0.173941,0.9571,0.957302,0.957314,0.957103


[I 2025-04-03 16:50:13,869] Trial 68 finished with value: 0.9571025209540652 and parameters: {'learning_rate': 0.000600122657064969, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}. Best is trial 23 with value: 0.957889508474951.


Trial 69 with params: {'learning_rate': 0.0014850705657138861, 'weight_decay': 0.005, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4626,0.343944,0.8691,0.876347,0.86933,0.869238
2,0.3076,0.26254,0.9098,0.912627,0.909945,0.909563


[I 2025-04-03 16:54:11,358] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00034075573943386243, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3924,0.264074,0.9135,0.917112,0.913608,0.913716
2,0.2299,0.211081,0.9385,0.939668,0.938822,0.938305
3,0.1867,0.207277,0.9409,0.940891,0.941237,0.940779
4,0.1634,0.202853,0.9437,0.94443,0.943666,0.94367
5,0.149,0.186401,0.9496,0.950297,0.949876,0.94966
6,0.1423,0.183987,0.9502,0.95072,0.950414,0.950398
7,0.1383,0.181314,0.9505,0.951149,0.950744,0.950623
8,0.1354,0.174418,0.9547,0.955249,0.954843,0.954909
9,0.1338,0.176758,0.9537,0.955338,0.953778,0.954184
10,0.1324,0.176919,0.955,0.955228,0.955277,0.954989


[I 2025-04-03 17:14:09,709] Trial 70 finished with value: 0.954988956146399 and parameters: {'learning_rate': 0.00034075573943386243, 'weight_decay': 0.001, 'warmup_steps': 18, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 23 with value: 0.957889508474951.


Trial 71 with params: {'learning_rate': 0.001288672260599229, 'weight_decay': 0.003, 'warmup_steps': 13, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4438,0.313457,0.8864,0.889233,0.886448,0.886223
2,0.2975,0.259929,0.912,0.91382,0.912163,0.911507
3,0.2405,0.243941,0.9222,0.922255,0.922677,0.921762
4,0.2047,0.228425,0.9298,0.930785,0.92971,0.929446


[I 2025-04-03 17:22:11,322] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00032775238249873245, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3896,0.256403,0.9191,0.922582,0.9194,0.919341
2,0.2292,0.203056,0.9422,0.942738,0.942414,0.942052
3,0.1852,0.214367,0.9376,0.937978,0.937921,0.937352
4,0.1622,0.201293,0.9456,0.946465,0.945749,0.945592
5,0.1481,0.185898,0.9507,0.951116,0.950901,0.950803
6,0.1417,0.180254,0.9522,0.952484,0.952332,0.952267
7,0.1384,0.18273,0.9501,0.950965,0.950368,0.950234
8,0.1354,0.176293,0.9537,0.954222,0.953839,0.953844


[I 2025-04-03 17:38:11,411] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.000349305658541348, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3809,0.254245,0.92,0.923131,0.920086,0.920195
2,0.2294,0.216032,0.9351,0.936304,0.935542,0.934638
3,0.1857,0.218364,0.9344,0.934783,0.934907,0.934117
4,0.1638,0.20335,0.9441,0.945176,0.944196,0.944174
5,0.1489,0.185439,0.9491,0.949549,0.949313,0.949137
6,0.1422,0.180842,0.9532,0.953749,0.953365,0.95345
7,0.1386,0.177138,0.9536,0.95432,0.953858,0.953779
8,0.1353,0.172918,0.9547,0.955016,0.954904,0.954799
9,0.1337,0.173081,0.9547,0.956358,0.954756,0.955125


[I 2025-04-03 17:58:04,198] Trial 73 finished with value: 0.9555220597767097 and parameters: {'learning_rate': 0.000349305658541348, 'weight_decay': 0.001, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 23 with value: 0.957889508474951.


Trial 74 with params: {'learning_rate': 0.00023765537108854154, 'weight_decay': 0.001, 'warmup_steps': 8, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3924,0.25403,0.9174,0.920949,0.917567,0.917701
2,0.2257,0.203688,0.9428,0.943293,0.9432,0.942642
3,0.1822,0.212768,0.9377,0.93802,0.938217,0.937263
4,0.1607,0.203903,0.9429,0.944414,0.943036,0.943012


[I 2025-04-03 18:06:02,025] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00038346168267470635, 'weight_decay': 0.0, 'warmup_steps': 6, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.38,0.268566,0.9103,0.915602,0.910525,0.910612
2,0.2305,0.218191,0.9317,0.932905,0.932087,0.931436
3,0.1883,0.216419,0.939,0.939185,0.939419,0.938721
4,0.1651,0.196729,0.9477,0.94863,0.94774,0.94776
5,0.151,0.186633,0.9511,0.951664,0.951206,0.951206
6,0.143,0.185175,0.9515,0.951865,0.951633,0.951543
7,0.1382,0.179803,0.9509,0.951493,0.951233,0.950982
8,0.1355,0.175766,0.9555,0.955818,0.955713,0.955563
9,0.1339,0.178865,0.9517,0.953677,0.951676,0.952168
10,0.1325,0.174728,0.955,0.95507,0.955291,0.954979


[I 2025-04-03 18:25:57,374] Trial 75 finished with value: 0.9549793462599284 and parameters: {'learning_rate': 0.00038346168267470635, 'weight_decay': 0.0, 'warmup_steps': 6, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 23 with value: 0.957889508474951.


Trial 76 with params: {'learning_rate': 0.0006793596447719977, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.395,0.281893,0.9029,0.906636,0.903073,0.903044
2,0.2559,0.23744,0.9225,0.924671,0.922673,0.922143


[I 2025-04-03 18:29:56,375] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00024702636080468015, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3845,0.256225,0.9177,0.921291,0.917855,0.917917
2,0.2255,0.203188,0.9426,0.943547,0.942926,0.942502
3,0.1815,0.211884,0.9378,0.937997,0.938223,0.937503
4,0.1601,0.203823,0.9424,0.94435,0.942532,0.942584
5,0.1477,0.185239,0.9514,0.952035,0.951494,0.951566
6,0.1417,0.186503,0.9492,0.949547,0.949438,0.949329
7,0.1384,0.184679,0.9504,0.95102,0.950655,0.950485
8,0.1357,0.177338,0.953,0.953506,0.953175,0.953178


[I 2025-04-03 18:46:15,643] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.00028817333026970874, 'weight_decay': 0.0, 'warmup_steps': 21, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3978,0.254544,0.9188,0.922118,0.918911,0.919067
2,0.2274,0.20582,0.9409,0.941649,0.941224,0.940746
3,0.1828,0.212963,0.9388,0.938992,0.939254,0.938499
4,0.1611,0.207491,0.9398,0.940927,0.939926,0.939829


[I 2025-04-03 18:54:13,131] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.00024605813270224314, 'weight_decay': 0.0, 'warmup_steps': 11, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3939,0.254741,0.917,0.920529,0.917105,0.917241
2,0.2259,0.205302,0.9413,0.942336,0.941654,0.941175
3,0.1819,0.205516,0.9436,0.943682,0.943933,0.943465
4,0.1608,0.203782,0.9448,0.946073,0.944871,0.944916
5,0.1482,0.185342,0.9496,0.950247,0.94983,0.949783
6,0.1418,0.184484,0.9488,0.94939,0.948996,0.949029
7,0.1385,0.184953,0.9487,0.949418,0.948939,0.948901
8,0.1357,0.178205,0.9526,0.952999,0.952782,0.952733


[I 2025-04-03 19:10:08,537] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.00010473728714758331, 'weight_decay': 0.002, 'warmup_steps': 25, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4749,0.271525,0.9114,0.914757,0.911543,0.911495
2,0.2528,0.218594,0.9346,0.935838,0.934929,0.934494
3,0.2,0.213429,0.939,0.938939,0.939349,0.938754
4,0.1728,0.212392,0.9393,0.940919,0.939401,0.939387


[I 2025-04-03 19:18:04,951] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0014987865807439419, 'weight_decay': 0.001, 'warmup_steps': 10, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4588,0.338725,0.8771,0.883198,0.877015,0.877196
2,0.3079,0.274918,0.9038,0.905968,0.90392,0.903526


[I 2025-04-03 19:22:04,433] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0007120667435052716, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3904,0.302667,0.8919,0.89821,0.891907,0.892095
2,0.259,0.239122,0.9235,0.925327,0.923775,0.92318
3,0.2094,0.230123,0.9315,0.93149,0.931914,0.931083
4,0.1799,0.210752,0.9365,0.937315,0.936574,0.936434


[I 2025-04-03 19:30:01,822] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.00048791837592914523, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3764,0.26529,0.9122,0.916572,0.912441,0.91257
2,0.2381,0.224246,0.9307,0.932002,0.931051,0.930259


[I 2025-04-03 19:34:01,082] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0005533234110048413, 'weight_decay': 0.001, 'warmup_steps': 6, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3848,0.278853,0.9039,0.908456,0.903913,0.904056
2,0.2437,0.2249,0.9301,0.930762,0.930437,0.929839
3,0.1993,0.21853,0.9381,0.938277,0.938433,0.937886
4,0.1726,0.201283,0.9461,0.946563,0.946214,0.946014
5,0.154,0.186624,0.951,0.951297,0.951195,0.951049
6,0.1451,0.1853,0.9518,0.952271,0.951995,0.951991
7,0.14,0.179093,0.9538,0.95398,0.954031,0.953847
8,0.1363,0.176762,0.9531,0.953817,0.953307,0.953255


[I 2025-04-03 19:50:01,039] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0006501581887941969, 'weight_decay': 0.003, 'warmup_steps': 18, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3973,0.266952,0.9114,0.914188,0.911505,0.911432
2,0.254,0.22613,0.9281,0.929241,0.92834,0.92787


[I 2025-04-03 19:54:00,309] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0002399668884562016, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4018,0.249772,0.9209,0.92361,0.920909,0.920953
2,0.2267,0.206697,0.9391,0.940056,0.939469,0.938974
3,0.1825,0.211868,0.9386,0.938531,0.939014,0.93829
4,0.1599,0.194868,0.9462,0.947237,0.946279,0.946352
5,0.1477,0.187391,0.9491,0.94956,0.949354,0.94914
6,0.1418,0.18669,0.9485,0.948975,0.948653,0.94864
7,0.1381,0.1853,0.9476,0.94822,0.947904,0.947699
8,0.1357,0.178668,0.953,0.953547,0.953193,0.953155


[I 2025-04-03 20:09:53,973] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 8.570556610000305e-05, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4783,0.278318,0.9091,0.912081,0.909275,0.909116
2,0.264,0.224054,0.9323,0.933386,0.932641,0.932302
3,0.2101,0.21791,0.9365,0.936531,0.936849,0.936217
4,0.1809,0.217015,0.937,0.938617,0.937077,0.937097


[I 2025-04-03 20:17:52,016] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0005277089447783195, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3984,0.289533,0.9043,0.910515,0.904271,0.904656
2,0.2428,0.235479,0.9244,0.926797,0.924702,0.923697


[I 2025-04-03 20:21:51,411] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0006826356451666915, 'weight_decay': 0.0, 'warmup_steps': 22, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4049,0.299118,0.898,0.903229,0.897888,0.898185
2,0.2564,0.235053,0.923,0.925034,0.92351,0.92269
3,0.2088,0.224894,0.9319,0.931855,0.932353,0.93146
4,0.1791,0.20274,0.9423,0.943369,0.942424,0.94232
5,0.1585,0.19128,0.947,0.947671,0.947255,0.947066
6,0.1482,0.185925,0.951,0.951148,0.951351,0.951093
7,0.1418,0.181031,0.9533,0.954012,0.953593,0.953441
8,0.1374,0.178125,0.953,0.953543,0.953261,0.953148


[I 2025-04-03 20:37:48,230] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0005785245690466829, 'weight_decay': 0.001, 'warmup_steps': 13, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3904,0.272471,0.9093,0.913009,0.909316,0.909404
2,0.2473,0.230882,0.9281,0.929413,0.928428,0.927536
3,0.203,0.231927,0.93,0.931197,0.930585,0.929259
4,0.173,0.205191,0.9395,0.940873,0.939377,0.93953


[I 2025-04-03 20:45:54,463] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.00033435279585866207, 'weight_decay': 0.006, 'warmup_steps': 21, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3964,0.266467,0.9114,0.916481,0.911501,0.911732
2,0.2303,0.212417,0.9378,0.938382,0.938226,0.937548
3,0.1855,0.217533,0.9352,0.935782,0.935721,0.935011
4,0.1626,0.19972,0.9458,0.946655,0.945927,0.945871
5,0.1494,0.184725,0.9512,0.951497,0.951411,0.951223
6,0.1419,0.186437,0.9486,0.949473,0.948774,0.948712
7,0.1381,0.181823,0.9509,0.951562,0.951167,0.950967
8,0.1354,0.175861,0.9548,0.955296,0.954978,0.954913
9,0.1337,0.179238,0.9519,0.954184,0.951935,0.952475
10,0.1323,0.177505,0.9537,0.953942,0.953985,0.953663


[I 2025-04-03 21:05:48,151] Trial 91 finished with value: 0.9536631894090075 and parameters: {'learning_rate': 0.00033435279585866207, 'weight_decay': 0.006, 'warmup_steps': 21, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 23 with value: 0.957889508474951.


Trial 92 with params: {'learning_rate': 0.0015837356481811218, 'weight_decay': 0.006, 'warmup_steps': 15, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4629,0.350439,0.869,0.876902,0.86915,0.868863
2,0.31,0.269719,0.9074,0.909668,0.907739,0.907012
3,0.2522,0.251803,0.9179,0.917918,0.918406,0.917372
4,0.2158,0.243839,0.9212,0.922793,0.921255,0.921024


[I 2025-04-03 21:13:44,504] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00033248595874725193, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4022,0.262441,0.9137,0.917861,0.913807,0.913943
2,0.23,0.210211,0.9376,0.938695,0.937889,0.937404
3,0.1859,0.208286,0.9419,0.942048,0.942164,0.94171
4,0.1638,0.19766,0.9462,0.946947,0.946194,0.946141
5,0.1487,0.18892,0.9505,0.950798,0.950781,0.950455
6,0.1419,0.182196,0.9529,0.95349,0.953096,0.95309
7,0.1383,0.181839,0.9508,0.951539,0.951076,0.950887
8,0.1355,0.173828,0.9556,0.955965,0.955738,0.95573
9,0.1338,0.17763,0.9531,0.955039,0.953174,0.953642
10,0.1325,0.176003,0.9532,0.953477,0.953455,0.953186


[I 2025-04-03 21:33:37,632] Trial 93 finished with value: 0.9531862792450981 and parameters: {'learning_rate': 0.00033248595874725193, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 5.5}. Best is trial 23 with value: 0.957889508474951.


Trial 94 with params: {'learning_rate': 0.0005575591548041156, 'weight_decay': 0.007, 'warmup_steps': 32, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4027,0.289662,0.8991,0.90605,0.89919,0.899559
2,0.2462,0.225491,0.9309,0.932349,0.931359,0.93058
3,0.1999,0.220416,0.934,0.934103,0.934397,0.933599
4,0.1723,0.20877,0.9387,0.939813,0.938783,0.938878


[I 2025-04-03 21:41:35,291] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0007529929232542638, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4095,0.299201,0.8938,0.899285,0.893669,0.893813
2,0.2633,0.244085,0.9231,0.92424,0.923406,0.922804


[I 2025-04-03 21:45:34,416] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.0005489748531825249, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.9, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4013,0.28056,0.9057,0.911075,0.905695,0.906008
2,0.2457,0.234742,0.9241,0.926894,0.924377,0.923774
3,0.1992,0.224732,0.9326,0.933388,0.933067,0.932086
4,0.1721,0.201296,0.9434,0.943551,0.943572,0.943341
5,0.1553,0.19142,0.947,0.947178,0.947223,0.947
6,0.1456,0.18583,0.95,0.950416,0.95019,0.95014
7,0.14,0.183458,0.9527,0.953163,0.952991,0.952754
8,0.1363,0.179785,0.9522,0.952453,0.952438,0.952145


[I 2025-04-03 22:01:28,244] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0009623351495963801, 'weight_decay': 0.005, 'warmup_steps': 25, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.424,0.316428,0.8873,0.892321,0.887358,0.887295
2,0.279,0.260287,0.912,0.915634,0.912228,0.911739


[I 2025-04-03 22:05:27,763] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.00033455680017667716, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.389,0.25801,0.9168,0.920591,0.916925,0.917066
2,0.2289,0.214422,0.9387,0.939356,0.939032,0.938448
3,0.1852,0.21004,0.939,0.939071,0.939408,0.938685
4,0.1626,0.200809,0.9429,0.943994,0.943008,0.942985


[I 2025-04-03 22:13:44,816] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.00023558072727044885, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4139,0.254203,0.9184,0.921597,0.918564,0.918635
2,0.2265,0.206739,0.9411,0.941818,0.941421,0.94099
3,0.1824,0.20998,0.9393,0.939647,0.939675,0.93918
4,0.1603,0.198822,0.945,0.946202,0.945088,0.945025
5,0.1478,0.188076,0.948,0.948469,0.948124,0.948126
6,0.1415,0.184168,0.9493,0.949478,0.949646,0.94937
7,0.1383,0.188116,0.9475,0.948307,0.947825,0.947612
8,0.1358,0.178691,0.9502,0.950667,0.95047,0.950369


[I 2025-04-03 22:29:36,028] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0039682019969684, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5828,0.418315,0.8279,0.834185,0.828133,0.82738
2,0.397,0.336201,0.8669,0.868702,0.867214,0.866431
3,0.3267,0.30852,0.8856,0.885588,0.885983,0.884885
4,0.2793,0.283553,0.8969,0.899119,0.89675,0.895898


[I 2025-04-03 22:37:31,329] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00031429153815329186, 'weight_decay': 0.005, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.401,0.250392,0.9219,0.924838,0.922051,0.921994
2,0.2285,0.209015,0.9401,0.940669,0.940461,0.939902
3,0.1847,0.213927,0.9387,0.939131,0.939073,0.938489
4,0.1633,0.201441,0.9463,0.947343,0.946303,0.946358
5,0.1489,0.184558,0.9519,0.951977,0.952037,0.95192
6,0.1418,0.180456,0.9535,0.953888,0.953674,0.953691
7,0.1382,0.183656,0.9519,0.952356,0.952216,0.951941
8,0.1355,0.177378,0.9524,0.952958,0.952647,0.952525


[I 2025-04-03 22:53:19,933] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0003366204043784816, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.403,0.257595,0.9182,0.921916,0.918272,0.918473
2,0.2292,0.204484,0.9405,0.941172,0.940834,0.940381
3,0.1848,0.214858,0.9379,0.938504,0.938288,0.937661
4,0.1631,0.19276,0.9492,0.950163,0.949289,0.949319
5,0.1493,0.188861,0.9488,0.950055,0.948978,0.948991
6,0.1419,0.182507,0.9516,0.95193,0.951825,0.951686
7,0.1381,0.183913,0.9493,0.950026,0.949551,0.949349
8,0.1353,0.176906,0.9541,0.954358,0.954311,0.954207
9,0.1336,0.177858,0.9545,0.956281,0.954574,0.95494
10,0.1324,0.178273,0.953,0.953356,0.953269,0.95298


[I 2025-04-03 23:13:12,699] Trial 102 finished with value: 0.9529799109748367 and parameters: {'learning_rate': 0.0003366204043784816, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 23 with value: 0.957889508474951.


Trial 103 with params: {'learning_rate': 0.00027009583847554473, 'weight_decay': 0.005, 'warmup_steps': 16, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.396,0.259485,0.9171,0.920823,0.91732,0.917371
2,0.2278,0.212247,0.9392,0.939841,0.93957,0.938957
3,0.1825,0.209414,0.9395,0.939607,0.939857,0.939313
4,0.1613,0.200884,0.944,0.944989,0.944099,0.943981
5,0.1484,0.189279,0.9493,0.949744,0.949361,0.949331
6,0.1416,0.187299,0.9482,0.948439,0.948492,0.948202
7,0.1382,0.1879,0.9459,0.946733,0.946215,0.945959
8,0.1354,0.178068,0.9521,0.952484,0.952261,0.952195


[I 2025-04-03 23:29:12,144] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0003343942218512808, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3827,0.263696,0.9138,0.917854,0.914076,0.914073
2,0.228,0.209978,0.9389,0.939711,0.939254,0.938615
3,0.1852,0.20474,0.9425,0.942529,0.942738,0.942333
4,0.163,0.205573,0.941,0.942146,0.941134,0.940986
5,0.1492,0.185904,0.9498,0.950391,0.949905,0.949834
6,0.1423,0.184176,0.9497,0.95019,0.949824,0.949782
7,0.138,0.18304,0.9508,0.951569,0.951062,0.950866
8,0.1354,0.174487,0.9539,0.95444,0.954008,0.954088
9,0.1338,0.178569,0.9529,0.955101,0.952917,0.953487
10,0.1324,0.177251,0.9533,0.953321,0.953561,0.953191


[I 2025-04-03 23:49:04,152] Trial 104 finished with value: 0.9531914500794357 and parameters: {'learning_rate': 0.0003343942218512808, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 3.5}. Best is trial 23 with value: 0.957889508474951.


Trial 105 with params: {'learning_rate': 0.00012335266074963035, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4451,0.265073,0.9162,0.919462,0.916394,0.916405
2,0.2436,0.214052,0.9375,0.938412,0.937933,0.937311
3,0.1927,0.210725,0.9397,0.939685,0.940038,0.939538
4,0.1675,0.209172,0.9424,0.943721,0.942521,0.942481


[I 2025-04-03 23:56:59,354] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0005424656137859003, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4008,0.278021,0.9055,0.910239,0.905634,0.905915
2,0.2441,0.235677,0.9222,0.923937,0.922539,0.921726
3,0.1988,0.223108,0.932,0.932124,0.932473,0.931572
4,0.1717,0.206058,0.9418,0.94253,0.941933,0.94169
5,0.155,0.193349,0.9472,0.947219,0.947452,0.947057
6,0.145,0.185842,0.9507,0.95119,0.950854,0.950807
7,0.1401,0.185319,0.9495,0.949836,0.949758,0.94952
8,0.1365,0.17561,0.9544,0.954935,0.95453,0.95455
9,0.1342,0.174703,0.953,0.954447,0.953081,0.953421
10,0.1327,0.176337,0.9537,0.95411,0.953952,0.953715


[I 2025-04-04 00:16:51,579] Trial 106 finished with value: 0.9537152987462051 and parameters: {'learning_rate': 0.0005424656137859003, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 23 with value: 0.957889508474951.


Trial 107 with params: {'learning_rate': 0.00025272221323181343, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3842,0.254926,0.9164,0.919843,0.916569,0.91654
2,0.2248,0.206897,0.9391,0.939963,0.939463,0.93889
3,0.1818,0.208247,0.9411,0.941367,0.941558,0.940841
4,0.161,0.200492,0.9443,0.945309,0.944418,0.944347
5,0.1475,0.187166,0.95,0.95057,0.950198,0.950053
6,0.1418,0.183118,0.9527,0.953169,0.952899,0.952858
7,0.1379,0.184483,0.9488,0.949447,0.949123,0.9488
8,0.1357,0.175806,0.955,0.955234,0.95516,0.95509
9,0.134,0.180723,0.9526,0.954516,0.95263,0.953113
10,0.1327,0.179606,0.9524,0.952771,0.952693,0.952398


[I 2025-04-04 00:36:45,331] Trial 107 finished with value: 0.9523977992474422 and parameters: {'learning_rate': 0.00025272221323181343, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 23 with value: 0.957889508474951.


Trial 108 with params: {'learning_rate': 0.0004684032801428743, 'weight_decay': 0.002, 'warmup_steps': 21, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.394,0.258135,0.9184,0.921116,0.918561,0.918553
2,0.2389,0.22276,0.9315,0.932866,0.931774,0.931351


[I 2025-04-04 00:40:44,894] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0003653498256090891, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3997,0.270258,0.9106,0.915262,0.910599,0.910923
2,0.2322,0.211501,0.9373,0.938117,0.937599,0.937016
3,0.1881,0.211737,0.9404,0.940547,0.940893,0.940015
4,0.1644,0.200883,0.9443,0.945189,0.944359,0.944316
5,0.1494,0.189106,0.9477,0.947949,0.947918,0.947722
6,0.1427,0.182479,0.9497,0.950313,0.9499,0.949964
7,0.1385,0.179242,0.9533,0.953845,0.953576,0.953365
8,0.1355,0.170796,0.9575,0.957876,0.957679,0.957669
9,0.1337,0.175392,0.9547,0.956388,0.954773,0.955162
10,0.1324,0.173873,0.9561,0.956348,0.956323,0.956095


[I 2025-04-04 01:00:36,851] Trial 109 finished with value: 0.9560953087964373 and parameters: {'learning_rate': 0.0003653498256090891, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 23 with value: 0.957889508474951.


Trial 110 with params: {'learning_rate': 0.0007972031495118968, 'weight_decay': 0.003, 'warmup_steps': 25, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4128,0.299217,0.8924,0.896066,0.892392,0.892313
2,0.2653,0.239724,0.9222,0.923937,0.922499,0.921705
3,0.217,0.225012,0.9345,0.934427,0.934793,0.934181
4,0.1831,0.214099,0.9377,0.93888,0.937714,0.937745


[I 2025-04-04 01:08:37,580] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0005484498450776523, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4019,0.292426,0.9004,0.906104,0.900621,0.900836
2,0.2441,0.231567,0.926,0.928129,0.92627,0.925529


[I 2025-04-04 01:12:36,624] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00038359853654439655, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4011,0.264956,0.9139,0.917996,0.913993,0.914254
2,0.232,0.214767,0.935,0.93567,0.935399,0.934613
3,0.1887,0.216102,0.9359,0.937212,0.936341,0.93572
4,0.1651,0.201415,0.9449,0.946255,0.944966,0.944911
5,0.1497,0.187949,0.9507,0.951398,0.950897,0.950834
6,0.1428,0.179642,0.954,0.954065,0.954226,0.954028
7,0.1386,0.179468,0.9535,0.954255,0.953795,0.953576
8,0.1354,0.17439,0.9554,0.955628,0.955602,0.955476
9,0.1338,0.176267,0.9559,0.957428,0.955941,0.956294
10,0.1324,0.176478,0.955,0.955084,0.955299,0.954946


[I 2025-04-04 01:32:29,518] Trial 112 finished with value: 0.9549460729483927 and parameters: {'learning_rate': 0.00038359853654439655, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 23 with value: 0.957889508474951.


Trial 113 with params: {'learning_rate': 0.00014656299412440152, 'weight_decay': 0.004, 'warmup_steps': 32, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4483,0.262691,0.9137,0.917333,0.913827,0.913994
2,0.2371,0.209657,0.9395,0.940379,0.939883,0.939315
3,0.1876,0.210178,0.9405,0.940535,0.94091,0.940225
4,0.1639,0.205204,0.9426,0.943897,0.94262,0.942657


[I 2025-04-04 01:40:24,692] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.00036568776222109777, 'weight_decay': 0.002, 'warmup_steps': 24, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3955,0.273034,0.9069,0.912836,0.906974,0.90725
2,0.2317,0.2129,0.9384,0.938915,0.938811,0.938153
3,0.1874,0.2105,0.9389,0.939425,0.939235,0.938667
4,0.1641,0.203596,0.9426,0.943621,0.942693,0.942598
5,0.1498,0.185471,0.9516,0.952064,0.951736,0.951707
6,0.1428,0.184944,0.9496,0.949792,0.949839,0.949682
7,0.1384,0.179793,0.9527,0.952999,0.952947,0.952719
8,0.1355,0.176132,0.9549,0.955513,0.955096,0.955086
9,0.1337,0.176223,0.9549,0.956899,0.954953,0.955412
10,0.1324,0.176419,0.9547,0.954974,0.954927,0.954696


[I 2025-04-04 02:00:22,681] Trial 114 finished with value: 0.9546958314142076 and parameters: {'learning_rate': 0.00036568776222109777, 'weight_decay': 0.002, 'warmup_steps': 24, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 23 with value: 0.957889508474951.


Trial 115 with params: {'learning_rate': 0.0001606635499590532, 'weight_decay': 0.001, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4375,0.259553,0.9168,0.92012,0.916957,0.917048
2,0.234,0.205884,0.9421,0.942807,0.94249,0.942016
3,0.1855,0.209673,0.9392,0.939429,0.939633,0.93902
4,0.1621,0.203771,0.9446,0.945988,0.944647,0.944609
5,0.1496,0.187517,0.95,0.950364,0.950157,0.950053
6,0.1433,0.189275,0.9475,0.947981,0.947722,0.94771
7,0.1397,0.194966,0.9432,0.944381,0.943458,0.943248
8,0.1372,0.18395,0.9493,0.950061,0.949495,0.949547


[I 2025-04-04 02:16:21,668] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.0007224051815629656, 'weight_decay': 0.001, 'warmup_steps': 28, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.408,0.293616,0.8955,0.901673,0.895504,0.895755
2,0.2609,0.232872,0.9269,0.928099,0.927212,0.926698
3,0.2105,0.226251,0.9309,0.930764,0.931192,0.93054
4,0.1803,0.217283,0.9364,0.938087,0.936504,0.936352


[I 2025-04-04 02:24:19,218] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00018843833782959934, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4201,0.255792,0.9197,0.923314,0.91983,0.919915
2,0.2293,0.208556,0.9394,0.940281,0.939812,0.939149
3,0.1824,0.202964,0.9445,0.944471,0.944859,0.94431
4,0.1601,0.196884,0.9471,0.948234,0.947148,0.947174
5,0.1479,0.185542,0.9489,0.949406,0.949157,0.949035
6,0.1423,0.18671,0.9484,0.948954,0.948633,0.948615
7,0.1387,0.189572,0.9468,0.947591,0.947081,0.94686
8,0.1364,0.181807,0.9508,0.951609,0.951011,0.951044


[I 2025-04-04 02:40:16,809] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00020904068575329813, 'weight_decay': 0.002, 'warmup_steps': 20, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4114,0.256667,0.9205,0.923733,0.920504,0.920704
2,0.2273,0.203497,0.9419,0.942633,0.942249,0.941844
3,0.182,0.207534,0.9403,0.940433,0.940599,0.94016
4,0.1605,0.197061,0.9466,0.947851,0.94671,0.946731
5,0.1477,0.189386,0.9467,0.947084,0.946956,0.946728
6,0.1417,0.186013,0.9484,0.949038,0.948605,0.948672
7,0.1383,0.188522,0.9455,0.946632,0.94575,0.945716
8,0.1361,0.181504,0.9497,0.950318,0.949868,0.949884


[I 2025-04-04 02:56:15,036] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0004494713044691342, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3993,0.268247,0.9135,0.91697,0.91365,0.913678
2,0.2362,0.211342,0.9382,0.938973,0.938465,0.937864
3,0.1924,0.213712,0.9406,0.940607,0.940904,0.940364
4,0.1681,0.211148,0.939,0.940088,0.938941,0.939049


[I 2025-04-04 03:04:13,844] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0001389613531617579, 'weight_decay': 0.0, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4296,0.260925,0.9172,0.920389,0.917387,0.917503
2,0.2379,0.2105,0.9388,0.939951,0.939254,0.938604
3,0.1887,0.208813,0.9409,0.940789,0.941192,0.940723
4,0.1647,0.209405,0.942,0.943569,0.942089,0.94204


[I 2025-04-04 03:12:14,293] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 5.962261310139537e-05, 'weight_decay': 0.008, 'warmup_steps': 9, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5299,0.294794,0.9039,0.906386,0.904109,0.903807
2,0.2921,0.238281,0.9269,0.928156,0.927157,0.92692
3,0.2356,0.228301,0.9323,0.932618,0.932714,0.931945
4,0.2035,0.226045,0.9324,0.934325,0.932503,0.932524


[I 2025-04-04 03:20:12,218] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00040617839537002485, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3963,0.274554,0.9084,0.913363,0.908563,0.908661
2,0.234,0.218153,0.934,0.935093,0.93427,0.933643


[I 2025-04-04 03:24:12,906] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0005295006960127749, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4017,0.260614,0.9137,0.916699,0.913659,0.913789
2,0.2435,0.229528,0.9281,0.92994,0.928387,0.927865
3,0.1999,0.217064,0.9367,0.936894,0.937213,0.936409
4,0.1726,0.207335,0.9394,0.940874,0.939438,0.939411


[I 2025-04-04 03:32:09,505] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0003941052352418532, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4,0.26968,0.9096,0.914713,0.909601,0.909896
2,0.2336,0.211436,0.9383,0.939781,0.938482,0.938137
3,0.1881,0.206301,0.9417,0.941782,0.942124,0.941444
4,0.1649,0.202493,0.9434,0.944457,0.943512,0.94334


[I 2025-04-04 03:40:06,275] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0001853339778656773, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3961,0.258779,0.9169,0.920775,0.917088,0.917194
2,0.2288,0.206694,0.9402,0.941457,0.940588,0.939962
3,0.183,0.205432,0.9414,0.941403,0.941745,0.941188
4,0.1608,0.204863,0.9415,0.942894,0.941661,0.941487
5,0.1481,0.185257,0.9493,0.950041,0.949517,0.949439
6,0.1424,0.186447,0.9496,0.95007,0.949819,0.94981
7,0.139,0.18554,0.9493,0.949989,0.949575,0.949415
8,0.1364,0.180512,0.9516,0.952018,0.951839,0.95173


[I 2025-04-04 03:55:58,975] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0009049791490282845, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4202,0.318357,0.883,0.887945,0.882824,0.882948
2,0.2742,0.247952,0.9177,0.921025,0.918016,0.917407


[I 2025-04-04 03:59:58,404] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.000804493332968866, 'weight_decay': 0.006, 'warmup_steps': 27, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4133,0.296535,0.8972,0.902918,0.897235,0.897636
2,0.2654,0.252096,0.9154,0.918571,0.915455,0.914703
3,0.2158,0.232258,0.9275,0.927738,0.927798,0.92716
4,0.1849,0.210437,0.941,0.941438,0.941113,0.940813
5,0.1622,0.196827,0.9458,0.945973,0.945896,0.945754
6,0.1513,0.197145,0.9453,0.945824,0.945507,0.945344
7,0.1432,0.187695,0.9481,0.948451,0.948416,0.948045
8,0.1383,0.176238,0.9551,0.955468,0.955238,0.955244
9,0.1354,0.179331,0.9535,0.95519,0.95356,0.953982
10,0.1337,0.175635,0.9547,0.954839,0.955022,0.954671


[I 2025-04-04 04:19:53,834] Trial 127 finished with value: 0.954671084153294 and parameters: {'learning_rate': 0.000804493332968866, 'weight_decay': 0.006, 'warmup_steps': 27, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}. Best is trial 23 with value: 0.957889508474951.


Trial 128 with params: {'learning_rate': 0.0011966406426631006, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4454,0.350857,0.8669,0.879267,0.866583,0.867371
2,0.294,0.257851,0.9144,0.915821,0.914504,0.913976
3,0.2364,0.241391,0.9216,0.921709,0.921984,0.921007
4,0.2016,0.226173,0.9318,0.932913,0.93186,0.931778


[I 2025-04-04 04:27:53,078] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0003177769971458836, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3945,0.26331,0.9132,0.917008,0.913218,0.913401
2,0.2287,0.21119,0.9382,0.939246,0.938516,0.937981
3,0.1837,0.210175,0.9423,0.942476,0.942659,0.94202
4,0.1622,0.200359,0.9465,0.947411,0.946551,0.946522
5,0.1485,0.182961,0.9533,0.953562,0.953451,0.953346
6,0.1421,0.180076,0.9525,0.953,0.952679,0.952737
7,0.1383,0.1785,0.954,0.95453,0.954194,0.954167
8,0.1355,0.173085,0.9554,0.955814,0.955562,0.955559
9,0.1337,0.178008,0.9545,0.956651,0.954531,0.955036
10,0.1325,0.175706,0.9543,0.954654,0.954585,0.95431


[I 2025-04-04 04:47:59,652] Trial 129 finished with value: 0.95431026289762 and parameters: {'learning_rate': 0.0003177769971458836, 'weight_decay': 0.001, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 23 with value: 0.957889508474951.


Trial 130 with params: {'learning_rate': 0.0003839573738753142, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3981,0.262505,0.9138,0.9175,0.913818,0.914022
2,0.2323,0.205863,0.9412,0.941591,0.941536,0.94117
3,0.1887,0.210559,0.9402,0.940147,0.940575,0.939878
4,0.1641,0.20058,0.9459,0.947101,0.945972,0.945913
5,0.1501,0.180433,0.9554,0.955718,0.95557,0.955449
6,0.1428,0.179149,0.9534,0.953936,0.953609,0.953639
7,0.1385,0.180383,0.9532,0.953569,0.953504,0.953217
8,0.1355,0.173166,0.9562,0.956622,0.956425,0.956324
9,0.1338,0.174292,0.9562,0.957916,0.956301,0.956652
10,0.1325,0.17308,0.9571,0.957178,0.957379,0.957051


[I 2025-04-04 05:07:54,011] Trial 130 finished with value: 0.9570510007552555 and parameters: {'learning_rate': 0.0003839573738753142, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.5}. Best is trial 23 with value: 0.957889508474951.


Trial 131 with params: {'learning_rate': 0.0005612567161548509, 'weight_decay': 0.01, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4008,0.2691,0.9109,0.914296,0.910927,0.911094
2,0.2471,0.230381,0.9267,0.92856,0.927086,0.926184
3,0.1994,0.221788,0.9321,0.93212,0.93246,0.931855
4,0.1736,0.209359,0.9394,0.941334,0.939432,0.939549


[I 2025-04-04 05:15:53,668] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.001193850938882126, 'weight_decay': 0.006, 'warmup_steps': 28, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4445,0.334135,0.878,0.884548,0.878103,0.878098
2,0.295,0.255663,0.9132,0.915468,0.913379,0.912986


[I 2025-04-04 05:19:53,419] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.00020811096799338262, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4154,0.258876,0.9163,0.920071,0.91648,0.916523
2,0.2287,0.206137,0.9421,0.942651,0.942511,0.941915
3,0.1821,0.2117,0.9397,0.940112,0.94009,0.939526
4,0.1605,0.20244,0.9433,0.944376,0.943356,0.943324
5,0.148,0.190109,0.9458,0.946627,0.946065,0.94586
6,0.142,0.18781,0.9479,0.948544,0.948083,0.948099
7,0.1385,0.188273,0.9484,0.949129,0.94871,0.948461
8,0.1362,0.177585,0.9543,0.954726,0.954513,0.954459
9,0.1344,0.181665,0.9507,0.952798,0.950764,0.951276
10,0.1332,0.18116,0.9526,0.952853,0.952841,0.952567


[I 2025-04-04 05:39:49,934] Trial 133 finished with value: 0.9525671980625605 and parameters: {'learning_rate': 0.00020811096799338262, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.0, 'temperature': 4.5}. Best is trial 23 with value: 0.957889508474951.


Trial 134 with params: {'learning_rate': 0.002145873346098373, 'weight_decay': 0.006, 'warmup_steps': 21, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5032,0.361583,0.8588,0.865075,0.858866,0.858522
2,0.3339,0.282946,0.899,0.899964,0.89915,0.898688


[I 2025-04-04 05:43:49,339] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0004198055571070975, 'weight_decay': 0.005, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3945,0.26537,0.9137,0.916884,0.913838,0.913846
2,0.2346,0.213871,0.9376,0.938469,0.937931,0.937317
3,0.1904,0.211999,0.9385,0.938782,0.938878,0.938359
4,0.1664,0.196841,0.9459,0.946563,0.945907,0.945865
5,0.1513,0.193431,0.9465,0.947305,0.946662,0.94655
6,0.1428,0.180115,0.9524,0.952778,0.952571,0.952504
7,0.1389,0.180576,0.9523,0.952744,0.952512,0.952395
8,0.1356,0.175195,0.9542,0.954632,0.954363,0.954345
9,0.1338,0.173801,0.9552,0.956506,0.955316,0.955569
10,0.1324,0.176188,0.9531,0.953278,0.953381,0.953035


[I 2025-04-04 06:03:43,805] Trial 135 finished with value: 0.9530345892870917 and parameters: {'learning_rate': 0.0004198055571070975, 'weight_decay': 0.005, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 6.5}. Best is trial 23 with value: 0.957889508474951.


Trial 136 with params: {'learning_rate': 0.0008245628898941667, 'weight_decay': 0.006, 'warmup_steps': 20, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4134,0.285975,0.9019,0.905627,0.901826,0.902002
2,0.2667,0.253224,0.9148,0.918763,0.915038,0.914611


[I 2025-04-04 06:07:43,903] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0003141350861291424, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4009,0.264147,0.9144,0.918079,0.914585,0.914515
2,0.2281,0.209531,0.937,0.937664,0.937382,0.936767
3,0.184,0.211221,0.9388,0.938759,0.939109,0.938441
4,0.1618,0.201386,0.9436,0.944708,0.94373,0.943606
5,0.1495,0.189729,0.9455,0.946047,0.945743,0.945607
6,0.1422,0.185831,0.9489,0.949107,0.949151,0.948992
7,0.1385,0.18296,0.9494,0.950067,0.949644,0.949522
8,0.1356,0.178604,0.9503,0.950533,0.950543,0.950368


[I 2025-04-04 06:23:38,502] Trial 137 pruned. 


Trial 138 with params: {'learning_rate': 0.0005894019596973103, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4041,0.288327,0.9004,0.905696,0.900076,0.900486
2,0.2498,0.228782,0.9293,0.930598,0.929576,0.928873
3,0.2027,0.222219,0.9348,0.934727,0.935256,0.934321
4,0.1742,0.21777,0.9339,0.934272,0.934115,0.933623


[I 2025-04-04 06:31:34,497] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0002653064801933545, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4077,0.257198,0.9175,0.920845,0.917707,0.917747
2,0.2279,0.202713,0.9428,0.943082,0.943084,0.942728
3,0.1827,0.213979,0.9386,0.939318,0.939064,0.938366
4,0.161,0.202716,0.944,0.944935,0.944108,0.94408
5,0.1477,0.185402,0.9501,0.951055,0.950273,0.950333
6,0.1419,0.18135,0.9515,0.952095,0.951629,0.951736
7,0.1382,0.183269,0.9509,0.951351,0.951133,0.950989
8,0.1356,0.174929,0.9543,0.954431,0.954533,0.954394
9,0.1339,0.178631,0.955,0.956702,0.9551,0.95547
10,0.1327,0.179794,0.9512,0.951504,0.951497,0.951178


[I 2025-04-04 06:51:23,713] Trial 139 finished with value: 0.9511777883729369 and parameters: {'learning_rate': 0.0002653064801933545, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 23 with value: 0.957889508474951.


Trial 140 with params: {'learning_rate': 0.0002120252908020281, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4182,0.259079,0.9182,0.921909,0.918295,0.918436
2,0.229,0.204097,0.9413,0.942009,0.941593,0.941115
3,0.1825,0.206377,0.9421,0.942134,0.942508,0.941833
4,0.16,0.204387,0.9431,0.944424,0.943185,0.943125
5,0.1476,0.184287,0.9514,0.952175,0.951487,0.9516
6,0.1422,0.185966,0.9513,0.951583,0.951503,0.951386
7,0.1384,0.185618,0.9499,0.950498,0.9502,0.949966
8,0.1359,0.179123,0.9529,0.953286,0.953121,0.95303


[I 2025-04-04 07:07:13,865] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0003184394005741729, 'weight_decay': 0.002, 'warmup_steps': 24, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3983,0.26287,0.9149,0.919048,0.915064,0.915243
2,0.2297,0.209409,0.9405,0.941398,0.940798,0.940351
3,0.185,0.210834,0.9402,0.940265,0.940555,0.939915
4,0.1622,0.202753,0.9444,0.945289,0.944484,0.944342
5,0.1481,0.186739,0.9489,0.949931,0.949029,0.949083
6,0.1419,0.186598,0.9475,0.94834,0.947721,0.947737
7,0.138,0.183662,0.9507,0.951044,0.95098,0.950659
8,0.1355,0.178212,0.9531,0.953342,0.95332,0.953149


[I 2025-04-04 07:23:03,910] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0004529782258355152, 'weight_decay': 0.002, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3882,0.263746,0.9148,0.918546,0.914992,0.915086
2,0.2377,0.219838,0.9333,0.934394,0.933568,0.932981
3,0.193,0.213118,0.9373,0.93741,0.937736,0.936987
4,0.1679,0.207754,0.9415,0.94264,0.941502,0.941559
5,0.1524,0.191635,0.9492,0.949493,0.949424,0.949213
6,0.1436,0.18396,0.9518,0.952406,0.952056,0.95197
7,0.1393,0.181375,0.9529,0.953232,0.953147,0.952957
8,0.136,0.176079,0.9516,0.952312,0.951855,0.951779


[I 2025-04-04 07:38:53,669] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.001027911901778042, 'weight_decay': 0.007, 'warmup_steps': 25, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4301,0.301105,0.8944,0.900265,0.894453,0.894535
2,0.2827,0.266356,0.9098,0.915062,0.909988,0.909584


[I 2025-04-04 07:42:50,872] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 5.8193477735771966e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 11, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5359,0.293883,0.9016,0.903812,0.901806,0.901455
2,0.2941,0.238891,0.9264,0.927568,0.926603,0.926425
3,0.2373,0.228785,0.9325,0.932927,0.932905,0.932146
4,0.2052,0.225406,0.9312,0.933204,0.931312,0.9313


[I 2025-04-04 07:50:46,247] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0013285417618381474, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4529,0.332847,0.8767,0.883079,0.876841,0.876974
2,0.3022,0.248801,0.9149,0.916385,0.915003,0.914376


[I 2025-04-04 07:54:45,550] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.0011607614784531854, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4283,0.30892,0.8913,0.89497,0.891307,0.891273
2,0.2911,0.253821,0.914,0.916143,0.91418,0.913319
3,0.2343,0.242011,0.9244,0.925266,0.924904,0.924113
4,0.1979,0.24033,0.9227,0.924443,0.922806,0.922631


[I 2025-04-04 08:02:44,030] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.0025651152134400176, 'weight_decay': 0.007, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5257,0.362495,0.8571,0.863948,0.857404,0.85707
2,0.3535,0.303458,0.8863,0.891198,0.886696,0.885017


[I 2025-04-04 08:06:46,747] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.003199645143713299, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.571,0.401291,0.8369,0.844332,0.836938,0.836957
2,0.3862,0.340433,0.8643,0.870963,0.864551,0.863232


[I 2025-04-04 08:10:46,902] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0005596762209877068, 'weight_decay': 0.0, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.389,0.283152,0.9042,0.909652,0.904084,0.904652
2,0.2463,0.228468,0.9292,0.930023,0.929583,0.928733
3,0.2004,0.225721,0.9317,0.931462,0.932165,0.931057
4,0.1719,0.207539,0.9425,0.943298,0.942479,0.942537
5,0.155,0.191063,0.9478,0.948282,0.948104,0.947851
6,0.1451,0.189687,0.9475,0.948251,0.947817,0.947658
7,0.1406,0.184649,0.9502,0.950633,0.950586,0.950218
8,0.1365,0.178124,0.9541,0.954708,0.954343,0.954301
9,0.1342,0.17765,0.9524,0.953967,0.952533,0.95284
10,0.1327,0.176178,0.9534,0.953563,0.953736,0.953345


[I 2025-04-04 08:30:46,772] Trial 149 finished with value: 0.9533447571219416 and parameters: {'learning_rate': 0.0005596762209877068, 'weight_decay': 0.0, 'warmup_steps': 11, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 23 with value: 0.957889508474951.


In [None]:
print(best_distil_pretrained)

BestRun(run_id='23', objective=0.957889508474951, hyperparameters={'learning_rate': 0.0005261925323189436, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}, run_summary=None)


In [None]:
print("Best random init training score: ", best_base_random)
print("Best random init distilation trianing score: ", best_distill_random)
print("Best pretrained (head only) training score: ", best_base_head)
print("Best pretrained distilation (head only) training score: ",best_distill_head)
print("Best pretrained training score: ", best_base_pretrained)
print("Best pretrained distilation training score: ", best_distil_pretrained)

NameError: name 'best_base_random' is not defined