# Notebook pro trénink s destilací nad datasetem CIFAR10
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR10, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

In [1]:
%pip install transformers[torch] huggingface_hub datasets evaluate torchvision optuna

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting huggingface_hub
  Downloading huggingface_hub-0.27.1-py3-none-any.whl.metadata (13 kB)
Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting optuna
  Downloading optuna-4.1.0-py3-none-any.whl.metadata (16 kB)
Collecting transformers[torch]
  Downloading transformers-4.48.0-py3-none-any.whl.metadata (44 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers[torch])
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting accelerate>=0.26.0 (from transformers[torch])
  Downloading accelerate-1.2.1-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x

## Import knihoven a definice metod

In [1]:
from transformers import Trainer, EarlyStoppingCallback
import optuna
import torch
import math

import base

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [2]:
dataset_part = base.get_dataset_part()

In [3]:
base.reset_seed()

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Provedení transformací nad datasetem.

In [6]:
x = base.get_dataset_part()

In [8]:
transform = base.base_transforms()

test = base.CustomCIFAR10L(root='./data/10-logits', dataset_part=x.TRAIN, transform=transform)
train = base.CustomCIFAR10L(root='./data/10-logits', dataset_part=x.TRAIN, transform=transform)

### Standardní trénink náhodně inicializovaného modelu. 

In [9]:
training_args = base.get_training_args(output_dir="./results/cifar10-random", logging_dir='./logs/cifar10-random', remove_unused_columns=True, epochs=5)

In [10]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "adam_beta1" : trial.suggest_float("adam_beta1", 0.9, 0.99, step=0.01)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

In [11]:
#Nápočet epoch na steps
min_r = math.ceil(50000/128)*1
max_r = math.ceil(50000/128)*1

In [12]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [15]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=test,
    compute_metrics = base.compute_metrics,
    model_init = lambda: base.get_random_init_mobilenet(10),
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 4)]
  )
  

In [17]:
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Tessst",
    n_trials=150
)

[I 2025-03-08 00:58:55,115] A new study created in memory with name: Tessst


Trial 0 with params: {'learning_rate': 4.128205343826226e-05, 'weight_decay': 0.001, 'adam_beta1': 0.91}


[W 2025-03-08 00:58:55,686] Trial 0 failed with parameters: {'learning_rate': 4.128205343826226e-05, 'weight_decay': 0.001, 'adam_beta1': 0.91} because of the following error: RuntimeError('Caught RuntimeError in DataLoader worker process 0.\nOriginal Traceback (most recent call last):\n  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop\n    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]\n  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch\n    data = [self.dataset[idx] for idx in possibly_batched_index]\n  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>\n    data = [self.dataset[idx] for idx in possibly_batched_index]\n  File "/home/jovyan/base.py", line 100, in __getitem__\n    \'pixel_values\': image.to(self.device),\n  File "/usr/local/lib/python3.10/dist-packages/torchvision/tv_tensors/_tv_tensor.py", l

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jovyan/base.py", line 100, in __getitem__
    'pixel_values': image.to(self.device),
  File "/usr/local/lib/python3.10/dist-packages/torchvision/tv_tensors/_tv_tensor.py", line 77, in __torch_function__
    output = func(*args, **kwargs or dict())
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 305, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method


In [None]:
print(best_trial.hyperparameters)

## Definice destilačního tréninku

Třída, která upravuje hugging face trenéra pro destilaci znalostí. Nově pracuje s logity uloženými v datasetu.

### Trénink náhodně inicializovaného modelu s pomocí destilace znalostí

In [24]:
base.reset_seed()

In [25]:
training_args = base.get_training_args(output_dir="./results/cifar10-random-KD", logging_dir='./logs/cifar10-random-KD', remove_unused_columns=False, epochs=15)

In [26]:
def hp_space(trial):
    params = {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "adam_beta1" : trial.suggest_float("adam_beta1", 0.9, 0.99, step=0.01),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params




In [27]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=test,
    compute_metrics=base.compute_metrics,
    model_init=lambda: base.get_random_init_mobilenet(10),
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 4)]
)

In [28]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [29]:
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    n_trials=100,
    study_name = "Distilation hp search"
)

[I 2025-01-10 23:15:42,747] A new study created in memory with name: Distilation hp search


Trial 0 with params: {'learning_rate': 1.0253509690168497e-05, 'weight_decay': 0.01, 'adam_beta1': 0.97, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7886,1.702865,0.2751,0.272882,0.2751,0.252327
2,1.564,1.732621,0.3477,0.348024,0.3477,0.335026
3,1.4649,1.731783,0.3853,0.386552,0.3853,0.376806
4,1.3975,1.757175,0.42,0.41732,0.42,0.410095
5,1.3473,1.746583,0.4365,0.436226,0.4365,0.428679
6,1.2997,1.751526,0.4599,0.454141,0.4599,0.448627
7,1.2635,1.736288,0.4755,0.473125,0.4755,0.468188
8,1.2326,1.775258,0.4694,0.462602,0.4694,0.455094


[I 2025-01-10 23:42:54,473] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 2.6364803038431666e-06, 'weight_decay': 0.0, 'adam_beta1': 0.98, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6176,1.461965,0.1547,0.140296,0.1547,0.105169
2,1.5752,1.444535,0.202,0.195464,0.202,0.161025


[I 2025-01-10 23:49:50,228] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.1364672700011182e-06, 'weight_decay': 0.01, 'adam_beta1': 0.98, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2282,2.060961,0.1057,0.082175,0.1057,0.031778
2,2.2009,2.040057,0.1376,0.107847,0.1376,0.086882
3,2.1717,2.027902,0.1638,0.128634,0.1638,0.115614
4,2.1477,1.997504,0.1874,0.167167,0.1874,0.132427
5,2.1303,1.983586,0.1936,0.17869,0.1936,0.145234
6,2.1016,1.965677,0.214,0.204685,0.214,0.168797
7,2.0732,1.936366,0.234,0.222323,0.234,0.194816
8,2.0465,1.923746,0.2391,0.223052,0.2391,0.204124


[I 2025-01-11 00:17:06,678] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 3.1261029103110603e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9669,1.803293,0.1641,0.18426,0.1641,0.11652
2,1.9,1.759229,0.2229,0.202938,0.2229,0.183592
3,1.7929,1.710603,0.2693,0.266797,0.2693,0.236143
4,1.7004,1.707074,0.2912,0.281414,0.2912,0.267588


[I 2025-01-11 00:30:41,956] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 4.480975918214949e-05, 'weight_decay': 0.001, 'adam_beta1': 0.92, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6197,1.558647,0.4243,0.420783,0.4243,0.414943
2,1.3161,1.476809,0.5173,0.514492,0.5173,0.512235


[I 2025-01-11 00:37:29,040] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00013157287601765647, 'weight_decay': 0.002, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3899,1.718354,0.5441,0.5437,0.5441,0.536819
2,0.9334,1.701452,0.6719,0.673959,0.6719,0.668729
3,0.7102,1.678886,0.7391,0.742763,0.7391,0.737488
4,0.571,1.730081,0.7715,0.770222,0.7715,0.76853


[I 2025-01-11 00:51:07,503] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 4.3625993625605605e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8421,0.988685,0.4108,0.415742,0.4108,0.399539
2,0.6334,1.066968,0.5072,0.508127,0.5072,0.499223
3,0.5156,1.124608,0.5765,0.57863,0.5765,0.568148
4,0.4448,1.203409,0.6245,0.61852,0.6245,0.616502
5,0.3944,1.190698,0.648,0.6583,0.648,0.647538
6,0.3545,1.262013,0.6657,0.675996,0.6657,0.661127
7,0.3226,1.261955,0.6881,0.699198,0.6881,0.68678
8,0.2954,1.276771,0.6907,0.697822,0.6907,0.686017
9,0.2723,1.358591,0.7066,0.719824,0.7066,0.707956
10,0.2508,1.3013,0.7179,0.726295,0.7179,0.718496


[I 2025-01-11 01:42:37,301] Trial 6 finished with value: 0.716753954361114 and parameters: {'learning_rate': 4.3625993625605605e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 6 with value: 0.716753954361114.


Trial 7 with params: {'learning_rate': 0.00015199881220083957, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0829,1.422456,0.5784,0.578594,0.5784,0.568207
2,0.6813,1.362785,0.6995,0.709137,0.6995,0.699944
3,0.5165,1.406807,0.759,0.769244,0.759,0.759764
4,0.4152,1.443983,0.7991,0.799926,0.7991,0.79757
5,0.3441,1.409517,0.802,0.816431,0.802,0.804768
6,0.2847,1.491206,0.8055,0.822915,0.8055,0.805115
7,0.2374,1.455112,0.8266,0.829447,0.8266,0.826007
8,0.2002,1.429668,0.8155,0.818673,0.8155,0.812998
9,0.1699,1.470028,0.828,0.832663,0.828,0.828802
10,0.1496,1.457331,0.8292,0.837223,0.8292,0.829864


[I 2025-01-11 02:33:55,765] Trial 7 finished with value: 0.8373784588057148 and parameters: {'learning_rate': 0.00015199881220083957, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 7 with value: 0.8373784588057148.


Trial 8 with params: {'learning_rate': 2.1348999901951977e-06, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2698,1.12955,0.1329,0.121772,0.1329,0.082015
2,1.235,1.132201,0.194,0.179378,0.194,0.149812
3,1.1976,1.155825,0.2257,0.20829,0.2257,0.185728
4,1.1548,1.206052,0.2538,0.240586,0.2538,0.213346


[I 2025-01-11 02:47:33,225] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 6.139426050898147e-05, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4256,1.559006,0.4493,0.446032,0.4493,0.443278
2,1.1253,1.533063,0.5383,0.544468,0.5383,0.532989
3,0.9181,1.493714,0.6188,0.620172,0.6188,0.615088
4,0.7768,1.489991,0.6716,0.668658,0.6716,0.666244
5,0.6795,1.473513,0.6874,0.698045,0.6874,0.688347
6,0.5954,1.484064,0.7155,0.724356,0.7155,0.713397
7,0.5284,1.512276,0.7231,0.733713,0.7231,0.721414
8,0.4721,1.568946,0.72,0.729592,0.72,0.714572


[I 2025-01-11 03:14:53,276] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0003145780170753732, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1237,1.311288,0.6503,0.653711,0.6503,0.649085
2,0.6868,1.221903,0.7617,0.764621,0.7617,0.760362
3,0.5275,1.252133,0.7907,0.797297,0.7907,0.789595
4,0.4416,1.161419,0.8131,0.818369,0.8131,0.811953
5,0.3754,1.185659,0.8235,0.835293,0.8235,0.825517
6,0.3115,1.184404,0.8269,0.843685,0.8269,0.82814
7,0.2628,1.189402,0.8373,0.843887,0.8373,0.837167
8,0.2221,1.195347,0.8432,0.845885,0.8432,0.842378


[I 2025-01-11 03:42:18,186] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 5.068963931664152e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8243,0.990263,0.4161,0.420404,0.4161,0.404493
2,0.6103,1.097165,0.5163,0.523295,0.5163,0.506319
3,0.4973,1.132581,0.5914,0.600944,0.5914,0.583704
4,0.4236,1.236222,0.6431,0.64404,0.6431,0.63824
5,0.3693,1.208286,0.6637,0.679152,0.6637,0.662686
6,0.3273,1.243094,0.6892,0.696202,0.6892,0.684186
7,0.2927,1.282907,0.7101,0.720073,0.7101,0.709275
8,0.2646,1.301071,0.7045,0.711767,0.7045,0.699778
9,0.2388,1.36939,0.7284,0.737829,0.7284,0.729318
10,0.2166,1.316465,0.7348,0.741131,0.7348,0.734529


[I 2025-01-11 04:33:09,224] Trial 11 finished with value: 0.7301273590912186 and parameters: {'learning_rate': 5.068963931664152e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 7 with value: 0.8373784588057148.


Trial 12 with params: {'learning_rate': 0.0003184805948017843, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1269,1.782518,0.6456,0.644614,0.6456,0.639541
2,0.6784,1.765137,0.7581,0.758229,0.7581,0.757148
3,0.5049,1.83463,0.7887,0.795524,0.7887,0.787921
4,0.4093,1.776625,0.8119,0.817004,0.8119,0.810212


[I 2025-01-11 04:46:43,908] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 8.787488401186728e-05, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.989,1.356385,0.4835,0.487195,0.4835,0.471346
2,0.6727,1.391599,0.628,0.62665,0.628,0.624568
3,0.5195,1.432552,0.6898,0.690251,0.6898,0.686758
4,0.4263,1.53613,0.7333,0.738533,0.7333,0.731936


[I 2025-01-11 05:00:21,374] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 1.0415089321753806e-05, 'weight_decay': 0.006, 'adam_beta1': 0.93, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2385,1.181042,0.285,0.287795,0.285,0.266046
2,1.1163,1.205641,0.3441,0.351066,0.3441,0.325646


[I 2025-01-11 05:07:06,808] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0002852725666675145, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0413,1.28395,0.6317,0.630216,0.6317,0.626811
2,0.6409,1.233595,0.7487,0.754463,0.7487,0.74984
3,0.4905,1.267297,0.7829,0.790874,0.7829,0.781609
4,0.4016,1.20492,0.818,0.820461,0.818,0.81683
5,0.3403,1.214085,0.8238,0.832285,0.8238,0.825097
6,0.2873,1.245446,0.8217,0.839079,0.8217,0.822717
7,0.2413,1.251313,0.8364,0.840486,0.8364,0.834936
8,0.203,1.23888,0.8377,0.841076,0.8377,0.835963


[I 2025-01-11 05:34:20,953] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0002187116642503092, 'weight_decay': 0.01, 'adam_beta1': 0.92, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2971,1.335056,0.6248,0.625254,0.6248,0.622233
2,0.8165,1.219661,0.723,0.728015,0.723,0.722543
3,0.6266,1.188132,0.7708,0.781029,0.7708,0.770099
4,0.5126,1.151904,0.8056,0.806604,0.8056,0.804274


[I 2025-01-11 05:47:46,286] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 6.214481745658709e-06, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3119,1.191361,0.2294,0.214232,0.2294,0.200955
2,1.2033,1.234918,0.2881,0.293524,0.2881,0.260904
3,1.145,1.243268,0.3329,0.334158,0.3329,0.317676
4,1.1106,1.258648,0.3486,0.352409,0.3486,0.333975
5,1.0868,1.268524,0.3655,0.366542,0.3655,0.350567
6,1.0614,1.270624,0.3884,0.381847,0.3884,0.374138
7,1.0413,1.271834,0.4017,0.397293,0.4017,0.389834
8,1.0227,1.294009,0.4003,0.398667,0.4003,0.380884


[I 2025-01-11 06:15:17,959] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0003091050760493089, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3821,1.455485,0.6448,0.645052,0.6448,0.64096
2,0.8551,1.331617,0.7637,0.763144,0.7637,0.761985
3,0.6524,1.312134,0.7941,0.799682,0.7941,0.793799
4,0.5303,1.237548,0.8124,0.820008,0.8124,0.811073
5,0.4474,1.251306,0.8237,0.83791,0.8237,0.826107
6,0.3693,1.269194,0.83,0.850854,0.83,0.832114
7,0.3092,1.220075,0.8431,0.846144,0.8431,0.842157
8,0.2549,1.23347,0.84,0.843833,0.84,0.837915


[I 2025-01-11 06:42:24,613] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0002947545082361056, 'weight_decay': 0.002, 'adam_beta1': 0.9500000000000001, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7155,1.524979,0.6149,0.608866,0.6149,0.607087
2,0.424,1.435207,0.7364,0.740865,0.7364,0.736614
3,0.3115,1.59425,0.781,0.79263,0.781,0.778794
4,0.2433,1.576003,0.816,0.818705,0.816,0.815122
5,0.1987,1.609864,0.8177,0.83187,0.8177,0.820314
6,0.1638,1.655583,0.8258,0.844658,0.8258,0.827353
7,0.1336,1.704596,0.8471,0.850965,0.8471,0.845868
8,0.1108,1.639132,0.8402,0.843759,0.8402,0.838149


[I 2025-01-11 07:09:18,343] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 5.405290206151555e-05, 'weight_decay': 0.008, 'adam_beta1': 0.98, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7944,1.574666,0.4293,0.431993,0.4293,0.420695
2,1.4265,1.415015,0.5297,0.529846,0.5297,0.527584


[I 2025-01-11 07:16:07,404] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 2.2633022690645107e-05, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.886,0.946618,0.3427,0.354704,0.3427,0.332316
2,0.7672,0.992865,0.4138,0.42179,0.4138,0.406242
3,0.6607,1.029289,0.4652,0.459924,0.4652,0.453186
4,0.5878,1.086193,0.5124,0.507281,0.5124,0.49936


[I 2025-01-11 07:29:41,604] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 6.933432092218376e-05, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9172,1.10726,0.46,0.464641,0.46,0.449178
2,0.6601,1.161521,0.5793,0.585803,0.5793,0.575174


[I 2025-01-11 07:36:24,259] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 3.9470832997428617e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3721,1.379144,0.4066,0.400644,0.4066,0.395556
2,1.1381,1.358323,0.4923,0.494324,0.4923,0.488143
3,0.9899,1.331263,0.5476,0.54646,0.5476,0.53895
4,0.8747,1.325292,0.5935,0.585784,0.5935,0.584996
5,0.7883,1.299066,0.6173,0.627335,0.6173,0.614917
6,0.7158,1.307998,0.6516,0.654654,0.6516,0.647616
7,0.6579,1.297888,0.6663,0.672733,0.6663,0.665435
8,0.6099,1.356095,0.6735,0.677372,0.6735,0.666561


[I 2025-01-11 08:03:52,827] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0003793850102112891, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6916,1.278946,0.6492,0.647749,0.6492,0.645275
2,0.4165,1.360292,0.7621,0.765113,0.7621,0.761884
3,0.3177,1.374305,0.7854,0.798105,0.7854,0.784177
4,0.2589,1.423625,0.8159,0.819163,0.8159,0.814824
5,0.2195,1.404794,0.8272,0.837718,0.8272,0.829139
6,0.1842,1.416215,0.8301,0.843833,0.8301,0.830604
7,0.1575,1.461539,0.8402,0.843364,0.8402,0.83885
8,0.1333,1.417961,0.8423,0.846273,0.8423,0.840867
9,0.1138,1.397598,0.86,0.861931,0.86,0.860237
10,0.0982,1.476766,0.8553,0.860966,0.8553,0.856152


[I 2025-01-11 08:54:49,191] Trial 24 finished with value: 0.8671829944838988 and parameters: {'learning_rate': 0.0003793850102112891, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 24 with value: 0.8671829944838988.


Trial 25 with params: {'learning_rate': 0.0004228288513803348, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7024,1.256235,0.6424,0.639874,0.6424,0.634701
2,0.423,1.292933,0.7521,0.755702,0.7521,0.750576
3,0.3196,1.367278,0.7844,0.795126,0.7844,0.781652
4,0.2631,1.393158,0.8196,0.82517,0.8196,0.81908
5,0.2225,1.411397,0.8297,0.838677,0.8297,0.831349
6,0.189,1.432035,0.8296,0.845622,0.8296,0.830122
7,0.16,1.412469,0.8457,0.849717,0.8457,0.844494
8,0.1354,1.436679,0.846,0.852594,0.846,0.84506
9,0.1152,1.414883,0.8594,0.862725,0.8594,0.859981
10,0.099,1.443976,0.8657,0.867906,0.8657,0.865846


[I 2025-01-11 09:45:49,132] Trial 25 finished with value: 0.8738118138794979 and parameters: {'learning_rate': 0.0004228288513803348, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 25 with value: 0.8738118138794979.


Trial 26 with params: {'learning_rate': 0.00043281328928984495, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7992,1.253574,0.6525,0.648017,0.6525,0.64667
2,0.4866,1.316876,0.7567,0.755435,0.7567,0.754533
3,0.3735,1.349306,0.7888,0.798435,0.7888,0.78831
4,0.3131,1.425681,0.8065,0.810942,0.8065,0.804619


[I 2025-01-11 09:59:47,928] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00037534084529860086, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6034,1.284188,0.6387,0.639859,0.6387,0.631366
2,0.3538,1.362511,0.7584,0.761668,0.7584,0.758155
3,0.263,1.41278,0.7841,0.792442,0.7841,0.782854
4,0.2135,1.400066,0.8143,0.818748,0.8143,0.813071
5,0.1774,1.409125,0.8235,0.83568,0.8235,0.825741
6,0.1474,1.459471,0.8255,0.841835,0.8255,0.826188
7,0.123,1.450803,0.8414,0.844553,0.8414,0.840299
8,0.1011,1.465175,0.8386,0.840102,0.8386,0.836257


[I 2025-01-11 10:27:50,576] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 9.219233712970501e-05, 'weight_decay': 0.005, 'adam_beta1': 0.92, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0851,1.38016,0.4981,0.497638,0.4981,0.492101
2,0.7433,1.405729,0.6397,0.639183,0.6397,0.635763


[I 2025-01-11 10:34:53,587] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 1.0849602255221522e-06, 'weight_decay': 0.006, 'adam_beta1': 0.96, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1797,2.011833,0.1034,0.078039,0.1034,0.027814
2,2.154,1.99178,0.1312,0.124741,0.1312,0.075844


[I 2025-01-11 10:41:54,932] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00021216455415222076, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3135,1.370996,0.6083,0.611466,0.6083,0.601512
2,0.819,1.265506,0.7398,0.7414,0.7398,0.739208
3,0.6215,1.243058,0.7841,0.792093,0.7841,0.784581
4,0.5059,1.16164,0.8125,0.819319,0.8125,0.812381
5,0.425,1.182377,0.8204,0.831541,0.8204,0.822571
6,0.3499,1.220746,0.8167,0.838126,0.8167,0.818091
7,0.291,1.198037,0.8353,0.83856,0.8353,0.83416
8,0.2428,1.195996,0.8336,0.836788,0.8336,0.831667
9,0.2047,1.180772,0.8459,0.849816,0.8459,0.846718
10,0.1796,1.16846,0.8473,0.850702,0.8473,0.847019


[I 2025-01-11 11:34:01,772] Trial 30 finished with value: 0.8496167270684216 and parameters: {'learning_rate': 0.00021216455415222076, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 25 with value: 0.8738118138794979.


Trial 31 with params: {'learning_rate': 0.00027098418702840296, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.418,1.229995,0.6477,0.64742,0.6477,0.642154
2,0.8765,1.058219,0.7506,0.752179,0.7506,0.750067
3,0.6721,1.014047,0.7884,0.793075,0.7884,0.787863
4,0.5506,0.9534,0.8088,0.811804,0.8088,0.807575
5,0.4574,0.921113,0.8245,0.835676,0.8245,0.826792
6,0.375,0.980552,0.8143,0.835954,0.8143,0.815514
7,0.3057,0.940524,0.8345,0.840773,0.8345,0.834334
8,0.2467,0.922114,0.8423,0.844498,0.8423,0.841316
9,0.2033,0.910348,0.8437,0.848333,0.8437,0.844511
10,0.172,0.895297,0.8514,0.854914,0.8514,0.852074


[I 2025-01-11 12:26:04,330] Trial 31 finished with value: 0.8513744443875344 and parameters: {'learning_rate': 0.00027098418702840296, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 25 with value: 0.8738118138794979.


Trial 32 with params: {'learning_rate': 0.0003799395139850002, 'weight_decay': 0.005, 'adam_beta1': 0.91, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4512,1.150584,0.6389,0.646061,0.6389,0.638284
2,0.8929,0.919537,0.7545,0.755443,0.7545,0.752587
3,0.6783,0.826226,0.8015,0.807867,0.8015,0.801384
4,0.5557,0.773081,0.8202,0.823981,0.8202,0.819443
5,0.4625,0.76212,0.8247,0.835271,0.8247,0.82672
6,0.3796,0.772061,0.8284,0.84604,0.8284,0.82958
7,0.3087,0.755879,0.8393,0.843393,0.8393,0.837367
8,0.2484,0.750461,0.8462,0.849439,0.8462,0.844791
9,0.1952,0.712601,0.8557,0.859704,0.8557,0.856292
10,0.1605,0.710891,0.8556,0.85969,0.8556,0.855718


[I 2025-01-11 13:18:27,869] Trial 32 finished with value: 0.8628454075424614 and parameters: {'learning_rate': 0.0003799395139850002, 'weight_decay': 0.005, 'adam_beta1': 0.91, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 25 with value: 0.8738118138794979.


Trial 33 with params: {'learning_rate': 0.0004990812311355335, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4798,1.009539,0.6331,0.6392,0.6331,0.633066
2,0.9211,0.760648,0.7316,0.736677,0.7316,0.730026
3,0.695,0.630021,0.7795,0.788684,0.7795,0.781613
4,0.5557,0.574791,0.8086,0.810611,0.8086,0.806002


[I 2025-01-11 13:32:32,202] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 9.29708981207645e-05, 'weight_decay': 0.005, 'adam_beta1': 0.93, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7459,1.426636,0.5012,0.502762,0.5012,0.495378
2,1.2413,1.17561,0.6362,0.641351,0.6362,0.636184
3,0.9789,1.017571,0.704,0.709359,0.704,0.704538
4,0.8039,0.956294,0.7388,0.736865,0.7388,0.735903


[I 2025-01-11 13:46:28,002] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 4.572836396686645e-05, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.785,1.570255,0.436,0.432786,0.436,0.43033
2,1.4305,1.417548,0.5295,0.527545,0.5295,0.525478
3,1.2316,1.311829,0.5891,0.589507,0.5891,0.584177
4,1.0792,1.252782,0.6368,0.633439,0.6368,0.632645


[I 2025-01-11 14:00:11,717] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0003859356054058685, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9065,1.28207,0.6468,0.644033,0.6468,0.642437
2,0.5635,1.262002,0.7636,0.763579,0.7636,0.762046
3,0.4365,1.323433,0.7836,0.793913,0.7836,0.782232
4,0.3637,1.288074,0.8132,0.816479,0.8132,0.811103
5,0.3072,1.302064,0.8268,0.836424,0.8268,0.828561
6,0.2622,1.322766,0.8219,0.843482,0.8219,0.823984
7,0.2248,1.350336,0.8328,0.8394,0.8328,0.831645
8,0.193,1.325045,0.8455,0.848474,0.8455,0.844756
9,0.1652,1.353049,0.8555,0.858503,0.8555,0.855899
10,0.1437,1.299347,0.86,0.86288,0.86,0.860351


[I 2025-01-11 14:54:26,109] Trial 36 finished with value: 0.8709704807575521 and parameters: {'learning_rate': 0.0003859356054058685, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 25 with value: 0.8738118138794979.


Trial 37 with params: {'learning_rate': 0.00019901387264934772, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6902,1.362734,0.6041,0.599677,0.6041,0.592705
2,0.4063,1.348853,0.7208,0.725907,0.7208,0.720539
3,0.299,1.428963,0.7681,0.776759,0.7681,0.766397
4,0.2363,1.586788,0.8014,0.805311,0.8014,0.799859


[I 2025-01-11 15:08:17,624] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00023856520750016358, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0587,1.3152,0.6239,0.623183,0.6239,0.616556
2,0.6631,1.258037,0.7361,0.743203,0.7361,0.736224
3,0.5079,1.252994,0.7806,0.786263,0.7806,0.779064
4,0.4169,1.261936,0.8085,0.811596,0.8085,0.806732


[I 2025-01-11 15:22:12,424] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.00045696384172056527, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5716,1.181671,0.6232,0.618828,0.6232,0.609904
2,0.3426,1.265421,0.7483,0.753951,0.7483,0.747398
3,0.2534,1.370171,0.779,0.791682,0.779,0.776577
4,0.2055,1.365504,0.8187,0.822322,0.8187,0.818145
5,0.172,1.394804,0.8242,0.837193,0.8242,0.826127
6,0.1438,1.440999,0.8206,0.842226,0.8206,0.821904
7,0.1211,1.434578,0.841,0.846259,0.841,0.839607
8,0.1008,1.404397,0.8447,0.848553,0.8447,0.843648
9,0.085,1.415862,0.8579,0.861459,0.8579,0.858377
10,0.0703,1.454174,0.8674,0.871149,0.8674,0.86803


[I 2025-01-11 16:14:46,446] Trial 39 finished with value: 0.8776587205453099 and parameters: {'learning_rate': 0.00045696384172056527, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 39 with value: 0.8776587205453099.


Trial 40 with params: {'learning_rate': 0.00046283884311314297, 'weight_decay': 0.003, 'adam_beta1': 0.91, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6955,1.284767,0.6617,0.663566,0.6617,0.661401
2,0.4193,1.398658,0.7439,0.746216,0.7439,0.741758
3,0.325,1.33933,0.7854,0.792931,0.7854,0.783726
4,0.2692,1.364603,0.8137,0.819035,0.8137,0.813588
5,0.2304,1.426043,0.8275,0.836265,0.8275,0.829261
6,0.1924,1.417498,0.8209,0.848351,0.8209,0.822798
7,0.1655,1.450359,0.8432,0.846801,0.8432,0.841714
8,0.1401,1.405748,0.8432,0.846986,0.8432,0.841676


[I 2025-01-11 16:43:00,887] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0004492853036662864, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7894,1.288988,0.6631,0.656524,0.6631,0.658261
2,0.4792,1.349342,0.7707,0.769506,0.7707,0.768591
3,0.3694,1.374668,0.8008,0.80688,0.8008,0.800345
4,0.3075,1.3504,0.8276,0.830897,0.8276,0.82654
5,0.2623,1.348997,0.8271,0.839663,0.8271,0.829198
6,0.2226,1.370739,0.822,0.849394,0.822,0.82359
7,0.1925,1.381156,0.8478,0.852191,0.8478,0.847091
8,0.1645,1.376281,0.8507,0.853034,0.8507,0.849287
9,0.1413,1.381923,0.8612,0.863807,0.8612,0.861553
10,0.1233,1.405284,0.8722,0.875284,0.8722,0.872568


[I 2025-01-11 17:35:06,242] Trial 41 finished with value: 0.8765415304329757 and parameters: {'learning_rate': 0.0004492853036662864, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.5}. Best is trial 39 with value: 0.8776587205453099.


Trial 42 with params: {'learning_rate': 0.0002463607584658398, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7278,1.253034,0.6243,0.63262,0.6243,0.62135
2,0.4379,1.238319,0.7465,0.749025,0.7465,0.745394
3,0.3257,1.316248,0.79,0.798487,0.79,0.789647
4,0.2623,1.345136,0.8163,0.817374,0.8163,0.814526
5,0.2199,1.349074,0.8196,0.830568,0.8196,0.821419
6,0.1819,1.406902,0.8173,0.839249,0.8173,0.818564
7,0.154,1.371623,0.8396,0.84464,0.8396,0.838614
8,0.1298,1.382133,0.8422,0.842194,0.8422,0.840405


[I 2025-01-11 18:01:56,861] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00046961970042811346, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5877,1.215994,0.6518,0.648491,0.6518,0.648254
2,0.341,1.349505,0.7496,0.75139,0.7496,0.74811
3,0.2559,1.390585,0.7884,0.796989,0.7884,0.785812
4,0.209,1.467788,0.8185,0.82271,0.8185,0.816978
5,0.1751,1.471954,0.8359,0.847928,0.8359,0.838432
6,0.1452,1.518123,0.8291,0.849757,0.8291,0.83087
7,0.1212,1.515039,0.8541,0.857532,0.8541,0.853624
8,0.1009,1.556748,0.8589,0.860569,0.8589,0.858167
9,0.0848,1.49293,0.8677,0.870548,0.8677,0.867616
10,0.0709,1.561019,0.8735,0.875499,0.8735,0.873578


[I 2025-01-11 18:52:28,780] Trial 43 finished with value: 0.8858248729707295 and parameters: {'learning_rate': 0.00046961970042811346, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 44 with params: {'learning_rate': 0.00025519265205474733, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9013,1.257122,0.5707,0.566639,0.5707,0.562672
2,0.554,1.269668,0.7035,0.711764,0.7035,0.701764


[I 2025-01-11 18:59:12,689] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0003263037810993442, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0327,1.334026,0.6375,0.638157,0.6375,0.631509
2,0.6485,1.275128,0.7496,0.754803,0.7496,0.750375
3,0.5024,1.318381,0.7781,0.789375,0.7781,0.776437
4,0.4133,1.297801,0.8199,0.821668,0.8199,0.818518
5,0.3497,1.289945,0.8286,0.841296,0.8286,0.831052
6,0.2958,1.321932,0.8286,0.846048,0.8286,0.829323
7,0.2507,1.310881,0.8479,0.849041,0.8479,0.846796
8,0.2124,1.29381,0.841,0.843695,0.841,0.839662


[I 2025-01-11 19:25:58,616] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00046060721841702554, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.588,1.243928,0.646,0.653111,0.646,0.643215
2,0.3392,1.397877,0.762,0.760694,0.762,0.759044
3,0.2577,1.355182,0.7886,0.798067,0.7886,0.7862
4,0.2109,1.447465,0.8118,0.817687,0.8118,0.809745


[I 2025-01-11 19:39:24,234] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 1.595459785536451e-05, 'weight_decay': 0.002, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2173,1.797093,0.3255,0.327294,0.3255,0.311943
2,1.8492,1.609195,0.4024,0.400314,0.4024,0.395336
3,1.6852,1.507228,0.4422,0.441575,0.4422,0.434329
4,1.5696,1.420815,0.478,0.471337,0.478,0.471556
5,1.4922,1.382496,0.494,0.495126,0.494,0.487831
6,1.415,1.331288,0.5166,0.510293,0.5166,0.509079
7,1.3578,1.290264,0.5297,0.53309,0.5297,0.528013
8,1.3034,1.274297,0.5435,0.534216,0.5435,0.533715


[I 2025-01-11 20:06:19,216] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0004625348911823293, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7015,1.238532,0.6451,0.649318,0.6451,0.640392
2,0.4304,1.330368,0.7484,0.748319,0.7484,0.746099
3,0.3299,1.386025,0.7882,0.795579,0.7882,0.786497
4,0.2683,1.436722,0.8163,0.820853,0.8163,0.815899
5,0.2289,1.422635,0.8325,0.845305,0.8325,0.835084
6,0.1929,1.422383,0.8301,0.85185,0.8301,0.831845
7,0.1648,1.530168,0.8497,0.853484,0.8497,0.849078
8,0.1398,1.47831,0.8534,0.855175,0.8534,0.852631
9,0.1189,1.451202,0.861,0.864914,0.861,0.861793
10,0.102,1.45424,0.8659,0.869839,0.8659,0.865782


[I 2025-01-11 20:56:45,972] Trial 48 finished with value: 0.8722818934913524 and parameters: {'learning_rate': 0.0004625348911823293, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 49 with params: {'learning_rate': 0.00035541307282489136, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6832,1.375139,0.6546,0.655136,0.6546,0.652668
2,0.3853,1.537217,0.7452,0.749158,0.7452,0.744259
3,0.2856,1.640092,0.7857,0.797382,0.7857,0.784355
4,0.2308,1.722376,0.8236,0.825153,0.8236,0.822126
5,0.193,1.618815,0.8209,0.840908,0.8209,0.824038
6,0.1568,1.633891,0.8114,0.845589,0.8114,0.813613
7,0.1304,1.647384,0.8471,0.850539,0.8471,0.846718
8,0.106,1.71509,0.8439,0.846968,0.8439,0.842646
9,0.088,1.647469,0.8599,0.864801,0.8599,0.860777
10,0.073,1.705866,0.8589,0.863402,0.8589,0.859411


[I 2025-01-11 21:47:31,139] Trial 49 finished with value: 0.8652733633674432 and parameters: {'learning_rate': 0.00035541307282489136, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 50 with params: {'learning_rate': 1.4906411453668724e-05, 'weight_decay': 0.006, 'adam_beta1': 0.99, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9229,1.819587,0.3027,0.304969,0.3027,0.283734
2,1.6639,1.762099,0.3716,0.368427,0.3716,0.359411
3,1.5466,1.729057,0.4214,0.418347,0.4214,0.415116
4,1.4549,1.729645,0.4535,0.448427,0.4535,0.444096


[I 2025-01-11 22:01:03,774] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0004531114112911217, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5725,1.144344,0.6449,0.641533,0.6449,0.639098
2,0.3373,1.262625,0.7615,0.762877,0.7615,0.759994
3,0.2527,1.313183,0.7833,0.791651,0.7833,0.78238
4,0.2062,1.443748,0.8264,0.828571,0.8264,0.824991
5,0.1726,1.399581,0.8238,0.838107,0.8238,0.825815
6,0.1445,1.427893,0.832,0.848881,0.832,0.832978
7,0.1218,1.430148,0.8503,0.854062,0.8503,0.849392
8,0.1006,1.42511,0.8498,0.851575,0.8498,0.848346
9,0.0845,1.422237,0.8594,0.862478,0.8594,0.85987
10,0.0705,1.449125,0.8708,0.872797,0.8708,0.870968


[I 2025-01-11 22:51:59,998] Trial 51 finished with value: 0.8743060530203361 and parameters: {'learning_rate': 0.0004531114112911217, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 52 with params: {'learning_rate': 0.0003831497817546985, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7029,1.261312,0.6501,0.649399,0.6501,0.647205
2,0.4198,1.315597,0.7451,0.746442,0.7451,0.74413
3,0.315,1.347724,0.7951,0.804109,0.7951,0.794107
4,0.258,1.396214,0.8157,0.818347,0.8157,0.814516
5,0.2172,1.410561,0.8325,0.839975,0.8325,0.833942
6,0.1839,1.429918,0.8247,0.846385,0.8247,0.825525
7,0.1565,1.42642,0.8493,0.853378,0.8493,0.848905
8,0.1315,1.446174,0.8477,0.849882,0.8477,0.845879
9,0.1128,1.41973,0.8619,0.86492,0.8619,0.862228
10,0.0967,1.464174,0.8621,0.86575,0.8621,0.86208


[I 2025-01-11 23:42:39,680] Trial 52 finished with value: 0.872400461284441 and parameters: {'learning_rate': 0.0003831497817546985, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 53 with params: {'learning_rate': 0.00045056610967625073, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5664,1.187231,0.6505,0.657811,0.6505,0.649783
2,0.3278,1.306952,0.758,0.762057,0.758,0.757132
3,0.2464,1.322633,0.7883,0.799035,0.7883,0.786175
4,0.2022,1.402987,0.8175,0.821891,0.8175,0.815779
5,0.1689,1.391924,0.8267,0.839976,0.8267,0.829091
6,0.1408,1.388519,0.8254,0.843675,0.8254,0.82563
7,0.1186,1.432646,0.8517,0.855095,0.8517,0.851612
8,0.0986,1.419662,0.8493,0.853423,0.8493,0.847043
9,0.0834,1.41943,0.8623,0.865443,0.8623,0.862749
10,0.0697,1.510705,0.87,0.872533,0.87,0.869931


[I 2025-01-12 00:33:40,285] Trial 53 finished with value: 0.8751704899821069 and parameters: {'learning_rate': 0.00045056610967625073, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 54 with params: {'learning_rate': 0.0002478748406512204, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7152,1.245915,0.6377,0.634879,0.6377,0.632066
2,0.4216,1.260237,0.7406,0.745826,0.7406,0.740171
3,0.3158,1.324512,0.7739,0.785218,0.7739,0.772164
4,0.2569,1.366827,0.8074,0.812111,0.8074,0.80644
5,0.2164,1.36514,0.8207,0.835425,0.8207,0.823268
6,0.1792,1.388149,0.8182,0.838016,0.8182,0.819217
7,0.1516,1.412604,0.8385,0.841388,0.8385,0.837079
8,0.1268,1.369675,0.8361,0.840232,0.8361,0.83455
9,0.1087,1.426545,0.8474,0.850248,0.8474,0.847944
10,0.0941,1.406598,0.8484,0.853185,0.8484,0.848647


[I 2025-01-12 01:25:18,787] Trial 54 finished with value: 0.856482541042779 and parameters: {'learning_rate': 0.0002478748406512204, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 55 with params: {'learning_rate': 0.0002777061852751291, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7154,1.265537,0.6432,0.644334,0.6432,0.641572
2,0.4226,1.233998,0.7416,0.74086,0.7416,0.740116
3,0.3168,1.324529,0.7884,0.796119,0.7884,0.787129
4,0.2568,1.349289,0.822,0.826216,0.822,0.821448
5,0.2118,1.351048,0.8158,0.837939,0.8158,0.820295
6,0.1777,1.411008,0.8155,0.843515,0.8155,0.815654
7,0.1502,1.362862,0.8436,0.849623,0.8436,0.843003
8,0.1254,1.380464,0.8452,0.849028,0.8452,0.844074
9,0.107,1.386908,0.8525,0.856553,0.8525,0.852675
10,0.0928,1.381501,0.8615,0.863874,0.8615,0.861856


[I 2025-01-12 02:15:50,711] Trial 55 finished with value: 0.8683141389636606 and parameters: {'learning_rate': 0.0002777061852751291, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 56 with params: {'learning_rate': 0.00022602735076646376, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6203,1.183349,0.6213,0.622412,0.6213,0.612671
2,0.3605,1.287261,0.7385,0.742721,0.7385,0.737857
3,0.259,1.354306,0.7841,0.791745,0.7841,0.782518
4,0.2064,1.397598,0.8062,0.812286,0.8062,0.804988
5,0.1691,1.389987,0.8127,0.829802,0.8127,0.815004
6,0.1367,1.416513,0.8245,0.844825,0.8245,0.825883
7,0.113,1.422673,0.837,0.840116,0.837,0.83596
8,0.0917,1.453497,0.8317,0.838889,0.8317,0.829059
9,0.0765,1.394009,0.8509,0.852518,0.8509,0.850864
10,0.065,1.440511,0.8542,0.85772,0.8542,0.854478


[I 2025-01-12 03:06:12,748] Trial 56 finished with value: 0.8597978433223968 and parameters: {'learning_rate': 0.00022602735076646376, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 57 with params: {'learning_rate': 5.787570113829524e-06, 'weight_decay': 0.0, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2402,1.142492,0.2196,0.219743,0.2196,0.192182
2,1.1393,1.24001,0.2782,0.286726,0.2782,0.250406
3,1.0806,1.269627,0.3208,0.315597,0.3208,0.30457
4,1.0476,1.304913,0.3335,0.33258,0.3335,0.317021


[I 2025-01-12 03:19:35,934] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 1.231563481729302e-06, 'weight_decay': 0.004, 'adam_beta1': 0.98, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3446,2.17429,0.1056,0.078202,0.1056,0.032554
2,2.3153,2.149431,0.1432,0.105174,0.1432,0.089903


[I 2025-01-12 03:26:17,998] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0004642383073750495, 'weight_decay': 0.001, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5691,1.233639,0.6543,0.651634,0.6543,0.647331
2,0.3358,1.280628,0.7544,0.758939,0.7544,0.754273
3,0.2522,1.385224,0.7938,0.802305,0.7938,0.791473
4,0.2064,1.434259,0.8149,0.81882,0.8149,0.813452
5,0.1717,1.402337,0.8278,0.842897,0.8278,0.830677
6,0.1429,1.446408,0.8235,0.843246,0.8235,0.824278
7,0.1208,1.444575,0.8486,0.852122,0.8486,0.847699
8,0.1002,1.398068,0.8411,0.845437,0.8411,0.839439


[I 2025-01-12 03:53:04,315] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 2.0114989721951363e-06, 'weight_decay': 0.01, 'adam_beta1': 0.92, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9738,1.818494,0.1301,0.159924,0.1301,0.077898
2,1.9324,1.799802,0.1788,0.169433,0.1788,0.126256
3,1.8824,1.786375,0.2096,0.208945,0.2096,0.169384
4,1.8157,1.767383,0.2496,0.239191,0.2496,0.206236
5,1.7505,1.763251,0.2613,0.263678,0.2613,0.223109
6,1.7008,1.768387,0.2856,0.284008,0.2856,0.255128
7,1.672,1.774733,0.3034,0.300129,0.3034,0.279214
8,1.6555,1.788482,0.3052,0.305785,0.3052,0.281433


[I 2025-01-12 04:19:50,784] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00020547175772221783, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8841,1.277411,0.5992,0.60032,0.5992,0.59339
2,0.5432,1.373044,0.7239,0.729369,0.7239,0.722451
3,0.4132,1.306624,0.7748,0.778992,0.7748,0.773278
4,0.3326,1.357619,0.8089,0.81136,0.8089,0.807764
5,0.2745,1.344297,0.8151,0.824061,0.8151,0.816564
6,0.2294,1.419493,0.7948,0.829683,0.7948,0.796912
7,0.1939,1.429683,0.8264,0.831241,0.8264,0.824633
8,0.1633,1.423889,0.8263,0.830176,0.8263,0.824704
9,0.1384,1.383594,0.8427,0.846476,0.8427,0.843471
10,0.1211,1.386492,0.8437,0.849682,0.8437,0.844546


[I 2025-01-12 05:10:05,955] Trial 61 finished with value: 0.8522741761332213 and parameters: {'learning_rate': 0.00020547175772221783, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 62 with params: {'learning_rate': 0.0003484792854068305, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6093,1.277689,0.6398,0.63567,0.6398,0.631859
2,0.3471,1.315071,0.7595,0.76258,0.7595,0.75816
3,0.2554,1.34734,0.7817,0.794574,0.7817,0.779181
4,0.2057,1.439406,0.8199,0.823123,0.8199,0.818785
5,0.1714,1.440905,0.8356,0.844743,0.8356,0.837133
6,0.1407,1.457467,0.8242,0.84875,0.8242,0.82562
7,0.1177,1.440848,0.8428,0.846955,0.8428,0.841749
8,0.0965,1.484125,0.8531,0.854205,0.8531,0.851812
9,0.0811,1.515239,0.8565,0.859128,0.8565,0.856548
10,0.0678,1.486511,0.8646,0.867725,0.8646,0.864778


[I 2025-01-12 06:00:35,772] Trial 62 finished with value: 0.8670797648612293 and parameters: {'learning_rate': 0.0003484792854068305, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 63 with params: {'learning_rate': 0.0004029175776852982, 'weight_decay': 0.003, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5714,1.220726,0.643,0.644368,0.643,0.638497
2,0.3338,1.327,0.7682,0.772393,0.7682,0.76805
3,0.2476,1.326104,0.7937,0.80051,0.7937,0.791482
4,0.2024,1.406905,0.8141,0.818642,0.8141,0.81253
5,0.1694,1.365042,0.8343,0.844979,0.8343,0.836435
6,0.1404,1.417827,0.822,0.844553,0.822,0.823055
7,0.1183,1.409877,0.8479,0.853095,0.8479,0.8472
8,0.0984,1.430831,0.8463,0.848275,0.8463,0.844538
9,0.0817,1.434605,0.8597,0.863071,0.8597,0.859821
10,0.0681,1.490489,0.8721,0.873145,0.8721,0.871998


[I 2025-01-12 06:52:12,583] Trial 63 finished with value: 0.8777763015151621 and parameters: {'learning_rate': 0.0004029175776852982, 'weight_decay': 0.003, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 64 with params: {'learning_rate': 6.864904704881755e-06, 'weight_decay': 0.002, 'adam_beta1': 0.99, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.11,1.019336,0.2261,0.21996,0.2261,0.187894
2,1.0261,1.102088,0.2889,0.300043,0.2889,0.257785


[I 2025-01-12 06:59:07,496] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0004363816115314198, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6905,1.241322,0.6494,0.651085,0.6494,0.64741
2,0.4153,1.290467,0.7488,0.752865,0.7488,0.748572
3,0.3178,1.384224,0.7857,0.79175,0.7857,0.784493
4,0.2628,1.337967,0.8147,0.815933,0.8147,0.812813


[I 2025-01-12 07:12:52,478] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.00044959317353520316, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5712,1.212543,0.6584,0.65005,0.6584,0.650909
2,0.3344,1.258147,0.7585,0.763974,0.7585,0.757953
3,0.2496,1.33466,0.7856,0.795855,0.7856,0.784262
4,0.2044,1.40945,0.819,0.819677,0.819,0.817068
5,0.1688,1.380033,0.8296,0.84267,0.8296,0.832439
6,0.1413,1.411902,0.8313,0.853928,0.8313,0.832874
7,0.1198,1.432328,0.8398,0.843922,0.8398,0.838293
8,0.1005,1.429365,0.8517,0.854593,0.8517,0.850597
9,0.0835,1.428589,0.8602,0.86296,0.8602,0.86022
10,0.0707,1.425039,0.862,0.86643,0.862,0.862153


[I 2025-01-12 08:03:52,185] Trial 66 finished with value: 0.8759907173919406 and parameters: {'learning_rate': 0.00044959317353520316, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 67 with params: {'learning_rate': 0.0003956722053375569, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7151,1.270366,0.6136,0.622422,0.6136,0.606748
2,0.4317,1.279424,0.7529,0.756936,0.7529,0.7531
3,0.3241,1.377097,0.7965,0.80145,0.7965,0.79549
4,0.264,1.425211,0.8223,0.824222,0.8223,0.820914
5,0.2209,1.434808,0.8286,0.837403,0.8286,0.830154
6,0.1878,1.442715,0.827,0.848017,0.827,0.82889
7,0.1571,1.435834,0.84,0.845464,0.84,0.838882
8,0.1338,1.393293,0.8505,0.854355,0.8505,0.849698
9,0.1137,1.423122,0.8583,0.860655,0.8583,0.858624
10,0.0987,1.44241,0.8695,0.872345,0.8695,0.869997


[I 2025-01-12 08:54:50,257] Trial 67 finished with value: 0.8744338458613091 and parameters: {'learning_rate': 0.0003956722053375569, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 68 with params: {'learning_rate': 6.690323842125524e-05, 'weight_decay': 0.0, 'adam_beta1': 0.93, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9521,1.142009,0.4485,0.442523,0.4485,0.431446
2,0.6838,1.211815,0.5746,0.575594,0.5746,0.570158
3,0.5462,1.244645,0.644,0.650213,0.644,0.639875
4,0.4585,1.324679,0.6858,0.684489,0.6858,0.683258
5,0.3985,1.285033,0.7065,0.716962,0.7065,0.707751
6,0.3494,1.330732,0.7179,0.733543,0.7179,0.716083
7,0.3081,1.377229,0.7324,0.742102,0.7324,0.732435
8,0.2739,1.410678,0.7343,0.741184,0.7343,0.729315


[I 2025-01-12 09:21:58,405] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 1.641098718467686e-05, 'weight_decay': 0.01, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7646,1.632309,0.3224,0.322723,0.3224,0.310357
2,1.5373,1.576596,0.3956,0.39044,0.3956,0.385793
3,1.4255,1.539807,0.4339,0.433933,0.4339,0.426201
4,1.3366,1.525565,0.4772,0.470579,0.4772,0.467076
5,1.2749,1.508068,0.4846,0.481374,0.4846,0.473924
6,1.2192,1.495919,0.5107,0.506739,0.5107,0.499349
7,1.1733,1.467834,0.5264,0.526857,0.5264,0.52037
8,1.1304,1.475381,0.5386,0.530489,0.5386,0.527916


[I 2025-01-12 09:49:08,163] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00030182759810243553, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.959,1.385059,0.6144,0.621084,0.6144,0.608788
2,0.5983,1.29928,0.7411,0.745286,0.7411,0.74016
3,0.4547,1.332471,0.7864,0.794503,0.7864,0.785328
4,0.3721,1.364244,0.8104,0.813837,0.8104,0.808832
5,0.3166,1.354055,0.8272,0.835394,0.8272,0.828371
6,0.2676,1.361334,0.8223,0.844657,0.8223,0.823464
7,0.2262,1.37748,0.8404,0.843523,0.8404,0.839257
8,0.1927,1.365576,0.8413,0.844824,0.8413,0.839655
9,0.1646,1.383029,0.8515,0.860234,0.8515,0.853449
10,0.1442,1.351121,0.855,0.860718,0.855,0.855991


[I 2025-01-12 10:40:10,311] Trial 70 finished with value: 0.8622894858963633 and parameters: {'learning_rate': 0.00030182759810243553, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 71 with params: {'learning_rate': 2.0955490522832793e-06, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.93, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7202,1.558227,0.1349,0.244938,0.1349,0.082619
2,1.6821,1.537885,0.1822,0.170344,0.1822,0.135921


[I 2025-01-12 10:46:55,530] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0001855972168731866, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6822,1.173651,0.5773,0.57139,0.5773,0.566525
2,0.4128,1.368703,0.7086,0.710703,0.7086,0.706417


[I 2025-01-12 10:53:42,560] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.00039266426177087855, 'weight_decay': 0.001, 'adam_beta1': 0.96, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8108,1.296561,0.6309,0.622571,0.6309,0.623694
2,0.4956,1.286179,0.749,0.751653,0.749,0.747022
3,0.3768,1.331921,0.787,0.794385,0.787,0.786713
4,0.309,1.332732,0.813,0.816592,0.813,0.810671


[I 2025-01-12 11:07:13,505] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0004635618184844074, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5977,1.224767,0.6301,0.62189,0.6301,0.61773
2,0.356,1.363783,0.7474,0.750754,0.7474,0.746903
3,0.2671,1.392329,0.7843,0.791692,0.7843,0.784012
4,0.2196,1.445268,0.8108,0.815622,0.8108,0.809317


[I 2025-01-12 11:20:43,036] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00017913124680222592, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6659,1.149636,0.5741,0.567465,0.5741,0.565611
2,0.3988,1.303488,0.7003,0.711014,0.7003,0.700918
3,0.2917,1.39273,0.7631,0.772425,0.7631,0.76285
4,0.2274,1.387176,0.804,0.80575,0.804,0.802087
5,0.1858,1.33994,0.8031,0.813083,0.8031,0.804849
6,0.1505,1.346369,0.8091,0.831412,0.8091,0.810704
7,0.1235,1.45635,0.8277,0.831613,0.8277,0.826534
8,0.1009,1.409286,0.8294,0.832729,0.8294,0.827832
9,0.0838,1.438518,0.836,0.840919,0.836,0.837246
10,0.0704,1.477541,0.8404,0.846457,0.8404,0.841503


[I 2025-01-12 12:11:24,492] Trial 75 finished with value: 0.8440720677795472 and parameters: {'learning_rate': 0.00017913124680222592, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 76 with params: {'learning_rate': 0.0003417843930342611, 'weight_decay': 0.001, 'adam_beta1': 0.9400000000000001, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6463,1.268328,0.6487,0.646496,0.6487,0.640131
2,0.3657,1.421195,0.7601,0.769011,0.7601,0.759757
3,0.2706,1.456168,0.7791,0.790036,0.7791,0.777236
4,0.2207,1.502619,0.8125,0.814583,0.8125,0.811125


[I 2025-01-12 12:24:56,902] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0003574884842049779, 'weight_decay': 0.007, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8062,1.288148,0.6489,0.652379,0.6489,0.646976
2,0.4849,1.296116,0.7669,0.766325,0.7669,0.765782
3,0.3701,1.327177,0.7886,0.798807,0.7886,0.787572
4,0.3077,1.380444,0.8185,0.821533,0.8185,0.816768
5,0.2602,1.353925,0.832,0.84178,0.832,0.833907
6,0.219,1.362232,0.8272,0.848201,0.8272,0.828797
7,0.1877,1.382081,0.8515,0.856893,0.8515,0.850756
8,0.1612,1.38978,0.8489,0.851327,0.8489,0.847858
9,0.1381,1.387812,0.8584,0.860952,0.8584,0.858621
10,0.1207,1.393427,0.8578,0.864996,0.8578,0.859107


[I 2025-01-12 13:16:01,394] Trial 77 finished with value: 0.8714966360973179 and parameters: {'learning_rate': 0.0003574884842049779, 'weight_decay': 0.007, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 78 with params: {'learning_rate': 1.4109296750080319e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2372,1.823756,0.3122,0.314047,0.3122,0.299432
2,1.8749,1.642306,0.3825,0.38383,0.3825,0.373883


[I 2025-01-12 13:22:47,540] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0003115299566108808, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7114,1.182421,0.6405,0.642176,0.6405,0.638043
2,0.4215,1.265371,0.7556,0.757166,0.7556,0.753627
3,0.3139,1.306841,0.792,0.799257,0.792,0.791819
4,0.2544,1.411318,0.8116,0.813346,0.8116,0.809821


[I 2025-01-12 13:36:19,634] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0002364627605200556, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8484,1.258717,0.6249,0.625333,0.6249,0.620909
2,0.5098,1.225958,0.749,0.752207,0.749,0.747513
3,0.3804,1.30163,0.7773,0.789177,0.7773,0.775583
4,0.3094,1.317922,0.8071,0.812966,0.8071,0.805362


[I 2025-01-12 13:49:52,609] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.00044262042328409995, 'weight_decay': 0.002, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9173,1.329443,0.6635,0.66381,0.6635,0.659913
2,0.5634,1.334371,0.7624,0.763062,0.7624,0.760846
3,0.4364,1.364579,0.7953,0.805183,0.7953,0.793917
4,0.3648,1.34758,0.8163,0.820467,0.8163,0.815071
5,0.3104,1.348466,0.8295,0.838251,0.8295,0.831725
6,0.2663,1.398615,0.8327,0.848839,0.8327,0.833378
7,0.2274,1.389033,0.8431,0.847162,0.8431,0.841353
8,0.196,1.380251,0.8535,0.857436,0.8535,0.852435
9,0.1681,1.39147,0.8626,0.865162,0.8626,0.863044
10,0.1462,1.360361,0.8689,0.872545,0.8689,0.869535


[I 2025-01-12 14:40:46,051] Trial 81 finished with value: 0.8743726935021687 and parameters: {'learning_rate': 0.00044262042328409995, 'weight_decay': 0.002, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 82 with params: {'learning_rate': 0.00015353998736973145, 'weight_decay': 0.004, 'adam_beta1': 0.93, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9739,1.312528,0.5498,0.55359,0.5498,0.545216
2,0.6156,1.36317,0.6976,0.698589,0.6976,0.696211


[I 2025-01-12 14:47:39,802] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0001039223534291677, 'weight_decay': 0.004, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7506,1.144625,0.5071,0.501807,0.5071,0.489401
2,0.4858,1.245931,0.6434,0.644518,0.6434,0.640951
3,0.3715,1.294773,0.7101,0.715046,0.7101,0.709105
4,0.3031,1.377183,0.7513,0.749328,0.7513,0.749196
5,0.2534,1.364346,0.7567,0.770094,0.7567,0.75981
6,0.2114,1.415021,0.7655,0.784047,0.7655,0.765305
7,0.1772,1.413678,0.7776,0.785061,0.7776,0.77517
8,0.1479,1.435403,0.7745,0.783802,0.7745,0.771914


[I 2025-01-12 15:14:48,210] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.00041171821898432144, 'weight_decay': 0.0, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5963,1.245507,0.6411,0.641344,0.6411,0.63836
2,0.3488,1.372714,0.7438,0.74726,0.7438,0.743226
3,0.2636,1.356272,0.7855,0.797706,0.7855,0.784463
4,0.2127,1.436002,0.8234,0.826196,0.8234,0.82259
5,0.1771,1.423725,0.8222,0.839238,0.8222,0.825595
6,0.1462,1.480301,0.8245,0.846825,0.8245,0.826766
7,0.1225,1.520857,0.8471,0.850082,0.8471,0.846109
8,0.1028,1.514488,0.8386,0.842734,0.8386,0.837175


[I 2025-01-12 15:42:24,057] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0003920548272987626, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1331,1.340253,0.6338,0.634431,0.6338,0.631451
2,0.7212,1.286713,0.744,0.744377,0.744,0.743108


[I 2025-01-12 15:49:09,124] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 1.3242306956820676e-06, 'weight_decay': 0.005, 'adam_beta1': 0.96, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4943,1.337988,0.1096,0.124339,0.1096,0.038882
2,1.4687,1.327524,0.1505,0.117016,0.1505,0.099355


[I 2025-01-12 15:55:54,695] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0004848843553616508, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8837,1.347309,0.6713,0.671428,0.6713,0.66856
2,0.5458,1.346245,0.7659,0.768075,0.7659,0.765547
3,0.4267,1.335642,0.7818,0.791926,0.7818,0.780476
4,0.3635,1.294378,0.8236,0.823784,0.8236,0.822226
5,0.3122,1.309952,0.8296,0.838509,0.8296,0.831436
6,0.2664,1.321174,0.8356,0.850761,0.8356,0.83639
7,0.2316,1.306218,0.8505,0.854228,0.8505,0.850018
8,0.1993,1.303053,0.8488,0.851771,0.8488,0.84707
9,0.1721,1.322272,0.8642,0.868085,0.8642,0.86506
10,0.1498,1.310831,0.8668,0.869428,0.8668,0.866835


[I 2025-01-12 16:47:42,711] Trial 87 finished with value: 0.8703901422924485 and parameters: {'learning_rate': 0.0004848843553616508, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 88 with params: {'learning_rate': 0.0004644055221486097, 'weight_decay': 0.004, 'adam_beta1': 0.92, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9563,1.44413,0.6611,0.659136,0.6611,0.65815
2,0.5875,1.437727,0.7617,0.759724,0.7617,0.758886
3,0.4563,1.454561,0.7875,0.797594,0.7875,0.787066
4,0.3805,1.399281,0.8132,0.815237,0.8132,0.811384
5,0.3243,1.422024,0.8306,0.840237,0.8306,0.832724
6,0.276,1.44269,0.8219,0.841227,0.8219,0.822599
7,0.2398,1.463379,0.8464,0.851225,0.8464,0.845863
8,0.2051,1.438038,0.8492,0.852037,0.8492,0.848075
9,0.1755,1.440405,0.867,0.86881,0.867,0.867352
10,0.1526,1.462412,0.8654,0.867467,0.8654,0.865249


[I 2025-01-12 17:38:25,204] Trial 88 finished with value: 0.8735363122179471 and parameters: {'learning_rate': 0.0004644055221486097, 'weight_decay': 0.004, 'adam_beta1': 0.92, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 89 with params: {'learning_rate': 4.5985694990966176e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss


[W 2025-01-12 17:40:41,577] Trial 89 failed with parameters: {'learning_rate': 4.5985694990966176e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.5, 'temperature': 2.5} because of the following error: KeyboardInterrupt().
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/integration_utils.py", line 250, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2171, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2536, in _inner_training_loop
    and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
KeyboardInterrupt
[W 2025-01-12 17:40:41,578] Trial 89 failed with value None.


KeyboardInterrupt: 