# Notebook pro trénink s destilací nad datasetem CIFAR10
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR10, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

In [1]:
%pip install transformers[torch] huggingface_hub datasets evaluate torchvision optuna

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Import knihoven a definice metod

In [1]:
from transformers import Trainer, TrainingArguments, MobileNetV2Config, MobileNetV2ForImageClassification, EarlyStoppingCallback
from torchvision import transforms
from torch.utils.data import Dataset
import torch.nn.functional as F
from PIL import Image
import torch.nn as nn
import numpy as np
import evaluate
import random
import pickle
import optuna
import torch
import math
import os 

Resetování náhodného seedu pro replikovatelnost výsledků.
Zřejmě je možné části odebrat.

TODO: Odebrat zbytečná nastavení.

In [2]:
def reset_seed(seed=42):
    torch.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) 
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    os.environ['PYTHONHASHSEED'] = str(seed)

Nový wrapper, který pracuje přímo se soubory staženého a upraveného datasetu CIFAR10.
Využití načtení pomocí metody jako dříve není možné kvůli jiné checksum. 

Zároveň se již dotahují logity přímo z datasetu.

In [3]:
class CustomCIFAR10(Dataset):
    def __init__(self, root, train=True, transform=None, target_transform=None):
        self.root = root
        self.train = train
        self.transform = transform
        self.target_transform = target_transform

        self.data = []
        self.targets = []
        self.logits = []
        
        if self.train:
             for i in range(1, 6):
                 data_file = os.path.join(self.root, 'cifar-10-batches-py', f'data_batch_{i}')
                 with open(data_file, 'rb') as fo:
                     dict = pickle.load(fo, encoding='bytes')
                     self.data.append(dict[b'data'])
                     self.targets.extend(dict[b'labels'])
                     self.logits.extend(dict[b'logits'])  
        else:
            data_file = os.path.join(self.root, 'cifar-10-batches-py', 'test_batch')
            with open(data_file, 'rb') as fo:
                dict = pickle.load(fo, encoding='bytes')
                self.data.append(dict[b'data'])
                self.targets.extend(dict[b'labels'])
                self.logits.extend(dict[b'logits'])  

        self.data = np.concatenate(self.data, axis=0)
        self.targets = np.array(self.targets)
        self.logits = np.array(self.logits)


    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        image = self.data[index].reshape(3, 32, 32).transpose(1, 2, 0)
        label = self.targets[index]
        logit = self.logits[index]
        
        image = Image.fromarray(image.astype('uint8'), 'RGB')
        logit = torch.tensor(logit, dtype=torch.float)
        if self.transform:
            image = self.transform(image)

        if self.target_transform:
            target = self.target_transform(target)
            
        return {
            'pixel_values': image,
            'labels': label,
            'logits': logit
        }


Definice accuracy metriky pro trénování modelu.

In [17]:
accuracy_metric = evaluate.load("accuracy")
precision_metric = evaluate.load("precision")
recall_metric = evaluate.load("recall")
f1_metric = evaluate.load("f1")

def compute_metrics(eval_pred):
    pred, labels = eval_pred
    predictions = np.argmax(pred, axis=1)
    
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    precision = precision_metric.compute(predictions=predictions, references=labels, average='macro', zero_division = 0)
    recall = recall_metric.compute(predictions=predictions, references=labels, average='macro', zero_division = 0)
    f1 = f1_metric.compute(predictions=predictions, references=labels, average='macro')

    return {
        "accuracy": accuracy["accuracy"],
        "precision": precision["precision"],
        "recall": recall["recall"],
        "f1": f1["f1"]
    }

Trénovací argumenty pro trainer. 

In [18]:
class Custom_training_args(TrainingArguments):
    def __init__(self, lambda_param, temperature, *args, **kwargs):
        super().__init__(*args, **kwargs)    
        self.lambda_param = lambda_param
        self.temperature = temperature

In [19]:
def get_training_args(output_dir:str, logging_dir:str, remove_unused_columns:bool):
    return (
        Custom_training_args(
        output_dir=output_dir,
        eval_strategy="epoch",
        save_strategy="epoch",
        logging_strategy="epoch",
        learning_rate=5e-5, #Defaultní hodnota 
        per_device_train_batch_size=128,
        per_device_eval_batch_size=128,
        num_train_epochs=20,
        weight_decay=0.01,
        seed = 42,  #Defaultní hodnota 
        metric_for_best_model="f1",
        fp16=True, 
        logging_dir=logging_dir,
        remove_unused_columns=remove_unused_columns,
        lambda_param = 0.5, 
        temperature = 5
    ))

Náhodně inicializovaný MobileNetV2.

In [20]:
def get_random_init_mobilenet():
    reset_seed(42)
    student_config = MobileNetV2Config()
    student_config.num_labels = 10
    return MobileNetV2ForImageClassification(student_config)

In [21]:
reset_seed(42)

In [22]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Provedení transformací nad datasetem.

In [23]:
transform = transforms.Compose([
    transforms.Resize((224, 224)), 
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
print(os.getcwd())
test = CustomCIFAR10(root='./data/10-logits', train=False, transform=transform)
train = CustomCIFAR10(root='./data/10-logits', train=True, transform=transform)

/home/jovyan


### Standardní trénink náhodně inicializovaného modelu. 

In [24]:
training_args = get_training_args("./results/cifar10-random", './logs/cifar10-random', True)

In [25]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "adam_beta1" : trial.suggest_float("adam_beta1", 0.9, 0.99, step=0.01)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

In [26]:
#Nápočet epoch na steps
min_r = math.ceil(50000/128)*3
max_r = math.ceil(50000/128)*20

In [27]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [28]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=test,
    compute_metrics=compute_metrics,
    model_init=get_random_init_mobilenet
  )
  

In [16]:
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Test",
    n_trials=150
)

[I 2025-01-04 21:20:33,926] A new study created in memory with name: Test


Trial 0 with params: {'learning_rate': 1.0253509690168497e-05, 'weight_decay': 0.01, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3246,1.952858,0.2791,0.279871,0.2791,0.259486
2,1.9593,1.712043,0.3508,0.347448,0.3508,0.342523
3,1.8109,1.617426,0.3922,0.389229,0.3922,0.385845


[I 2025-01-04 21:27:26,053] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 4.128205343826226e-05, 'weight_decay': 0.001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9742,1.53719,0.4307,0.429679,0.4307,0.424683
2,1.5412,1.330349,0.5157,0.515273,0.5157,0.510939
3,1.3125,1.170791,0.5809,0.580427,0.5809,0.578425
4,1.1445,1.056477,0.6237,0.618623,0.6237,0.617743
5,1.0126,1.003619,0.6454,0.659922,0.6454,0.646621
6,0.9023,0.950919,0.6697,0.673489,0.6697,0.665197


[I 2025-01-04 21:41:07,363] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.4347159517201402e-06, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4605,2.282299,0.1106,0.10778,0.1106,0.042568
2,2.4206,2.251575,0.1534,0.117842,0.1534,0.102382
3,2.386,2.220645,0.1817,0.190924,0.1817,0.128917
4,2.3441,2.159038,0.2124,0.210841,0.2124,0.158581
5,2.2976,2.121108,0.2225,0.213822,0.2225,0.180521
6,2.2322,2.050332,0.251,0.245453,0.251,0.211239
7,2.1638,1.967224,0.2743,0.267993,0.2743,0.240347
8,2.1014,1.944384,0.2863,0.281919,0.2863,0.259355
9,2.0626,1.88449,0.2925,0.290812,0.2925,0.268828
10,2.0277,1.870312,0.3018,0.294115,0.3018,0.283157


[I 2025-01-04 22:08:40,232] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 8.14829321010529e-05, 'weight_decay': 0.0, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8941,1.430043,0.4765,0.470782,0.4765,0.469335
2,1.3893,1.165547,0.5748,0.576051,0.5748,0.573984
3,1.1102,0.983707,0.6453,0.653153,0.6453,0.645667
4,0.9095,0.865909,0.6929,0.693976,0.6929,0.687767
5,0.7547,0.792643,0.7203,0.733455,0.7203,0.722677
6,0.6192,0.748712,0.7448,0.752064,0.7448,0.744119


[I 2025-01-04 22:21:34,354] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0001764971584817573, 'weight_decay': 0.002, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6406,1.127816,0.5875,0.588188,0.5875,0.585114
2,1.0209,0.820439,0.7112,0.718462,0.7112,0.713173
3,0.7381,0.670809,0.7674,0.771259,0.7674,0.766575
4,0.5715,0.586457,0.802,0.804205,0.802,0.801382
5,0.4413,0.590939,0.7975,0.805855,0.7975,0.798904
6,0.3355,0.586605,0.8099,0.820852,0.8099,0.810224
7,0.2506,0.587743,0.8146,0.818786,0.8146,0.813713
8,0.1754,0.60947,0.8218,0.82409,0.8218,0.819391
9,0.1259,0.609261,0.8274,0.830458,0.8274,0.827604
10,0.0875,0.614316,0.8285,0.83009,0.8285,0.828583


[I 2025-01-04 22:47:15,137] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 3.1261029103110603e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4351,2.242303,0.1635,0.136943,0.1635,0.111866
2,2.3502,2.14284,0.2235,0.210075,0.2235,0.178484
3,2.2007,1.968147,0.2729,0.274187,0.2729,0.247437
4,2.0476,1.852924,0.3054,0.295098,0.3054,0.287945
5,1.9712,1.791265,0.3225,0.318366,0.3225,0.305945
6,1.9223,1.761298,0.3419,0.334431,0.3419,0.330193
7,1.8883,1.718303,0.3538,0.348998,0.3538,0.345933
8,1.8572,1.729753,0.3525,0.350417,0.3525,0.33989
9,1.8349,1.68352,0.361,0.359587,0.361,0.35687
10,1.8103,1.661988,0.3759,0.369264,0.3759,0.368133


[I 2025-01-04 23:13:38,212] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 1.4648955132800731e-05, 'weight_decay': 0.003, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2563,1.826764,0.3087,0.31461,0.3087,0.292582
2,1.8794,1.647966,0.3774,0.375282,0.3774,0.368815
3,1.7187,1.529115,0.4315,0.429389,0.4315,0.422839
4,1.5983,1.434979,0.4706,0.466812,0.4706,0.463697
5,1.4963,1.387527,0.4919,0.492083,0.4919,0.487411
6,1.4176,1.318753,0.5203,0.516597,0.5203,0.513999
7,1.3518,1.283824,0.5315,0.53481,0.5315,0.530667
8,1.2935,1.290219,0.537,0.528555,0.537,0.52647
9,1.2428,1.210784,0.5538,0.562601,0.5538,0.554204
10,1.1966,1.201471,0.5659,0.563295,0.5659,0.560735


[I 2025-01-04 23:59:14,928] Trial 6 finished with value: 0.5950815756680115 and parameters: {'learning_rate': 1.4648955132800731e-05, 'weight_decay': 0.003, 'adam_beta1': 0.96}. Best is trial 6 with value: 0.5950815756680115.


Trial 7 with params: {'learning_rate': 2.379522116387725e-06, 'weight_decay': 0.003, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4468,2.260634,0.1454,0.1549,0.1454,0.097441
2,2.3863,2.205066,0.1901,0.161216,0.1901,0.136788
3,2.3062,2.111651,0.2298,0.221321,0.2298,0.189907
4,2.1789,1.956355,0.2692,0.257775,0.2692,0.234927
5,2.0694,1.879248,0.2902,0.288061,0.2902,0.261908
6,2.0033,1.830837,0.3178,0.311658,0.3178,0.29864


[I 2025-01-05 00:12:52,817] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 1.7018418817029176e-05, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1899,1.748065,0.3472,0.346124,0.3472,0.338002
2,1.8072,1.573016,0.4137,0.411779,0.4137,0.406236
3,1.6486,1.469678,0.4623,0.45938,0.4623,0.456181
4,1.5265,1.391242,0.4942,0.486963,0.4942,0.486008
5,1.43,1.337853,0.5166,0.519818,0.5166,0.512017
6,1.3466,1.266646,0.5423,0.535865,0.5423,0.534215
7,1.2734,1.223051,0.5558,0.561654,0.5558,0.556145
8,1.2093,1.206107,0.5643,0.557,0.5643,0.556482
9,1.154,1.157013,0.5804,0.590095,0.5804,0.582638
10,1.1061,1.135454,0.5876,0.585054,0.5876,0.583652


[I 2025-01-05 00:55:40,712] Trial 8 finished with value: 0.6208557778745601 and parameters: {'learning_rate': 1.7018418817029176e-05, 'weight_decay': 0.008, 'adam_beta1': 0.91}. Best is trial 8 with value: 0.6208557778745601.


Trial 9 with params: {'learning_rate': 2.4428866967349976e-05, 'weight_decay': 0.006, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1049,1.678595,0.3648,0.362437,0.3648,0.356758
2,1.71,1.476397,0.4493,0.444266,0.4493,0.444005
3,1.5237,1.371078,0.4975,0.492691,0.4975,0.490335
4,1.3954,1.277173,0.537,0.531524,0.537,0.529376
5,1.2839,1.230579,0.5537,0.559545,0.5537,0.550563
6,1.1989,1.151359,0.5843,0.583187,0.5843,0.577479
7,1.1141,1.112761,0.6025,0.608018,0.6025,0.602562
8,1.0469,1.099864,0.609,0.605735,0.609,0.601062
9,0.9814,1.062936,0.6153,0.629036,0.6153,0.619208
10,0.9276,1.023877,0.6338,0.634735,0.6338,0.632403


[I 2025-01-05 01:38:23,211] Trial 9 finished with value: 0.6587412372988884 and parameters: {'learning_rate': 2.4428866967349976e-05, 'weight_decay': 0.006, 'adam_beta1': 0.9}. Best is trial 9 with value: 0.6587412372988884.


Trial 10 with params: {'learning_rate': 0.00026497374689934315, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5501,1.07052,0.6095,0.61599,0.6095,0.60885
2,0.9387,0.791795,0.7222,0.733242,0.7222,0.722489
3,0.6961,0.640728,0.7764,0.782317,0.7764,0.775712


[I 2025-01-05 01:44:45,380] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 6.527343955903165e-06, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3908,2.129595,0.2269,0.214195,0.2269,0.193048
2,2.1013,1.828567,0.3235,0.322282,0.3235,0.307401
3,1.9237,1.732433,0.3536,0.352255,0.3536,0.347266


[I 2025-01-05 01:51:07,505] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 3.134890324531348e-05, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0415,1.604173,0.3981,0.387857,0.3981,0.389202
2,1.6263,1.405563,0.482,0.480198,0.482,0.477872
3,1.409,1.245762,0.5494,0.543667,0.5494,0.540694
4,1.2537,1.147284,0.5863,0.57934,0.5863,0.577982
5,1.1328,1.100701,0.6041,0.614502,0.6041,0.603709
6,1.0259,1.017457,0.6387,0.640086,0.6387,0.634207


[I 2025-01-05 02:03:56,905] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 1.6056817955297837e-05, 'weight_decay': 0.007, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2195,1.784407,0.3307,0.334054,0.3307,0.318914
2,1.835,1.59344,0.4042,0.406138,0.4042,0.397507
3,1.6717,1.488459,0.4516,0.446642,0.4516,0.444709
4,1.5608,1.411375,0.4861,0.481367,0.4861,0.479865
5,1.4684,1.374187,0.4949,0.500257,0.4949,0.491805
6,1.3935,1.297013,0.5264,0.521843,0.5264,0.517804


[I 2025-01-05 02:16:47,622] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 1.947699682751316e-06, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4531,2.270793,0.1296,0.113251,0.1296,0.075413
2,2.4031,2.227194,0.174,0.142878,0.174,0.120905
3,2.3436,2.164479,0.2065,0.191618,0.2065,0.159332
4,2.2618,2.048353,0.2522,0.23654,0.2522,0.209989
5,2.1583,1.963016,0.2683,0.263557,0.2683,0.230846
6,2.0769,1.900415,0.292,0.29083,0.292,0.264963
7,2.0239,1.836032,0.3111,0.304448,0.3111,0.292289
8,1.9822,1.836321,0.3138,0.311642,0.3138,0.295086
9,1.9562,1.789328,0.3188,0.315088,0.3188,0.307131
10,1.932,1.776399,0.3336,0.325596,0.3336,0.324586


[I 2025-01-05 02:42:24,410] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 3.7646643049236884e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.002,1.577376,0.4059,0.402007,0.4059,0.392889
2,1.5902,1.369401,0.4973,0.497588,0.4973,0.49068
3,1.3685,1.21385,0.5634,0.559047,0.5634,0.557309
4,1.1987,1.10486,0.6036,0.598958,0.6036,0.596596
5,1.0733,1.061325,0.6223,0.639277,0.6223,0.624395
6,0.9606,0.978991,0.6552,0.657911,0.6552,0.651362
7,0.8636,0.932991,0.6701,0.677782,0.6701,0.670745
8,0.7748,0.893197,0.6885,0.685967,0.6885,0.684532
9,0.6987,0.878446,0.6965,0.707104,0.6965,0.699116
10,0.6292,0.860722,0.6993,0.705631,0.6993,0.698749


[I 2025-01-05 03:25:18,187] Trial 15 finished with value: 0.7115095943026561 and parameters: {'learning_rate': 3.7646643049236884e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}. Best is trial 15 with value: 0.7115095943026561.


Trial 16 with params: {'learning_rate': 4.793568139083467e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9636,1.517127,0.4326,0.426099,0.4326,0.425027
2,1.5189,1.301817,0.5258,0.524072,0.5258,0.52115
3,1.2805,1.13664,0.5954,0.593616,0.5954,0.588324
4,1.0965,1.018689,0.6371,0.63767,0.6371,0.633892
5,0.9563,0.948012,0.6657,0.674683,0.6657,0.667007
6,0.8327,0.873027,0.6947,0.701483,0.6947,0.693779
7,0.7312,0.848679,0.7033,0.713526,0.7033,0.704211
8,0.639,0.84095,0.7082,0.709359,0.7082,0.703627
9,0.5556,0.813897,0.7222,0.730861,0.7222,0.724781
10,0.4772,0.819984,0.7233,0.733256,0.7233,0.723457


[I 2025-01-05 04:07:55,253] Trial 16 finished with value: 0.7264228400259334 and parameters: {'learning_rate': 4.793568139083467e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}. Best is trial 16 with value: 0.7264228400259334.


Trial 17 with params: {'learning_rate': 4.5954116842262964e-05, 'weight_decay': 0.01, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9592,1.498164,0.4406,0.434725,0.4406,0.429227
2,1.5103,1.284363,0.5348,0.53596,0.5348,0.53189
3,1.2699,1.130308,0.5946,0.593785,0.5946,0.590546
4,1.0968,1.015123,0.6339,0.633395,0.6339,0.628984
5,0.9578,0.960816,0.6583,0.674505,0.6583,0.660201
6,0.8375,0.880179,0.6891,0.695175,0.6891,0.687079
7,0.7295,0.840937,0.7079,0.714594,0.7079,0.708108
8,0.6416,0.842875,0.7126,0.713827,0.7126,0.70855
9,0.5563,0.829616,0.7202,0.728773,0.7202,0.722622
10,0.4817,0.818117,0.7218,0.727749,0.7218,0.721353


[I 2025-01-05 04:50:31,890] Trial 17 finished with value: 0.7324007888083972 and parameters: {'learning_rate': 4.5954116842262964e-05, 'weight_decay': 0.01, 'adam_beta1': 0.93}. Best is trial 17 with value: 0.7324007888083972.


Trial 18 with params: {'learning_rate': 0.00013765420776152537, 'weight_decay': 0.006, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7098,1.246266,0.5434,0.55002,0.5434,0.541676
2,1.0932,0.892218,0.6882,0.688161,0.6882,0.686755
3,0.8157,0.725626,0.7488,0.755123,0.7488,0.748851
4,0.6321,0.618335,0.788,0.788621,0.788,0.787188
5,0.4919,0.60125,0.7934,0.802161,0.7934,0.794982
6,0.3776,0.600538,0.8021,0.815965,0.8021,0.803029


[I 2025-01-05 05:03:21,236] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00013579422085040563, 'weight_decay': 0.01, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7034,1.203514,0.5646,0.568082,0.5646,0.562523
2,1.0906,0.919844,0.6724,0.686609,0.6724,0.672239
3,0.7997,0.729169,0.7411,0.748621,0.7411,0.741454


[I 2025-01-05 05:09:45,228] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.00017908732484905353, 'weight_decay': 0.01, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6761,1.188751,0.5694,0.569628,0.5694,0.568315
2,1.0513,0.830241,0.7062,0.705964,0.7062,0.70553
3,0.7493,0.679388,0.7621,0.767416,0.7621,0.761431
4,0.5746,0.594829,0.7961,0.796734,0.7961,0.793515
5,0.4486,0.587099,0.8003,0.809539,0.8003,0.801703
6,0.3393,0.560434,0.8142,0.820768,0.8142,0.813883
7,0.2484,0.601537,0.8162,0.820883,0.8162,0.815266
8,0.1772,0.615484,0.8165,0.819076,0.8165,0.813967
9,0.1265,0.621376,0.825,0.832725,0.825,0.826762
10,0.0896,0.653146,0.8222,0.829261,0.8222,0.822789


[I 2025-01-05 05:52:17,516] Trial 20 finished with value: 0.8436343253621142 and parameters: {'learning_rate': 0.00017908732484905353, 'weight_decay': 0.01, 'adam_beta1': 0.96}. Best is trial 20 with value: 0.8436343253621142.


Trial 21 with params: {'learning_rate': 0.00024796752162416176, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.604,1.125922,0.595,0.592714,0.595,0.58988
2,1.0082,0.819158,0.7081,0.7107,0.7081,0.70588
3,0.7359,0.703378,0.7554,0.764378,0.7554,0.754871
4,0.571,0.572495,0.8047,0.805637,0.8047,0.80164
5,0.4527,0.588036,0.7942,0.80576,0.7942,0.794658
6,0.3546,0.56029,0.8126,0.826051,0.8126,0.812595
7,0.2675,0.581054,0.8168,0.825978,0.8168,0.816683
8,0.2006,0.555858,0.8316,0.83469,0.8316,0.830775
9,0.1427,0.595301,0.8284,0.839054,0.8284,0.830749
10,0.1048,0.573688,0.8383,0.842008,0.8383,0.838236


[I 2025-01-05 06:34:57,766] Trial 21 finished with value: 0.8460513731033876 and parameters: {'learning_rate': 0.00024796752162416176, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98}. Best is trial 21 with value: 0.8460513731033876.


Trial 22 with params: {'learning_rate': 0.0004413674387972158, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5263,1.072006,0.6091,0.606467,0.6091,0.602693
2,0.976,0.848625,0.6972,0.704602,0.6972,0.696241
3,0.7541,0.691066,0.7536,0.757052,0.7536,0.753133
4,0.6088,0.630164,0.7815,0.785131,0.7815,0.779897
5,0.5171,0.610278,0.7904,0.801282,0.7904,0.79152
6,0.4315,0.544594,0.8145,0.827902,0.8145,0.815427
7,0.3552,0.572938,0.8153,0.824265,0.8153,0.815738
8,0.2818,0.549921,0.8265,0.83069,0.8265,0.824541
9,0.2303,0.536593,0.8358,0.841596,0.8358,0.836641
10,0.1788,0.520882,0.843,0.844831,0.843,0.842888


[I 2025-01-05 07:17:39,913] Trial 22 finished with value: 0.86291730276543 and parameters: {'learning_rate': 0.0004413674387972158, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.99}. Best is trial 22 with value: 0.86291730276543.


Trial 23 with params: {'learning_rate': 0.00040595483908833195, 'weight_decay': 0.008, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6046,1.151673,0.5766,0.572685,0.5766,0.567392
2,1.0462,0.877985,0.6785,0.68861,0.6785,0.68058
3,0.7993,0.707619,0.7486,0.751764,0.7486,0.748546
4,0.6486,0.644641,0.7777,0.779473,0.7777,0.776084
5,0.5362,0.642922,0.7768,0.792114,0.7768,0.777777
6,0.4519,0.595346,0.7975,0.817441,0.7975,0.798586


[I 2025-01-05 07:30:27,576] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0003306866048037238, 'weight_decay': 0.01, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5338,1.049941,0.6124,0.613865,0.6124,0.60947
2,0.9302,0.742202,0.7376,0.737658,0.7376,0.735975
3,0.6826,0.646021,0.7788,0.784694,0.7788,0.777208
4,0.5521,0.557602,0.8142,0.815036,0.8142,0.812262
5,0.4443,0.53778,0.8192,0.826836,0.8192,0.820028
6,0.3591,0.541463,0.8192,0.836789,0.8192,0.821371
7,0.2847,0.49979,0.8369,0.841481,0.8369,0.836534
8,0.223,0.523813,0.838,0.841018,0.838,0.836406
9,0.1645,0.518979,0.8455,0.852752,0.8455,0.847362
10,0.1239,0.493854,0.8542,0.857621,0.8542,0.854677


[I 2025-01-05 07:56:04,290] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.00015871762956826775, 'weight_decay': 0.01, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7296,1.264098,0.5315,0.525612,0.5315,0.521144
2,1.1628,0.948111,0.6611,0.666969,0.6611,0.660386
3,0.8737,0.768148,0.7302,0.730452,0.7302,0.729357


[I 2025-01-05 08:02:27,685] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 5.2161675322108636e-05, 'weight_decay': 0.005, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0434,1.576046,0.4048,0.399431,0.4048,0.393377
2,1.5752,1.320534,0.5168,0.507791,0.5168,0.509507
3,1.3187,1.174596,0.5763,0.576565,0.5763,0.572134


[I 2025-01-05 08:08:51,458] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 4.848784343928481e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9818,1.528348,0.4314,0.429014,0.4314,0.421205
2,1.5096,1.275821,0.5307,0.528413,0.5307,0.527289
3,1.2682,1.131454,0.5953,0.594555,0.5953,0.589652


[I 2025-01-05 08:15:15,614] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.00022623511831356313, 'weight_decay': 0.007, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6436,1.151175,0.5855,0.581571,0.5855,0.580988
2,1.0089,0.81074,0.7108,0.714458,0.7108,0.711026
3,0.7413,0.698268,0.7549,0.760881,0.7549,0.751885
4,0.5745,0.59562,0.7954,0.796538,0.7954,0.794411
5,0.4617,0.579388,0.8007,0.809625,0.8007,0.801779
6,0.3654,0.548162,0.8187,0.826444,0.8187,0.818501
7,0.2701,0.579607,0.8163,0.825852,0.8163,0.817796
8,0.2016,0.579981,0.8172,0.819854,0.8172,0.815249
9,0.149,0.602065,0.8223,0.830631,0.8223,0.824271
10,0.1043,0.593555,0.8306,0.835582,0.8306,0.831068


[I 2025-01-05 08:40:50,503] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 6.187797075267839e-05, 'weight_decay': 0.01, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9265,1.449535,0.4644,0.461455,0.4644,0.459055
2,1.4263,1.206546,0.5602,0.565743,0.5602,0.556031
3,1.16,1.018002,0.6329,0.635851,0.6329,0.630489


[I 2025-01-05 08:47:14,410] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00040351618359122916, 'weight_decay': 0.01, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5302,1.107742,0.5958,0.594683,0.5958,0.587103
2,1.0136,0.847365,0.6985,0.707735,0.6985,0.701026
3,0.7764,0.718215,0.754,0.760479,0.754,0.754562
4,0.6245,0.641368,0.7779,0.779005,0.7779,0.774775
5,0.5215,0.614973,0.7881,0.800707,0.7881,0.78849
6,0.4405,0.55185,0.8099,0.824994,0.8099,0.810899
7,0.3565,0.55271,0.8205,0.827275,0.8205,0.820768
8,0.2914,0.544716,0.8305,0.83093,0.8305,0.828199
9,0.2286,0.526493,0.8388,0.845154,0.8388,0.840409
10,0.1805,0.542583,0.8371,0.839678,0.8371,0.836822


[I 2025-01-05 09:12:49,493] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 7.501161458092744e-05, 'weight_decay': 0.01, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8637,1.435934,0.4681,0.470104,0.4681,0.464792
2,1.3459,1.111851,0.6015,0.598183,0.6015,0.597492
3,1.0483,0.924457,0.6729,0.676566,0.6729,0.671784


[I 2025-01-05 09:19:13,623] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.00017987487341337253, 'weight_decay': 0.01, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6673,1.183231,0.567,0.56266,0.567,0.561249
2,1.0639,0.870205,0.6949,0.701186,0.6949,0.694775
3,0.7816,0.72269,0.7461,0.752602,0.7461,0.744001
4,0.5954,0.612397,0.7898,0.788797,0.7898,0.788245
5,0.4656,0.589297,0.7974,0.807671,0.7974,0.798929
6,0.3608,0.579994,0.8064,0.816853,0.8064,0.807168
7,0.2716,0.610317,0.8062,0.816345,0.8062,0.806156
8,0.1961,0.655771,0.8074,0.812743,0.8074,0.804916
9,0.1357,0.607155,0.8213,0.826973,0.8213,0.82247
10,0.0999,0.624281,0.829,0.834222,0.829,0.830235


[I 2025-01-05 09:44:49,029] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0001375060054089062, 'weight_decay': 0.007, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7838,1.317146,0.5127,0.506138,0.5127,0.501142
2,1.2444,1.021243,0.6281,0.632958,0.6281,0.627122
3,0.9596,0.866364,0.6954,0.701867,0.6954,0.695338
4,0.7532,0.762638,0.7332,0.73041,0.7332,0.729612
5,0.5955,0.668178,0.7669,0.77407,0.7669,0.767181
6,0.4703,0.6301,0.7794,0.790219,0.7794,0.780411
7,0.3614,0.645798,0.7854,0.793198,0.7854,0.785507
8,0.2724,0.691113,0.7817,0.785478,0.7817,0.779639
9,0.1902,0.684846,0.7914,0.803737,0.7914,0.794863
10,0.1322,0.728731,0.7882,0.79406,0.7882,0.788138


[I 2025-01-05 10:10:22,513] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 2.8038934896473335e-05, 'weight_decay': 0.01, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0774,1.644676,0.381,0.374977,0.381,0.370814
2,1.676,1.456053,0.4671,0.465184,0.4671,0.464632
3,1.4847,1.319991,0.5186,0.515205,0.5186,0.514397
4,1.3385,1.219909,0.5548,0.553928,0.5548,0.549895
5,1.2181,1.165114,0.5807,0.591533,0.5807,0.578669
6,1.1184,1.084092,0.6175,0.615457,0.6175,0.612354
7,1.0295,1.044356,0.6285,0.632898,0.6285,0.627711
8,0.954,1.00784,0.6463,0.641398,0.6463,0.641087
9,0.8821,0.996263,0.6467,0.660349,0.6467,0.650152
10,0.8202,0.95187,0.6606,0.66081,0.6606,0.659115


[I 2025-01-05 10:35:49,533] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0004925521491380374, 'weight_decay': 0.008, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4936,1.065401,0.6153,0.616797,0.6153,0.614399
2,0.9496,0.801986,0.7178,0.727551,0.7178,0.718673
3,0.7259,0.658534,0.7678,0.77433,0.7678,0.766697
4,0.5983,0.578264,0.8011,0.805685,0.8011,0.797801
5,0.4941,0.583973,0.7995,0.812949,0.7995,0.801593
6,0.4078,0.510527,0.8295,0.836786,0.8295,0.829517
7,0.336,0.53764,0.8276,0.833611,0.8276,0.826932
8,0.2732,0.536804,0.8312,0.835083,0.8312,0.829237
9,0.2115,0.502514,0.8457,0.851378,0.8457,0.846653
10,0.1695,0.484192,0.854,0.85749,0.854,0.853897


[I 2025-01-05 11:01:18,947] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 2.075398068791511e-06, 'weight_decay': 0.006, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4498,2.267393,0.1372,0.157429,0.1372,0.084057
2,2.4011,2.226266,0.1767,0.132207,0.1767,0.1235
3,2.3467,2.171937,0.2049,0.203067,0.2049,0.161622


[I 2025-01-05 11:07:41,466] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 3.92291334425912e-06, 'weight_decay': 0.0, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.43,2.229822,0.1785,0.174986,0.1785,0.125259
2,2.322,2.088967,0.2432,0.232828,0.2432,0.206842
3,2.129,1.899266,0.2954,0.292112,0.2954,0.277154
4,1.9891,1.797882,0.3216,0.313088,0.3216,0.311877
5,1.922,1.750868,0.3359,0.332163,0.3359,0.325653
6,1.8741,1.70966,0.3563,0.347011,0.3563,0.344639
7,1.8355,1.667521,0.3666,0.359263,0.3666,0.35771
8,1.8022,1.665559,0.3758,0.372505,0.3758,0.362887
9,1.773,1.63011,0.3919,0.392158,0.3919,0.387649
10,1.7466,1.609582,0.3987,0.392177,0.3987,0.390862


[I 2025-01-05 11:33:16,645] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00018168575078529773, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6516,1.166662,0.5741,0.573484,0.5741,0.570754
2,1.0166,0.830036,0.7045,0.714556,0.7045,0.705619
3,0.724,0.654419,0.7735,0.780981,0.7735,0.773478
4,0.5589,0.586825,0.7983,0.798835,0.7983,0.797153
5,0.4344,0.582735,0.8017,0.810444,0.8017,0.802967
6,0.3245,0.584696,0.8053,0.821958,0.8053,0.806304
7,0.239,0.588599,0.817,0.823193,0.817,0.815664
8,0.172,0.608273,0.8202,0.821215,0.8202,0.817997
9,0.1237,0.591844,0.8288,0.834629,0.8288,0.83003
10,0.0851,0.625611,0.8287,0.835308,0.8287,0.829591


[I 2025-01-05 12:15:55,227] Trial 38 finished with value: 0.8405213542354458 and parameters: {'learning_rate': 0.00018168575078529773, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}. Best is trial 22 with value: 0.86291730276543.


Trial 39 with params: {'learning_rate': 0.0002947977049247464, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5445,1.035609,0.6227,0.62702,0.6227,0.620208
2,0.935,0.755529,0.7345,0.743601,0.7345,0.73472
3,0.6858,0.619067,0.7835,0.789564,0.7835,0.78403
4,0.5442,0.580627,0.8047,0.804374,0.8047,0.803141
5,0.4433,0.551817,0.8136,0.82091,0.8136,0.814933
6,0.3483,0.543,0.8264,0.838439,0.8264,0.827464
7,0.2688,0.540105,0.828,0.833298,0.828,0.826846
8,0.2016,0.561683,0.8319,0.836564,0.8319,0.830493
9,0.1474,0.540595,0.84,0.84672,0.84,0.841428
10,0.1096,0.565936,0.8394,0.845226,0.8394,0.839216


[I 2025-01-05 12:58:29,990] Trial 39 finished with value: 0.860222532824702 and parameters: {'learning_rate': 0.0002947977049247464, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}. Best is trial 22 with value: 0.86291730276543.


Trial 40 with params: {'learning_rate': 0.00020652942222134038, 'weight_decay': 0.01, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6026,1.095366,0.602,0.603605,0.602,0.600169
2,0.987,0.824548,0.7104,0.708952,0.7104,0.707536
3,0.721,0.656982,0.7696,0.774285,0.7696,0.767133
4,0.5627,0.591322,0.7942,0.792547,0.7942,0.791528
5,0.4466,0.571206,0.8063,0.815896,0.8063,0.808007
6,0.3423,0.564947,0.8174,0.833951,0.8174,0.819564
7,0.2522,0.566043,0.8227,0.827047,0.8227,0.82121
8,0.1845,0.597047,0.8204,0.823936,0.8204,0.818025
9,0.1337,0.59116,0.8285,0.838044,0.8285,0.83056
10,0.0949,0.608517,0.8334,0.837099,0.8334,0.833408


[I 2025-01-05 13:41:12,177] Trial 40 finished with value: 0.8506766774228713 and parameters: {'learning_rate': 0.00020652942222134038, 'weight_decay': 0.01, 'adam_beta1': 0.9500000000000001}. Best is trial 22 with value: 0.86291730276543.


Trial 41 with params: {'learning_rate': 0.0004276250043872556, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.465,1.002748,0.6335,0.630586,0.6335,0.627556
2,0.9202,0.740912,0.7415,0.745255,0.7415,0.74014
3,0.6867,0.639018,0.7771,0.784604,0.7771,0.776443
4,0.5534,0.59532,0.7962,0.797429,0.7962,0.794177
5,0.4686,0.558515,0.8051,0.812614,0.8051,0.805744
6,0.379,0.511824,0.8265,0.838594,0.8265,0.826624
7,0.3047,0.551869,0.8264,0.831876,0.8264,0.824582
8,0.2471,0.600445,0.8246,0.831169,0.8246,0.821463
9,0.1903,0.511897,0.8458,0.850208,0.8458,0.846498
10,0.1478,0.508903,0.8506,0.853311,0.8506,0.850274


[I 2025-01-05 14:23:54,193] Trial 41 finished with value: 0.8663758905377342 and parameters: {'learning_rate': 0.0004276250043872556, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}. Best is trial 41 with value: 0.8663758905377342.


Trial 42 with params: {'learning_rate': 0.0004670686683720471, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4563,0.988651,0.6338,0.63298,0.6338,0.628775
2,0.8962,0.741054,0.7441,0.748267,0.7441,0.742493
3,0.6734,0.619009,0.7855,0.79321,0.7855,0.785067
4,0.5529,0.531752,0.8175,0.819251,0.8175,0.816925
5,0.4538,0.560722,0.8073,0.822262,0.8073,0.8105
6,0.3752,0.527844,0.8252,0.842409,0.8252,0.826647
7,0.309,0.555326,0.827,0.833333,0.827,0.825437
8,0.252,0.524185,0.8379,0.84081,0.8379,0.836136
9,0.197,0.472335,0.8519,0.855355,0.8519,0.852688
10,0.1497,0.51406,0.8488,0.852637,0.8488,0.847909


[I 2025-01-05 14:49:31,192] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0004725747942797692, 'weight_decay': 0.008, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5202,1.049342,0.6168,0.616421,0.6168,0.612083
2,0.9645,0.779385,0.7232,0.730057,0.7232,0.722855
3,0.7218,0.655909,0.7668,0.776302,0.7668,0.767587
4,0.5814,0.577393,0.8027,0.803525,0.8027,0.799994
5,0.483,0.537759,0.8149,0.819047,0.8149,0.815666
6,0.3965,0.536287,0.8172,0.841473,0.8172,0.820237
7,0.3229,0.513649,0.8319,0.837585,0.8319,0.830164
8,0.2594,0.508824,0.8396,0.842581,0.8396,0.837451
9,0.2007,0.499723,0.8437,0.849736,0.8437,0.844718
10,0.155,0.519655,0.8469,0.853087,0.8469,0.847738


[I 2025-01-05 15:32:11,073] Trial 43 finished with value: 0.8734249573321554 and parameters: {'learning_rate': 0.0004725747942797692, 'weight_decay': 0.008, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 44 with params: {'learning_rate': 0.0002586202470904171, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.572,1.047924,0.6226,0.623142,0.6226,0.6201
2,0.9539,0.788371,0.725,0.727569,0.725,0.724647
3,0.6946,0.650969,0.7722,0.778209,0.7722,0.772804
4,0.5383,0.583792,0.7985,0.800123,0.7985,0.797708
5,0.4297,0.58956,0.7993,0.809862,0.7993,0.800428
6,0.3355,0.550477,0.8176,0.836395,0.8176,0.820118
7,0.2543,0.571033,0.8207,0.82696,0.8207,0.819353
8,0.1926,0.573277,0.8255,0.829198,0.8255,0.823739
9,0.1382,0.566545,0.8368,0.844027,0.8368,0.838013
10,0.1034,0.565774,0.8443,0.84653,0.8443,0.844147


[I 2025-01-05 16:14:50,466] Trial 44 finished with value: 0.8570214282825903 and parameters: {'learning_rate': 0.0002586202470904171, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 45 with params: {'learning_rate': 0.0002772250200432083, 'weight_decay': 0.007, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5561,1.047056,0.6191,0.616475,0.6191,0.615019
2,0.9374,0.771624,0.728,0.730108,0.728,0.726273
3,0.6842,0.642774,0.7797,0.78775,0.7797,0.780383
4,0.5397,0.551972,0.8052,0.80588,0.8052,0.803279
5,0.431,0.558962,0.8135,0.822577,0.8135,0.814913
6,0.34,0.554033,0.8219,0.836892,0.8219,0.823944
7,0.2618,0.571558,0.8256,0.832446,0.8256,0.824782
8,0.196,0.569202,0.832,0.835001,0.832,0.83016
9,0.1414,0.554201,0.8375,0.845658,0.8375,0.839308
10,0.1052,0.565717,0.8438,0.846079,0.8438,0.843327


[I 2025-01-05 16:57:22,535] Trial 45 finished with value: 0.8610621552023506 and parameters: {'learning_rate': 0.0002772250200432083, 'weight_decay': 0.007, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 46 with params: {'learning_rate': 0.0004397901974583414, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4823,1.051332,0.618,0.61644,0.618,0.613575
2,0.9168,0.745865,0.7366,0.746159,0.7366,0.73634
3,0.6945,0.640588,0.7728,0.779592,0.7728,0.772126
4,0.5601,0.57268,0.8046,0.804024,0.8046,0.80225
5,0.4696,0.560456,0.8064,0.812624,0.8064,0.806883
6,0.3819,0.506327,0.8302,0.839951,0.8302,0.8307
7,0.3063,0.5073,0.8331,0.839835,0.8331,0.832498
8,0.2445,0.547802,0.8318,0.833204,0.8318,0.82896
9,0.1942,0.515192,0.8412,0.84665,0.8412,0.842383
10,0.1439,0.517146,0.8489,0.853143,0.8489,0.849248


[I 2025-01-05 17:39:56,002] Trial 46 finished with value: 0.8680805975880987 and parameters: {'learning_rate': 0.0004397901974583414, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 47 with params: {'learning_rate': 0.0004797607389782365, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4959,1.08307,0.6109,0.615528,0.6109,0.608715
2,0.9584,0.770279,0.7316,0.736569,0.7316,0.731607
3,0.7143,0.635144,0.7778,0.787581,0.7778,0.778356
4,0.577,0.584406,0.802,0.802784,0.802,0.800133
5,0.478,0.543707,0.8134,0.818064,0.8134,0.814049
6,0.3977,0.542697,0.8202,0.834782,0.8202,0.820668
7,0.3224,0.529765,0.832,0.837915,0.832,0.831131
8,0.2643,0.553548,0.8264,0.832056,0.8264,0.824555
9,0.2055,0.475171,0.8543,0.857102,0.8543,0.854533
10,0.1576,0.521419,0.8476,0.853592,0.8476,0.848236


[I 2025-01-05 18:22:51,755] Trial 47 finished with value: 0.8720004894730644 and parameters: {'learning_rate': 0.0004797607389782365, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 48 with params: {'learning_rate': 0.0003492552435215607, 'weight_decay': 0.002, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5225,1.026445,0.6289,0.634853,0.6289,0.62711
2,0.9372,0.793807,0.7236,0.726657,0.7236,0.723118
3,0.7034,0.65558,0.7682,0.77488,0.7682,0.767834
4,0.5599,0.567577,0.8043,0.805656,0.8043,0.801937
5,0.4542,0.570614,0.8088,0.819051,0.8088,0.810261
6,0.3666,0.543063,0.8239,0.832641,0.8239,0.823988
7,0.2884,0.551899,0.8269,0.833216,0.8269,0.825696
8,0.2208,0.568634,0.8331,0.837188,0.8331,0.830852
9,0.1643,0.519863,0.8443,0.848539,0.8443,0.845071
10,0.1225,0.51418,0.8528,0.853625,0.8528,0.852203


[I 2025-01-05 19:05:47,283] Trial 48 finished with value: 0.8628823283516527 and parameters: {'learning_rate': 0.0003492552435215607, 'weight_decay': 0.002, 'adam_beta1': 0.96}. Best is trial 43 with value: 0.8734249573321554.


Trial 49 with params: {'learning_rate': 0.0003267485508168292, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5037,1.02345,0.6326,0.629438,0.6326,0.6286
2,0.9216,0.760325,0.7338,0.735409,0.7338,0.733889
3,0.6831,0.616931,0.7885,0.796538,0.7885,0.787626
4,0.5444,0.562224,0.8098,0.81057,0.8098,0.80843
5,0.4408,0.543788,0.8159,0.823848,0.8159,0.817148
6,0.3548,0.551357,0.8169,0.831343,0.8169,0.818734
7,0.2802,0.529237,0.8305,0.835374,0.8305,0.8293
8,0.2159,0.516477,0.8365,0.837907,0.8365,0.834413
9,0.1623,0.517359,0.8486,0.858577,0.8486,0.850679
10,0.119,0.524445,0.8496,0.852097,0.8496,0.849438


[I 2025-01-05 19:48:45,071] Trial 49 finished with value: 0.8650364930541079 and parameters: {'learning_rate': 0.0003267485508168292, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 50 with params: {'learning_rate': 0.000448494834140905, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4748,0.999107,0.6378,0.637541,0.6378,0.635556
2,0.9235,0.764875,0.7381,0.742248,0.7381,0.736283
3,0.6857,0.650859,0.769,0.784805,0.769,0.770706
4,0.5587,0.553325,0.814,0.815186,0.814,0.812562
5,0.4617,0.548922,0.8086,0.818335,0.8086,0.810682
6,0.3787,0.5417,0.8193,0.837565,0.8193,0.821123
7,0.3083,0.50752,0.8386,0.843065,0.8386,0.837577
8,0.2413,0.558687,0.8308,0.83435,0.8308,0.828627
9,0.1872,0.500504,0.8507,0.855137,0.8507,0.851544
10,0.1435,0.492879,0.85,0.855243,0.85,0.851166


[I 2025-01-05 20:31:26,854] Trial 50 finished with value: 0.8705269585305098 and parameters: {'learning_rate': 0.000448494834140905, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 51 with params: {'learning_rate': 0.00022905223619638782, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5841,1.10386,0.6022,0.603365,0.6022,0.598135
2,0.9664,0.805039,0.7175,0.719092,0.7175,0.717316
3,0.707,0.64581,0.776,0.783406,0.776,0.776257
4,0.5492,0.567129,0.8066,0.805918,0.8066,0.804425
5,0.4374,0.573411,0.8037,0.81079,0.8037,0.804765
6,0.3437,0.572338,0.8123,0.830737,0.8123,0.813826


[I 2025-01-05 20:44:15,392] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0003572576187741018, 'weight_decay': 0.003, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4752,1.011231,0.6409,0.654688,0.6409,0.640656
2,0.8802,0.715536,0.7446,0.750261,0.7446,0.744742
3,0.6615,0.627614,0.783,0.790018,0.783,0.781081
4,0.5325,0.54604,0.814,0.81537,0.814,0.8129
5,0.4367,0.546844,0.8124,0.825626,0.8124,0.814315
6,0.3532,0.541202,0.8221,0.839542,0.8221,0.823461


[I 2025-01-05 20:57:05,827] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.00011040902092007652, 'weight_decay': 0.001, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7724,1.349078,0.5101,0.524677,0.5101,0.505291
2,1.19,0.959483,0.6558,0.65621,0.6558,0.654239
3,0.8923,0.780131,0.7273,0.730935,0.7273,0.72542


[I 2025-01-05 21:03:31,158] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0004016091664777296, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4854,1.019874,0.6322,0.631339,0.6322,0.629722
2,0.9295,0.759615,0.7334,0.738021,0.7334,0.732286
3,0.6956,0.667049,0.7671,0.780671,0.7671,0.766386
4,0.5689,0.577947,0.7997,0.79876,0.7997,0.797395
5,0.4722,0.578991,0.8016,0.812196,0.8016,0.802677
6,0.3847,0.549861,0.82,0.832715,0.82,0.820019


[I 2025-01-05 21:16:21,759] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.00040570395318183824, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4997,1.023191,0.6276,0.638368,0.6276,0.628595
2,0.9233,0.737692,0.742,0.743085,0.742,0.741157
3,0.6867,0.651854,0.7782,0.7859,0.7782,0.77771
4,0.558,0.576176,0.8076,0.809609,0.8076,0.806115
5,0.4619,0.544779,0.8142,0.825653,0.8142,0.816864
6,0.3733,0.529681,0.8234,0.839911,0.8234,0.82527
7,0.3005,0.530598,0.8329,0.840793,0.8329,0.833411
8,0.2384,0.519069,0.839,0.841863,0.839,0.837501
9,0.1859,0.470977,0.8574,0.859697,0.8574,0.858002
10,0.1384,0.499792,0.8516,0.853549,0.8516,0.851573


[I 2025-01-05 21:59:01,117] Trial 55 finished with value: 0.8691381024646935 and parameters: {'learning_rate': 0.00040570395318183824, 'weight_decay': 0.004, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 56 with params: {'learning_rate': 0.000442388035837816, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5035,1.052023,0.6156,0.622884,0.6156,0.613331
2,0.9658,0.818383,0.7103,0.714292,0.7103,0.709703
3,0.7357,0.703441,0.7522,0.764883,0.7522,0.752767
4,0.5973,0.587103,0.8024,0.801475,0.8024,0.800251
5,0.4907,0.568354,0.8036,0.817451,0.8036,0.806715
6,0.4054,0.561609,0.8164,0.833848,0.8164,0.816844


[I 2025-01-05 22:11:46,460] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00015451028113544184, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6745,1.157302,0.5834,0.588872,0.5834,0.581887
2,1.0556,0.859545,0.6962,0.706529,0.6962,0.696158
3,0.7735,0.710457,0.7541,0.757504,0.7541,0.75381
4,0.6051,0.631469,0.781,0.782455,0.781,0.780149
5,0.473,0.635361,0.7842,0.793067,0.7842,0.784963
6,0.3595,0.630759,0.7922,0.812003,0.7922,0.793914


[I 2025-01-05 22:24:35,111] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0004265256770682201, 'weight_decay': 0.005, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4796,1.02131,0.6273,0.635027,0.6273,0.627584
2,0.9211,0.746995,0.7416,0.744051,0.7416,0.740491
3,0.683,0.64442,0.7756,0.781738,0.7756,0.775658
4,0.5548,0.58562,0.8059,0.808576,0.8059,0.80405
5,0.4573,0.544259,0.8203,0.827257,0.8203,0.821745
6,0.3728,0.517218,0.8227,0.839108,0.8227,0.823905
7,0.3002,0.532418,0.8278,0.834855,0.8278,0.826466
8,0.2363,0.520577,0.8387,0.841837,0.8387,0.836894
9,0.1821,0.491377,0.8553,0.857783,0.8553,0.855895
10,0.1405,0.527167,0.8474,0.850395,0.8474,0.846851


[I 2025-01-05 23:07:18,718] Trial 58 finished with value: 0.870244789846579 and parameters: {'learning_rate': 0.0004265256770682201, 'weight_decay': 0.005, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 59 with params: {'learning_rate': 0.0003444410121164827, 'weight_decay': 0.001, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5123,1.033352,0.6242,0.625736,0.6242,0.620589
2,0.9388,0.756097,0.735,0.737943,0.735,0.734245
3,0.6984,0.675718,0.7693,0.782734,0.7693,0.77051
4,0.5609,0.570574,0.805,0.808212,0.805,0.803758
5,0.4616,0.560951,0.8066,0.818768,0.8066,0.808924
6,0.3711,0.54432,0.8173,0.839123,0.8173,0.819845
7,0.2961,0.563864,0.8219,0.82728,0.8219,0.820527
8,0.2265,0.559359,0.8293,0.833029,0.8293,0.827388
9,0.1708,0.497884,0.851,0.856588,0.851,0.852357
10,0.1295,0.522955,0.8485,0.852111,0.8485,0.849078


[I 2025-01-05 23:50:06,296] Trial 59 finished with value: 0.8635176173765824 and parameters: {'learning_rate': 0.0003444410121164827, 'weight_decay': 0.001, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 60 with params: {'learning_rate': 0.0003311460639031371, 'weight_decay': 0.005, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4983,0.994967,0.6408,0.644601,0.6408,0.637629
2,0.8898,0.726301,0.7466,0.744651,0.7466,0.744756
3,0.657,0.629337,0.7844,0.792082,0.7844,0.783435
4,0.5209,0.557179,0.8115,0.813942,0.8115,0.810122
5,0.426,0.560725,0.8132,0.823096,0.8132,0.815486
6,0.3386,0.542596,0.8232,0.844791,0.8232,0.825387
7,0.2592,0.578227,0.8237,0.832134,0.8237,0.821836
8,0.2019,0.536198,0.8412,0.843731,0.8412,0.839165
9,0.1483,0.507593,0.8516,0.85675,0.8516,0.852504
10,0.1116,0.517639,0.8536,0.858036,0.8536,0.853783


[I 2025-01-06 00:33:00,846] Trial 60 finished with value: 0.868428585207249 and parameters: {'learning_rate': 0.0003311460639031371, 'weight_decay': 0.005, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 61 with params: {'learning_rate': 0.0003523966961192257, 'weight_decay': 0.006, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4669,0.986324,0.6436,0.65189,0.6436,0.643738
2,0.8942,0.750718,0.7374,0.743662,0.7374,0.736818
3,0.6659,0.619064,0.7838,0.790967,0.7838,0.783333
4,0.5357,0.54677,0.8141,0.815339,0.8141,0.81239
5,0.4428,0.557886,0.8056,0.822502,0.8056,0.808679
6,0.3517,0.549823,0.8154,0.837919,0.8154,0.818001


[I 2025-01-06 00:45:52,764] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.00019775145102347727, 'weight_decay': 0.005, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5741,1.101775,0.599,0.596099,0.599,0.593851
2,0.9652,0.790364,0.7221,0.723805,0.7221,0.721135
3,0.7016,0.681907,0.7633,0.776884,0.7633,0.761906
4,0.5452,0.583881,0.8021,0.802764,0.8021,0.800616
5,0.4272,0.566688,0.8053,0.814781,0.8053,0.807536
6,0.3301,0.60713,0.807,0.826464,0.807,0.807575


[I 2025-01-06 00:58:41,915] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0004472893403094078, 'weight_decay': 0.005, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.501,1.018426,0.6287,0.630333,0.6287,0.625007
2,0.9385,0.792908,0.7216,0.72691,0.7216,0.720806
3,0.7099,0.672514,0.7704,0.784173,0.7704,0.770667
4,0.5716,0.553957,0.8085,0.810627,0.8085,0.807715
5,0.4699,0.542509,0.811,0.819024,0.811,0.812912
6,0.3846,0.573377,0.8097,0.832863,0.8097,0.8111
7,0.3105,0.5265,0.834,0.838771,0.834,0.833181
8,0.2459,0.533909,0.8328,0.836044,0.8328,0.830638
9,0.193,0.486675,0.8488,0.853554,0.8488,0.849419
10,0.1455,0.496321,0.8546,0.857965,0.8546,0.854569


[I 2025-01-06 01:41:35,249] Trial 63 finished with value: 0.8657352607464006 and parameters: {'learning_rate': 0.0004472893403094078, 'weight_decay': 0.005, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 64 with params: {'learning_rate': 0.000429611397476057, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.442,1.005649,0.6334,0.635353,0.6334,0.631819
2,0.892,0.774661,0.7263,0.734686,0.7263,0.723331
3,0.6618,0.638024,0.7784,0.787745,0.7784,0.77776
4,0.5373,0.553533,0.8101,0.809299,0.8101,0.807624
5,0.4469,0.555553,0.811,0.822908,0.811,0.812637
6,0.3626,0.558757,0.817,0.837094,0.817,0.818927
7,0.2908,0.52567,0.8373,0.84217,0.8373,0.836478
8,0.2258,0.553724,0.8324,0.834611,0.8324,0.830474
9,0.176,0.479681,0.851,0.853557,0.851,0.851325
10,0.1296,0.525169,0.8522,0.856691,0.8522,0.851873


[I 2025-01-06 02:24:26,857] Trial 64 finished with value: 0.868353644895224 and parameters: {'learning_rate': 0.000429611397476057, 'weight_decay': 0.005, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 65 with params: {'learning_rate': 0.00034975166753541603, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4819,0.970475,0.6398,0.644578,0.6398,0.635729
2,0.8646,0.710227,0.7556,0.757745,0.7556,0.754642
3,0.6423,0.645422,0.7797,0.78992,0.7797,0.780125
4,0.5197,0.537317,0.8161,0.817518,0.8161,0.814382
5,0.4237,0.510032,0.8205,0.827558,0.8205,0.821922
6,0.3404,0.535378,0.8246,0.843021,0.8246,0.825927
7,0.2673,0.52942,0.8359,0.840861,0.8359,0.835183
8,0.204,0.546578,0.8353,0.840103,0.8353,0.832477
9,0.1551,0.493083,0.8499,0.853152,0.8499,0.850126
10,0.1154,0.557393,0.8467,0.8519,0.8467,0.84597


[I 2025-01-06 03:07:13,602] Trial 65 finished with value: 0.8653114598917085 and parameters: {'learning_rate': 0.00034975166753541603, 'weight_decay': 0.005, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 66 with params: {'learning_rate': 0.00040470030266719775, 'weight_decay': 0.004, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4686,1.016645,0.6322,0.639199,0.6322,0.629123
2,0.895,0.714659,0.7486,0.753312,0.7486,0.749137
3,0.6669,0.656501,0.772,0.78108,0.772,0.771896
4,0.5419,0.566078,0.8071,0.808088,0.8071,0.805918
5,0.4474,0.523593,0.821,0.830585,0.821,0.822778
6,0.3609,0.581764,0.8091,0.839269,0.8091,0.812208


[I 2025-01-06 03:20:00,138] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 9.335272885246343e-05, 'weight_decay': 0.006, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7957,1.355059,0.5029,0.501718,0.5029,0.49921
2,1.2429,1.014603,0.6363,0.635225,0.6363,0.634468
3,0.9461,0.827621,0.7043,0.713812,0.7043,0.705246
4,0.7466,0.702152,0.7581,0.755833,0.7581,0.755708
5,0.5926,0.655342,0.7675,0.779398,0.7675,0.769906
6,0.4678,0.663406,0.7745,0.788596,0.7745,0.774928


[I 2025-01-06 03:32:52,094] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 3.0237449631860023e-05, 'weight_decay': 0.003, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0551,1.620177,0.384,0.377958,0.384,0.372818
2,1.6547,1.420216,0.4784,0.476308,0.4784,0.472678
3,1.4494,1.285144,0.5378,0.534039,0.5378,0.531652


[I 2025-01-06 03:39:18,472] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.00044913226856979936, 'weight_decay': 0.005, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.468,1.006453,0.6324,0.644623,0.6324,0.631177
2,0.9181,0.740215,0.7378,0.740151,0.7378,0.736084
3,0.6834,0.639592,0.779,0.791253,0.779,0.780677
4,0.5583,0.553223,0.8129,0.812512,0.8129,0.811597
5,0.4566,0.542028,0.8146,0.828162,0.8146,0.816846
6,0.3768,0.54527,0.8197,0.839236,0.8197,0.822054
7,0.3046,0.530746,0.8285,0.837387,0.8285,0.82774
8,0.2441,0.513483,0.8415,0.843294,0.8415,0.839556
9,0.1866,0.488621,0.8532,0.855764,0.8532,0.85292
10,0.1459,0.509703,0.853,0.85857,0.853,0.853293


[I 2025-01-06 04:04:56,771] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00030285916646651407, 'weight_decay': 0.006, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5183,1.045423,0.6239,0.625953,0.6239,0.622024
2,0.9148,0.720611,0.7442,0.746582,0.7442,0.743937
3,0.6626,0.634525,0.7797,0.786354,0.7797,0.779195
4,0.5319,0.575112,0.8047,0.808651,0.8047,0.803058
5,0.4255,0.526435,0.8194,0.830211,0.8194,0.821532
6,0.3397,0.574135,0.8135,0.840988,0.8135,0.815764
7,0.2618,0.531765,0.8355,0.838998,0.8355,0.834718
8,0.1973,0.533049,0.8356,0.839508,0.8356,0.833668
9,0.1453,0.508038,0.8467,0.850928,0.8467,0.847487
10,0.1071,0.53218,0.8466,0.852401,0.8466,0.847646


[I 2025-01-06 04:47:48,262] Trial 70 finished with value: 0.8674378267475584 and parameters: {'learning_rate': 0.00030285916646651407, 'weight_decay': 0.006, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 71 with params: {'learning_rate': 0.00045551296610184743, 'weight_decay': 0.004, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4669,0.983371,0.641,0.639384,0.641,0.638166
2,0.9158,0.755632,0.7352,0.738605,0.7352,0.734607
3,0.6981,0.653298,0.7766,0.780846,0.7766,0.776257
4,0.5684,0.566161,0.8062,0.809173,0.8062,0.805218
5,0.4764,0.538676,0.8154,0.823458,0.8154,0.817038
6,0.3894,0.538664,0.8181,0.833867,0.8181,0.818813


[I 2025-01-06 05:00:39,442] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0004855269556893471, 'weight_decay': 0.005, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4801,1.013938,0.6306,0.636074,0.6306,0.630806
2,0.9139,0.741974,0.7439,0.750552,0.7439,0.744114
3,0.6834,0.638821,0.7747,0.785596,0.7747,0.774936
4,0.5611,0.573721,0.8079,0.807724,0.8079,0.806453
5,0.4749,0.53218,0.8182,0.824668,0.8182,0.819236
6,0.3903,0.531885,0.8214,0.834738,0.8214,0.822503
7,0.3235,0.512437,0.8343,0.840801,0.8343,0.833714
8,0.261,0.528239,0.8344,0.838386,0.8344,0.831999
9,0.2078,0.488125,0.8483,0.853126,0.8483,0.84927
10,0.1568,0.497539,0.8544,0.85899,0.8544,0.854505


[I 2025-01-06 05:43:31,016] Trial 72 finished with value: 0.8721757676811153 and parameters: {'learning_rate': 0.0004855269556893471, 'weight_decay': 0.005, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 73 with params: {'learning_rate': 0.0004931227153023962, 'weight_decay': 0.002, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4963,1.039001,0.6185,0.631263,0.6185,0.615784
2,0.9309,0.777544,0.7249,0.727488,0.7249,0.721888
3,0.7031,0.646621,0.7759,0.786035,0.7759,0.776156
4,0.566,0.580712,0.8,0.801221,0.8,0.798277
5,0.472,0.539491,0.8166,0.825056,0.8166,0.818308
6,0.3852,0.527739,0.8256,0.842178,0.8256,0.826789
7,0.3136,0.545101,0.8283,0.834271,0.8283,0.826542
8,0.2516,0.557794,0.8304,0.834445,0.8304,0.827944
9,0.1977,0.493158,0.8496,0.853416,0.8496,0.850175
10,0.1522,0.520611,0.8461,0.850482,0.8461,0.845889


[I 2025-01-06 06:26:14,054] Trial 73 finished with value: 0.8672765399160488 and parameters: {'learning_rate': 0.0004931227153023962, 'weight_decay': 0.002, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 74 with params: {'learning_rate': 0.0003660005345388166, 'weight_decay': 0.006, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4977,1.008491,0.6309,0.630904,0.6309,0.627708
2,0.9101,0.761845,0.7344,0.741753,0.7344,0.734019
3,0.6869,0.628953,0.7774,0.786135,0.7774,0.778511
4,0.5507,0.532757,0.8181,0.817527,0.8181,0.817007
5,0.4476,0.562788,0.8076,0.817025,0.8076,0.808857
6,0.3615,0.555791,0.8152,0.834577,0.8152,0.81681


[I 2025-01-06 06:39:06,765] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0004267365683134317, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4833,1.009558,0.6333,0.638699,0.6333,0.633473
2,0.8964,0.725079,0.7445,0.746264,0.7445,0.74364
3,0.6688,0.642739,0.7773,0.787814,0.7773,0.777885
4,0.5446,0.533775,0.8192,0.820136,0.8192,0.818136
5,0.4486,0.533399,0.8177,0.824557,0.8177,0.819164
6,0.372,0.5299,0.8241,0.841676,0.8241,0.826139
7,0.298,0.523146,0.8294,0.834344,0.8294,0.82861
8,0.2376,0.546413,0.8328,0.837791,0.8328,0.831008
9,0.1797,0.484547,0.8489,0.852671,0.8489,0.849174
10,0.1363,0.505192,0.8519,0.854956,0.8519,0.852045


[I 2025-01-06 07:21:51,277] Trial 75 finished with value: 0.8655523846377962 and parameters: {'learning_rate': 0.0004267365683134317, 'weight_decay': 0.004, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 76 with params: {'learning_rate': 3.721571397975755e-06, 'weight_decay': 0.01, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4303,2.237418,0.1728,0.135045,0.1728,0.120363
2,2.342,2.13602,0.224,0.213535,0.224,0.188456
3,2.1945,1.957101,0.2784,0.277683,0.2784,0.250162
4,2.0337,1.832001,0.3146,0.304911,0.3146,0.299912
5,1.9492,1.770352,0.327,0.324455,0.327,0.315685
6,1.8955,1.729565,0.3513,0.345534,0.3513,0.341256


[I 2025-01-06 07:34:38,948] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00011231663323072297, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7602,1.297926,0.5284,0.530748,0.5284,0.524786
2,1.1754,0.963881,0.6576,0.654774,0.6576,0.655149
3,0.8852,0.794462,0.7157,0.720883,0.7157,0.715679


[I 2025-01-06 07:41:04,181] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.000447602055169624, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4848,0.988222,0.6469,0.653563,0.6469,0.648013
2,0.9089,0.751412,0.7358,0.742914,0.7358,0.735643
3,0.6976,0.660852,0.7723,0.781041,0.7723,0.77324
4,0.5715,0.56272,0.8037,0.804317,0.8037,0.802065
5,0.4725,0.551282,0.8142,0.825291,0.8142,0.816193
6,0.3843,0.587186,0.8131,0.835408,0.8131,0.815222
7,0.3146,0.525934,0.8306,0.836641,0.8306,0.830484
8,0.2516,0.562216,0.8284,0.833504,0.8284,0.825899
9,0.2018,0.522993,0.8419,0.849751,0.8419,0.843796
10,0.1521,0.540394,0.8451,0.84979,0.8451,0.845102


[I 2025-01-06 08:23:45,203] Trial 78 finished with value: 0.8634235867779394 and parameters: {'learning_rate': 0.000447602055169624, 'weight_decay': 0.004, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 79 with params: {'learning_rate': 0.0004000670460098183, 'weight_decay': 0.007, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4562,0.96469,0.6538,0.654271,0.6538,0.650009
2,0.863,0.702417,0.7551,0.758365,0.7551,0.753652
3,0.6474,0.651694,0.7769,0.786126,0.7769,0.776767
4,0.5313,0.559874,0.8095,0.811477,0.8095,0.807406
5,0.4422,0.521907,0.8186,0.827151,0.8186,0.820194
6,0.3574,0.54995,0.8222,0.840171,0.8222,0.82428
7,0.2914,0.519902,0.8392,0.842617,0.8392,0.838282
8,0.2262,0.531603,0.8356,0.838005,0.8356,0.833685
9,0.1761,0.478396,0.8551,0.857029,0.8551,0.854941
10,0.1301,0.500594,0.854,0.856883,0.854,0.854186


[I 2025-01-06 09:06:40,687] Trial 79 finished with value: 0.8658422177623833 and parameters: {'learning_rate': 0.0004000670460098183, 'weight_decay': 0.007, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 80 with params: {'learning_rate': 0.00041799722774633744, 'weight_decay': 0.003, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4774,1.029866,0.6229,0.622256,0.6229,0.615935
2,0.919,0.750097,0.739,0.741464,0.739,0.737599
3,0.6822,0.668776,0.7687,0.77612,0.7687,0.768633
4,0.5529,0.559449,0.8099,0.81078,0.8099,0.808312
5,0.451,0.551913,0.8118,0.819375,0.8118,0.813604
6,0.3686,0.565648,0.8161,0.840289,0.8161,0.818305
7,0.2957,0.541114,0.8339,0.83775,0.8339,0.832484
8,0.2317,0.525971,0.8366,0.841185,0.8366,0.834997
9,0.1766,0.51759,0.8456,0.849341,0.8456,0.846299
10,0.1327,0.533382,0.8501,0.852684,0.8501,0.849111


[I 2025-01-06 09:49:15,337] Trial 80 finished with value: 0.8689689700730387 and parameters: {'learning_rate': 0.00041799722774633744, 'weight_decay': 0.003, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 81 with params: {'learning_rate': 0.00019257852267789502, 'weight_decay': 0.0, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5993,1.08993,0.6077,0.607928,0.6077,0.604713
2,0.9635,0.80249,0.7165,0.720815,0.7165,0.714942
3,0.7036,0.652361,0.7747,0.780826,0.7747,0.774378
4,0.5488,0.565565,0.8073,0.809251,0.8073,0.806495
5,0.4226,0.558582,0.8087,0.819513,0.8087,0.810758
6,0.3173,0.586938,0.8125,0.833585,0.8125,0.814518


[I 2025-01-06 10:02:06,899] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0003609902553553248, 'weight_decay': 0.002, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5027,1.024951,0.6261,0.630354,0.6261,0.624042
2,0.9159,0.727593,0.7419,0.746034,0.7419,0.741832
3,0.6707,0.644396,0.7767,0.784447,0.7767,0.77803
4,0.5389,0.561981,0.8088,0.808999,0.8088,0.80714
5,0.4397,0.581985,0.8019,0.816281,0.8019,0.805214
6,0.3511,0.549123,0.8196,0.835577,0.8196,0.820377
7,0.28,0.526889,0.8336,0.83713,0.8336,0.833164
8,0.2151,0.541592,0.8365,0.837327,0.8365,0.83418
9,0.1613,0.526104,0.8431,0.848855,0.8431,0.84431
10,0.1213,0.562318,0.8425,0.850482,0.8425,0.842413


[I 2025-01-06 10:44:42,475] Trial 82 finished with value: 0.8641234643600398 and parameters: {'learning_rate': 0.0003609902553553248, 'weight_decay': 0.002, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 83 with params: {'learning_rate': 0.00010582806610219601, 'weight_decay': 0.006, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7684,1.333896,0.5096,0.515478,0.5096,0.505844
2,1.1938,0.966519,0.6528,0.65474,0.6528,0.652583
3,0.8931,0.814032,0.7136,0.723497,0.7136,0.714398


[I 2025-01-06 10:51:06,209] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.000430315470339567, 'weight_decay': 0.003, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4489,0.986564,0.6429,0.652847,0.6429,0.640177
2,0.8742,0.71361,0.7488,0.751844,0.7488,0.747174
3,0.6633,0.631513,0.7821,0.789316,0.7821,0.782447
4,0.5436,0.55926,0.8111,0.813371,0.8111,0.809399
5,0.4485,0.551704,0.8099,0.82138,0.8099,0.811146
6,0.366,0.573404,0.8096,0.833508,0.8096,0.811509


[I 2025-01-06 11:03:52,869] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 7.33845823015502e-05, 'weight_decay': 0.004, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8709,1.434637,0.4767,0.478081,0.4767,0.468887
2,1.354,1.116524,0.6003,0.599948,0.6003,0.598039
3,1.0461,0.925452,0.6722,0.678963,0.6722,0.670578


[I 2025-01-06 11:10:19,174] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0004686238940265176, 'weight_decay': 0.006, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4828,1.067545,0.6147,0.616233,0.6147,0.613784
2,0.9367,0.758618,0.7404,0.747841,0.7404,0.740866
3,0.6999,0.651856,0.7709,0.780034,0.7709,0.771354
4,0.5694,0.551811,0.8101,0.810179,0.8101,0.80852
5,0.4728,0.528005,0.8193,0.824367,0.8193,0.819714
6,0.3932,0.508082,0.8282,0.838517,0.8282,0.82893
7,0.3189,0.545018,0.8255,0.832265,0.8255,0.823295
8,0.2544,0.52248,0.8381,0.84046,0.8381,0.835732
9,0.1961,0.471568,0.8535,0.858236,0.8535,0.854827
10,0.1498,0.498642,0.8547,0.856142,0.8547,0.854141


[I 2025-01-06 11:36:06,624] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.00020604669292026708, 'weight_decay': 0.003, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6011,1.083391,0.6063,0.608053,0.6063,0.603271
2,0.9819,0.798708,0.7195,0.730051,0.7195,0.721361
3,0.7143,0.675274,0.7667,0.776219,0.7667,0.76687
4,0.5549,0.578859,0.8025,0.803954,0.8025,0.801226
5,0.4372,0.552844,0.8118,0.818038,0.8118,0.812621
6,0.335,0.573637,0.8102,0.82623,0.8102,0.810927


[I 2025-01-06 11:48:51,650] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0003375974608332158, 'weight_decay': 0.004, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4978,0.98746,0.6353,0.643702,0.6353,0.633812
2,0.8919,0.732354,0.7398,0.748187,0.7398,0.74
3,0.6713,0.641148,0.7791,0.78711,0.7791,0.778527
4,0.544,0.558513,0.8096,0.810764,0.8096,0.808352
5,0.4448,0.551292,0.8119,0.822292,0.8119,0.81463
6,0.3545,0.552477,0.8213,0.839149,0.8213,0.82293
7,0.2725,0.573513,0.821,0.826372,0.821,0.818805
8,0.2085,0.554744,0.8352,0.838027,0.8352,0.832871
9,0.1592,0.555709,0.8352,0.84085,0.8352,0.835407
10,0.1164,0.569493,0.8444,0.848724,0.8444,0.845263


[I 2025-01-06 12:31:27,514] Trial 88 finished with value: 0.8655746142035923 and parameters: {'learning_rate': 0.0003375974608332158, 'weight_decay': 0.004, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 89 with params: {'learning_rate': 2.6182343503528787e-06, 'weight_decay': 0.006, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4427,2.254329,0.1582,0.135416,0.1582,0.106412
2,2.3742,2.186322,0.1969,0.177306,0.1969,0.146771
3,2.2724,2.05983,0.2471,0.236901,0.2471,0.208843
4,2.1291,1.916776,0.2861,0.278441,0.2861,0.261006
5,2.0327,1.846283,0.3002,0.294921,0.3002,0.274112
6,1.9725,1.801134,0.3285,0.324926,0.3285,0.312327
7,1.9339,1.757089,0.3403,0.332348,0.3403,0.327599
8,1.9004,1.763123,0.3397,0.334785,0.3397,0.325425
9,1.8765,1.719727,0.3478,0.345125,0.3478,0.341148
10,1.854,1.70819,0.3537,0.345373,0.3537,0.345172


[I 2025-01-06 12:57:18,150] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0001268966887509396, 'weight_decay': 0.004, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7198,1.235088,0.5542,0.550633,0.5542,0.551028
2,1.1143,0.912102,0.6762,0.684734,0.6762,0.676012
3,0.8231,0.727061,0.7441,0.749168,0.7441,0.743565
4,0.6314,0.63495,0.781,0.779171,0.781,0.778519
5,0.4934,0.625153,0.7884,0.799095,0.7884,0.790648
6,0.3705,0.639049,0.7908,0.813735,0.7908,0.792634


[I 2025-01-06 13:10:05,777] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0004292963049723308, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4834,1.008501,0.6352,0.637733,0.6352,0.631453
2,0.9218,0.735325,0.7401,0.745496,0.7401,0.740171
3,0.6891,0.669961,0.7694,0.781772,0.7694,0.769579
4,0.5583,0.560746,0.8126,0.81153,0.8126,0.810646
5,0.4615,0.53064,0.8168,0.826499,0.8168,0.818998
6,0.3752,0.528574,0.8205,0.839893,0.8205,0.822198
7,0.302,0.546023,0.8293,0.836442,0.8293,0.828479
8,0.2388,0.512451,0.8395,0.841021,0.8395,0.838169
9,0.1844,0.499047,0.8482,0.853243,0.8482,0.849416
10,0.1387,0.525033,0.8511,0.856674,0.8511,0.851951


[I 2025-01-06 13:52:41,551] Trial 91 finished with value: 0.8661125228947864 and parameters: {'learning_rate': 0.0004292963049723308, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 92 with params: {'learning_rate': 5.417104271994568e-05, 'weight_decay': 0.004, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9195,1.494215,0.4479,0.445528,0.4479,0.441923
2,1.4674,1.252064,0.5364,0.542701,0.5364,0.530771
3,1.2072,1.065434,0.6196,0.622542,0.6196,0.618188
4,1.0177,0.952708,0.6629,0.659633,0.6629,0.658616
5,0.8727,0.891043,0.6847,0.697729,0.6847,0.687187
6,0.7541,0.827185,0.7099,0.718938,0.7099,0.708344


[I 2025-01-06 14:05:28,218] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 1.7230959137136504e-05, 'weight_decay': 0.0, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.188,1.758128,0.3376,0.334327,0.3376,0.328221
2,1.815,1.585962,0.4082,0.4058,0.4082,0.40269
3,1.6471,1.465144,0.4554,0.452772,0.4554,0.44776
4,1.5294,1.388459,0.4922,0.486299,0.4922,0.485305
5,1.4374,1.33866,0.5144,0.520698,0.5144,0.51052
6,1.3592,1.273064,0.5377,0.532705,0.5377,0.529253


[I 2025-01-06 14:18:26,521] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.00022213623144275155, 'weight_decay': 0.005, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6123,1.12719,0.5944,0.588529,0.5944,0.588651
2,0.9858,0.804674,0.7166,0.721927,0.7166,0.717569
3,0.7217,0.657851,0.7715,0.775922,0.7715,0.770085
4,0.5646,0.596874,0.7949,0.794804,0.7949,0.792611
5,0.4466,0.579802,0.7989,0.811948,0.7989,0.801686
6,0.3462,0.598399,0.8036,0.824908,0.8036,0.806


[I 2025-01-06 14:31:13,459] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0004237366326187549, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4949,0.981784,0.639,0.641251,0.639,0.636717
2,0.9152,0.777847,0.7296,0.734782,0.7296,0.729424
3,0.6953,0.642383,0.7794,0.789802,0.7794,0.780098
4,0.5657,0.552816,0.8139,0.812935,0.8139,0.812158
5,0.4712,0.555103,0.8068,0.813565,0.8068,0.808061
6,0.3854,0.529548,0.8253,0.842837,0.8253,0.826998
7,0.3109,0.519009,0.8323,0.837774,0.8323,0.830987
8,0.2466,0.517887,0.8356,0.838843,0.8356,0.834065
9,0.1928,0.485076,0.8504,0.855142,0.8504,0.851515
10,0.1447,0.499124,0.8539,0.857063,0.8539,0.854281


[I 2025-01-06 15:13:56,128] Trial 95 finished with value: 0.8719877942652667 and parameters: {'learning_rate': 0.0004237366326187549, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 96 with params: {'learning_rate': 0.000413148563825221, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.471,0.994665,0.6384,0.637454,0.6384,0.635661
2,0.9096,0.75106,0.7354,0.74139,0.7354,0.735864
3,0.6696,0.617798,0.7851,0.791071,0.7851,0.784615
4,0.5446,0.548153,0.8137,0.815269,0.8137,0.812327
5,0.4451,0.548388,0.8131,0.821297,0.8131,0.814996
6,0.3659,0.539871,0.8214,0.839258,0.8214,0.82342
7,0.2941,0.531826,0.8319,0.838061,0.8319,0.831296
8,0.231,0.558802,0.8319,0.83706,0.8319,0.829613
9,0.1749,0.500682,0.8485,0.85284,0.8485,0.849266
10,0.134,0.519961,0.855,0.856999,0.855,0.854983


[I 2025-01-06 15:39:52,802] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.00048761131220534826, 'weight_decay': 0.004, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4663,1.004144,0.6375,0.641706,0.6375,0.638331
2,0.9064,0.742697,0.741,0.744793,0.741,0.739731
3,0.6861,0.638436,0.7829,0.788011,0.7829,0.78191
4,0.5598,0.560233,0.807,0.808854,0.807,0.805733
5,0.4697,0.522531,0.8226,0.829503,0.8226,0.824442
6,0.3853,0.546408,0.8166,0.835934,0.8166,0.818696
7,0.3193,0.527354,0.8292,0.833227,0.8292,0.827673
8,0.2596,0.517769,0.8381,0.840048,0.8381,0.836439
9,0.1986,0.476436,0.8521,0.855492,0.8521,0.852541
10,0.155,0.473792,0.8584,0.861753,0.8584,0.858836


[I 2025-01-06 16:22:28,864] Trial 97 finished with value: 0.8709650111005172 and parameters: {'learning_rate': 0.00048761131220534826, 'weight_decay': 0.004, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 98 with params: {'learning_rate': 0.00047100232840302893, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4794,0.995511,0.6367,0.642089,0.6367,0.634788
2,0.9261,0.741851,0.7414,0.746798,0.7414,0.741291
3,0.7007,0.655505,0.7727,0.783538,0.7727,0.772378
4,0.5705,0.582737,0.8012,0.803523,0.8012,0.798767
5,0.4814,0.557349,0.8112,0.820417,0.8112,0.812184
6,0.3962,0.580275,0.8087,0.829953,0.8087,0.810244


[I 2025-01-06 16:35:16,269] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0004514735259044771, 'weight_decay': 0.005, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.473,1.005352,0.6357,0.640436,0.6357,0.634453
2,0.9087,0.76104,0.7372,0.743965,0.7372,0.736132
3,0.6867,0.637072,0.7771,0.783378,0.7771,0.776669
4,0.5579,0.564109,0.8092,0.815156,0.8092,0.809342
5,0.4638,0.541249,0.8146,0.826223,0.8146,0.817479
6,0.3803,0.548409,0.8153,0.840204,0.8153,0.816718
7,0.3057,0.516029,0.8305,0.835836,0.8305,0.829574
8,0.2471,0.531576,0.8341,0.837817,0.8341,0.83291
9,0.1904,0.478365,0.853,0.855514,0.853,0.853358
10,0.1438,0.536494,0.8439,0.851127,0.8439,0.844961


[I 2025-01-06 17:17:54,965] Trial 99 finished with value: 0.8721532469666696 and parameters: {'learning_rate': 0.0004514735259044771, 'weight_decay': 0.005, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 100 with params: {'learning_rate': 0.0002641183491967019, 'weight_decay': 0.006, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5647,1.029215,0.63,0.636395,0.63,0.628918
2,0.9402,0.747744,0.7379,0.742299,0.7379,0.73641
3,0.6738,0.633208,0.7797,0.784876,0.7797,0.779192
4,0.537,0.550297,0.8137,0.81523,0.8137,0.813207
5,0.429,0.531248,0.8207,0.826095,0.8207,0.820692
6,0.3358,0.541818,0.8211,0.841328,0.8211,0.823765
7,0.2583,0.537493,0.8328,0.837989,0.8328,0.831854
8,0.192,0.549773,0.8334,0.835756,0.8334,0.831329
9,0.1414,0.527465,0.8432,0.849311,0.8432,0.844258
10,0.1073,0.557513,0.8474,0.853141,0.8474,0.846951


[I 2025-01-06 18:00:43,756] Trial 100 finished with value: 0.8645121750354681 and parameters: {'learning_rate': 0.0002641183491967019, 'weight_decay': 0.006, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 101 with params: {'learning_rate': 0.0004951724773049825, 'weight_decay': 0.004, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4542,1.014381,0.6353,0.641768,0.6353,0.637136
2,0.8979,0.769314,0.7311,0.736012,0.7311,0.727837
3,0.6844,0.66199,0.7732,0.781736,0.7732,0.773357
4,0.56,0.560628,0.8075,0.810163,0.8075,0.805838
5,0.4654,0.534379,0.8177,0.824763,0.8177,0.819316
6,0.3855,0.561054,0.8154,0.839704,0.8154,0.816886
7,0.3126,0.526515,0.8308,0.838331,0.8308,0.830913
8,0.2499,0.535112,0.8334,0.836048,0.8334,0.831657
9,0.2003,0.490494,0.8506,0.856048,0.8506,0.851891
10,0.1519,0.509538,0.8525,0.857347,0.8525,0.853488


[I 2025-01-06 18:43:22,041] Trial 101 finished with value: 0.8624748676835965 and parameters: {'learning_rate': 0.0004951724773049825, 'weight_decay': 0.004, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 102 with params: {'learning_rate': 0.00040261834393029834, 'weight_decay': 0.005, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4875,1.007465,0.6324,0.632277,0.6324,0.630101
2,0.9133,0.755406,0.7312,0.737288,0.7312,0.729748
3,0.679,0.634687,0.7772,0.788998,0.7772,0.778897
4,0.5554,0.546372,0.8147,0.816119,0.8147,0.813842
5,0.4549,0.550271,0.8114,0.817887,0.8114,0.812602
6,0.3738,0.567093,0.8133,0.836038,0.8133,0.816189


[I 2025-01-06 18:56:14,008] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0004875046222887296, 'weight_decay': 0.007, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4544,1.035429,0.6227,0.62232,0.6227,0.615881
2,0.9159,0.746902,0.7405,0.741408,0.7405,0.738548
3,0.6921,0.657186,0.7727,0.783978,0.7727,0.773849
4,0.567,0.580375,0.8001,0.802367,0.8001,0.798884
5,0.4661,0.562836,0.8075,0.818758,0.8075,0.809514
6,0.3864,0.554367,0.8162,0.838492,0.8162,0.818517
7,0.316,0.507879,0.8345,0.839755,0.8345,0.833284
8,0.2522,0.54694,0.831,0.834344,0.831,0.82914
9,0.1969,0.481735,0.8528,0.855952,0.8528,0.853646
10,0.1538,0.478611,0.8612,0.86355,0.8612,0.861523


[I 2025-01-06 19:38:53,766] Trial 103 finished with value: 0.8729018320751306 and parameters: {'learning_rate': 0.0004875046222887296, 'weight_decay': 0.007, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 104 with params: {'learning_rate': 0.00038101676079616616, 'weight_decay': 0.006, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4759,1.007977,0.6269,0.627276,0.6269,0.624763
2,0.9029,0.780982,0.7298,0.73418,0.7298,0.727996
3,0.676,0.638755,0.7763,0.786122,0.7763,0.776412
4,0.5465,0.551865,0.8109,0.812641,0.8109,0.810214
5,0.4497,0.51921,0.8225,0.829793,0.8225,0.823836
6,0.3669,0.53587,0.824,0.840371,0.824,0.825559
7,0.2897,0.549359,0.8257,0.831182,0.8257,0.82443
8,0.2294,0.521268,0.8405,0.842635,0.8405,0.838135
9,0.1767,0.501853,0.8478,0.851534,0.8478,0.848564
10,0.1303,0.52705,0.8502,0.856399,0.8502,0.850934


[I 2025-01-06 20:21:27,762] Trial 104 finished with value: 0.8673069480019088 and parameters: {'learning_rate': 0.00038101676079616616, 'weight_decay': 0.006, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 105 with params: {'learning_rate': 1.6562808358868155e-06, 'weight_decay': 0.01, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4572,2.277532,0.1206,0.124015,0.1206,0.059839
2,2.4126,2.240326,0.166,0.134109,0.166,0.112822
3,2.3694,2.198128,0.1933,0.185263,0.1933,0.141917
4,2.3121,2.119181,0.2269,0.202366,0.2269,0.175262
5,2.2402,2.049466,0.2459,0.238973,0.2459,0.20453
6,2.1515,1.966135,0.2749,0.273657,0.2749,0.242066
7,2.0867,1.894638,0.2919,0.284931,0.2919,0.266964
8,2.0356,1.885482,0.2959,0.292269,0.2959,0.273417
9,2.0062,1.832391,0.3081,0.305261,0.3081,0.291479
10,1.9766,1.823234,0.3194,0.312048,0.3194,0.307714


[I 2025-01-06 20:46:59,831] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.00033832242794776723, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5041,1.021198,0.6295,0.637126,0.6295,0.626644
2,0.8904,0.722517,0.7484,0.753113,0.7484,0.74647
3,0.6601,0.621556,0.7816,0.789003,0.7816,0.781997
4,0.5283,0.563579,0.807,0.808238,0.807,0.804537
5,0.4368,0.519542,0.8239,0.832784,0.8239,0.82586
6,0.3467,0.545855,0.8216,0.844708,0.8216,0.82374
7,0.2692,0.522023,0.8351,0.838209,0.8351,0.834164
8,0.2068,0.538181,0.8344,0.838972,0.8344,0.832128
9,0.1532,0.500001,0.8511,0.853477,0.8511,0.85167
10,0.1126,0.565213,0.8453,0.851347,0.8453,0.845753


[I 2025-01-06 21:29:33,388] Trial 106 finished with value: 0.8698852189150866 and parameters: {'learning_rate': 0.00033832242794776723, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 107 with params: {'learning_rate': 0.00028522255066289827, 'weight_decay': 0.007, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5133,1.007248,0.6325,0.634758,0.6325,0.631105
2,0.9179,0.731844,0.7426,0.745861,0.7426,0.741407
3,0.6639,0.593407,0.7962,0.803744,0.7962,0.796694
4,0.5211,0.533475,0.8196,0.818742,0.8196,0.817881
5,0.4254,0.554023,0.8127,0.820069,0.8127,0.813794
6,0.3326,0.57059,0.815,0.839477,0.815,0.817328
7,0.2539,0.554289,0.8292,0.834969,0.8292,0.82702
8,0.1923,0.55371,0.8385,0.841047,0.8385,0.836847
9,0.1397,0.50961,0.8531,0.855862,0.8531,0.853658
10,0.1022,0.567822,0.845,0.852402,0.845,0.845209


[I 2025-01-06 22:12:05,882] Trial 107 finished with value: 0.863185979768234 and parameters: {'learning_rate': 0.00028522255066289827, 'weight_decay': 0.007, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 108 with params: {'learning_rate': 0.0004365276969073769, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4887,1.019611,0.6295,0.638153,0.6295,0.629674
2,0.9135,0.751291,0.7363,0.741747,0.7363,0.733969
3,0.6863,0.646845,0.7734,0.77937,0.7734,0.77269
4,0.5588,0.565889,0.8064,0.8097,0.8064,0.805643
5,0.4655,0.547127,0.8102,0.822986,0.8102,0.812744
6,0.3779,0.560665,0.8159,0.838858,0.8159,0.818546
7,0.3108,0.532496,0.827,0.832483,0.827,0.825579
8,0.2463,0.524314,0.837,0.839808,0.837,0.835077
9,0.1888,0.471913,0.8543,0.855554,0.8543,0.854048
10,0.1479,0.52281,0.8528,0.857168,0.8528,0.852818


[I 2025-01-06 22:54:44,884] Trial 108 finished with value: 0.8727676220851027 and parameters: {'learning_rate': 0.0004365276969073769, 'weight_decay': 0.008, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 109 with params: {'learning_rate': 0.000212997016148891, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5766,1.094815,0.6031,0.612541,0.6031,0.605233
2,0.9812,0.812522,0.7172,0.718901,0.7172,0.715546
3,0.7178,0.655792,0.7713,0.776905,0.7713,0.770252
4,0.5547,0.569587,0.8069,0.805872,0.8069,0.80568
5,0.429,0.552165,0.8132,0.820541,0.8132,0.814902
6,0.3266,0.570732,0.8129,0.833972,0.8129,0.81536
7,0.2408,0.556546,0.8258,0.832277,0.8258,0.825673
8,0.1763,0.599999,0.8241,0.826453,0.8241,0.821476
9,0.1256,0.549609,0.839,0.841442,0.839,0.839362
10,0.0912,0.602112,0.8349,0.838596,0.8349,0.835332


[I 2025-01-06 23:20:21,924] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0002399718778688677, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5515,1.038404,0.6239,0.622679,0.6239,0.618936
2,0.9206,0.76678,0.7295,0.736039,0.7295,0.729121
3,0.6595,0.620691,0.7884,0.795524,0.7884,0.789411
4,0.5198,0.569419,0.8067,0.810447,0.8067,0.806308
5,0.4113,0.55324,0.8089,0.81752,0.8089,0.810595
6,0.3156,0.585042,0.8104,0.834598,0.8104,0.814136


[I 2025-01-06 23:33:08,914] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.00029279311575263313, 'weight_decay': 0.01, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5381,1.029084,0.6271,0.629944,0.6271,0.625677
2,0.9177,0.743338,0.7429,0.745048,0.7429,0.74118
3,0.6735,0.632915,0.7803,0.789758,0.7803,0.781466
4,0.5259,0.565544,0.8091,0.809425,0.8091,0.807847
5,0.4262,0.567103,0.8066,0.821537,0.8066,0.809399
6,0.3374,0.560736,0.8182,0.834808,0.8182,0.82036
7,0.2549,0.531882,0.8358,0.838637,0.8358,0.834667
8,0.1963,0.547865,0.8385,0.841163,0.8385,0.837171
9,0.1459,0.5297,0.844,0.847759,0.844,0.844576
10,0.1067,0.56984,0.8448,0.848234,0.8448,0.844341


[I 2025-01-06 23:58:43,844] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00042883235294515854, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4586,1.035847,0.6268,0.630918,0.6268,0.624924
2,0.8933,0.758939,0.7339,0.738334,0.7339,0.732454
3,0.6731,0.636689,0.7782,0.785521,0.7782,0.777974
4,0.55,0.560111,0.807,0.806917,0.807,0.805357
5,0.4553,0.518992,0.8232,0.831827,0.8232,0.825028
6,0.3727,0.536765,0.8229,0.840531,0.8229,0.824673
7,0.2975,0.537818,0.83,0.835377,0.83,0.829098
8,0.24,0.543594,0.8294,0.833039,0.8294,0.827152
9,0.1853,0.48844,0.8534,0.855882,0.8534,0.853524
10,0.1433,0.518822,0.8521,0.856005,0.8521,0.851279


[I 2025-01-07 00:41:22,585] Trial 112 finished with value: 0.8706206914500061 and parameters: {'learning_rate': 0.00042883235294515854, 'weight_decay': 0.008, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 113 with params: {'learning_rate': 0.0004111791189875075, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.472,0.995304,0.6375,0.634364,0.6375,0.630972
2,0.8953,0.728746,0.7493,0.747151,0.7493,0.746951
3,0.6783,0.646693,0.78,0.788866,0.78,0.779676
4,0.5469,0.564003,0.8063,0.808938,0.8063,0.805246
5,0.4516,0.553311,0.8081,0.817002,0.8081,0.810078
6,0.367,0.592281,0.8018,0.830401,0.8018,0.804261


[I 2025-01-07 00:54:08,962] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0003722199724674618, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4863,1.007537,0.6326,0.636623,0.6326,0.629187
2,0.8848,0.73654,0.7441,0.748527,0.7441,0.742066
3,0.6645,0.638198,0.7791,0.788339,0.7791,0.779073
4,0.5326,0.544711,0.8116,0.813698,0.8116,0.810422
5,0.443,0.512094,0.8287,0.833874,0.8287,0.829566
6,0.3519,0.568748,0.8166,0.841472,0.8166,0.817022


[I 2025-01-07 01:06:56,829] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0004963807424292574, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4547,1.010476,0.6363,0.64722,0.6363,0.636405
2,0.8967,0.739689,0.7429,0.74679,0.7429,0.74096
3,0.6859,0.650148,0.7769,0.782966,0.7769,0.777242
4,0.563,0.572751,0.8052,0.807688,0.8052,0.803526
5,0.47,0.538843,0.8106,0.822436,0.8106,0.813435
6,0.393,0.552786,0.82,0.834168,0.82,0.820603
7,0.3211,0.578589,0.8209,0.828237,0.8209,0.819488
8,0.2586,0.551265,0.8343,0.837307,0.8343,0.832736
9,0.2034,0.498952,0.845,0.850238,0.845,0.846242
10,0.157,0.507991,0.8517,0.85713,0.8517,0.852682


[I 2025-01-07 01:49:39,681] Trial 115 finished with value: 0.8655737625501635 and parameters: {'learning_rate': 0.0004963807424292574, 'weight_decay': 0.008, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 116 with params: {'learning_rate': 0.0004742032194648361, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4774,1.002081,0.6328,0.644258,0.6328,0.632684
2,0.9102,0.720287,0.746,0.749287,0.746,0.746166
3,0.6886,0.670612,0.7679,0.77866,0.7679,0.766967
4,0.5748,0.562095,0.8086,0.810384,0.8086,0.806697
5,0.4785,0.545969,0.8151,0.822396,0.8151,0.815168
6,0.3987,0.5171,0.8229,0.836807,0.8229,0.824455
7,0.3272,0.533114,0.8292,0.83479,0.8292,0.828167
8,0.2644,0.517907,0.834,0.835867,0.834,0.83226
9,0.2077,0.483381,0.849,0.851318,0.849,0.849532
10,0.1586,0.501144,0.8542,0.858107,0.8542,0.854797


[I 2025-01-07 02:32:38,287] Trial 116 finished with value: 0.8666231450272142 and parameters: {'learning_rate': 0.0004742032194648361, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 117 with params: {'learning_rate': 0.00016769203705822665, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6556,1.142616,0.5826,0.590841,0.5826,0.584549
2,1.0333,0.819238,0.7122,0.717916,0.7122,0.712892
3,0.7374,0.644444,0.7788,0.784087,0.7788,0.779574
4,0.5644,0.590818,0.7983,0.800921,0.7983,0.796929
5,0.4406,0.571379,0.8004,0.81036,0.8004,0.802439
6,0.3356,0.602637,0.8019,0.827367,0.8019,0.804206


[I 2025-01-07 02:45:25,489] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00045039424742782536, 'weight_decay': 0.0, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4709,1.01819,0.6282,0.628457,0.6282,0.626227
2,0.9275,0.786155,0.7183,0.724879,0.7183,0.716573
3,0.7026,0.661289,0.7673,0.777702,0.7673,0.765997
4,0.5689,0.574138,0.804,0.802679,0.804,0.800934
5,0.4778,0.554265,0.8102,0.818739,0.8102,0.8117
6,0.3974,0.523596,0.8262,0.836739,0.8262,0.826352
7,0.327,0.552846,0.8263,0.83605,0.8263,0.825359
8,0.2617,0.591155,0.8207,0.826989,0.8207,0.818889
9,0.2004,0.509233,0.8404,0.849822,0.8404,0.842428
10,0.1519,0.499016,0.8551,0.857505,0.8551,0.855394


In [None]:
print(best_trial.hyperparameters)

## Definice destilačního tréninku

Třída, která upravuje hugging face trenéra pro destilaci znalostí. Nově pracuje s logity uloženými v datasetu.

In [17]:
class ImageDistilTrainer(Trainer):
    def __init__(self, model_init, *args, **kwargs):
        self.model_init = model_init
        self.loss_function = nn.KLDivLoss(reduction="batchmean")
       
        super().__init__(model_init=model_init, *args, **kwargs)
        
        self.student = self.model_init()
        self.temperature = self.args.temperature
        self.lambda_param = self.args.lambda_param

        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.student.to(device)


    def compute_loss(self, model, inputs, return_outputs=False):
        logits = inputs.pop("logits")

        student_output = model(**inputs)
        self.lambda_param = self.args.lambda_param
        self.temperature = self.args.temperature
        
        soft_teacher = F.softmax(logits / self.temperature, dim=-1)
        soft_student = F.log_softmax(student_output.logits / self.temperature, dim=-1)

        distillation_loss = self.loss_function(soft_student, soft_teacher) * (self.temperature ** 2)

        student_target_loss = student_output.loss

        loss = ((1. - self.lambda_param) * student_target_loss + self.lambda_param * distillation_loss)
        return (loss, student_output) if return_outputs else loss

### Trénink náhodně inicializovaného modelu s pomocí destilace znalostí

In [18]:
reset_seed(42)

In [19]:
training_args = get_training_args("./results/cifar10-random-KD", './logs/cifar10-random-KD', False)

In [20]:
def hp_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 5e-4, log=True),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64]),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }

In [21]:
trainer = ImageDistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=test,
    compute_metrics=compute_metrics,
    model_init=get_random_init_mobilenet,
    callbacks = [EarlyStoppingCallback(early_stopping_patience=3)]
)

In [22]:
pruner = optuna.pruners.HyperbandPruner(min_resource=1, reduction_factor=4)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [23]:
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    n_trials=80
)

[I 2025-01-02 10:31:58,319] A new study created in memory with name: no-name-c949043e-bbe5-4784-8e94-3711ffa24f5e


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8261,1.794448,0.3562,0.363023,0.3562,0.332477
2,1.6639,1.638281,0.429,0.451496,0.429,0.430032
3,1.506,1.536666,0.4901,0.496502,0.4901,0.486227
4,1.3961,1.514397,0.5165,0.521185,0.5165,0.506952
5,1.3117,1.508902,0.5278,0.575985,0.5278,0.532914
6,1.2622,1.367924,0.5864,0.578568,0.5864,0.576121
7,1.1986,1.423449,0.5768,0.601245,0.5768,0.572559
8,1.1257,1.765431,0.5099,0.563286,0.5099,0.494695
9,1.0585,1.310685,0.6264,0.646086,0.6264,0.620627
10,1.0291,1.226045,0.662,0.67427,0.662,0.660745


[I 2025-01-02 11:14:36,692] Trial 0 finished with value: 0.6493830756403727 and parameters: {'learning_rate': 1.0253509690168497e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.001, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 0 with value: 0.6493830756403727.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6508,1.645658,0.6416,0.651808,0.6416,0.630255
2,0.4678,1.807167,0.7252,0.751262,0.7252,0.725268
3,0.372,1.774964,0.796,0.79762,0.796,0.793375
4,0.3116,1.782879,0.8207,0.821734,0.8207,0.81995
5,0.2664,1.785201,0.8013,0.82813,0.8013,0.804786
6,0.2327,1.71434,0.8237,0.833181,0.8237,0.820774
7,0.2029,1.73078,0.816,0.830533,0.816,0.8146
8,0.1778,1.671794,0.7674,0.791711,0.7674,0.761982
9,0.1544,1.754976,0.85,0.855467,0.85,0.849485
10,0.136,1.758484,0.8534,0.860455,0.8534,0.8544


[I 2025-01-02 12:06:48,211] Trial 1 finished with value: 0.8775070142695549 and parameters: {'learning_rate': 0.00021766241123453658, 'per_device_train_batch_size': 32, 'weight_decay': 0.01, 'lambda_param': 0.9, 'temperature': 3.0}. Best is trial 1 with value: 0.8775070142695549.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0266,1.83874,0.1926,0.190376,0.1926,0.143647


[I 2025-01-02 12:09:50,409] Trial 2 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1866,2.000362,0.1742,0.134926,0.1742,0.122639
2,2.0727,1.905863,0.2219,0.216746,0.2219,0.173206
3,1.9723,1.817852,0.272,0.263115,0.272,0.25065
4,1.8452,1.777897,0.2949,0.285927,0.2949,0.278934
5,1.8184,1.783804,0.3045,0.331433,0.3045,0.292158
6,1.773,1.753153,0.3229,0.330373,0.3229,0.307589
7,1.7537,1.781673,0.3105,0.309499,0.3105,0.294976
8,1.7208,2.030673,0.261,0.288998,0.261,0.224095
9,1.6927,1.684682,0.3663,0.370945,0.3663,0.352001
10,1.6895,1.673737,0.3747,0.377942,0.3747,0.369316


[I 2025-01-02 13:10:45,151] Trial 3 finished with value: 0.39022170376656556 and parameters: {'learning_rate': 2.379522116387725e-06, 'per_device_train_batch_size': 64, 'weight_decay': 0.008, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 1 with value: 0.8775070142695549.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6648,1.092367,0.4227,0.459306,0.4227,0.390251


[I 2025-01-02 13:13:50,371] Trial 4 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6859,1.289853,0.5999,0.617377,0.5999,0.590284


[I 2025-01-02 13:16:53,827] Trial 5 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7116,1.602961,0.1753,0.142933,0.1753,0.130146


[I 2025-01-02 13:20:11,131] Trial 6 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4267,1.416727,0.2528,0.251181,0.2528,0.229749


[I 2025-01-02 13:23:28,723] Trial 7 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1714,1.999753,0.1432,0.110177,0.1432,0.092197


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2025-01-02 13:26:31,907] Trial 8 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9743,1.83692,0.3201,0.3136,0.3201,0.298563


[I 2025-01-02 13:29:36,746] Trial 9 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6415,1.609443,0.6555,0.670329,0.6555,0.649023
2,0.4677,1.665924,0.7134,0.749679,0.7134,0.71601
3,0.3789,1.665458,0.7837,0.793046,0.7837,0.782119


[I 2025-01-02 13:38:45,576] Trial 10 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5397,1.72343,0.4542,0.473658,0.4542,0.431718


[I 2025-01-02 13:42:01,117] Trial 11 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0416,1.937631,0.2213,0.216447,0.2213,0.1908


[I 2025-01-02 13:45:16,357] Trial 12 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8073,1.686657,0.6353,0.649883,0.6353,0.627057
2,0.5961,1.635921,0.7302,0.764407,0.7302,0.733458
3,0.4773,1.564477,0.7897,0.796995,0.7897,0.786955
4,0.4012,1.509111,0.8153,0.816522,0.8153,0.814655
5,0.3592,1.624529,0.806,0.831573,0.806,0.810094
6,0.3157,1.57329,0.8287,0.841215,0.8287,0.828446
7,0.2735,1.636031,0.829,0.839301,0.829,0.827579
8,0.2444,1.514421,0.8112,0.82412,0.8112,0.80896
9,0.2129,1.593595,0.8513,0.856352,0.8513,0.850958
10,0.1863,1.598731,0.8642,0.867108,0.8642,0.864882


[I 2025-01-02 14:51:05,124] Trial 13 finished with value: 0.8775142231826768 and parameters: {'learning_rate': 0.00010562422487867538, 'per_device_train_batch_size': 16, 'weight_decay': 0.006, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 13 with value: 0.8775142231826768.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8636,1.897928,0.5118,0.516317,0.5118,0.485337
2,0.641,2.023914,0.6657,0.698177,0.6657,0.66941
3,0.4992,1.995993,0.7334,0.740914,0.7334,0.727995
4,0.4063,2.107063,0.781,0.782289,0.781,0.777834
5,0.3424,2.191655,0.7478,0.795487,0.7478,0.754354
6,0.2992,2.152781,0.8003,0.812695,0.8003,0.797772
7,0.2569,2.111333,0.7903,0.80694,0.7903,0.788578
8,0.2212,2.132159,0.7699,0.788331,0.7699,0.764204
9,0.1868,2.192748,0.8197,0.834518,0.8197,0.818955
10,0.1566,2.109633,0.8454,0.847213,0.8454,0.845268


[I 2025-01-02 15:36:57,665] Trial 14 finished with value: 0.8393768114963797 and parameters: {'learning_rate': 5.070319408723441e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.006, 'lambda_param': 1.0, 'temperature': 2.5}. Best is trial 13 with value: 0.8775142231826768.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9173,1.454623,0.6557,0.663914,0.6557,0.648298


[I 2025-01-02 15:40:02,363] Trial 15 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0642,1.465229,0.6686,0.681861,0.6686,0.668477
2,0.8165,1.360149,0.7421,0.761589,0.7421,0.743705


[I 2025-01-02 15:46:34,346] Trial 16 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1126,1.311957,0.4418,0.479723,0.4418,0.40881


[I 2025-01-02 15:49:36,026] Trial 17 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5256,1.792656,0.4155,0.425195,0.4155,0.39269


[I 2025-01-02 15:52:38,130] Trial 18 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9531,1.179795,0.3425,0.347448,0.3425,0.322543


[I 2025-01-02 15:55:41,693] Trial 19 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9927,1.475629,0.5558,0.570293,0.5558,0.53655


[I 2025-01-02 15:58:57,928] Trial 20 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9215,1.833634,0.5451,0.56258,0.5451,0.526838


[I 2025-01-02 16:02:14,403] Trial 21 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2212,1.650682,0.3403,0.346941,0.3403,0.318851


[I 2025-01-02 16:05:31,073] Trial 22 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.734,2.036732,0.6393,0.663486,0.6393,0.633929


[I 2025-01-02 16:08:47,252] Trial 23 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.594,1.468202,0.6346,0.659192,0.6346,0.629873


[I 2025-01-02 16:11:51,706] Trial 24 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6798,1.906213,0.6414,0.660885,0.6414,0.632942


[I 2025-01-02 16:14:55,823] Trial 25 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7194,1.694801,0.6473,0.662804,0.6473,0.640999


[I 2025-01-02 16:18:12,209] Trial 26 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9313,1.908035,0.6298,0.645091,0.6298,0.622301


[I 2025-01-02 16:21:16,050] Trial 27 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9379,1.426069,0.563,0.575587,0.563,0.540515


[I 2025-01-02 16:24:32,325] Trial 28 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9227,1.432718,0.6136,0.619687,0.6136,0.595488


[I 2025-01-02 16:27:48,218] Trial 29 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1975,1.556503,0.4274,0.450959,0.4274,0.409366


[I 2025-01-02 16:30:54,160] Trial 30 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3001,1.957914,0.2562,0.25312,0.2562,0.227429


[I 2025-01-02 16:33:57,126] Trial 31 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2843,2.159734,0.178,0.170701,0.178,0.130561


[I 2025-01-02 16:37:13,903] Trial 32 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9151,1.848692,0.3279,0.325475,0.3279,0.306867


[I 2025-01-02 16:40:18,042] Trial 33 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7446,1.533749,0.5156,0.537144,0.5156,0.492623


[I 2025-01-02 16:43:35,296] Trial 34 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6718,1.563451,0.4275,0.465244,0.4275,0.411582


[I 2025-01-02 16:46:52,288] Trial 35 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.795,1.661769,0.388,0.401225,0.388,0.364767


[I 2025-01-02 16:50:09,319] Trial 36 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.105,1.519232,0.5907,0.600787,0.5907,0.576703


[I 2025-01-02 16:53:12,069] Trial 37 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7513,1.651152,0.2473,0.243097,0.2473,0.223189


[I 2025-01-02 16:56:15,048] Trial 38 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9926,1.823636,0.5743,0.597146,0.5743,0.563294


[I 2025-01-02 16:59:32,107] Trial 39 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3369,1.677657,0.36,0.373771,0.36,0.341022


[I 2025-01-02 17:02:35,414] Trial 40 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.329,2.152216,0.1382,0.116866,0.1382,0.084491


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2025-01-02 17:05:38,226] Trial 41 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9139,1.728208,0.2069,0.199192,0.2069,0.164699


[I 2025-01-02 17:08:40,898] Trial 42 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2475,1.8881,0.2716,0.270982,0.2716,0.244194


[I 2025-01-02 17:11:43,044] Trial 43 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4339,1.587345,0.4032,0.426729,0.4032,0.373247


[I 2025-01-02 17:15:00,289] Trial 44 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9294,1.715594,0.6177,0.636102,0.6177,0.604075


[I 2025-01-02 17:18:04,823] Trial 45 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9583,1.835287,0.3054,0.295889,0.3054,0.279486


[I 2025-01-02 17:21:08,155] Trial 46 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7165,1.402947,0.6234,0.63648,0.6234,0.611129


[I 2025-01-02 17:24:12,097] Trial 47 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.92,1.839371,0.5049,0.525058,0.5049,0.484643


[I 2025-01-02 17:27:28,696] Trial 48 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6656,1.776729,0.3457,0.360688,0.3457,0.317261


[I 2025-01-02 17:30:45,151] Trial 49 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6235,1.582114,0.6191,0.617891,0.6191,0.597189


[I 2025-01-02 17:33:49,449] Trial 50 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0977,1.372472,0.6512,0.667586,0.6512,0.646812


[I 2025-01-02 17:37:06,060] Trial 51 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7649,1.179881,0.5718,0.583125,0.5718,0.560764


[I 2025-01-02 17:40:09,002] Trial 52 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0512,1.703946,0.5877,0.610459,0.5877,0.572156


[I 2025-01-02 17:43:27,231] Trial 53 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0096,0.895696,0.2205,0.215084,0.2205,0.186628


[I 2025-01-02 17:46:30,657] Trial 54 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1238,1.408782,0.6416,0.65533,0.6416,0.638465


[I 2025-01-02 17:49:47,553] Trial 55 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0247,1.767879,0.4539,0.473591,0.4539,0.430663


[I 2025-01-02 17:52:51,986] Trial 56 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7054,1.701337,0.3864,0.410349,0.3864,0.363824


[I 2025-01-02 17:56:09,362] Trial 57 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9377,1.742919,0.2577,0.248751,0.2577,0.226796


[I 2025-01-02 17:59:11,668] Trial 58 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2921,1.788777,0.4656,0.496162,0.4656,0.440067


[I 2025-01-02 18:02:28,588] Trial 59 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2658,1.626632,0.3021,0.321277,0.3021,0.279703


[I 2025-01-02 18:05:46,758] Trial 60 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.714,1.62575,0.6495,0.666292,0.6495,0.641938


[I 2025-01-02 18:08:52,208] Trial 61 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1942,1.513356,0.6262,0.653919,0.6262,0.623126


[I 2025-01-02 18:11:54,478] Trial 62 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1225,1.419581,0.6398,0.658429,0.6398,0.638277


[I 2025-01-02 18:14:58,139] Trial 63 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8484,1.764161,0.2486,0.24747,0.2486,0.227665


[I 2025-01-02 18:18:14,721] Trial 64 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9052,1.743548,0.1456,0.129052,0.1456,0.096428


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2025-01-02 18:21:16,968] Trial 65 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2166,1.072678,0.614,0.631255,0.614,0.603854
2,0.9373,0.988015,0.657,0.698383,0.657,0.660612
3,0.7622,0.743426,0.7493,0.761935,0.7493,0.747534
4,0.6453,0.711066,0.7707,0.777858,0.7707,0.768524
5,0.5827,0.633615,0.7793,0.807959,0.7793,0.783624
6,0.5,0.722539,0.7801,0.801005,0.7801,0.773408
7,0.4413,0.743682,0.784,0.804393,0.784,0.783129
8,0.3678,1.221466,0.7243,0.751061,0.7243,0.710855


[I 2025-01-02 18:45:53,795] Trial 66 finished with value: 0.7108547303462907 and parameters: {'learning_rate': 0.0004192037404572582, 'per_device_train_batch_size': 32, 'weight_decay': 0.003, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 13 with value: 0.8775142231826768.


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1901,1.213143,0.6133,0.631634,0.6133,0.599124


[I 2025-01-02 18:48:57,411] Trial 67 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.241,1.092313,0.6043,0.62028,0.6043,0.594696


[I 2025-01-02 18:52:14,084] Trial 68 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.318,1.321736,0.5601,0.578742,0.5601,0.534668


[I 2025-01-02 18:55:31,834] Trial 69 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1984,1.200755,0.6198,0.640503,0.6198,0.608948


[I 2025-01-02 18:58:35,890] Trial 70 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1979,1.085242,0.6149,0.634268,0.6149,0.604737


[I 2025-01-02 19:01:39,791] Trial 71 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3289,2.153829,0.1377,0.113611,0.1377,0.085904


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2025-01-02 19:04:42,067] Trial 72 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7528,1.753912,0.3764,0.382832,0.3764,0.353604


[I 2025-01-02 19:07:59,703] Trial 73 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2316,1.519022,0.3475,0.357001,0.3475,0.322437


[I 2025-01-02 19:11:05,193] Trial 74 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6653,1.486519,0.5762,0.590446,0.5762,0.561158


[I 2025-01-02 19:14:08,355] Trial 75 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6702,1.912372,0.6303,0.640771,0.6303,0.620805


[I 2025-01-02 19:17:10,884] Trial 76 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2416,1.490284,0.5533,0.57234,0.5533,0.541789


[I 2025-01-02 19:20:27,459] Trial 77 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5943,1.438909,0.1707,0.148205,0.1707,0.118925


[I 2025-01-02 19:23:30,635] Trial 78 pruned. 


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6093,1.698447,0.6154,0.631447,0.6154,0.606751


[I 2025-01-02 19:26:47,257] Trial 79 pruned. 
