# Prohledávání hyperparametrů pro model BiLSTM nad datasetem TREC (fine) 

Tento notebook slouží k nalezení optimálních hyperparametrů nad datasetem TREC (fine) pro model BiLSTM s odemčenou embedding vrstvou. Hyperparametry jsou hledány pro původní i augmentovaný dataset pro normální trénink i destilaci.

K prohledávání je využito knihovny Optuna s algoritmem Hyperband. Nejlepší konfigurace je volena na základě F1-skóre, zkoušeno je 150 kombinací hyperparametrů pro každou z variant.

## Import knihoven a základní nastavení

In [1]:
from transformers import BasicTokenizer, Trainer
from datasets import concatenate_datasets, load_from_disk
import kagglehub
import optuna
import torch
import math
import base

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


Resetování náhodného seedu pro replikovatelnost výsledků.

In [None]:
base.reset_seed()

Ověření dostupnosti GPU.

In [3]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Načtení embeddingů.

Načtení datasetu a jeho základní předzpracování (tokenizace, vytvoření slovníků všech tokenů, vytvoření indexu pro GloVe embeddingy).

In [4]:
my_glove = kagglehub.dataset_download("thanakomsn/glove6b300dtxt")
print(my_glove)

/home/jovyan/.cache/kagglehub/datasets/thanakomsn/glove6b300dtxt/versions/1


In [None]:
GLOVE_FILE = f"{my_glove}/glove.6B.300d.txt"
DATASET = "trec"

In [6]:
train_data = load_from_disk(f"~/data/{DATASET}/train-logits_fine")
eval_data = load_from_disk(f"~/data/{DATASET}/eval-logits_fine")
test_data = load_from_disk(f"~/data/{DATASET}/test-logits_fine")

all_train_data = load_from_disk(f"~/data/{DATASET}/train-logits-augmented_fine")

all_data = concatenate_datasets([load_from_disk(file) for file in [f"~/data/{DATASET}/eval-logits_fine", f"~/data/{DATASET}/test-logits_fine", f"~/data/{DATASET}/train-logits-augmented_fine"]])
tokenizer = BasicTokenizer(do_lower_case=True)

Tokenizace.

In [7]:
train_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), train_data))
eval_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), eval_data))
test_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), test_data))

all_train_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), all_train_data))

all_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), all_data))

Získání všech unikátních tokenů v datasetu.

In [8]:
vocab = base.get_vocab(all_data_tokens)

Přiřazení indexu jednotlivým tokenům.

In [None]:
word_index = dict(zip(vocab, range(len(vocab))))

Získání indexů z GloVe embeddingů.

In [10]:
embeddings_index = base.get_embeddings_indeces(GLOVE_FILE)

Found 400000 word vectors.


Definice velikosti slovníku a velikosti embedding dimenze. 

In [11]:
print(len(vocab))
num_tokens = len(vocab) + 2
embedding_dim = 300

8766


Vytvoření vazby mezi tokeny (jejich indexy) a embeddingy. Část tokenů nebyla nalezena, což ovšem nepředstavuje problém.

In [12]:
embedding_matrix = base.get_embedding_matrix(num_tokens, embedding_dim, word_index, embeddings_index)

Converted 8551 words (215) misses


Přiřazení indexu tokenům v každé části datasetu.

In [13]:
train_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),train_data_tokens))
eval_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),eval_data_tokens))
test_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),test_data_tokens))

all_train_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),all_train_data_tokens))

Zarovnání délky všech záznamů.

In [14]:
train_padded_data = list(map(lambda x: base.padd(x,60), train_data_index))
eval_padded_data = list(map(lambda x: base.padd(x,60), eval_data_index))
test_padded_data = list(map(lambda x: base.padd(x,60), test_data_index))

all_train_padded_data = list(map(lambda x: base.padd(x,60), all_train_data_index))

Přidání ID tokenů do každé části datasetu.

In [15]:
train_data = train_data.add_column("input_ids", train_padded_data)
eval_data = eval_data.add_column("input_ids", eval_padded_data)
test_data = test_data.add_column("input_ids", test_padded_data)

all_train_data = all_train_data.add_column("input_ids", all_train_padded_data)

Základní konfigurace tréninku během prohledávání. Optuna nepracuje s epochami, ale s kroky. Níže je prováděn přepočet. 

Minimální délka tréninku je pět epoch, maximální 15 epoch. Maximální počet kroků pro warm up je nastaven na 10 % první epochy.

In [16]:
num_epochs = 15
batch_size = 128

In [17]:
data_length = len(train_data)
min_r = math.ceil(data_length/batch_size)*5
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

## Prohledávání s normálním tréninkem nad původním datasetem
Definice hledaných hyperparametrů a jejich rozmezí.

In [18]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [19]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Získání modelu s definovanou odemčenou embedding vrstvou. 

In [20]:
def get_BiLSTM():
    return base.BiLSTMClassifier(embedding_matrix=embedding_matrix, embedding_dim=embedding_dim, fc_dim=400, hidden_dim=300, output_dim=50, freeze_embed=False)

In [21]:
base.reset_seed()

Konfigurace jednotlivých tréninků.

In [22]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-base-embedd_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-base-embedd_fine_hp-search", epochs=num_epochs, batch_size=batch_size)

Konfigurace trenéra pro jednotlivé tréninky. 

In [23]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM()
)
  

Nastavení prohledávání.

In [24]:
best_trial_normal = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-embedd",
    n_trials=150
)

[I 2025-03-23 01:13:28,224] A new study created in memory with name: Base-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4618,3.04708,0.189734,0.022144,0.024338,0.013021
2,2.8146,2.617949,0.372136,0.038624,0.07836,0.049747
3,2.4438,2.280283,0.439047,0.068191,0.099587,0.071883
4,2.1558,2.05949,0.483043,0.09543,0.12539,0.100944
5,1.9556,1.895859,0.529789,0.112293,0.150968,0.123581
6,1.7639,1.807152,0.538955,0.176345,0.16147,0.146206
7,1.6278,1.713703,0.568286,0.176519,0.182097,0.166172
8,1.5259,1.628252,0.5967,0.249123,0.203762,0.190628
9,1.4153,1.578906,0.598533,0.230872,0.21006,0.19937
10,1.3227,1.542505,0.613199,0.229849,0.22344,0.214511


[I 2025-03-23 01:14:13,090] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1047,2.543267,0.363886,0.066008,0.076156,0.052049
2,2.21,1.959974,0.525206,0.125123,0.15094,0.123267
3,1.725,1.604249,0.588451,0.209772,0.195137,0.182757
4,1.3189,1.381563,0.652612,0.341937,0.276152,0.27734
5,1.0251,1.253919,0.683776,0.355704,0.328093,0.327607


[I 2025-03-23 01:14:34,568] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8273,3.660262,0.176902,0.003538,0.02,0.006012
2,3.3667,3.163603,0.176902,0.003538,0.02,0.006012
3,3.1284,3.041841,0.190651,0.032719,0.023633,0.011853
4,2.9722,2.911999,0.307974,0.038347,0.055073,0.03821
5,2.8837,2.804605,0.351054,0.040293,0.068702,0.047243
6,2.7627,2.725765,0.349221,0.036505,0.068988,0.043481
7,2.6819,2.657447,0.36297,0.038682,0.072393,0.047294
8,2.628,2.600378,0.377635,0.03946,0.077756,0.050565
9,2.5712,2.555525,0.383135,0.038974,0.079715,0.051491
10,2.5266,2.525966,0.379468,0.038622,0.079665,0.051008


[I 2025-03-23 01:15:21,951] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0303,2.36296,0.4033,0.111774,0.095519,0.077746
2,1.9547,1.757104,0.562786,0.167262,0.181701,0.157759
3,1.4062,1.348982,0.67461,0.29262,0.298996,0.288091
4,0.9498,1.236822,0.675527,0.411654,0.352556,0.357483
5,0.6498,1.160943,0.714024,0.466775,0.417837,0.422833


[I 2025-03-23 01:15:44,021] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7074,2.058103,0.505041,0.132806,0.145703,0.124629
2,1.6638,1.472297,0.63428,0.260963,0.270114,0.256845
3,1.0867,1.241883,0.699358,0.380607,0.375831,0.362172
4,0.6369,1.160062,0.719523,0.471829,0.430245,0.434721
5,0.331,1.20041,0.75527,0.579004,0.529319,0.532956
6,0.182,1.270732,0.75802,0.598675,0.533349,0.546755
7,0.0864,1.348416,0.759853,0.655133,0.586484,0.603988
8,0.0453,1.482355,0.769936,0.656258,0.593561,0.604529
9,0.0243,1.50953,0.76352,0.645696,0.596514,0.604005
10,0.0095,1.520208,0.767186,0.666919,0.603014,0.62152


[I 2025-03-23 01:17:37,567] Trial 4 finished with value: 0.6093633706169189 and parameters: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 4 with value: 0.6093633706169189.


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6978,3.245064,0.176902,0.003538,0.02,0.006012
2,3.1307,3.003498,0.192484,0.015556,0.024304,0.012622
3,2.9102,2.787781,0.362053,0.039807,0.071243,0.045003
4,2.7032,2.625978,0.371219,0.038126,0.074823,0.048994
5,2.5769,2.483708,0.401467,0.063314,0.085088,0.056882


[I 2025-03-23 01:18:01,580] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.408,2.95808,0.261228,0.038292,0.047003,0.031393
2,2.6859,2.432035,0.412466,0.063949,0.089168,0.060228
3,2.2419,2.105584,0.47571,0.100401,0.11863,0.091181
4,1.9579,1.872701,0.52429,0.135196,0.151501,0.127131
5,1.7344,1.720331,0.571952,0.199296,0.187406,0.168119
6,1.5261,1.666177,0.582035,0.21234,0.198664,0.183596
7,1.3763,1.539977,0.60495,0.247672,0.217818,0.209527
8,1.256,1.455747,0.626948,0.274909,0.236447,0.233025
9,1.1336,1.405638,0.637947,0.285989,0.261665,0.260708
10,1.0214,1.397861,0.643446,0.296704,0.261728,0.26419


[I 2025-03-23 01:19:22,924] Trial 6 finished with value: 0.31590272628134014 and parameters: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 3}. Best is trial 4 with value: 0.6093633706169189.


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7497,3.381554,0.176902,0.003538,0.02,0.006012
2,3.1947,3.080284,0.176902,0.003538,0.02,0.006012
3,2.9959,2.874931,0.358387,0.043991,0.069258,0.046528
4,2.8007,2.734558,0.361137,0.037883,0.071881,0.047006
5,2.6995,2.615865,0.381302,0.039351,0.078697,0.051624


[I 2025-03-23 01:19:46,833] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3245,2.882922,0.28231,0.040252,0.051416,0.034161
2,2.5955,2.338784,0.434464,0.065695,0.09839,0.068952
3,2.1463,2.007175,0.504125,0.138443,0.134604,0.109606
4,1.8521,1.783487,0.530706,0.137137,0.151696,0.129574
5,1.6329,1.645471,0.591201,0.232489,0.203465,0.189478


[I 2025-03-23 01:20:17,860] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2375,2.821208,0.382218,0.046404,0.08342,0.057103
2,2.4543,2.208674,0.455545,0.109076,0.108701,0.086816
3,1.9967,1.854265,0.532539,0.153834,0.149094,0.12815
4,1.6552,1.605096,0.603116,0.232545,0.2037,0.188975
5,1.397,1.451104,0.635197,0.248384,0.253934,0.24025
6,1.1597,1.40823,0.630614,0.273224,0.264781,0.252446
7,0.979,1.299767,0.663611,0.332368,0.296517,0.294701
8,0.8503,1.243276,0.68011,0.404579,0.33495,0.337911
9,0.719,1.208756,0.698442,0.438716,0.371216,0.378733
10,0.6048,1.197417,0.697525,0.399627,0.359213,0.368436


[I 2025-03-23 01:22:14,266] Trial 9 finished with value: 0.4075841718228686 and parameters: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 4 with value: 0.6093633706169189.


Trial 10 with params: {'learning_rate': 0.004518165681587256, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5201,1.82129,0.566453,0.198411,0.198047,0.184979
2,1.3537,1.286499,0.666361,0.341277,0.335087,0.318119
3,0.7569,1.105533,0.742438,0.476189,0.450608,0.447393
4,0.3736,1.086664,0.754354,0.556492,0.53581,0.530784
5,0.1599,1.16237,0.772686,0.654368,0.600407,0.606158
6,0.0611,1.232382,0.774519,0.638706,0.619799,0.613933
7,0.0167,1.374407,0.786434,0.729272,0.672993,0.674588
8,0.0075,1.404443,0.787351,0.728791,0.668146,0.67751
9,0.0028,1.458347,0.784601,0.716618,0.661144,0.668076
10,0.0014,1.488412,0.785518,0.711091,0.659553,0.665467


[I 2025-03-23 01:23:34,001] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0020056372842325635, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7441,2.074675,0.484876,0.124774,0.132292,0.111426
2,1.7034,1.492146,0.636114,0.264968,0.264221,0.253602
3,1.1415,1.267918,0.692942,0.372729,0.352285,0.343549
4,0.6989,1.184391,0.71494,0.509216,0.432071,0.447849
5,0.3876,1.179192,0.736939,0.52221,0.513085,0.505986
6,0.2314,1.256458,0.738772,0.541986,0.491266,0.499312
7,0.1225,1.32397,0.747021,0.595675,0.54153,0.544287
8,0.0586,1.3896,0.756187,0.60849,0.572834,0.573961
9,0.0297,1.513923,0.752521,0.615733,0.594599,0.587115
10,0.0158,1.463669,0.762603,0.629271,0.608741,0.603862


[I 2025-03-23 01:24:49,673] Trial 11 finished with value: 0.5955664872419971 and parameters: {'learning_rate': 0.0020056372842325635, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 4 with value: 0.6093633706169189.


Trial 12 with params: {'learning_rate': 0.0033049565193748773, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6951,1.910719,0.51879,0.121588,0.14851,0.120513
2,1.4991,1.373355,0.661778,0.314171,0.303258,0.297693
3,0.9312,1.119032,0.725023,0.411803,0.376867,0.37505
4,0.4652,1.106371,0.742438,0.59483,0.532005,0.542036
5,0.2022,1.220057,0.76077,0.634731,0.594565,0.60132
6,0.1009,1.234824,0.777269,0.668305,0.631231,0.63811
7,0.0384,1.348499,0.775435,0.62883,0.626894,0.61499
8,0.0117,1.432703,0.776352,0.681517,0.655128,0.656218
9,0.0057,1.454526,0.779102,0.665008,0.67052,0.656589
10,0.0031,1.514759,0.787351,0.668379,0.673099,0.659331


[I 2025-03-23 01:25:44,113] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0018997871267974278, 'weight_decay': 0.005, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8677,2.176776,0.446379,0.107365,0.115331,0.098868
2,1.7808,1.522784,0.618698,0.248601,0.240538,0.230516
3,1.1742,1.243648,0.689276,0.364285,0.343141,0.337349
4,0.7289,1.150123,0.704858,0.444606,0.408402,0.411231
5,0.4198,1.088321,0.746104,0.557051,0.50166,0.503064
6,0.2515,1.13226,0.756187,0.558314,0.546891,0.537488
7,0.1195,1.256455,0.753437,0.631215,0.561817,0.579126
8,0.0525,1.33118,0.76077,0.640798,0.587028,0.594102
9,0.0263,1.434608,0.756187,0.623208,0.56993,0.577444
10,0.0163,1.448779,0.759853,0.62779,0.60929,0.60603


[I 2025-03-23 01:27:13,916] Trial 13 finished with value: 0.6078067222206694 and parameters: {'learning_rate': 0.0018997871267974278, 'weight_decay': 0.005, 'warmup_steps': 2}. Best is trial 4 with value: 0.6093633706169189.


Trial 14 with params: {'learning_rate': 0.002120746655142563, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.903,2.202122,0.433547,0.103217,0.110566,0.097134
2,1.7748,1.542752,0.612282,0.273535,0.246284,0.234243
3,1.1432,1.217694,0.695692,0.383453,0.352425,0.346399
4,0.6768,1.138082,0.712191,0.460431,0.417661,0.424079
5,0.3885,1.155979,0.740605,0.543403,0.501776,0.507043
6,0.2217,1.202444,0.76077,0.560738,0.520923,0.52486
7,0.1023,1.352687,0.76077,0.643624,0.568328,0.587294
8,0.0445,1.420037,0.758937,0.62557,0.575667,0.581729
9,0.0246,1.446387,0.761687,0.6067,0.598989,0.593097
10,0.0128,1.430048,0.769936,0.657222,0.624042,0.622393


[I 2025-03-23 01:28:00,985] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.003827341260767903, 'weight_decay': 0.008, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6981,1.962881,0.527039,0.166235,0.164134,0.144484
2,1.4986,1.379467,0.661778,0.323604,0.299682,0.294397
3,0.8427,1.111262,0.742438,0.406134,0.420423,0.407814
4,0.42,1.113673,0.729606,0.535745,0.491464,0.499618
5,0.1673,1.142024,0.772686,0.657962,0.606896,0.615822
6,0.0588,1.303704,0.776352,0.69785,0.674288,0.666187
7,0.0258,1.356265,0.772686,0.674432,0.64772,0.646805
8,0.0072,1.356857,0.785518,0.734004,0.666559,0.683294
9,0.0027,1.410357,0.783685,0.702836,0.659095,0.669009
10,0.0018,1.433722,0.780935,0.704575,0.659336,0.66729


[I 2025-03-23 01:29:26,388] Trial 15 finished with value: 0.6756248675615067 and parameters: {'learning_rate': 0.003827341260767903, 'weight_decay': 0.008, 'warmup_steps': 3}. Best is trial 15 with value: 0.6756248675615067.


Trial 16 with params: {'learning_rate': 0.0010018348952328356, 'weight_decay': 0.007, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.134,2.50103,0.371219,0.057513,0.078737,0.054415
2,2.141,1.89784,0.532539,0.141685,0.156804,0.133403
3,1.6132,1.486546,0.626031,0.288612,0.237249,0.231135
4,1.1486,1.294579,0.672777,0.371731,0.324783,0.325347


[I 2025-03-23 01:31:22,148] Trial 17 finished with value: 0.6565911707440389 and parameters: {'learning_rate': 0.003147329048348789, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 15 with value: 0.6756248675615067.


Trial 18 with params: {'learning_rate': 0.004371089537104322, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8126,1.956105,0.501375,0.139999,0.142271,0.118094
2,1.5183,1.386899,0.658112,0.304872,0.293029,0.287662
3,0.865,1.097845,0.745188,0.455872,0.441633,0.433922
4,0.4157,1.147899,0.75527,0.60528,0.540809,0.55236
5,0.1682,1.323602,0.749771,0.635053,0.607466,0.597492
6,0.0765,1.295193,0.771769,0.670544,0.658506,0.646431
7,0.0226,1.398121,0.776352,0.704799,0.675591,0.672732
8,0.0075,1.479596,0.776352,0.675451,0.645898,0.642071
9,0.0052,1.483587,0.780935,0.683134,0.665084,0.659494
10,0.0033,1.511001,0.774519,0.670321,0.65075,0.64963


[I 2025-03-23 01:32:39,147] Trial 18 finished with value: 0.6717366051630941 and parameters: {'learning_rate': 0.004371089537104322, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 15 with value: 0.6756248675615067.


Trial 19 with params: {'learning_rate': 0.0029388128961488516, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8403,2.061924,0.483043,0.12899,0.135251,0.114749
2,1.6446,1.447596,0.63703,0.31524,0.268819,0.266517
3,1.0018,1.135792,0.722273,0.400778,0.38757,0.378854
4,0.519,1.131162,0.730522,0.53014,0.475083,0.484507
5,0.2598,1.144506,0.751604,0.609125,0.588444,0.58614
6,0.1231,1.235215,0.764436,0.626731,0.591358,0.585519
7,0.0485,1.300737,0.780018,0.686237,0.657684,0.651504
8,0.0187,1.37236,0.776352,0.697027,0.646968,0.650643
9,0.0106,1.398336,0.779102,0.666394,0.636942,0.637905
10,0.0043,1.446267,0.782768,0.674325,0.628772,0.637382


[I 2025-03-23 01:33:42,217] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0029427035280916295, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7703,2.001158,0.494042,0.122897,0.136089,0.115163
2,1.5955,1.454533,0.64528,0.322841,0.285568,0.281821
3,1.0013,1.159889,0.71769,0.384238,0.384728,0.371324
4,0.5374,1.212744,0.713107,0.526862,0.483381,0.482683
5,0.2676,1.224541,0.748854,0.570289,0.529326,0.535918
6,0.1373,1.270856,0.754354,0.620272,0.594214,0.589943
7,0.0597,1.407501,0.766269,0.650677,0.583665,0.59842
8,0.0264,1.510525,0.75802,0.639695,0.6017,0.608143
9,0.0102,1.548684,0.769019,0.666805,0.609678,0.620697
10,0.0047,1.575394,0.769019,0.679129,0.625454,0.635509


[I 2025-03-23 01:35:08,032] Trial 20 finished with value: 0.6339438810952206 and parameters: {'learning_rate': 0.0029427035280916295, 'weight_decay': 0.01, 'warmup_steps': 1}. Best is trial 15 with value: 0.6756248675615067.


Trial 21 with params: {'learning_rate': 0.004803389671600374, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.936,2.060746,0.491292,0.140563,0.142133,0.120525
2,1.588,1.436522,0.643446,0.289519,0.298411,0.284256
3,0.9372,1.112362,0.722273,0.429369,0.420071,0.413904
4,0.435,1.188905,0.733272,0.628175,0.538255,0.554097
5,0.1797,1.380209,0.754354,0.61077,0.563014,0.559626
6,0.0752,1.331232,0.789184,0.648785,0.655191,0.635041
7,0.0311,1.357482,0.784601,0.681508,0.661035,0.652429
8,0.0097,1.47375,0.780935,0.641922,0.658331,0.634208
9,0.0059,1.5185,0.782768,0.667146,0.66995,0.651851
10,0.002,1.530321,0.784601,0.669208,0.665326,0.653613


[I 2025-03-23 01:36:16,526] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.004562123537732074, 'weight_decay': 0.004, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6372,1.931593,0.549038,0.187866,0.191062,0.171514
2,1.4136,1.307736,0.666361,0.321091,0.323221,0.311118
3,0.7584,1.061657,0.746104,0.502659,0.451791,0.458547
4,0.3434,1.149559,0.75802,0.573738,0.547527,0.543755
5,0.1239,1.26954,0.771769,0.674638,0.631566,0.628891
6,0.0414,1.320454,0.783685,0.698556,0.656706,0.655065
7,0.0157,1.342127,0.787351,0.703613,0.671103,0.669274
8,0.0083,1.405443,0.792851,0.697637,0.659443,0.664438
9,0.0027,1.420628,0.7956,0.686732,0.662738,0.659179
10,0.0016,1.450091,0.788268,0.678953,0.650897,0.650684


[I 2025-03-23 01:38:09,586] Trial 22 finished with value: 0.6610818401177522 and parameters: {'learning_rate': 0.004562123537732074, 'weight_decay': 0.004, 'warmup_steps': 4}. Best is trial 15 with value: 0.6756248675615067.


Trial 23 with params: {'learning_rate': 0.0028614046317231518, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8259,2.076087,0.472044,0.135953,0.131754,0.11443
2,1.6356,1.43489,0.63703,0.307616,0.266271,0.258913
3,0.9989,1.162546,0.710357,0.403923,0.391238,0.382604
4,0.5315,1.188882,0.72044,0.484526,0.443854,0.446368
5,0.2506,1.165924,0.758937,0.606234,0.580734,0.578105
6,0.1295,1.256052,0.769019,0.658014,0.595917,0.606145
7,0.0572,1.319365,0.791017,0.684192,0.634054,0.641172
8,0.0201,1.37684,0.775435,0.664133,0.626748,0.62902
9,0.0096,1.455644,0.76352,0.664377,0.634762,0.632292
10,0.0074,1.47193,0.774519,0.658413,0.624043,0.62891


[I 2025-03-23 01:39:03,319] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.004814920167490347, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9386,2.081138,0.490376,0.142438,0.14024,0.119318
2,1.5825,1.424045,0.646196,0.306038,0.300484,0.29039
3,0.9243,1.089708,0.730522,0.457275,0.452348,0.446367
4,0.4452,1.165573,0.728689,0.555671,0.47799,0.492683
5,0.1917,1.349086,0.749771,0.632083,0.602045,0.590347
6,0.0743,1.319082,0.770852,0.643662,0.64982,0.622228
7,0.0319,1.326555,0.787351,0.660705,0.651178,0.638324
8,0.0084,1.425962,0.793767,0.682305,0.65762,0.65267
9,0.0036,1.462553,0.781852,0.676876,0.663797,0.654086
10,0.0024,1.475283,0.793767,0.704083,0.675754,0.672443


[I 2025-03-23 01:40:22,953] Trial 24 finished with value: 0.6734869672192978 and parameters: {'learning_rate': 0.004814920167490347, 'weight_decay': 0.003, 'warmup_steps': 3}. Best is trial 15 with value: 0.6756248675615067.


Trial 25 with params: {'learning_rate': 0.004865736242064892, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.599,1.849709,0.533456,0.180579,0.174611,0.163327
2,1.3889,1.268042,0.681027,0.376093,0.340712,0.334934
3,0.7169,0.987638,0.764436,0.522551,0.486918,0.486819
4,0.2986,1.041291,0.772686,0.602699,0.568262,0.558509
5,0.0941,1.174311,0.790101,0.754333,0.670618,0.686014
6,0.028,1.282744,0.787351,0.723811,0.690649,0.688563
7,0.0138,1.243534,0.799267,0.723376,0.712111,0.696877
8,0.0055,1.280089,0.800183,0.755041,0.704963,0.712447
9,0.0028,1.316965,0.80385,0.754404,0.70613,0.711834
10,0.0012,1.326707,0.804766,0.763772,0.710415,0.719549


[I 2025-03-23 01:41:41,639] Trial 25 finished with value: 0.7213101626581778 and parameters: {'learning_rate': 0.004865736242064892, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 26 with params: {'learning_rate': 0.0033416184346350187, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7719,1.980291,0.494959,0.149309,0.139904,0.121845
2,1.5576,1.451876,0.644363,0.299148,0.281512,0.277301
3,0.9373,1.122128,0.732356,0.414514,0.386734,0.386841
4,0.4906,1.13492,0.729606,0.537231,0.480087,0.490181
5,0.2186,1.180928,0.76077,0.599024,0.548067,0.555695
6,0.1102,1.279609,0.753437,0.666724,0.61437,0.618803
7,0.0439,1.336927,0.791934,0.688515,0.6403,0.644447
8,0.0132,1.429161,0.776352,0.672178,0.631513,0.638805
9,0.0063,1.429268,0.785518,0.677934,0.650299,0.649936
10,0.003,1.469766,0.786434,0.684889,0.651809,0.653164


[I 2025-03-23 01:43:48,749] Trial 26 finished with value: 0.6356749828244298 and parameters: {'learning_rate': 0.0033416184346350187, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 27 with params: {'learning_rate': 0.004953871093788108, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6123,1.878022,0.528873,0.164893,0.169943,0.154163
2,1.3897,1.284062,0.673694,0.367298,0.354368,0.349926
3,0.7201,0.991807,0.756187,0.493386,0.470143,0.464924
4,0.3058,1.048346,0.780935,0.643252,0.59334,0.598772
5,0.1026,1.21217,0.785518,0.698751,0.658149,0.656603
6,0.0336,1.214613,0.789184,0.675289,0.674999,0.656393
7,0.0115,1.214378,0.796517,0.706362,0.682954,0.676826
8,0.0043,1.277633,0.802933,0.721723,0.694102,0.694741
9,0.0021,1.281569,0.802016,0.723009,0.69226,0.69423
10,0.0011,1.284712,0.8011,0.717336,0.679537,0.681633


[I 2025-03-23 01:45:16,850] Trial 27 finished with value: 0.6828390283635766 and parameters: {'learning_rate': 0.004953871093788108, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 28 with params: {'learning_rate': 0.004683316894202572, 'weight_decay': 0.005, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6105,1.849745,0.522456,0.185133,0.166448,0.154986
2,1.4056,1.266468,0.686526,0.368121,0.349328,0.34152
3,0.7326,1.017172,0.749771,0.472892,0.436905,0.434371
4,0.3262,1.032541,0.775435,0.580966,0.552984,0.553654
5,0.1015,1.238917,0.776352,0.698585,0.646439,0.645876
6,0.046,1.224474,0.782768,0.643488,0.638364,0.625482
7,0.0185,1.292734,0.785518,0.685965,0.669205,0.658537
8,0.0038,1.297135,0.796517,0.686787,0.653551,0.656394
9,0.002,1.312185,0.7956,0.682534,0.664606,0.660223
10,0.0009,1.33624,0.799267,0.687901,0.663318,0.662367


[I 2025-03-23 01:46:32,428] Trial 28 finished with value: 0.6782644055290712 and parameters: {'learning_rate': 0.004683316894202572, 'weight_decay': 0.005, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 29 with params: {'learning_rate': 0.004355274973374928, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5872,1.874108,0.51879,0.197228,0.177192,0.165591
2,1.3695,1.303984,0.660862,0.335322,0.325314,0.317412
3,0.7672,1.078695,0.750687,0.526601,0.479897,0.477637
4,0.364,1.083585,0.761687,0.614549,0.535368,0.556008
5,0.1358,1.260652,0.766269,0.702591,0.628349,0.637483
6,0.0414,1.32612,0.776352,0.692182,0.641245,0.641853
7,0.0106,1.355332,0.788268,0.716384,0.678175,0.676612
8,0.0037,1.381285,0.784601,0.690997,0.654259,0.657505
9,0.0013,1.396262,0.785518,0.698817,0.663221,0.665966
10,0.0008,1.453804,0.782768,0.70833,0.653831,0.663225


[I 2025-03-23 01:48:06,067] Trial 29 finished with value: 0.6658067828082923 and parameters: {'learning_rate': 0.004355274973374928, 'weight_decay': 0.006, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 30 with params: {'learning_rate': 0.004145084741809081, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.73,1.94582,0.520623,0.162164,0.159237,0.138472
2,1.4809,1.330659,0.667278,0.314021,0.298719,0.292331
3,0.8275,1.102251,0.746104,0.406111,0.413787,0.400689
4,0.3905,1.136841,0.753437,0.581715,0.520256,0.533942
5,0.1662,1.214995,0.770852,0.703857,0.612558,0.630284
6,0.0604,1.30477,0.770852,0.685159,0.639447,0.643056
7,0.026,1.457585,0.758937,0.669379,0.611332,0.621387
8,0.0097,1.50504,0.775435,0.687225,0.643627,0.652172
9,0.0035,1.539057,0.777269,0.691868,0.638895,0.653615
10,0.0017,1.571544,0.774519,0.671562,0.630179,0.639613


[I 2025-03-23 01:49:43,193] Trial 30 finished with value: 0.6479926696500735 and parameters: {'learning_rate': 0.004145084741809081, 'weight_decay': 0.0, 'warmup_steps': 3}. Best is trial 25 with value: 0.7213101626581778.


Trial 31 with params: {'learning_rate': 0.0038001844261600906, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7769,1.960599,0.505958,0.146299,0.144238,0.124669
2,1.5063,1.376375,0.656279,0.297736,0.282982,0.276626
3,0.8698,1.090423,0.729606,0.416618,0.415797,0.406252
4,0.4186,1.20933,0.745188,0.566976,0.515173,0.521505
5,0.1894,1.243021,0.765353,0.661614,0.587352,0.598096
6,0.0805,1.347746,0.76077,0.639777,0.616537,0.609526
7,0.0304,1.447702,0.772686,0.650887,0.639887,0.625148
8,0.0152,1.596207,0.764436,0.653302,0.576096,0.593825
9,0.0095,1.529472,0.786434,0.67819,0.636863,0.637177
10,0.0036,1.523494,0.786434,0.683702,0.641622,0.644417


[I 2025-03-23 01:50:43,609] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.00418927378916555, 'weight_decay': 0.007, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7333,1.939941,0.520623,0.156689,0.15845,0.135165
2,1.4739,1.32364,0.659028,0.339577,0.300662,0.297566
3,0.8234,1.110932,0.745188,0.449794,0.429045,0.422921
4,0.3886,1.159059,0.748854,0.588986,0.506134,0.524667
5,0.1546,1.270865,0.773602,0.663712,0.627697,0.620774
6,0.0525,1.366603,0.770852,0.636566,0.613382,0.611829
7,0.021,1.497915,0.764436,0.662702,0.634594,0.636719
8,0.0066,1.494757,0.776352,0.67545,0.657809,0.654485
9,0.0075,1.51085,0.785518,0.681134,0.648135,0.655594
10,0.0025,1.547698,0.781852,0.723644,0.662126,0.677434


[I 2025-03-23 01:52:00,821] Trial 32 finished with value: 0.6596834148380606 and parameters: {'learning_rate': 0.00418927378916555, 'weight_decay': 0.007, 'warmup_steps': 3}. Best is trial 25 with value: 0.7213101626581778.


Trial 33 with params: {'learning_rate': 0.0007553826543667807, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1844,2.652326,0.349221,0.060531,0.073599,0.046815
2,2.2838,2.022257,0.510541,0.111326,0.141403,0.116583
3,1.794,1.650575,0.579285,0.189302,0.182992,0.167425
4,1.3934,1.428132,0.647113,0.306397,0.266639,0.260779
5,1.0888,1.275105,0.670027,0.333643,0.322278,0.31291


[I 2025-03-23 01:52:32,616] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0037505272312465423, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6518,1.942752,0.486709,0.125599,0.134489,0.11848
2,1.5076,1.314947,0.683776,0.347296,0.32749,0.321615
3,0.8728,1.062228,0.736939,0.40892,0.415284,0.401752
4,0.4169,1.077328,0.745188,0.576968,0.514235,0.519071
5,0.1711,1.207553,0.76352,0.66784,0.604157,0.617207
6,0.0674,1.283899,0.766269,0.663441,0.623437,0.619843
7,0.0325,1.373885,0.787351,0.687305,0.669663,0.656403
8,0.0115,1.379911,0.780018,0.711329,0.666459,0.668443
9,0.004,1.415115,0.789184,0.716352,0.673123,0.678509
10,0.004,1.415416,0.789184,0.700456,0.672297,0.668314


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-23 01:54:23,546] Trial 34 finished with value: 0.659835978544302 and parameters: {'learning_rate': 0.0037505272312465423, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 35 with params: {'learning_rate': 0.0042382179189959755, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6065,1.942245,0.520623,0.177602,0.170057,0.153618
2,1.47,1.338692,0.669111,0.31822,0.324966,0.313763
3,0.8538,1.05338,0.745188,0.437028,0.443392,0.434104
4,0.3916,1.136145,0.743355,0.556925,0.524018,0.52085
5,0.152,1.222311,0.775435,0.67424,0.649491,0.648861
6,0.0515,1.304896,0.789184,0.743774,0.706434,0.707347
7,0.0223,1.375645,0.7956,0.70833,0.693573,0.685093
8,0.0074,1.458068,0.791934,0.750929,0.691298,0.697239
9,0.0038,1.480292,0.789184,0.754164,0.691708,0.704613
10,0.0021,1.483945,0.7956,0.761481,0.69267,0.706678


[I 2025-03-23 01:55:43,067] Trial 35 finished with value: 0.7028985334545514 and parameters: {'learning_rate': 0.0042382179189959755, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 36 with params: {'learning_rate': 0.003843897582851462, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6534,1.946696,0.498625,0.147084,0.144338,0.127698
2,1.5109,1.338596,0.67461,0.356749,0.320437,0.31899
3,0.8792,1.075737,0.742438,0.427364,0.423141,0.411715
4,0.4091,1.068504,0.748854,0.587156,0.543715,0.545708
5,0.1702,1.172562,0.773602,0.675505,0.597346,0.611472
6,0.0585,1.22339,0.788268,0.622208,0.614006,0.604879
7,0.0258,1.325891,0.789184,0.71384,0.686413,0.681653
8,0.0102,1.367193,0.791017,0.722199,0.678098,0.680294
9,0.0055,1.392292,0.789184,0.700065,0.654564,0.658212
10,0.0034,1.426013,0.794684,0.717579,0.691579,0.682419


[I 2025-03-23 01:57:05,081] Trial 36 finished with value: 0.6786017042852298 and parameters: {'learning_rate': 0.003843897582851462, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 37 with params: {'learning_rate': 0.004421330145677686, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6079,1.909256,0.529789,0.204249,0.174306,0.158739
2,1.4355,1.355464,0.659945,0.315564,0.322616,0.308141
3,0.8045,1.07661,0.738772,0.455533,0.457794,0.445603
4,0.368,1.108314,0.745188,0.61541,0.554409,0.559943
5,0.145,1.26837,0.76077,0.656947,0.629958,0.617804
6,0.0443,1.298018,0.784601,0.674911,0.664338,0.65352
7,0.0174,1.395426,0.781852,0.689967,0.67235,0.662873
8,0.0062,1.408794,0.796517,0.723178,0.688822,0.688423
9,0.0039,1.433704,0.787351,0.701003,0.667865,0.668424
10,0.0022,1.441528,0.797434,0.730211,0.68397,0.69177


[I 2025-03-23 01:58:46,875] Trial 37 finished with value: 0.6634377475776292 and parameters: {'learning_rate': 0.004421330145677686, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 38 with params: {'learning_rate': 0.001771167012891248, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9089,2.214333,0.439047,0.10156,0.110682,0.095509
2,1.8283,1.624895,0.591201,0.231186,0.218252,0.207017
3,1.2231,1.257951,0.681943,0.336021,0.327289,0.313104
4,0.7675,1.154993,0.704858,0.470756,0.387303,0.397088
5,0.4614,1.086631,0.741522,0.488696,0.468202,0.464727


[I 2025-03-23 01:59:18,328] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8428,3.714679,0.176902,0.003538,0.02,0.006012
2,3.4421,3.19821,0.176902,0.003538,0.02,0.006012
3,3.1579,3.084574,0.176902,0.003538,0.02,0.006012
4,3.0304,2.975934,0.260312,0.030831,0.041949,0.031117
5,2.9453,2.863856,0.334555,0.043052,0.062946,0.044451
6,2.8239,2.783784,0.352887,0.038367,0.069456,0.044752
7,2.7463,2.721063,0.347388,0.036991,0.067497,0.043297
8,2.6951,2.66599,0.369386,0.038212,0.074908,0.048454
9,2.6407,2.624079,0.373052,0.03814,0.075991,0.049152
10,2.5986,2.595084,0.371219,0.038298,0.077522,0.049866


[I 2025-03-23 02:00:15,862] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.004913713007022662, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7482,1.986072,0.505041,0.139815,0.149945,0.121912
2,1.5178,1.384065,0.648029,0.323506,0.305999,0.30422
3,0.8519,1.114633,0.749771,0.456535,0.446431,0.44536
4,0.3651,1.156953,0.764436,0.592538,0.547361,0.550709
5,0.1421,1.274854,0.759853,0.646169,0.592841,0.598246
6,0.052,1.240823,0.786434,0.708298,0.672659,0.673859
7,0.0201,1.36431,0.790101,0.733666,0.673517,0.690133
8,0.0091,1.411035,0.792851,0.705664,0.682681,0.686442
9,0.0027,1.455878,0.790101,0.715223,0.679218,0.688223
10,0.0014,1.470037,0.791934,0.69549,0.66983,0.675083


[I 2025-03-23 02:01:45,846] Trial 40 finished with value: 0.6878713058207612 and parameters: {'learning_rate': 0.004913713007022662, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 41 with params: {'learning_rate': 0.003114875907093967, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7357,2.007958,0.514207,0.134236,0.151755,0.124457
2,1.5915,1.428653,0.63978,0.321372,0.275238,0.274343
3,0.9797,1.147783,0.713107,0.413459,0.388306,0.385547
4,0.5271,1.135581,0.733272,0.554624,0.508237,0.516298
5,0.246,1.192904,0.769936,0.653086,0.593228,0.605222
6,0.1137,1.260386,0.777269,0.618979,0.615822,0.598736
7,0.0522,1.368077,0.771769,0.645595,0.621795,0.618575
8,0.0218,1.470407,0.778185,0.68137,0.627307,0.630913
9,0.0081,1.521767,0.788268,0.667294,0.655864,0.643743
10,0.0033,1.529555,0.785518,0.648275,0.628874,0.623257


[I 2025-03-23 02:02:39,823] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.004894606498664555, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6079,1.860595,0.525206,0.158408,0.163883,0.148663
2,1.3877,1.299842,0.666361,0.356223,0.338979,0.33499
3,0.719,1.00622,0.750687,0.478865,0.465292,0.453978
4,0.3053,1.050087,0.771769,0.613015,0.608015,0.589898
5,0.0992,1.1755,0.782768,0.676844,0.648964,0.645647
6,0.0404,1.244917,0.783685,0.709598,0.681253,0.678951
7,0.0112,1.280062,0.791017,0.707351,0.669912,0.674297
8,0.0034,1.340548,0.791934,0.712172,0.671273,0.675862
9,0.0021,1.403189,0.790101,0.714241,0.677516,0.680066
10,0.0012,1.375676,0.7956,0.722708,0.689856,0.690392


[I 2025-03-23 02:04:06,971] Trial 42 finished with value: 0.694663307427776 and parameters: {'learning_rate': 0.004894606498664555, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 43 with params: {'learning_rate': 0.004729482921657465, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5998,1.827202,0.534372,0.182814,0.184182,0.171515
2,1.3923,1.303505,0.673694,0.331846,0.31928,0.310323
3,0.7359,0.996685,0.753437,0.494754,0.477535,0.471747
4,0.3249,1.038955,0.771769,0.625981,0.575348,0.57498
5,0.1056,1.129902,0.790101,0.686905,0.629667,0.636462
6,0.0388,1.205533,0.794684,0.700408,0.671789,0.671927
7,0.0123,1.215195,0.799267,0.697813,0.694912,0.681294
8,0.0046,1.280458,0.799267,0.720864,0.684276,0.682251
9,0.0018,1.306853,0.800183,0.698161,0.674003,0.669144
10,0.001,1.325592,0.804766,0.703465,0.680873,0.675262


[I 2025-03-23 02:06:06,958] Trial 43 finished with value: 0.6766778687622806 and parameters: {'learning_rate': 0.004729482921657465, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 44 with params: {'learning_rate': 0.004984407009707917, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6064,1.863992,0.538038,0.178319,0.177309,0.164287
2,1.3834,1.270263,0.67461,0.341591,0.331795,0.325421
3,0.7082,0.99098,0.756187,0.493848,0.463555,0.462803
4,0.3018,1.081406,0.769936,0.609271,0.584025,0.573941
5,0.1017,1.199824,0.780935,0.694481,0.659891,0.644257
6,0.046,1.173518,0.804766,0.721055,0.703722,0.695572
7,0.0183,1.209322,0.796517,0.72764,0.701174,0.700893
8,0.0061,1.234619,0.7956,0.708731,0.695324,0.689868
9,0.0032,1.264813,0.804766,0.75095,0.699539,0.708635
10,0.0013,1.286515,0.809349,0.754253,0.707513,0.715632


[I 2025-03-23 02:08:31,890] Trial 44 finished with value: 0.7107978081969203 and parameters: {'learning_rate': 0.004984407009707917, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 45 with params: {'learning_rate': 0.0049947698763480675, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6039,1.857072,0.527039,0.167699,0.172588,0.158724
2,1.3786,1.267068,0.68011,0.380511,0.347758,0.347577
3,0.7036,0.992915,0.753437,0.519179,0.477254,0.474816
4,0.3005,1.061528,0.762603,0.586597,0.569669,0.556918
5,0.1009,1.193467,0.777269,0.689996,0.630925,0.636136
6,0.0403,1.253944,0.787351,0.6908,0.664532,0.659079
7,0.0165,1.247171,0.797434,0.710727,0.697988,0.680859
8,0.0034,1.283796,0.7956,0.714175,0.686461,0.68582
9,0.0019,1.326752,0.7956,0.70115,0.690443,0.679008
10,0.0009,1.339853,0.796517,0.703369,0.685925,0.673261


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-23 02:11:55,361] Trial 45 finished with value: 0.6790959900153802 and parameters: {'learning_rate': 0.0049947698763480675, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 46 with params: {'learning_rate': 0.004500879645988984, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5673,1.892609,0.523373,0.180108,0.179551,0.162531
2,1.4454,1.302483,0.672777,0.340486,0.334851,0.324986
3,0.7993,1.073125,0.741522,0.447106,0.461137,0.438057
4,0.371,1.033177,0.766269,0.586626,0.576476,0.568493
5,0.131,1.162757,0.787351,0.720697,0.681085,0.680777
6,0.0517,1.223812,0.780018,0.673899,0.675645,0.659174
7,0.0167,1.314066,0.796517,0.6837,0.668543,0.661183
8,0.0087,1.392811,0.799267,0.655135,0.642072,0.637126
9,0.0024,1.392883,0.79835,0.67263,0.657182,0.654696
10,0.0015,1.423862,0.802016,0.681937,0.665944,0.664427


[I 2025-03-23 02:12:54,440] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.004383876709152397, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6072,1.914959,0.543538,0.201106,0.18328,0.16811
2,1.4458,1.334852,0.661778,0.31854,0.316048,0.305432
3,0.803,1.087169,0.736022,0.444995,0.443374,0.433935
4,0.3727,1.11275,0.745188,0.556202,0.528021,0.521363
5,0.1423,1.205151,0.764436,0.66959,0.639566,0.638315
6,0.0509,1.210046,0.784601,0.722129,0.671837,0.679119
7,0.0174,1.335843,0.783685,0.694428,0.652474,0.658026
8,0.0084,1.375738,0.790101,0.686047,0.66521,0.660497
9,0.0036,1.446988,0.791934,0.692496,0.665916,0.666486
10,0.0016,1.448051,0.796517,0.694656,0.671066,0.667979


[I 2025-03-23 02:14:16,225] Trial 47 finished with value: 0.6741425573709166 and parameters: {'learning_rate': 0.004383876709152397, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 48 with params: {'learning_rate': 0.00015433736178353414, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6001,3.16886,0.176902,0.003538,0.02,0.006012
2,3.04,2.879914,0.293309,0.041179,0.052243,0.030889
3,2.7611,2.632001,0.379468,0.041612,0.077843,0.050776
4,2.5302,2.445545,0.40055,0.062251,0.084836,0.057569
5,2.3784,2.286758,0.44363,0.077831,0.101143,0.07572
6,2.2104,2.179693,0.462878,0.095979,0.111346,0.086617
7,2.1062,2.094776,0.48121,0.104473,0.121114,0.097052
8,2.0322,2.029707,0.494042,0.112607,0.12816,0.105881
9,1.9553,1.976974,0.510541,0.119913,0.135126,0.110127
10,1.8931,1.950931,0.508708,0.132232,0.140078,0.116568


[I 2025-03-23 02:15:09,516] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0017794082568283853, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8664,2.17998,0.451879,0.107366,0.117602,0.099541
2,1.7975,1.580057,0.60495,0.242409,0.226857,0.21621
3,1.2074,1.257417,0.676444,0.373964,0.338562,0.329048
4,0.7586,1.139868,0.710357,0.452436,0.406279,0.41169
5,0.4478,1.070799,0.748854,0.52554,0.477884,0.483618


[I 2025-03-23 02:15:38,874] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.004623949798145367, 'weight_decay': 0.0, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5186,1.867957,0.536205,0.185825,0.192856,0.175859
2,1.3701,1.266418,0.667278,0.366657,0.338298,0.326812
3,0.7463,1.029473,0.749771,0.473419,0.455772,0.45437
4,0.3516,1.076379,0.762603,0.611383,0.571834,0.569539
5,0.1307,1.19692,0.779102,0.693756,0.613122,0.624631
6,0.0512,1.217603,0.780018,0.681162,0.640525,0.640153
7,0.0277,1.214038,0.792851,0.698344,0.698293,0.680242
8,0.0106,1.286606,0.792851,0.684074,0.674653,0.664774
9,0.003,1.324072,0.793767,0.713623,0.691638,0.68881
10,0.0019,1.32886,0.79835,0.68232,0.68529,0.67116


[I 2025-03-23 02:17:28,180] Trial 50 finished with value: 0.6643380885716959 and parameters: {'learning_rate': 0.004623949798145367, 'weight_decay': 0.0, 'warmup_steps': 0}. Best is trial 25 with value: 0.7213101626581778.


Trial 51 with params: {'learning_rate': 0.004881552006829856, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6093,1.861052,0.537122,0.168181,0.174415,0.158684
2,1.3882,1.257098,0.67736,0.360612,0.349872,0.344918
3,0.7197,0.982429,0.76352,0.499572,0.491187,0.477421
4,0.3099,1.030322,0.777269,0.609171,0.581146,0.578491
5,0.0933,1.206084,0.775435,0.661548,0.634513,0.624501
6,0.0325,1.180418,0.7956,0.705124,0.696401,0.688133
7,0.0123,1.238388,0.797434,0.690575,0.674143,0.66624
8,0.0045,1.283256,0.790101,0.703303,0.68713,0.683944
9,0.0018,1.305905,0.7956,0.717748,0.685898,0.686384
10,0.0011,1.314716,0.797434,0.717227,0.689803,0.688413


[I 2025-03-23 02:18:52,176] Trial 51 finished with value: 0.6870946472890591 and parameters: {'learning_rate': 0.004881552006829856, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 52 with params: {'learning_rate': 0.004907379897357111, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6066,1.873615,0.531622,0.164549,0.169578,0.152581
2,1.3937,1.257678,0.68011,0.355518,0.344352,0.338048
3,0.7182,0.992554,0.747021,0.44525,0.443789,0.430023
4,0.3139,1.047043,0.771769,0.615917,0.594577,0.588477
5,0.1101,1.183473,0.783685,0.695073,0.635483,0.640613
6,0.041,1.239169,0.788268,0.706793,0.702132,0.688642
7,0.013,1.236451,0.799267,0.712314,0.704935,0.695167
8,0.0034,1.304808,0.7956,0.706551,0.69059,0.676986
9,0.0019,1.33745,0.799267,0.715346,0.695864,0.690613
10,0.0009,1.348537,0.797434,0.71223,0.694573,0.688228


[I 2025-03-23 02:20:20,622] Trial 52 finished with value: 0.685831644034061 and parameters: {'learning_rate': 0.004907379897357111, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 53 with params: {'learning_rate': 0.00481352109424765, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6015,1.864507,0.545371,0.187939,0.187958,0.174632
2,1.4093,1.249669,0.68286,0.34107,0.335715,0.32788
3,0.7415,1.020654,0.754354,0.483917,0.479287,0.471319
4,0.3212,1.058109,0.771769,0.612497,0.615868,0.591828
5,0.1081,1.210896,0.762603,0.659554,0.653358,0.638373
6,0.0455,1.24055,0.773602,0.670008,0.683125,0.653682
7,0.0161,1.294772,0.785518,0.664221,0.654456,0.645305
8,0.0043,1.336099,0.788268,0.697634,0.692856,0.681421
9,0.0018,1.382237,0.785518,0.704622,0.690275,0.684043
10,0.001,1.398614,0.783685,0.702854,0.682894,0.678185


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 02:22:12,128] Trial 53 finished with value: 0.6809269342754584 and parameters: {'learning_rate': 0.00481352109424765, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 54 with params: {'learning_rate': 0.004214577072436996, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6038,1.946366,0.527039,0.211085,0.17857,0.162707
2,1.477,1.322742,0.68011,0.362898,0.332735,0.32897
3,0.8368,1.0682,0.748854,0.450821,0.44023,0.428947
4,0.3976,1.150346,0.732356,0.538879,0.514025,0.507382
5,0.1466,1.289364,0.76352,0.683905,0.620998,0.633351
6,0.0559,1.311751,0.779102,0.678166,0.663826,0.652845
7,0.025,1.397299,0.780018,0.687927,0.672218,0.661027
8,0.0093,1.433891,0.791017,0.710152,0.70005,0.690451
9,0.0034,1.502269,0.791017,0.713377,0.674294,0.679288
10,0.0019,1.511535,0.797434,0.748945,0.693926,0.705178


[I 2025-03-23 02:23:34,010] Trial 54 finished with value: 0.7023461612608972 and parameters: {'learning_rate': 0.004214577072436996, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 55 with params: {'learning_rate': 0.0002606336830980987, 'weight_decay': 0.0, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4271,3.042253,0.191567,0.02113,0.025567,0.014634
2,2.816,2.625931,0.374885,0.03875,0.077831,0.049246
3,2.4607,2.307727,0.434464,0.067451,0.098087,0.070452
4,2.1853,2.091954,0.485793,0.102396,0.126441,0.102448
5,1.9989,1.944002,0.519707,0.105314,0.143341,0.116966
6,1.8162,1.845982,0.538038,0.175599,0.159057,0.142402
7,1.6839,1.749111,0.55912,0.168847,0.174263,0.158327
8,1.5863,1.682132,0.573786,0.2234,0.188038,0.173886
9,1.4791,1.622328,0.586618,0.231393,0.194007,0.182068
10,1.3879,1.578954,0.593034,0.21858,0.20403,0.192835


[I 2025-03-23 02:24:29,630] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.004457642247699223, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6138,1.904621,0.506874,0.158212,0.159666,0.146613
2,1.4579,1.358445,0.660862,0.317526,0.310531,0.302404
3,0.8054,1.109825,0.736939,0.443759,0.453736,0.436964
4,0.3739,1.095419,0.752521,0.582834,0.545736,0.547418
5,0.1462,1.246913,0.764436,0.644263,0.60039,0.604501
6,0.0554,1.310476,0.769019,0.668122,0.63116,0.625095
7,0.0242,1.317419,0.786434,0.674215,0.662633,0.653246
8,0.0066,1.353485,0.787351,0.686412,0.671321,0.666453
9,0.0028,1.44207,0.780935,0.676581,0.65286,0.652145
10,0.0017,1.458212,0.778185,0.671166,0.650837,0.646548


[I 2025-03-23 02:25:59,924] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0018647096088531545, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9073,2.210394,0.446379,0.103658,0.114256,0.098974
2,1.8097,1.56152,0.611366,0.241508,0.233371,0.225989
3,1.2027,1.249274,0.690192,0.386461,0.342892,0.332211
4,0.7552,1.163237,0.710357,0.452311,0.394411,0.401039
5,0.4489,1.100758,0.738772,0.514888,0.469206,0.471929
6,0.2694,1.157704,0.758937,0.604753,0.542544,0.556047
7,0.146,1.269333,0.76077,0.6162,0.563793,0.577189
8,0.0792,1.332519,0.761687,0.639213,0.585217,0.595223
9,0.0398,1.427207,0.76077,0.640085,0.598811,0.605716
10,0.0221,1.410894,0.770852,0.639718,0.624268,0.622348


[I 2025-03-23 02:26:54,808] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.00021771047684957567, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5184,3.107121,0.209899,0.008169,0.028571,0.012391
2,2.9325,2.750684,0.366636,0.040975,0.072032,0.044118
3,2.6131,2.456717,0.405133,0.044035,0.085463,0.05645
4,2.3402,2.236025,0.448213,0.074737,0.104044,0.078356
5,2.1487,2.06811,0.498625,0.095613,0.128368,0.103491
6,1.9659,1.963911,0.505041,0.128426,0.137657,0.114421
7,1.8442,1.866287,0.526123,0.128567,0.15009,0.129391
8,1.7648,1.80319,0.547204,0.145444,0.166166,0.145647
9,1.6636,1.750362,0.565536,0.182957,0.177151,0.159174
10,1.5827,1.706449,0.55637,0.150345,0.174097,0.155872


[I 2025-03-23 02:28:00,372] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0017831790121617555, 'weight_decay': 0.0, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8042,2.123026,0.479377,0.133046,0.129399,0.109514
2,1.7518,1.556587,0.614115,0.267417,0.239795,0.231373
3,1.214,1.293476,0.678277,0.346486,0.327505,0.316388
4,0.7773,1.204986,0.692942,0.421439,0.397475,0.394439
5,0.4645,1.144842,0.728689,0.493814,0.468103,0.46838


[I 2025-03-23 02:28:24,240] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.0047545536863348595, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7286,1.970443,0.503208,0.155338,0.147842,0.120925
2,1.4931,1.371742,0.660862,0.340893,0.325426,0.321535
3,0.8394,1.122166,0.76077,0.467484,0.463839,0.459781
4,0.3743,1.103695,0.746104,0.606285,0.523035,0.537451
5,0.1439,1.227916,0.76352,0.629169,0.60716,0.601005
6,0.0548,1.267056,0.774519,0.646858,0.645385,0.630951
7,0.0211,1.314227,0.793767,0.683994,0.670318,0.66441
8,0.0064,1.362227,0.799267,0.67865,0.666094,0.660906
9,0.0019,1.380233,0.799267,0.702821,0.670216,0.672207
10,0.0012,1.400349,0.805683,0.69919,0.668229,0.67244


[I 2025-03-23 02:30:18,377] Trial 60 finished with value: 0.6743965350861519 and parameters: {'learning_rate': 0.0047545536863348595, 'weight_decay': 0.003, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 61 with params: {'learning_rate': 0.0023745898790282086, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8528,2.112212,0.469294,0.113442,0.123564,0.109008
2,1.7039,1.477304,0.641613,0.302367,0.271317,0.266383
3,1.0706,1.24483,0.708524,0.385523,0.369238,0.350021
4,0.6219,1.165429,0.722273,0.476783,0.426146,0.439226
5,0.3483,1.121987,0.76352,0.592094,0.517569,0.528644


[I 2025-03-23 02:30:44,692] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.00460788929886214, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.719,1.997979,0.495875,0.128243,0.144492,0.114694
2,1.5213,1.368467,0.662695,0.32756,0.315053,0.309345
3,0.842,1.096335,0.743355,0.453895,0.443266,0.441617
4,0.3729,1.158988,0.746104,0.577879,0.520095,0.529241
5,0.1496,1.204701,0.759853,0.624365,0.627008,0.60553
6,0.0554,1.268599,0.786434,0.675761,0.654151,0.64837
7,0.0262,1.340502,0.779102,0.702312,0.667701,0.668927
8,0.0091,1.38503,0.797434,0.714873,0.686744,0.690319
9,0.0038,1.416582,0.791017,0.714788,0.683018,0.685447
10,0.0019,1.438439,0.793767,0.718279,0.681798,0.686178


[I 2025-03-23 02:32:11,037] Trial 62 finished with value: 0.6840149402640165 and parameters: {'learning_rate': 0.00460788929886214, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 63 with params: {'learning_rate': 0.0023355115210666284, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8393,2.110776,0.468378,0.117161,0.127459,0.109826
2,1.701,1.493309,0.622365,0.298613,0.249224,0.249741
3,1.0862,1.202816,0.703025,0.383148,0.373488,0.363844
4,0.6262,1.164374,0.72044,0.491121,0.459732,0.46233
5,0.333,1.109324,0.765353,0.590064,0.529131,0.535869


[I 2025-03-23 02:32:43,724] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00011912397327149118, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6903,3.232957,0.176902,0.003538,0.02,0.006012
2,3.1233,2.991135,0.208066,0.038205,0.028521,0.017357
3,2.9023,2.77807,0.363886,0.041336,0.071465,0.046452
4,2.6902,2.611262,0.378552,0.039083,0.077458,0.050794
5,2.5599,2.46523,0.4033,0.062856,0.085861,0.057132
6,2.3955,2.351495,0.419798,0.067408,0.091828,0.063227
7,2.2927,2.275528,0.44363,0.10689,0.102518,0.077186
8,2.2219,2.205717,0.456462,0.10269,0.107666,0.083079
9,2.1521,2.150584,0.467461,0.099434,0.111672,0.088273
10,2.0982,2.122705,0.479377,0.092341,0.120623,0.095102


[I 2025-03-23 02:33:42,346] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0049945587080890644, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6099,1.866994,0.525206,0.173341,0.168945,0.157145
2,1.3807,1.290663,0.673694,0.340829,0.335073,0.326232
3,0.6994,0.976703,0.750687,0.509482,0.470983,0.46675
4,0.2949,1.079262,0.772686,0.605682,0.556346,0.560169
5,0.1003,1.289724,0.768103,0.722838,0.639614,0.651817
6,0.0379,1.203441,0.790101,0.731064,0.689696,0.694346
7,0.0148,1.213403,0.7956,0.708586,0.673177,0.671431
8,0.0076,1.302941,0.794684,0.708483,0.677755,0.674055
9,0.0024,1.309303,0.797434,0.70353,0.681076,0.674126
10,0.0011,1.330266,0.792851,0.700247,0.674406,0.667828


[I 2025-03-23 02:34:38,765] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.004853382366338703, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7385,1.988575,0.498625,0.137036,0.146399,0.120384
2,1.4965,1.355894,0.663611,0.338558,0.321501,0.320418
3,0.8326,1.092692,0.747021,0.478204,0.442052,0.446346
4,0.365,1.100623,0.76077,0.566937,0.534328,0.536189
5,0.1403,1.19911,0.773602,0.656837,0.60399,0.613576
6,0.0551,1.297904,0.781852,0.679367,0.648029,0.646276
7,0.0278,1.306336,0.8011,0.736813,0.682331,0.692566
8,0.0093,1.38405,0.788268,0.7215,0.680015,0.684562
9,0.0038,1.410256,0.791934,0.7243,0.68142,0.687959
10,0.0021,1.447539,0.791934,0.715,0.683183,0.683731


[I 2025-03-23 02:36:36,268] Trial 66 finished with value: 0.6672856198333256 and parameters: {'learning_rate': 0.004853382366338703, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 67 with params: {'learning_rate': 0.004524867403337216, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.549,1.880043,0.538955,0.218888,0.196125,0.184052
2,1.3767,1.328584,0.664528,0.34107,0.326975,0.32121
3,0.755,1.037213,0.761687,0.496003,0.482871,0.475249
4,0.3335,1.040103,0.775435,0.640713,0.614198,0.608604
5,0.1237,1.199106,0.770852,0.715921,0.654169,0.653945
6,0.0488,1.26579,0.775435,0.695741,0.663395,0.660167
7,0.0159,1.300137,0.8011,0.716916,0.695154,0.689415
8,0.0048,1.372962,0.802016,0.718506,0.696826,0.688785
9,0.0023,1.356749,0.80385,0.718138,0.691455,0.689426
10,0.001,1.380285,0.80385,0.716147,0.697928,0.692513


[I 2025-03-23 02:38:06,613] Trial 67 finished with value: 0.6973129429691035 and parameters: {'learning_rate': 0.004524867403337216, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 68 with params: {'learning_rate': 0.0002204153721176256, 'weight_decay': 0.006, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5251,3.108799,0.210816,0.008227,0.02881,0.012499
2,2.9331,2.748363,0.362053,0.043802,0.070317,0.042043
3,2.6107,2.454324,0.406049,0.045008,0.086073,0.057217
4,2.3364,2.233972,0.447296,0.077424,0.103907,0.07909
5,2.142,2.069122,0.493126,0.09739,0.126011,0.101999


[I 2025-03-23 02:38:33,251] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.001813269953669265, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8642,2.173804,0.456462,0.110841,0.119247,0.101398
2,1.7868,1.52812,0.615032,0.260905,0.230274,0.223181
3,1.1846,1.243748,0.684693,0.376758,0.33745,0.334278
4,0.7299,1.139896,0.705775,0.481902,0.424154,0.433916
5,0.4255,1.092291,0.754354,0.539565,0.501347,0.500527
6,0.2566,1.16809,0.757104,0.624947,0.565808,0.574023
7,0.1251,1.248589,0.75527,0.616246,0.567733,0.570882
8,0.0596,1.319818,0.762603,0.660219,0.601439,0.617487
9,0.0315,1.35604,0.764436,0.642903,0.58841,0.599727
10,0.0237,1.443258,0.75527,0.6427,0.6122,0.613219


[I 2025-03-23 02:39:34,959] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.004841859090454132, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6019,1.857653,0.522456,0.169551,0.165673,0.152982
2,1.3932,1.285911,0.675527,0.371109,0.333901,0.328308
3,0.726,0.996184,0.752521,0.47275,0.458842,0.448226
4,0.315,1.070398,0.767186,0.567283,0.560154,0.54941
5,0.1068,1.136894,0.789184,0.708184,0.664353,0.667774
6,0.0334,1.248065,0.778185,0.684641,0.688848,0.674648
7,0.0095,1.257386,0.797434,0.705829,0.681562,0.679226
8,0.0032,1.32549,0.79835,0.69917,0.675393,0.668409
9,0.0016,1.339273,0.797434,0.67812,0.663453,0.65221
10,0.0014,1.349291,0.802933,0.700973,0.683978,0.673


[I 2025-03-23 02:41:17,443] Trial 70 finished with value: 0.6798114022341065 and parameters: {'learning_rate': 0.004841859090454132, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 71 with params: {'learning_rate': 0.004980829809661793, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9772,2.121351,0.462878,0.143586,0.130334,0.114618
2,1.5853,1.45312,0.646196,0.303803,0.292535,0.286493
3,0.9274,1.119262,0.736022,0.455999,0.436794,0.428886
4,0.4442,1.147437,0.736939,0.539038,0.517995,0.508769
5,0.1945,1.290035,0.756187,0.624089,0.591992,0.592378


[I 2025-03-23 02:41:41,252] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00010295616529943657, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7104,3.289429,0.176902,0.003538,0.02,0.006012
2,3.1604,3.053917,0.176902,0.003538,0.02,0.006012
3,2.9612,2.84073,0.361137,0.041833,0.070428,0.045279
4,2.7646,2.693192,0.363886,0.036998,0.07309,0.047266
5,2.6553,2.567701,0.390467,0.040339,0.081857,0.053461
6,2.5049,2.459469,0.392301,0.03938,0.082281,0.052663
7,2.408,2.385456,0.405133,0.066488,0.08685,0.060316
8,2.343,2.323535,0.421632,0.062245,0.092733,0.065344
9,2.2798,2.273882,0.442713,0.084152,0.101443,0.075987
10,2.2259,2.241269,0.450046,0.115968,0.104896,0.0792


[I 2025-03-23 02:42:50,918] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0004201995563692489, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3988,2.924447,0.306141,0.035862,0.058804,0.037647
2,2.6324,2.362136,0.428964,0.055896,0.095548,0.065647
3,2.1548,2.015408,0.508708,0.141935,0.137042,0.110735
4,1.8557,1.773242,0.548121,0.145455,0.162876,0.140164
5,1.6269,1.629467,0.584785,0.200283,0.197367,0.178658


[I 2025-03-23 02:43:14,521] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0019284295911141784, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8682,2.171213,0.446379,0.106944,0.11502,0.097865
2,1.7759,1.501182,0.626948,0.272084,0.245145,0.237652
3,1.163,1.232244,0.694775,0.370639,0.347125,0.340718
4,0.7139,1.166399,0.711274,0.511934,0.444604,0.458029
5,0.4157,1.118786,0.744271,0.557975,0.504043,0.508926
6,0.2503,1.146872,0.764436,0.603831,0.556929,0.557164
7,0.1202,1.272423,0.745188,0.619767,0.548639,0.565694
8,0.0556,1.310319,0.771769,0.646211,0.612327,0.616318
9,0.0231,1.397819,0.769936,0.651491,0.596944,0.607979
10,0.0158,1.362458,0.771769,0.625596,0.617878,0.612832


[I 2025-03-23 02:44:02,359] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.004920670165554563, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7476,1.984262,0.499542,0.130076,0.14419,0.114642
2,1.5124,1.378944,0.654445,0.336298,0.307658,0.308304
3,0.8517,1.100782,0.742438,0.441847,0.435405,0.430443
4,0.38,1.093695,0.76077,0.618735,0.543391,0.560551
5,0.1507,1.255961,0.762603,0.655092,0.599908,0.613247
6,0.0493,1.257191,0.789184,0.702966,0.662965,0.661381
7,0.0189,1.376587,0.786434,0.667302,0.631056,0.637103
8,0.0066,1.379778,0.796517,0.731546,0.679858,0.690153
9,0.0024,1.407827,0.79835,0.736463,0.673239,0.690353
10,0.0012,1.424986,0.794684,0.734604,0.672027,0.688582


[I 2025-03-23 02:45:28,147] Trial 75 finished with value: 0.6732383261678107 and parameters: {'learning_rate': 0.004920670165554563, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 76 with params: {'learning_rate': 0.0022348035008633725, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9054,2.186831,0.439963,0.119121,0.113182,0.100969
2,1.7505,1.5163,0.620532,0.260018,0.251725,0.24212
3,1.0938,1.186476,0.706691,0.395208,0.377284,0.368229
4,0.6285,1.156708,0.704858,0.497121,0.425629,0.438183
5,0.3628,1.168952,0.738772,0.543883,0.490771,0.489701
6,0.1992,1.244321,0.75527,0.559771,0.538341,0.536156
7,0.0958,1.275914,0.765353,0.650852,0.588972,0.606958
8,0.0363,1.388545,0.76352,0.682814,0.593857,0.61202
9,0.0237,1.371136,0.773602,0.634252,0.585707,0.59802
10,0.0118,1.44394,0.773602,0.693402,0.622311,0.638812


[I 2025-03-23 02:46:20,425] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.004717407077747297, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6098,1.852235,0.519707,0.173758,0.169673,0.157045
2,1.39,1.291315,0.681943,0.344927,0.337724,0.32822
3,0.7348,1.0196,0.761687,0.494007,0.474333,0.466311
4,0.3184,1.04434,0.767186,0.589818,0.571082,0.563964
5,0.0947,1.178852,0.777269,0.675033,0.665427,0.646962
6,0.0401,1.199538,0.797434,0.696226,0.686099,0.67302
7,0.0137,1.257732,0.790101,0.696434,0.686308,0.672743
8,0.0032,1.304862,0.7956,0.706823,0.701541,0.686391
9,0.0015,1.337325,0.792851,0.698286,0.684029,0.671355
10,0.001,1.372423,0.796517,0.702163,0.693346,0.677083


[I 2025-03-23 02:47:43,677] Trial 77 finished with value: 0.6763656869666181 and parameters: {'learning_rate': 0.004717407077747297, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 78 with params: {'learning_rate': 0.0045035146863825915, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5564,1.844328,0.550871,0.197334,0.195581,0.183266
2,1.3714,1.302733,0.672777,0.35246,0.335225,0.329308
3,0.7523,1.079797,0.748854,0.475686,0.44916,0.447089
4,0.3455,1.061422,0.764436,0.610123,0.593937,0.585859
5,0.1229,1.192319,0.768103,0.676235,0.657331,0.642377
6,0.0384,1.310364,0.782768,0.660106,0.669105,0.640443
7,0.0221,1.337662,0.780018,0.687362,0.679145,0.664707
8,0.0067,1.389929,0.786434,0.66871,0.663388,0.647674
9,0.0023,1.388449,0.799267,0.683026,0.670526,0.662684
10,0.0012,1.412202,0.8011,0.675239,0.663507,0.656927


[I 2025-03-23 02:48:39,154] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.001463573748350829, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9442,2.252505,0.43813,0.123068,0.110201,0.098097
2,1.8892,1.65559,0.583868,0.227598,0.199193,0.186484
3,1.3301,1.290492,0.678277,0.324943,0.314075,0.304909
4,0.87,1.183804,0.694775,0.396277,0.375553,0.370346
5,0.5623,1.120829,0.734189,0.479918,0.449963,0.451704
6,0.3569,1.174166,0.734189,0.563865,0.492169,0.498872
7,0.2108,1.24405,0.740605,0.587757,0.515475,0.532626
8,0.1198,1.295278,0.738772,0.636774,0.567575,0.579693
9,0.081,1.319462,0.756187,0.620255,0.569954,0.578529
10,0.0426,1.385539,0.749771,0.619398,0.575926,0.58169


[I 2025-03-23 02:50:12,074] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.004039888371460781, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7568,1.939409,0.502291,0.169274,0.145493,0.124235
2,1.4532,1.357416,0.659945,0.31897,0.310444,0.302759
3,0.8155,1.14643,0.72044,0.407761,0.411483,0.399231
4,0.3951,1.112237,0.748854,0.596626,0.534819,0.543713
5,0.1488,1.33358,0.76352,0.70631,0.636146,0.64316
6,0.0654,1.315778,0.781852,0.685899,0.656464,0.65681
7,0.0247,1.435869,0.771769,0.709043,0.64273,0.656844
8,0.01,1.497995,0.785518,0.721117,0.655971,0.672761
9,0.003,1.497283,0.788268,0.698027,0.660291,0.668479
10,0.0014,1.518649,0.791017,0.706879,0.662125,0.67237


[I 2025-03-23 02:52:04,059] Trial 80 finished with value: 0.665788346552795 and parameters: {'learning_rate': 0.004039888371460781, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 81 with params: {'learning_rate': 0.004622400542875729, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8789,2.036081,0.491292,0.123053,0.137133,0.113811
2,1.559,1.43387,0.648029,0.305535,0.292244,0.284698
3,0.9123,1.091472,0.741522,0.464454,0.418999,0.422729
4,0.4383,1.162717,0.740605,0.579866,0.532713,0.53691
5,0.1835,1.22549,0.767186,0.66186,0.620228,0.615726
6,0.063,1.296907,0.782768,0.671402,0.658346,0.649065
7,0.02,1.405185,0.781852,0.668858,0.677823,0.652393
8,0.0102,1.497809,0.793767,0.672069,0.634003,0.638825
9,0.003,1.49439,0.800183,0.703111,0.676659,0.67825
10,0.0021,1.535754,0.7956,0.681712,0.659393,0.659284


[I 2025-03-23 02:53:24,592] Trial 81 finished with value: 0.6594280177441721 and parameters: {'learning_rate': 0.004622400542875729, 'weight_decay': 0.001, 'warmup_steps': 3}. Best is trial 25 with value: 0.7213101626581778.


Trial 82 with params: {'learning_rate': 0.004060072359830231, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7551,1.939791,0.508708,0.164973,0.148377,0.129241
2,1.4447,1.335397,0.668194,0.316524,0.321289,0.304638
3,0.8049,1.158021,0.725023,0.415231,0.422766,0.410737
4,0.3901,1.12303,0.751604,0.616545,0.560664,0.562395
5,0.144,1.449858,0.745188,0.63681,0.586907,0.580181


[I 2025-03-23 02:53:52,755] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0047056945931189045, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6123,1.864723,0.514207,0.162419,0.164686,0.154115
2,1.3996,1.313996,0.67736,0.339365,0.328989,0.318796
3,0.7256,1.014715,0.752521,0.477055,0.452756,0.449513
4,0.3177,1.097291,0.754354,0.585,0.54381,0.542431
5,0.1185,1.199813,0.771769,0.667596,0.664187,0.640859
6,0.0474,1.228514,0.786434,0.69789,0.676065,0.668578
7,0.0177,1.264101,0.79835,0.719295,0.707939,0.69193
8,0.0059,1.371672,0.793767,0.717179,0.704186,0.687532
9,0.003,1.318546,0.811182,0.709528,0.709497,0.69303
10,0.0016,1.337618,0.805683,0.708157,0.70675,0.6893


[I 2025-03-23 02:55:23,008] Trial 83 finished with value: 0.6873707280956435 and parameters: {'learning_rate': 0.0047056945931189045, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 84 with params: {'learning_rate': 0.0044648854682187555, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5731,1.885307,0.526123,0.208872,0.184297,0.172969
2,1.3949,1.307626,0.671861,0.345733,0.326624,0.322285
3,0.7738,1.079352,0.757104,0.515684,0.494429,0.484024
4,0.347,1.044651,0.769936,0.613858,0.566999,0.570552
5,0.126,1.179668,0.775435,0.691165,0.63122,0.630937
6,0.0422,1.25015,0.784601,0.694717,0.680963,0.672285
7,0.0129,1.304984,0.791017,0.687371,0.693184,0.67126
8,0.0034,1.345018,0.7956,0.707278,0.692269,0.684603
9,0.0016,1.383323,0.791934,0.702375,0.694377,0.680577
10,0.0015,1.402315,0.793767,0.700801,0.686067,0.677512


[I 2025-03-23 02:56:52,400] Trial 84 finished with value: 0.6740486656901745 and parameters: {'learning_rate': 0.0044648854682187555, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 85 with params: {'learning_rate': 0.00019155912937031863, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5447,3.1286,0.181485,0.013147,0.021098,0.008131
2,2.9773,2.795344,0.351054,0.029685,0.06645,0.040145
3,2.6711,2.530945,0.393217,0.042017,0.082949,0.054016
4,2.4176,2.316707,0.424381,0.064283,0.093731,0.066767
5,2.2366,2.155044,0.469294,0.090041,0.112747,0.087917
6,2.0594,2.042758,0.499542,0.120007,0.130706,0.105497
7,1.9449,1.951049,0.514207,0.14215,0.141915,0.119669
8,1.8682,1.888618,0.52429,0.117785,0.148598,0.125604
9,1.7764,1.833952,0.544455,0.165388,0.159133,0.139397
10,1.7018,1.790419,0.543538,0.14592,0.164583,0.1442


[I 2025-03-23 02:57:52,675] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0026081168603375926, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7526,2.061607,0.486709,0.127816,0.136427,0.116673
2,1.6524,1.47374,0.638863,0.312427,0.275412,0.269338
3,1.0525,1.227737,0.707608,0.379053,0.378853,0.36255
4,0.5864,1.235451,0.712191,0.51753,0.461609,0.468676
5,0.3048,1.154261,0.758937,0.612119,0.545065,0.556358


[I 2025-03-23 02:58:52,609] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0018929104410607966, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.904,2.215165,0.439963,0.104319,0.112524,0.097353
2,1.8027,1.569387,0.606783,0.234739,0.227833,0.218879
3,1.192,1.260513,0.692942,0.357784,0.342331,0.329991
4,0.7399,1.171358,0.707608,0.441465,0.401745,0.402023
5,0.4327,1.089131,0.749771,0.52182,0.469407,0.477889
6,0.2592,1.215482,0.736022,0.521396,0.498205,0.491141
7,0.1327,1.253886,0.759853,0.614374,0.573968,0.583544
8,0.0635,1.406408,0.747938,0.580492,0.568743,0.565294
9,0.0354,1.420068,0.754354,0.635183,0.574187,0.58945
10,0.0159,1.408235,0.752521,0.646184,0.597006,0.608785


[I 2025-03-23 02:59:45,145] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0005004376561176635, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3148,2.870702,0.287809,0.040643,0.058003,0.035718
2,2.5383,2.259258,0.447296,0.089544,0.104571,0.078129
3,2.0597,1.906886,0.52429,0.156811,0.145255,0.12311
4,1.7312,1.678609,0.571952,0.186003,0.179376,0.163898
5,1.4737,1.502692,0.621448,0.251461,0.233831,0.22546
6,1.227,1.431027,0.631531,0.288342,0.261432,0.254845
7,1.0359,1.30653,0.664528,0.333109,0.288795,0.289239
8,0.9063,1.259595,0.67736,0.365683,0.32207,0.325045
9,0.7842,1.214566,0.697525,0.405061,0.365813,0.369983
10,0.6635,1.220434,0.688359,0.387159,0.350019,0.35503


[I 2025-03-23 03:00:43,481] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.004853598281197902, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7383,1.987831,0.500458,0.15647,0.147666,0.122554
2,1.5007,1.362801,0.658112,0.343794,0.320832,0.320849
3,0.84,1.099293,0.753437,0.462593,0.445709,0.447418
4,0.3613,1.102346,0.767186,0.598971,0.549444,0.55726
5,0.1391,1.232906,0.770852,0.645917,0.602358,0.61079
6,0.0488,1.261309,0.790101,0.678042,0.660716,0.6549
7,0.0197,1.328881,0.802016,0.717938,0.681695,0.689948
8,0.0054,1.421105,0.797434,0.759767,0.68881,0.709035
9,0.0033,1.412692,0.796517,0.714113,0.671839,0.678332
10,0.0017,1.427398,0.802933,0.749185,0.698462,0.71053


[I 2025-03-23 03:02:27,359] Trial 89 finished with value: 0.6900363100484875 and parameters: {'learning_rate': 0.004853598281197902, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 90 with params: {'learning_rate': 0.004803675985652374, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5321,1.913612,0.526123,0.19765,0.175573,0.159594
2,1.3766,1.245868,0.679193,0.350043,0.33749,0.325065
3,0.7346,0.999723,0.748854,0.457614,0.45759,0.444945
4,0.3456,1.094483,0.771769,0.61475,0.576512,0.577758
5,0.1539,1.218783,0.773602,0.687301,0.623491,0.630656
6,0.0539,1.299622,0.767186,0.659307,0.643289,0.627068
7,0.0182,1.285324,0.781852,0.64139,0.643935,0.626003
8,0.005,1.363949,0.791934,0.687115,0.653248,0.652771
9,0.002,1.372457,0.797434,0.694179,0.666591,0.664442
10,0.001,1.390328,0.800183,0.706125,0.688368,0.675381


[I 2025-03-23 03:03:50,200] Trial 90 finished with value: 0.6764646887318382 and parameters: {'learning_rate': 0.004803675985652374, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 25 with value: 0.7213101626581778.


Trial 91 with params: {'learning_rate': 0.004579398486854087, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7169,2.00335,0.495875,0.135847,0.145073,0.115745
2,1.524,1.383311,0.656279,0.324272,0.308813,0.303585
3,0.8425,1.069365,0.743355,0.447888,0.448523,0.436659
4,0.3722,1.143167,0.754354,0.599352,0.545677,0.553518
5,0.1493,1.241435,0.765353,0.649062,0.617517,0.607789
6,0.0603,1.259708,0.783685,0.655415,0.628086,0.619077
7,0.0235,1.31685,0.793767,0.68164,0.668709,0.667557
8,0.0087,1.339175,0.793767,0.682338,0.664259,0.65853
9,0.0034,1.381967,0.791934,0.679187,0.650968,0.650697
10,0.0016,1.39878,0.79835,0.682339,0.664548,0.659588


[I 2025-03-23 03:05:16,037] Trial 91 finished with value: 0.6581612704397952 and parameters: {'learning_rate': 0.004579398486854087, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 92 with params: {'learning_rate': 0.004530891476610411, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6147,1.93744,0.517874,0.172212,0.165066,0.154751
2,1.4641,1.366851,0.661778,0.345981,0.30759,0.302717
3,0.814,1.088016,0.732356,0.433563,0.4396,0.423711
4,0.3662,1.099209,0.762603,0.616875,0.555852,0.562271
5,0.1344,1.239331,0.768103,0.67601,0.629409,0.626169
6,0.0534,1.290886,0.791934,0.694474,0.677656,0.667661
7,0.0183,1.368973,0.779102,0.669695,0.645676,0.640896
8,0.0054,1.421626,0.787351,0.701125,0.673265,0.672947
9,0.002,1.449575,0.785518,0.692131,0.662711,0.662822
10,0.0011,1.467165,0.785518,0.676436,0.65909,0.655201


[I 2025-03-23 03:06:13,364] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.00398205129911087, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7109,1.986792,0.494959,0.17242,0.152777,0.135139
2,1.4587,1.343805,0.648029,0.304279,0.285403,0.283309
3,0.8073,1.102572,0.731439,0.435658,0.408061,0.405246
4,0.3775,1.158704,0.740605,0.586372,0.518311,0.531931
5,0.1581,1.287936,0.765353,0.710665,0.612915,0.633623
6,0.0624,1.411349,0.750687,0.618365,0.604445,0.585665
7,0.0327,1.417233,0.780018,0.668532,0.631929,0.633313
8,0.0147,1.464305,0.780935,0.696737,0.644692,0.647833
9,0.004,1.491859,0.780935,0.699895,0.661981,0.665169
10,0.0021,1.5198,0.789184,0.70542,0.671346,0.672241


[I 2025-03-23 03:07:34,976] Trial 93 finished with value: 0.6452901914000516 and parameters: {'learning_rate': 0.00398205129911087, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 94 with params: {'learning_rate': 0.004152998977531388, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5999,1.938733,0.528873,0.177594,0.175596,0.15724
2,1.4735,1.323288,0.670944,0.322254,0.315292,0.310934
3,0.8434,1.066011,0.746104,0.43003,0.43776,0.424643
4,0.3881,1.110134,0.745188,0.556611,0.511071,0.51492
5,0.1413,1.233458,0.778185,0.652852,0.645782,0.636186
6,0.0519,1.294385,0.787351,0.692435,0.665792,0.665294
7,0.0231,1.396367,0.7956,0.771019,0.697905,0.709941
8,0.0091,1.470193,0.789184,0.714608,0.676604,0.677028
9,0.0041,1.472601,0.784601,0.745554,0.69934,0.699059
10,0.0038,1.462325,0.789184,0.742865,0.701123,0.699638


[I 2025-03-23 03:09:12,538] Trial 94 finished with value: 0.7182828169393953 and parameters: {'learning_rate': 0.004152998977531388, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 95 with params: {'learning_rate': 0.002537959209551782, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7462,2.05447,0.484876,0.127668,0.135875,0.117464
2,1.6419,1.447049,0.63428,0.333209,0.27302,0.273096
3,1.044,1.229323,0.701192,0.381553,0.369753,0.356217
4,0.5763,1.243596,0.706691,0.471441,0.420482,0.428549
5,0.3058,1.169995,0.757104,0.584432,0.539393,0.546909
6,0.1634,1.243849,0.771769,0.617482,0.576243,0.57742
7,0.0716,1.315742,0.769936,0.652529,0.598692,0.605253
8,0.0317,1.419812,0.769019,0.674191,0.623992,0.627642
9,0.0155,1.453619,0.774519,0.697482,0.633892,0.643679
10,0.0093,1.434226,0.780018,0.682586,0.640936,0.643681


[I 2025-03-23 03:11:09,216] Trial 95 finished with value: 0.6451138586498673 and parameters: {'learning_rate': 0.002537959209551782, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 96 with params: {'learning_rate': 0.004390211478337816, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6236,1.952348,0.527956,0.196981,0.179471,0.166914
2,1.4847,1.333119,0.667278,0.330769,0.32078,0.313573
3,0.8402,1.05957,0.742438,0.461876,0.448179,0.441048
4,0.3808,1.168925,0.738772,0.556885,0.53315,0.521644
5,0.1561,1.18154,0.76352,0.660942,0.636655,0.629836
6,0.0519,1.27927,0.777269,0.696916,0.671695,0.66593
7,0.0189,1.389714,0.772686,0.706334,0.663623,0.668151
8,0.0072,1.438823,0.781852,0.74627,0.681346,0.693711
9,0.0036,1.474704,0.780935,0.737878,0.696402,0.699106
10,0.0019,1.486396,0.786434,0.74045,0.700823,0.701467


[I 2025-03-23 03:12:51,628] Trial 96 finished with value: 0.7019073274538614 and parameters: {'learning_rate': 0.004390211478337816, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 97 with params: {'learning_rate': 0.004381110504419244, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6209,1.945159,0.530706,0.179579,0.178284,0.162656
2,1.4826,1.330782,0.675527,0.341334,0.324798,0.319406
3,0.8318,1.053843,0.746104,0.447807,0.450146,0.434248
4,0.3849,1.151638,0.725023,0.562737,0.531667,0.520351
5,0.1573,1.175363,0.758937,0.664457,0.661634,0.646896
6,0.0502,1.245817,0.791017,0.730865,0.711764,0.697192
7,0.0238,1.24033,0.786434,0.698259,0.691446,0.679415
8,0.0078,1.307433,0.788268,0.736977,0.694921,0.698702
9,0.0026,1.355125,0.786434,0.710241,0.68468,0.678251
10,0.0015,1.382838,0.789184,0.728215,0.69699,0.696554


[I 2025-03-23 03:15:13,106] Trial 97 finished with value: 0.6967954165257169 and parameters: {'learning_rate': 0.004381110504419244, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 98 with params: {'learning_rate': 0.00412846365825856, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6008,1.954133,0.51879,0.1747,0.174943,0.156787
2,1.4817,1.301775,0.678277,0.324823,0.320302,0.309702
3,0.8513,1.057013,0.737855,0.433295,0.443933,0.425385
4,0.4094,1.094958,0.738772,0.541266,0.516996,0.511724
5,0.1597,1.235925,0.775435,0.665014,0.62221,0.620306
6,0.0645,1.260678,0.789184,0.705575,0.669469,0.670208
7,0.032,1.314578,0.792851,0.713042,0.679686,0.674838
8,0.0117,1.377777,0.792851,0.71401,0.667761,0.671851
9,0.0033,1.423998,0.805683,0.729413,0.698058,0.695425
10,0.0026,1.442693,0.804766,0.734816,0.701768,0.697383


[I 2025-03-23 03:16:36,761] Trial 98 finished with value: 0.7072872145377835 and parameters: {'learning_rate': 0.00412846365825856, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 99 with params: {'learning_rate': 0.0013272391439310605, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9446,2.292738,0.428048,0.113417,0.103342,0.086327
2,1.9331,1.72296,0.571952,0.1809,0.187013,0.163761
3,1.4029,1.330882,0.681027,0.314964,0.301041,0.297199
4,0.9471,1.225371,0.679193,0.3703,0.352287,0.349259
5,0.6609,1.101892,0.730522,0.448672,0.413183,0.414448


[I 2025-03-23 03:17:01,224] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.0033250820507622265, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6867,1.980087,0.494042,0.142758,0.141569,0.126048
2,1.5474,1.36394,0.669111,0.316872,0.306832,0.302515
3,0.9081,1.092298,0.724106,0.401329,0.389203,0.379641
4,0.444,1.166121,0.715857,0.552702,0.511545,0.514795
5,0.206,1.158677,0.776352,0.674514,0.60265,0.618517
6,0.0783,1.275024,0.781852,0.726666,0.672869,0.680006
7,0.0336,1.332681,0.776352,0.626117,0.621624,0.608746
8,0.0127,1.413289,0.780018,0.664279,0.659651,0.647193
9,0.006,1.407236,0.783685,0.675548,0.650702,0.649569
10,0.0028,1.444054,0.787351,0.695858,0.660887,0.662143


[I 2025-03-23 03:18:00,597] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0028417955836135254, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7622,2.025831,0.494959,0.132116,0.139809,0.117737
2,1.5899,1.363799,0.657195,0.302298,0.29274,0.278183
3,1.0092,1.164149,0.71494,0.388666,0.381796,0.37282
4,0.5588,1.20796,0.71494,0.488664,0.457717,0.454716
5,0.2942,1.207824,0.757104,0.552108,0.547148,0.537601
6,0.1524,1.388189,0.738772,0.55744,0.507076,0.514894
7,0.0722,1.393281,0.768103,0.630265,0.594184,0.596174
8,0.031,1.47569,0.76352,0.609106,0.597172,0.5906
9,0.0174,1.559746,0.752521,0.614778,0.601897,0.595563
10,0.0061,1.585093,0.759853,0.636321,0.611395,0.608961


[I 2025-03-23 03:18:53,365] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0020628229507993015, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8294,2.124561,0.469294,0.116228,0.126725,0.111581
2,1.7181,1.520767,0.622365,0.283303,0.244825,0.242358
3,1.1401,1.23464,0.688359,0.357188,0.343454,0.332328
4,0.6681,1.141003,0.709441,0.478602,0.416001,0.424151
5,0.3712,1.074094,0.762603,0.554667,0.525802,0.523854


[I 2025-03-23 03:19:23,476] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0043049846927293465, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6148,1.95226,0.517874,0.187798,0.177037,0.162903
2,1.49,1.323149,0.665445,0.320297,0.317806,0.309214
3,0.8321,1.054367,0.745188,0.448181,0.448574,0.431967
4,0.3895,1.125963,0.731439,0.547503,0.530106,0.516148
5,0.1496,1.343265,0.758937,0.696302,0.622372,0.632062
6,0.0593,1.330167,0.774519,0.721544,0.681965,0.678003
7,0.0254,1.370605,0.780018,0.702232,0.665347,0.666511
8,0.0087,1.439674,0.775435,0.67686,0.648216,0.649448
9,0.0033,1.468499,0.787351,0.674921,0.651832,0.650892
10,0.0016,1.504538,0.791934,0.699103,0.663887,0.66644


[I 2025-03-23 03:20:56,419] Trial 103 finished with value: 0.6669747989276148 and parameters: {'learning_rate': 0.0043049846927293465, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 104 with params: {'learning_rate': 0.00480481682907343, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7365,1.975295,0.504125,0.135518,0.146886,0.118738
2,1.5015,1.373519,0.663611,0.332214,0.313671,0.311943
3,0.8482,1.113523,0.748854,0.450745,0.43162,0.431787
4,0.3779,1.143942,0.757104,0.584255,0.548465,0.549379
5,0.1534,1.270177,0.768103,0.697023,0.603147,0.623254
6,0.0555,1.292149,0.783685,0.686953,0.644005,0.643045
7,0.024,1.338688,0.788268,0.736278,0.692016,0.698155
8,0.0112,1.423915,0.777269,0.71215,0.661443,0.672373
9,0.0052,1.42528,0.791017,0.717529,0.664853,0.676509
10,0.0015,1.445766,0.793767,0.726406,0.67047,0.682067


[I 2025-03-23 03:23:16,506] Trial 104 finished with value: 0.6830988770780095 and parameters: {'learning_rate': 0.00480481682907343, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 105 with params: {'learning_rate': 0.0006078662726350267, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1934,2.764461,0.330889,0.035997,0.068166,0.042983
2,2.3774,2.130673,0.477544,0.093664,0.121251,0.096611
3,1.9095,1.760736,0.551787,0.181228,0.165213,0.147841
4,1.547,1.540573,0.622365,0.288764,0.229488,0.223385
5,1.2695,1.377241,0.646196,0.263395,0.27292,0.262658
6,1.0231,1.325728,0.656279,0.310212,0.315626,0.298465
7,0.8395,1.269913,0.679193,0.380599,0.325362,0.334899
8,0.7076,1.222893,0.691109,0.406575,0.361666,0.365425
9,0.5797,1.1819,0.703025,0.414247,0.393988,0.393848
10,0.473,1.190159,0.704858,0.445356,0.399701,0.409233


[I 2025-03-23 03:24:14,380] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.004581384813730785, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7181,2.012002,0.495875,0.12895,0.145261,0.115424
2,1.5244,1.366845,0.661778,0.326114,0.313053,0.307585
3,0.8405,1.065773,0.743355,0.456694,0.445184,0.436668
4,0.3757,1.181683,0.739688,0.586481,0.535359,0.540644
5,0.1483,1.230788,0.764436,0.666064,0.62137,0.622499
6,0.0549,1.272392,0.787351,0.66178,0.64339,0.634988
7,0.0211,1.321833,0.791017,0.717691,0.686658,0.687823
8,0.0079,1.419537,0.782768,0.736202,0.679188,0.694624
9,0.004,1.410134,0.796517,0.72015,0.682934,0.690274
10,0.0022,1.424211,0.793767,0.724099,0.686584,0.688567


[I 2025-03-23 03:25:42,722] Trial 106 finished with value: 0.6954174191056004 and parameters: {'learning_rate': 0.004581384813730785, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 107 with params: {'learning_rate': 0.004992784601613193, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7514,2.012715,0.491292,0.169537,0.14371,0.123223
2,1.4822,1.362843,0.650779,0.319407,0.308365,0.30131
3,0.8112,1.087144,0.747938,0.436194,0.438202,0.426779
4,0.3554,1.108789,0.756187,0.585877,0.537151,0.546036
5,0.1416,1.188709,0.780935,0.686534,0.626061,0.636103
6,0.0454,1.29454,0.773602,0.692269,0.676056,0.664821
7,0.0173,1.397915,0.776352,0.691118,0.654085,0.655061
8,0.0061,1.418376,0.787351,0.713255,0.653596,0.667434
9,0.0022,1.439578,0.791017,0.694315,0.659641,0.664057
10,0.001,1.464411,0.7956,0.705679,0.664318,0.671488


[I 2025-03-23 03:27:26,505] Trial 107 finished with value: 0.6667812535368698 and parameters: {'learning_rate': 0.004992784601613193, 'weight_decay': 0.003, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 108 with params: {'learning_rate': 0.004849674406341833, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7396,1.982191,0.496792,0.157085,0.147131,0.122472
2,1.5014,1.372294,0.658112,0.337998,0.323377,0.321015
3,0.8452,1.114213,0.741522,0.43283,0.421879,0.418377
4,0.3787,1.106856,0.762603,0.577017,0.550799,0.552216
5,0.1474,1.235743,0.767186,0.69994,0.640712,0.652634
6,0.0531,1.256312,0.788268,0.673335,0.654832,0.65047
7,0.0227,1.360447,0.780935,0.675838,0.652756,0.647071
8,0.0096,1.424501,0.793767,0.710733,0.681719,0.682291
9,0.0034,1.452188,0.792851,0.694077,0.672395,0.671139
10,0.0014,1.464808,0.796517,0.703268,0.671967,0.674412


[I 2025-03-23 03:28:47,806] Trial 108 finished with value: 0.6771043425519284 and parameters: {'learning_rate': 0.004849674406341833, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 109 with params: {'learning_rate': 0.0025308359748445483, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.749,2.06497,0.484876,0.119945,0.137529,0.118081
2,1.6528,1.475878,0.629698,0.314973,0.274205,0.274577
3,1.0656,1.198949,0.711274,0.400328,0.382506,0.371632
4,0.5785,1.207026,0.712191,0.472604,0.426604,0.432578
5,0.3103,1.160112,0.756187,0.589066,0.529011,0.541796


[I 2025-03-23 03:29:23,133] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.002323282456822122, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7477,2.078821,0.482126,0.126382,0.136281,0.119865
2,1.6667,1.452355,0.635197,0.316198,0.260356,0.259171
3,1.0714,1.209779,0.702108,0.395549,0.362524,0.353995
4,0.622,1.227758,0.713107,0.483396,0.437658,0.438604
5,0.3273,1.159113,0.741522,0.613623,0.527813,0.549936


[I 2025-03-23 03:29:48,353] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.004221367866873884, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6025,1.936073,0.528873,0.193196,0.17596,0.157815
2,1.4761,1.348158,0.668194,0.339536,0.316644,0.311056
3,0.8344,1.063938,0.745188,0.438954,0.428294,0.426263
4,0.3896,1.165404,0.738772,0.548108,0.51079,0.504693
5,0.1462,1.244262,0.769019,0.679218,0.613296,0.628317
6,0.0552,1.318464,0.784601,0.696373,0.647295,0.654382
7,0.0216,1.385982,0.783685,0.692436,0.673132,0.664224
8,0.0086,1.465402,0.782768,0.727574,0.653785,0.668275
9,0.0029,1.47878,0.784601,0.719176,0.665293,0.677847
10,0.0015,1.495837,0.785518,0.714486,0.666865,0.676245


[I 2025-03-23 03:31:11,064] Trial 111 finished with value: 0.6898345473056596 and parameters: {'learning_rate': 0.004221367866873884, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 112 with params: {'learning_rate': 0.0039738469171652, 'weight_decay': 0.0, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5002,1.907851,0.505958,0.200454,0.178298,0.159691
2,1.4066,1.282016,0.672777,0.3614,0.338785,0.332802
3,0.8184,1.054904,0.743355,0.459741,0.443697,0.432678
4,0.4072,1.112623,0.752521,0.599914,0.543753,0.555503
5,0.186,1.153615,0.765353,0.669671,0.613767,0.615396
6,0.0636,1.251378,0.777269,0.673238,0.681653,0.655129
7,0.0277,1.325836,0.791017,0.731683,0.675451,0.682327
8,0.0106,1.424587,0.789184,0.704327,0.659809,0.658304
9,0.004,1.432625,0.796517,0.714081,0.697751,0.687448
10,0.0022,1.434082,0.7956,0.725455,0.692613,0.688505


[I 2025-03-23 03:32:45,989] Trial 112 finished with value: 0.6836926030083901 and parameters: {'learning_rate': 0.0039738469171652, 'weight_decay': 0.0, 'warmup_steps': 0}. Best is trial 25 with value: 0.7213101626581778.


Trial 113 with params: {'learning_rate': 0.004179608982717893, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5079,1.860662,0.542621,0.211561,0.186462,0.172335
2,1.3852,1.284073,0.672777,0.337759,0.342543,0.328625
3,0.7849,1.040574,0.754354,0.489053,0.464452,0.459542
4,0.3805,1.105081,0.750687,0.612871,0.523849,0.546056
5,0.1573,1.198977,0.766269,0.675337,0.622539,0.628507
6,0.0634,1.239295,0.786434,0.683139,0.645863,0.63904
7,0.023,1.358901,0.792851,0.692698,0.651908,0.656521
8,0.0073,1.3671,0.790101,0.732045,0.679219,0.690709
9,0.0028,1.364677,0.796517,0.72231,0.685752,0.690331
10,0.0015,1.403196,0.797434,0.723194,0.684416,0.688459


[I 2025-03-23 03:34:31,243] Trial 113 finished with value: 0.6785743004771484 and parameters: {'learning_rate': 0.004179608982717893, 'weight_decay': 0.005, 'warmup_steps': 0}. Best is trial 25 with value: 0.7213101626581778.


Trial 114 with params: {'learning_rate': 0.004484810349007265, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6225,1.947088,0.501375,0.164696,0.157506,0.147959
2,1.4731,1.356921,0.664528,0.336597,0.309271,0.299539
3,0.8386,1.092939,0.736022,0.441639,0.431825,0.426347
4,0.3723,1.127577,0.752521,0.5918,0.537958,0.543554
5,0.1426,1.264449,0.76352,0.646759,0.599183,0.597356


[I 2025-03-23 03:35:04,487] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.003712159645067156, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6503,1.930828,0.495875,0.142253,0.142037,0.125537
2,1.5093,1.320899,0.684693,0.347215,0.325111,0.318504
3,0.8733,1.054933,0.734189,0.429538,0.430641,0.420147
4,0.4298,1.117362,0.732356,0.550959,0.49178,0.500681
5,0.1767,1.260749,0.75802,0.656931,0.59737,0.60155
6,0.0677,1.236702,0.771769,0.693801,0.638997,0.642953
7,0.0287,1.292064,0.796517,0.749018,0.70206,0.703943
8,0.0131,1.373078,0.790101,0.730945,0.70079,0.693142
9,0.0042,1.380237,0.791017,0.719197,0.6705,0.67877
10,0.0032,1.423546,0.791934,0.714538,0.682281,0.680159


[I 2025-03-23 03:36:29,196] Trial 115 finished with value: 0.6814626099141844 and parameters: {'learning_rate': 0.003712159645067156, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 116 with params: {'learning_rate': 0.0048568272115202425, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7416,1.98106,0.501375,0.138202,0.146425,0.118455
2,1.4985,1.385362,0.656279,0.327785,0.318416,0.313242
3,0.8384,1.129607,0.745188,0.44331,0.434476,0.432797
4,0.3741,1.114634,0.766269,0.580324,0.537641,0.545528
5,0.1513,1.230471,0.772686,0.670541,0.623058,0.625061
6,0.0582,1.229939,0.788268,0.706002,0.668855,0.666586
7,0.0172,1.314667,0.793767,0.726942,0.708263,0.700541
8,0.0074,1.40785,0.792851,0.722594,0.684324,0.683427
9,0.0033,1.431039,0.793767,0.714948,0.672497,0.676644
10,0.0019,1.443027,0.796517,0.712155,0.678299,0.675685


[I 2025-03-23 03:37:29,442] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.004990423195816227, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7526,1.99191,0.492209,0.151883,0.141989,0.116444
2,1.4921,1.363151,0.655362,0.350687,0.31761,0.318915
3,0.8151,1.105607,0.745188,0.456589,0.431349,0.431457
4,0.3622,1.111331,0.76352,0.623781,0.561082,0.573832
5,0.1516,1.273436,0.754354,0.68942,0.618885,0.627614
6,0.0476,1.257328,0.787351,0.678251,0.668511,0.659096
7,0.0176,1.381793,0.780018,0.682194,0.645359,0.64766
8,0.0057,1.438357,0.790101,0.707019,0.669815,0.676257
9,0.0024,1.443712,0.794684,0.696958,0.663221,0.668816
10,0.0011,1.47505,0.796517,0.701616,0.672112,0.676083


[I 2025-03-23 03:39:22,718] Trial 117 finished with value: 0.6683611982398666 and parameters: {'learning_rate': 0.004990423195816227, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 118 with params: {'learning_rate': 0.001835474334112618, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8683,2.184316,0.448213,0.10878,0.116312,0.099855
2,1.7942,1.56105,0.609533,0.264829,0.234084,0.223228
3,1.1987,1.243105,0.684693,0.370693,0.340419,0.333847
4,0.7462,1.143643,0.709441,0.451212,0.408838,0.410916
5,0.4383,1.091154,0.748854,0.576938,0.494137,0.505367
6,0.264,1.154283,0.769019,0.62182,0.548852,0.566001
7,0.1305,1.262005,0.745188,0.61896,0.521157,0.547899
8,0.0672,1.283354,0.759853,0.614584,0.562624,0.573695
9,0.037,1.35786,0.75527,0.62801,0.591784,0.591631
10,0.0194,1.422892,0.758937,0.635645,0.596483,0.598643


[I 2025-03-23 03:40:16,223] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0006406713431828303, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.233,2.741828,0.380385,0.038947,0.082989,0.051265
2,2.3788,2.120363,0.467461,0.099191,0.114403,0.09087
3,1.8998,1.745261,0.553621,0.178479,0.166103,0.150605
4,1.5336,1.528644,0.625115,0.278102,0.234808,0.227481
5,1.2444,1.336477,0.659028,0.275898,0.290221,0.278046


[I 2025-03-23 03:41:22,567] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.004592536613105015, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6266,1.881029,0.519707,0.162505,0.168767,0.155015
2,1.4248,1.282117,0.679193,0.354829,0.329735,0.324674
3,0.7575,1.009186,0.754354,0.461939,0.449459,0.448352
4,0.3306,1.072836,0.765353,0.603288,0.562749,0.564478
5,0.1126,1.162655,0.772686,0.723618,0.651084,0.651855
6,0.037,1.200727,0.781852,0.680668,0.668311,0.652317
7,0.0141,1.227566,0.806599,0.720312,0.680583,0.678683
8,0.0058,1.320483,0.7956,0.718165,0.687364,0.67553
9,0.0017,1.306582,0.79835,0.732554,0.699446,0.693709
10,0.0009,1.326706,0.79835,0.722823,0.678187,0.680329


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-23 03:43:19,227] Trial 120 finished with value: 0.6961972394290988 and parameters: {'learning_rate': 0.004592536613105015, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 121 with params: {'learning_rate': 0.002570766043234812, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7445,2.055817,0.494959,0.141121,0.14212,0.120276
2,1.6419,1.432785,0.637947,0.320412,0.274605,0.276627
3,1.0263,1.17152,0.712191,0.389158,0.371414,0.363822
4,0.5686,1.20754,0.708524,0.461321,0.438983,0.439412
5,0.3055,1.136742,0.765353,0.626765,0.555231,0.569015


[I 2025-03-23 03:43:46,722] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.004300669624298167, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5985,1.950876,0.527956,0.202083,0.185161,0.173143
2,1.427,1.281309,0.678277,0.350991,0.335384,0.33228
3,0.7845,1.052228,0.748854,0.465989,0.433363,0.431303
4,0.3567,1.074666,0.75527,0.582954,0.555371,0.549669
5,0.128,1.325199,0.766269,0.731424,0.613923,0.640542
6,0.062,1.2545,0.792851,0.713615,0.659425,0.660212
7,0.0221,1.298055,0.780018,0.686142,0.670864,0.655889
8,0.0078,1.359217,0.800183,0.728556,0.68569,0.692196
9,0.0026,1.385689,0.796517,0.692,0.669696,0.667458
10,0.0018,1.39856,0.7956,0.701697,0.669122,0.668884


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 03:46:16,523] Trial 122 finished with value: 0.6674642057207252 and parameters: {'learning_rate': 0.004300669624298167, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 123 with params: {'learning_rate': 0.002285659024267276, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7497,2.081579,0.482126,0.127108,0.134998,0.117171
2,1.6676,1.464217,0.633364,0.343215,0.273694,0.27532
3,1.0853,1.239421,0.698442,0.39466,0.36184,0.348452
4,0.6364,1.203827,0.708524,0.47284,0.429931,0.433598
5,0.3378,1.143346,0.740605,0.588123,0.512441,0.53079


[I 2025-03-23 03:46:42,842] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.003755212299396064, 'weight_decay': 0.004, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6918,1.967965,0.517874,0.160497,0.161101,0.142847
2,1.502,1.380555,0.658112,0.327257,0.297743,0.2977
3,0.8615,1.108326,0.744271,0.447677,0.441351,0.431351
4,0.4196,1.114354,0.730522,0.528642,0.506102,0.503332
5,0.1823,1.227228,0.777269,0.699237,0.625247,0.639445
6,0.0728,1.305594,0.778185,0.672392,0.662357,0.649365
7,0.0237,1.400628,0.780935,0.708299,0.648739,0.65706
8,0.0093,1.445007,0.788268,0.666675,0.64802,0.644934
9,0.0032,1.469345,0.784601,0.677216,0.659466,0.657972
10,0.0024,1.504317,0.790101,0.69266,0.670399,0.666744


[I 2025-03-23 03:48:02,361] Trial 124 finished with value: 0.6667459412113138 and parameters: {'learning_rate': 0.003755212299396064, 'weight_decay': 0.004, 'warmup_steps': 3}. Best is trial 25 with value: 0.7213101626581778.


Trial 125 with params: {'learning_rate': 0.004595391471126412, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6271,1.878132,0.51879,0.163104,0.172245,0.159274
2,1.4135,1.310287,0.678277,0.351871,0.328037,0.323305
3,0.7393,1.013584,0.749771,0.475818,0.439977,0.439956
4,0.327,1.051677,0.784601,0.633646,0.609484,0.604599
5,0.1031,1.146853,0.780935,0.657746,0.638825,0.62553
6,0.0401,1.212117,0.789184,0.694727,0.661558,0.655836
7,0.0182,1.280874,0.782768,0.696412,0.674413,0.665248
8,0.006,1.32416,0.794684,0.723201,0.679743,0.679434
9,0.0029,1.323191,0.794684,0.712706,0.69529,0.685701
10,0.0013,1.346344,0.793767,0.730812,0.688383,0.685186


[I 2025-03-23 03:49:25,604] Trial 125 finished with value: 0.6896784381158768 and parameters: {'learning_rate': 0.004595391471126412, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 126 with params: {'learning_rate': 0.0047593561775193145, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6068,1.864537,0.519707,0.167874,0.160319,0.146182
2,1.3962,1.262622,0.691109,0.371399,0.352856,0.34567
3,0.7181,1.008496,0.758937,0.507065,0.483292,0.475186
4,0.315,1.100651,0.769936,0.608395,0.603821,0.588568
5,0.1033,1.222262,0.785518,0.715161,0.65265,0.658649
6,0.0389,1.222482,0.783685,0.681809,0.673924,0.662853
7,0.0121,1.263826,0.802933,0.744881,0.700182,0.696751
8,0.0034,1.301895,0.79835,0.724411,0.689476,0.685412
9,0.0023,1.326021,0.7956,0.75528,0.685592,0.693953
10,0.0015,1.314209,0.80385,0.746571,0.700816,0.700003


[I 2025-03-23 03:50:50,680] Trial 126 finished with value: 0.6982372335684731 and parameters: {'learning_rate': 0.0047593561775193145, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 127 with params: {'learning_rate': 0.0047309014371615535, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5978,1.829958,0.536205,0.184245,0.183702,0.172131
2,1.3956,1.323671,0.673694,0.33274,0.32142,0.308292
3,0.7377,0.994376,0.747938,0.481131,0.445501,0.443698
4,0.3267,1.074904,0.770852,0.610893,0.574765,0.581212
5,0.1119,1.201298,0.774519,0.660391,0.634683,0.627646
6,0.0454,1.193841,0.794684,0.683406,0.699204,0.679997
7,0.0132,1.295434,0.780935,0.728531,0.715017,0.703034
8,0.006,1.372912,0.790101,0.732515,0.696248,0.689932
9,0.0025,1.353838,0.790101,0.701446,0.677178,0.671598
10,0.0013,1.375841,0.793767,0.737474,0.702264,0.705361


[I 2025-03-23 03:52:29,981] Trial 127 finished with value: 0.6797208006567468 and parameters: {'learning_rate': 0.0047309014371615535, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 128 with params: {'learning_rate': 0.0023420167662334024, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8684,2.14531,0.458295,0.113538,0.119509,0.104832
2,1.7152,1.480362,0.641613,0.292965,0.271776,0.263685
3,1.0826,1.188676,0.709441,0.371213,0.378657,0.358588
4,0.6147,1.143249,0.710357,0.478251,0.424446,0.43568
5,0.3368,1.088714,0.753437,0.553314,0.520089,0.522314


[I 2025-03-23 03:52:57,419] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0049426452728303875, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6086,1.867443,0.522456,0.166323,0.166038,0.152513
2,1.384,1.291711,0.670027,0.362774,0.344461,0.341871
3,0.7207,0.994511,0.758937,0.490687,0.469535,0.463159
4,0.3167,1.073702,0.772686,0.596296,0.59207,0.576504
5,0.1068,1.237394,0.765353,0.663522,0.619761,0.608614
6,0.0426,1.248435,0.790101,0.684317,0.691951,0.673025
7,0.0119,1.253829,0.7956,0.707817,0.686267,0.678263
8,0.0034,1.277769,0.802933,0.699067,0.688563,0.677722
9,0.0025,1.323723,0.805683,0.715437,0.690106,0.68671
10,0.001,1.337428,0.808433,0.714249,0.695635,0.689186


[I 2025-03-23 03:54:36,543] Trial 129 finished with value: 0.6794700587798279 and parameters: {'learning_rate': 0.0049426452728303875, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 130 with params: {'learning_rate': 0.001752984764568054, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9548,2.226461,0.434464,0.129802,0.111057,0.097842
2,1.8185,1.590142,0.601283,0.237269,0.225796,0.213133
3,1.235,1.260967,0.678277,0.332489,0.328545,0.315572
4,0.7729,1.154012,0.700275,0.433423,0.396627,0.398641
5,0.4548,1.129551,0.742438,0.522128,0.490886,0.484933
6,0.2768,1.220978,0.753437,0.544943,0.501721,0.507969
7,0.1522,1.301165,0.752521,0.597721,0.545819,0.555952
8,0.0791,1.392216,0.749771,0.610357,0.56473,0.57159
9,0.0415,1.408738,0.748854,0.608148,0.593962,0.586575
10,0.0217,1.452235,0.761687,0.632671,0.602228,0.604205


[I 2025-03-23 03:55:32,187] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.004931816282594176, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6099,1.867096,0.52154,0.158942,0.162189,0.147245
2,1.3888,1.26623,0.673694,0.35066,0.344839,0.33794
3,0.7133,0.997956,0.75527,0.502427,0.463168,0.463832
4,0.3055,1.087723,0.775435,0.631864,0.60069,0.595469
5,0.1071,1.204441,0.764436,0.67798,0.64162,0.633906
6,0.0407,1.258906,0.783685,0.707294,0.685366,0.673947
7,0.0125,1.2811,0.789184,0.68325,0.685566,0.665737
8,0.0046,1.324303,0.802016,0.715083,0.697621,0.6924
9,0.0025,1.351679,0.796517,0.715435,0.691717,0.686676
10,0.0011,1.364938,0.793767,0.711947,0.689202,0.683127


[I 2025-03-23 03:57:17,250] Trial 131 finished with value: 0.6933157115819694 and parameters: {'learning_rate': 0.004931816282594176, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 132 with params: {'learning_rate': 0.002620047077125068, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7233,1.984903,0.515124,0.13643,0.145229,0.121273
2,1.5728,1.41514,0.654445,0.292932,0.294436,0.283327
3,0.9723,1.216333,0.695692,0.375923,0.374114,0.35747
4,0.5401,1.217355,0.715857,0.541099,0.443849,0.469335
5,0.2741,1.193492,0.758937,0.632295,0.584145,0.589301
6,0.1486,1.23227,0.770852,0.685764,0.620945,0.631585
7,0.0591,1.400126,0.76077,0.650485,0.576399,0.598247
8,0.0231,1.371042,0.778185,0.655211,0.632401,0.627326
9,0.0099,1.449735,0.780935,0.67459,0.6202,0.629714
10,0.0055,1.496438,0.776352,0.679136,0.623862,0.632968


[I 2025-03-23 03:58:54,334] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.004928181224285774, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6057,1.866169,0.527039,0.163751,0.165076,0.150915
2,1.3778,1.298575,0.670944,0.355081,0.334784,0.329664
3,0.7069,1.003138,0.751604,0.507771,0.471137,0.467758
4,0.3117,1.017875,0.76352,0.609617,0.599527,0.57982
5,0.1011,1.157898,0.776352,0.64621,0.634898,0.615932
6,0.0381,1.139685,0.800183,0.708165,0.717661,0.697786
7,0.0142,1.207594,0.796517,0.73872,0.68657,0.692209
8,0.0042,1.26847,0.797434,0.708308,0.684208,0.679482
9,0.0023,1.260774,0.802016,0.708154,0.698437,0.690696
10,0.0009,1.277626,0.802016,0.707577,0.697965,0.689467


[I 2025-03-23 04:01:01,640] Trial 133 finished with value: 0.68946176844969 and parameters: {'learning_rate': 0.004928181224285774, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 134 with params: {'learning_rate': 0.0029738141228889195, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.808,2.027359,0.486709,0.129236,0.134982,0.11809
2,1.5978,1.354162,0.68011,0.322231,0.31522,0.30796
3,0.9644,1.16048,0.71769,0.422982,0.38975,0.380331
4,0.517,1.17041,0.725023,0.505321,0.465707,0.46687
5,0.2406,1.213617,0.754354,0.599019,0.537164,0.545915
6,0.1192,1.319995,0.769936,0.630028,0.596856,0.601604
7,0.0484,1.362758,0.771769,0.660527,0.627883,0.625299
8,0.0216,1.438807,0.780018,0.658208,0.635475,0.632142
9,0.0077,1.455593,0.789184,0.679847,0.624507,0.635702
10,0.0036,1.483627,0.784601,0.658639,0.627902,0.630312


[I 2025-03-23 04:02:04,881] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.004599155892559169, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6244,1.879805,0.517874,0.160106,0.171533,0.154901
2,1.4254,1.29767,0.68286,0.36263,0.338507,0.332032
3,0.7402,1.016313,0.753437,0.463055,0.469086,0.4499
4,0.3184,1.036271,0.767186,0.565241,0.555168,0.544125
5,0.104,1.265861,0.765353,0.690801,0.618578,0.623904
6,0.0422,1.28671,0.775435,0.722225,0.686344,0.679703
7,0.0152,1.237209,0.791934,0.699502,0.682655,0.675618
8,0.0051,1.318015,0.793767,0.711055,0.678283,0.679031
9,0.0031,1.342423,0.794684,0.707708,0.685682,0.681319
10,0.0016,1.372201,0.794684,0.70746,0.684483,0.677977


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 04:04:22,317] Trial 135 finished with value: 0.6717710220483413 and parameters: {'learning_rate': 0.004599155892559169, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 136 with params: {'learning_rate': 0.004837834216626757, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6048,1.855423,0.529789,0.162918,0.167366,0.152719
2,1.396,1.269907,0.673694,0.361558,0.345624,0.336931
3,0.7425,0.981026,0.748854,0.484665,0.459383,0.451584
4,0.3193,1.05634,0.767186,0.605116,0.59313,0.580298
5,0.0995,1.149238,0.780018,0.709162,0.670319,0.666714
6,0.0358,1.209643,0.791934,0.680924,0.672312,0.660315
7,0.0166,1.198859,0.79835,0.701388,0.695313,0.680852
8,0.0055,1.2424,0.805683,0.709009,0.693588,0.686869
9,0.0018,1.274412,0.802016,0.688095,0.663004,0.660864
10,0.0011,1.291468,0.802016,0.689526,0.671694,0.665747


[I 2025-03-23 04:06:04,169] Trial 136 finished with value: 0.6711538677958893 and parameters: {'learning_rate': 0.004837834216626757, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 137 with params: {'learning_rate': 0.004855174569230301, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6046,1.849496,0.532539,0.17757,0.174005,0.161878
2,1.3936,1.278316,0.678277,0.346263,0.333925,0.325834
3,0.7221,0.991056,0.754354,0.51828,0.473287,0.472312
4,0.3151,1.049272,0.772686,0.607527,0.583793,0.579507
5,0.1017,1.173125,0.796517,0.711983,0.656027,0.656679
6,0.0341,1.244843,0.789184,0.682773,0.688454,0.667693
7,0.0123,1.290653,0.789184,0.693468,0.678022,0.665514
8,0.0048,1.300401,0.79835,0.71386,0.690492,0.682573
9,0.0022,1.332834,0.797434,0.720226,0.678559,0.678716
10,0.0011,1.354522,0.79835,0.73383,0.678579,0.685303


[I 2025-03-23 04:07:33,170] Trial 137 finished with value: 0.6863337009437221 and parameters: {'learning_rate': 0.004855174569230301, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 138 with params: {'learning_rate': 0.0026117019956397175, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7251,1.992836,0.508708,0.135792,0.143499,0.119855
2,1.574,1.448608,0.656279,0.301368,0.30108,0.290582
3,0.9997,1.213425,0.71769,0.385492,0.382153,0.3663
4,0.5539,1.189451,0.710357,0.469741,0.433193,0.437703
5,0.2921,1.200895,0.75802,0.64651,0.57434,0.586204


[I 2025-03-23 04:07:58,972] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.004942851028828938, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6066,1.860553,0.519707,0.161416,0.161667,0.14691
2,1.3839,1.298767,0.670027,0.367294,0.343631,0.341607
3,0.7222,1.008041,0.75527,0.481678,0.461854,0.460238
4,0.3037,1.041628,0.777269,0.662978,0.637583,0.633306
5,0.0989,1.230331,0.762603,0.646769,0.63916,0.624178
6,0.045,1.239884,0.785518,0.69465,0.679311,0.666293
7,0.015,1.272341,0.779102,0.730522,0.686277,0.690492
8,0.0077,1.32121,0.784601,0.712417,0.677925,0.681079
9,0.0047,1.320112,0.787351,0.697662,0.681308,0.67639
10,0.0016,1.334686,0.787351,0.689302,0.684043,0.670981


[I 2025-03-23 04:09:17,189] Trial 139 finished with value: 0.6705347506495062 and parameters: {'learning_rate': 0.004942851028828938, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 25 with value: 0.7213101626581778.


Trial 140 with params: {'learning_rate': 0.0021267752792944274, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7946,2.092466,0.474794,0.123948,0.128683,0.113589
2,1.7001,1.479635,0.631531,0.330721,0.254477,0.254819
3,1.1187,1.228839,0.703941,0.364288,0.355258,0.3421
4,0.6684,1.13141,0.72044,0.464775,0.432096,0.43302
5,0.3628,1.130708,0.745188,0.5658,0.511911,0.52016


[I 2025-03-23 04:09:43,116] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0044953543928124085, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5927,1.887592,0.537122,0.179812,0.183931,0.166084
2,1.431,1.373002,0.648029,0.327961,0.306848,0.302011
3,0.808,1.057945,0.750687,0.488762,0.470151,0.46154
4,0.3682,1.080298,0.75527,0.610352,0.547677,0.556995
5,0.1421,1.202208,0.782768,0.67691,0.646269,0.642071
6,0.0565,1.218817,0.770852,0.717027,0.695653,0.682404
7,0.0279,1.34721,0.777269,0.684893,0.656227,0.657321
8,0.0071,1.36789,0.784601,0.71291,0.657921,0.666538
9,0.0035,1.395903,0.789184,0.706985,0.663703,0.665703
10,0.0013,1.413617,0.790101,0.713009,0.676038,0.673585


[I 2025-03-23 04:11:11,376] Trial 141 finished with value: 0.6822585774224865 and parameters: {'learning_rate': 0.0044953543928124085, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 142 with params: {'learning_rate': 0.002970323072724632, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7626,1.990119,0.491292,0.119519,0.134019,0.112632
2,1.5825,1.434847,0.647113,0.307485,0.275634,0.272785
3,0.9916,1.155356,0.72319,0.381606,0.384034,0.372498
4,0.52,1.213924,0.72044,0.498223,0.467415,0.465546
5,0.2608,1.22354,0.752521,0.626233,0.521418,0.537692


[I 2025-03-23 04:11:35,277] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.00417279246681903, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5999,1.930132,0.526123,0.179246,0.173303,0.153834
2,1.468,1.332177,0.666361,0.3362,0.309869,0.307237
3,0.8392,1.085074,0.746104,0.454347,0.442448,0.432795
4,0.3999,1.150576,0.730522,0.560394,0.500239,0.508133
5,0.153,1.226564,0.777269,0.675985,0.615692,0.623101
6,0.0612,1.278409,0.780935,0.68411,0.647464,0.643614
7,0.0248,1.343814,0.788268,0.716339,0.678576,0.680454
8,0.0138,1.448585,0.789184,0.717438,0.688991,0.68362
9,0.0033,1.433058,0.797434,0.716943,0.68003,0.68064
10,0.0019,1.457137,0.794684,0.730276,0.691237,0.692982


[I 2025-03-23 04:13:04,001] Trial 143 finished with value: 0.6910918403348414 and parameters: {'learning_rate': 0.00417279246681903, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 144 with params: {'learning_rate': 0.004843550464624874, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7398,1.984685,0.496792,0.135768,0.146396,0.119098
2,1.5135,1.39262,0.64528,0.308282,0.292051,0.287182
3,0.8529,1.116028,0.746104,0.445723,0.430017,0.42968
4,0.3792,1.115868,0.770852,0.605457,0.569239,0.574388
5,0.141,1.262142,0.762603,0.64275,0.60362,0.606277
6,0.054,1.297557,0.773602,0.639386,0.64338,0.623284
7,0.0221,1.327121,0.792851,0.714812,0.670501,0.676881
8,0.0066,1.411327,0.792851,0.705466,0.662378,0.671532
9,0.0023,1.447223,0.7956,0.714058,0.674303,0.683814
10,0.0014,1.436134,0.799267,0.723255,0.6841,0.691584


[I 2025-03-23 04:14:31,870] Trial 144 finished with value: 0.6904001806911194 and parameters: {'learning_rate': 0.004843550464624874, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 25 with value: 0.7213101626581778.


Trial 145 with params: {'learning_rate': 0.0009846999981940965, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0872,2.484695,0.382218,0.060389,0.082291,0.057651
2,2.1245,1.896887,0.526123,0.142538,0.151683,0.129077
3,1.604,1.49059,0.636114,0.279277,0.242514,0.238402
4,1.143,1.295856,0.67461,0.354289,0.334325,0.331061
5,0.8391,1.161847,0.701192,0.431687,0.387052,0.383221
6,0.6021,1.133584,0.706691,0.443752,0.415749,0.413317
7,0.4094,1.177178,0.719523,0.501854,0.412768,0.432185
8,0.2976,1.163395,0.730522,0.561796,0.521788,0.527773
9,0.1993,1.170859,0.731439,0.542819,0.496122,0.505647
10,0.1383,1.237852,0.737855,0.610862,0.578827,0.580636


[I 2025-03-23 04:15:31,839] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.001976177147571617, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.882,2.198911,0.447296,0.104273,0.116722,0.100333
2,1.7757,1.541972,0.619615,0.249833,0.24609,0.237496
3,1.1809,1.242907,0.693859,0.407213,0.354585,0.348715
4,0.7201,1.150612,0.713107,0.455067,0.406091,0.411399
5,0.411,1.128784,0.736939,0.578168,0.473023,0.492227


[I 2025-03-23 04:15:55,959] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.004695301719138056, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5229,1.826525,0.551787,0.186691,0.190216,0.173467
2,1.3424,1.304868,0.656279,0.336392,0.326308,0.312856
3,0.7437,1.026971,0.753437,0.476086,0.461549,0.455642
4,0.3572,1.051061,0.762603,0.612217,0.581802,0.570325
5,0.1442,1.207239,0.773602,0.733684,0.673474,0.680558
6,0.0571,1.179284,0.787351,0.690815,0.696695,0.682069
7,0.0205,1.25067,0.784601,0.714409,0.703885,0.69278
8,0.0085,1.341349,0.781852,0.683398,0.673148,0.659704
9,0.0032,1.354189,0.789184,0.704185,0.685975,0.676527
10,0.0012,1.360254,0.787351,0.679901,0.674526,0.660981


[I 2025-03-23 04:17:17,269] Trial 147 finished with value: 0.6715317407739333 and parameters: {'learning_rate': 0.004695301719138056, 'weight_decay': 0.003, 'warmup_steps': 0}. Best is trial 25 with value: 0.7213101626581778.


Trial 148 with params: {'learning_rate': 0.0021960779673047405, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9164,2.192074,0.44363,0.104488,0.113677,0.099753
2,1.768,1.507256,0.628781,0.264674,0.255848,0.241793
3,1.1201,1.210473,0.701192,0.428385,0.362594,0.359856
4,0.6571,1.109886,0.713107,0.473417,0.427416,0.434709
5,0.3728,1.143373,0.752521,0.536336,0.505119,0.50474
6,0.2038,1.221412,0.754354,0.539074,0.541116,0.526224
7,0.0978,1.351081,0.753437,0.601839,0.569406,0.569866
8,0.0406,1.393696,0.759853,0.63572,0.60181,0.600294
9,0.0184,1.511722,0.762603,0.652554,0.598873,0.605533
10,0.0111,1.511352,0.762603,0.634614,0.598893,0.606013


[I 2025-03-23 04:18:14,062] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.004869791072319661, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7411,1.989647,0.496792,0.137951,0.144474,0.118082
2,1.4885,1.374242,0.659028,0.344742,0.318835,0.320574
3,0.8251,1.105261,0.747021,0.4743,0.436791,0.441362
4,0.3734,1.124272,0.753437,0.566859,0.532435,0.53408
5,0.1427,1.28125,0.769936,0.669757,0.625789,0.632301
6,0.0503,1.290808,0.787351,0.705799,0.672961,0.674463
7,0.0218,1.336129,0.789184,0.693077,0.645688,0.655009
8,0.0069,1.41909,0.79835,0.727016,0.680836,0.680176
9,0.002,1.452958,0.79835,0.718131,0.668737,0.676562
10,0.0011,1.48029,0.8011,0.71526,0.674737,0.678505


[I 2025-03-23 04:19:12,171] Trial 149 pruned. 


In [25]:
print(best_trial_normal)

BestRun(run_id='25', objective=0.7213101626581778, hyperparameters={'learning_rate': 0.004865736242064892, 'weight_decay': 0.001, 'warmup_steps': 2}, run_summary=None)


In [26]:
base.reset_seed()

## Prohledávání s destilací nad původním datasetem
Konfigurace jednotlivých tréninků.

In [27]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-distill-embedd_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-distill-embedd_fine_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.


In [28]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [29]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [30]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [31]:
best_trial_distill = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill-embedd",
    n_trials=150
)

[I 2025-03-23 04:19:12,662] A new study created in memory with name: Distill-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2392,2.006961,0.179652,0.024221,0.020668,0.007371
2,1.8425,1.698756,0.380385,0.040339,0.07875,0.050465
3,1.6046,1.49399,0.44088,0.067901,0.100723,0.072117
4,1.4345,1.376686,0.473877,0.104446,0.117615,0.09316
5,1.3234,1.28942,0.509624,0.097179,0.133923,0.107066


[I 2025-03-23 04:19:42,129] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3802,2.145343,0.176902,0.003538,0.02,0.006012
2,2.086,2.01727,0.176902,0.003538,0.02,0.006012
3,1.9506,1.860448,0.359303,0.042002,0.070585,0.045422
4,1.8084,1.753822,0.35472,0.036939,0.07025,0.045432
5,1.7278,1.666985,0.384051,0.039128,0.079655,0.051755
6,1.6305,1.594813,0.40055,0.041341,0.084744,0.054696
7,1.5714,1.547364,0.407883,0.068124,0.087629,0.061929
8,1.5257,1.51054,0.429881,0.069388,0.095509,0.068482
9,1.4923,1.478102,0.44088,0.069847,0.099588,0.072023
10,1.4611,1.461319,0.450046,0.08716,0.103577,0.076663


[I 2025-03-23 04:20:37,359] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4411,2.361915,0.176902,0.003538,0.02,0.006012
2,2.2247,2.101008,0.176902,0.003538,0.02,0.006012
3,2.0939,2.045839,0.176902,0.003538,0.02,0.006012
4,2.0152,1.974783,0.219065,0.025944,0.031291,0.020595
5,1.945,1.888219,0.329056,0.04338,0.061039,0.042593


[I 2025-03-23 04:21:02,845] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3518,2.106797,0.176902,0.003538,0.02,0.006012
2,2.0609,1.969829,0.206233,0.016942,0.027942,0.01627
3,1.9137,1.823671,0.36022,0.04066,0.069826,0.044264
4,1.7659,1.707406,0.371219,0.037515,0.074595,0.04808
5,1.6783,1.61633,0.399633,0.050289,0.08406,0.055314
6,1.5777,1.541568,0.419798,0.068037,0.091055,0.061317
7,1.5155,1.490457,0.43538,0.070137,0.098149,0.072156
8,1.4688,1.457933,0.44363,0.0921,0.10206,0.0767
9,1.4338,1.421856,0.468378,0.089344,0.110252,0.082012
10,1.4015,1.407136,0.468378,0.099407,0.112284,0.086411


[I 2025-03-23 04:22:04,036] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0391,1.667182,0.368469,0.064026,0.076105,0.050111
2,1.4728,1.348601,0.480293,0.104142,0.119147,0.096925
3,1.2112,1.12588,0.571036,0.188467,0.175669,0.156925
4,0.9715,0.981287,0.637947,0.25611,0.23546,0.223773
5,0.8004,0.89442,0.679193,0.29435,0.287372,0.279348
6,0.6535,0.813386,0.700275,0.363738,0.320993,0.316193
7,0.5375,0.777171,0.729606,0.389982,0.360114,0.36142
8,0.4601,0.742478,0.721357,0.39396,0.365038,0.366123
9,0.3875,0.725085,0.746104,0.421643,0.397129,0.397107
10,0.3326,0.720452,0.752521,0.462141,0.440623,0.440744


[I 2025-03-23 04:23:07,381] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8755,1.454821,0.449129,0.125883,0.10781,0.090143
2,1.2569,1.17339,0.560037,0.180116,0.175078,0.154892
3,0.9336,0.87829,0.681027,0.313922,0.299134,0.289429
4,0.6464,0.789448,0.709441,0.380858,0.345239,0.344306
5,0.4671,0.691353,0.750687,0.404716,0.404267,0.397615
6,0.336,0.6679,0.752521,0.478456,0.444573,0.450278
7,0.2496,0.658564,0.756187,0.505468,0.454264,0.463439
8,0.1991,0.647931,0.75802,0.548585,0.514119,0.511188
9,0.1617,0.636361,0.768103,0.55795,0.527394,0.529483
10,0.1389,0.628145,0.771769,0.596277,0.542678,0.553756


[I 2025-03-23 04:24:40,194] Trial 5 finished with value: 0.5995307738374851 and parameters: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 5 with value: 0.5995307738374851.


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0187,1.659681,0.372136,0.043928,0.076615,0.050403
2,1.461,1.333927,0.490376,0.101909,0.123803,0.100935
3,1.2101,1.122935,0.575619,0.194116,0.178034,0.157605
4,0.9828,0.991101,0.636114,0.277543,0.228735,0.213876
5,0.8176,0.913212,0.667278,0.291963,0.271502,0.266633
6,0.6711,0.832393,0.698442,0.350983,0.313955,0.307451
7,0.5533,0.795283,0.714024,0.38162,0.344024,0.345981
8,0.4718,0.752437,0.733272,0.425697,0.376716,0.378597
9,0.397,0.723269,0.743355,0.467259,0.401612,0.413227
10,0.3364,0.719278,0.754354,0.497994,0.453828,0.46148


[I 2025-03-23 04:25:59,174] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8023,1.406276,0.48121,0.110184,0.125395,0.10239
2,1.2142,1.094507,0.590284,0.174672,0.199057,0.176066
3,0.8896,0.869998,0.692026,0.352052,0.314769,0.306702
4,0.624,0.772633,0.71769,0.407451,0.358375,0.364572
5,0.4318,0.689803,0.747938,0.450659,0.432865,0.430222
6,0.3116,0.665374,0.761687,0.513187,0.462713,0.472983
7,0.2283,0.673769,0.761687,0.53834,0.479398,0.491262
8,0.1793,0.646033,0.766269,0.538051,0.519083,0.516809
9,0.1477,0.645976,0.765353,0.578494,0.524933,0.536625
10,0.1263,0.641549,0.769936,0.592224,0.546547,0.554587


[I 2025-03-23 04:27:17,270] Trial 7 finished with value: 0.6351698136085316 and parameters: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 7 with value: 0.6351698136085316.


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3841,2.179567,0.176902,0.003538,0.02,0.006012
2,2.1043,2.043896,0.176902,0.003538,0.02,0.006012
3,1.9926,1.903975,0.353804,0.03276,0.066687,0.042641
4,1.8566,1.809246,0.344638,0.038619,0.066793,0.04365
5,1.785,1.728371,0.363886,0.037022,0.072667,0.047062
6,1.6942,1.660489,0.378552,0.039246,0.078283,0.050621
7,1.6372,1.612992,0.383135,0.040613,0.078906,0.051884
8,1.5918,1.573467,0.401467,0.041771,0.084729,0.054877
9,1.5593,1.54291,0.417965,0.067047,0.090513,0.060648
10,1.5274,1.525288,0.424381,0.064645,0.093199,0.065161


[I 2025-03-23 04:28:10,391] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0094,1.615434,0.386801,0.043979,0.082219,0.05362
2,1.4114,1.308573,0.502291,0.117481,0.132345,0.113139
3,1.1374,1.04927,0.607699,0.218242,0.201266,0.18803
4,0.8759,0.920514,0.658112,0.300254,0.263711,0.257582
5,0.6999,0.829509,0.706691,0.365404,0.320485,0.313936
6,0.5467,0.756741,0.726856,0.420575,0.368261,0.372319
7,0.4365,0.725854,0.740605,0.420319,0.385043,0.384708
8,0.3595,0.708179,0.741522,0.461372,0.418681,0.422082
9,0.2923,0.701813,0.752521,0.48487,0.459885,0.458867
10,0.25,0.685214,0.75802,0.500564,0.476826,0.476599


[I 2025-03-23 04:29:43,876] Trial 9 finished with value: 0.4988478214831171 and parameters: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 7 with value: 0.6351698136085316.


Trial 10 with params: {'learning_rate': 0.004794768110099147, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6735,1.263177,0.51879,0.115801,0.154572,0.124017
2,1.0103,0.889256,0.672777,0.283122,0.297982,0.277192
3,0.6437,0.732105,0.744271,0.388424,0.392772,0.379148
4,0.4079,0.67809,0.75802,0.496275,0.470469,0.468512
5,0.2593,0.61943,0.784601,0.524893,0.521885,0.511482
6,0.1778,0.599186,0.790101,0.616238,0.57757,0.583522
7,0.1303,0.593499,0.796517,0.687715,0.636566,0.647626
8,0.105,0.584718,0.8011,0.741238,0.668243,0.683265
9,0.0918,0.578655,0.805683,0.72444,0.668868,0.680615
10,0.0855,0.572899,0.806599,0.731334,0.666299,0.680864


[I 2025-03-23 04:31:06,744] Trial 10 finished with value: 0.7004773282546286 and parameters: {'learning_rate': 0.004794768110099147, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 11 with params: {'learning_rate': 0.0024992304877613537, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8023,1.390247,0.482126,0.127781,0.130804,0.112629
2,1.1883,1.048627,0.607699,0.233169,0.211489,0.189106
3,0.8482,0.837139,0.703025,0.322553,0.329341,0.314585
4,0.5718,0.757783,0.718607,0.369734,0.365838,0.35758
5,0.3884,0.670404,0.757104,0.449375,0.429268,0.428486
6,0.2765,0.64741,0.764436,0.517112,0.480404,0.484375
7,0.2,0.631792,0.765353,0.537707,0.499324,0.500643
8,0.1599,0.624525,0.774519,0.596715,0.54907,0.556778
9,0.132,0.620293,0.776352,0.647258,0.573869,0.591569
10,0.1138,0.610912,0.773602,0.64117,0.591335,0.599597


[I 2025-03-23 04:32:30,466] Trial 11 finished with value: 0.6516149506193659 and parameters: {'learning_rate': 0.0024992304877613537, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 12 with params: {'learning_rate': 0.002582261142330005, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8117,1.404533,0.469294,0.110115,0.126439,0.102243
2,1.1739,1.049009,0.611366,0.226514,0.224146,0.209246
3,0.8142,0.821449,0.701192,0.318207,0.341157,0.321964
4,0.5484,0.726362,0.733272,0.377107,0.382986,0.370227
5,0.3724,0.651915,0.753437,0.486136,0.443432,0.449855
6,0.2642,0.633743,0.771769,0.497035,0.478351,0.474159
7,0.1905,0.629189,0.776352,0.586562,0.518409,0.529619
8,0.152,0.622722,0.774519,0.631191,0.575222,0.58501
9,0.1259,0.611024,0.780935,0.628669,0.575498,0.589191
10,0.1079,0.612646,0.780935,0.64186,0.594198,0.605574


[I 2025-03-23 04:34:20,953] Trial 12 finished with value: 0.6249656524090532 and parameters: {'learning_rate': 0.002582261142330005, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 13 with params: {'learning_rate': 0.00440198015702204, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6975,1.244689,0.530706,0.113907,0.151229,0.122744
2,1.0351,0.901364,0.671861,0.282102,0.290129,0.273514
3,0.6757,0.741492,0.731439,0.385917,0.377188,0.363121
4,0.4341,0.681526,0.743355,0.467735,0.43676,0.435962
5,0.2797,0.637688,0.775435,0.549098,0.508618,0.509084
6,0.1956,0.617082,0.780018,0.632715,0.578828,0.589213
7,0.1414,0.608062,0.791934,0.646172,0.589815,0.603863
8,0.1118,0.596262,0.802016,0.671114,0.640337,0.646984
9,0.0955,0.593676,0.807516,0.682444,0.649333,0.653084
10,0.0869,0.589725,0.809349,0.686543,0.652071,0.657463


[I 2025-03-23 04:35:42,593] Trial 13 finished with value: 0.6937974477122353 and parameters: {'learning_rate': 0.00440198015702204, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 14 with params: {'learning_rate': 0.0012737019172890734, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9643,1.547887,0.4033,0.071925,0.088166,0.062284
2,1.3521,1.271718,0.527039,0.158891,0.15133,0.133743
3,1.0779,0.975136,0.648029,0.237525,0.23986,0.224081
4,0.7969,0.865937,0.681027,0.340184,0.301836,0.300883
5,0.6089,0.762754,0.72319,0.355257,0.352859,0.343457
6,0.4603,0.718303,0.736022,0.431696,0.399514,0.398179
7,0.3524,0.693887,0.750687,0.450504,0.430878,0.429453
8,0.2812,0.685066,0.750687,0.488775,0.460653,0.463265
9,0.2305,0.679023,0.753437,0.496944,0.470097,0.472668
10,0.1976,0.666374,0.750687,0.483135,0.481043,0.470426


[I 2025-03-23 04:36:34,004] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0023694846855495865, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8027,1.397377,0.482126,0.11745,0.129901,0.107817
2,1.1969,1.064731,0.59945,0.21829,0.204879,0.186126
3,0.8635,0.870586,0.68286,0.328296,0.316267,0.30234
4,0.5913,0.760447,0.710357,0.384816,0.362322,0.35889
5,0.403,0.678678,0.751604,0.437321,0.421471,0.421138
6,0.2877,0.664996,0.76352,0.533922,0.464769,0.477835
7,0.2114,0.652731,0.75802,0.552425,0.477897,0.491285
8,0.1688,0.631706,0.768103,0.579045,0.529557,0.54013
9,0.1378,0.625385,0.776352,0.610652,0.5537,0.568539
10,0.1203,0.62965,0.775435,0.628941,0.566866,0.582612


[I 2025-03-23 04:37:52,335] Trial 15 finished with value: 0.642108737426113 and parameters: {'learning_rate': 0.0023694846855495865, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 16 with params: {'learning_rate': 0.002734346005205536, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7898,1.367379,0.493126,0.135221,0.138563,0.116659
2,1.1623,1.03332,0.612282,0.213828,0.216829,0.196021
3,0.8133,0.833888,0.702108,0.359292,0.329462,0.31682
4,0.5509,0.740276,0.731439,0.434511,0.390541,0.391986
5,0.3652,0.666775,0.759853,0.497601,0.453003,0.458825
6,0.2568,0.633166,0.773602,0.53012,0.505789,0.504738
7,0.1844,0.633193,0.761687,0.545307,0.492996,0.502454
8,0.1497,0.619732,0.783685,0.560406,0.544625,0.543044
9,0.1232,0.608126,0.789184,0.669731,0.623004,0.629699
10,0.1077,0.608777,0.787351,0.641661,0.621905,0.621239


[I 2025-03-23 04:39:06,312] Trial 16 finished with value: 0.643001812791519 and parameters: {'learning_rate': 0.002734346005205536, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 17 with params: {'learning_rate': 0.003525963832830755, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7856,1.333953,0.487626,0.104641,0.127578,0.101201
2,1.1119,0.985137,0.640697,0.25307,0.255364,0.245334
3,0.7276,0.763329,0.724106,0.34481,0.363855,0.34845
4,0.4574,0.695086,0.752521,0.460114,0.434912,0.433055
5,0.305,0.644898,0.767186,0.531863,0.497263,0.498695
6,0.2114,0.621636,0.776352,0.588061,0.535233,0.547013
7,0.1536,0.617629,0.786434,0.627919,0.55861,0.576031
8,0.1209,0.606443,0.792851,0.6324,0.595869,0.599797
9,0.1033,0.603323,0.797434,0.648227,0.629057,0.628649
10,0.0927,0.608474,0.791934,0.672892,0.641124,0.646356


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 04:40:28,484] Trial 17 finished with value: 0.6498041404403969 and parameters: {'learning_rate': 0.003525963832830755, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 18 with params: {'learning_rate': 0.0046276741627411236, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7593,1.317247,0.494959,0.111852,0.13923,0.114175
2,1.0591,0.929041,0.665445,0.28123,0.283473,0.266202
3,0.6743,0.766273,0.729606,0.356271,0.372502,0.350021
4,0.4244,0.680306,0.749771,0.469461,0.444117,0.444029
5,0.2719,0.622839,0.786434,0.541365,0.511402,0.515929
6,0.1857,0.618508,0.780935,0.598261,0.563618,0.571109
7,0.1358,0.601865,0.799267,0.67953,0.642476,0.649887
8,0.1068,0.58497,0.8011,0.671071,0.643003,0.645139
9,0.094,0.580693,0.800183,0.696977,0.645767,0.658155
10,0.0868,0.585978,0.802016,0.735184,0.681888,0.695113


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 04:42:07,270] Trial 18 finished with value: 0.6672795065178876 and parameters: {'learning_rate': 0.0046276741627411236, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 19 with params: {'learning_rate': 0.0027288210583899194, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7916,1.370731,0.485793,0.132385,0.135612,0.11381
2,1.1644,1.032526,0.611366,0.215122,0.217553,0.197378
3,0.8132,0.831047,0.698442,0.338315,0.329887,0.315814
4,0.5476,0.741358,0.731439,0.406466,0.385742,0.381065
5,0.3635,0.667204,0.75527,0.485531,0.447186,0.452687
6,0.2572,0.638451,0.769936,0.542619,0.497214,0.499184
7,0.1823,0.633086,0.774519,0.557294,0.509184,0.516032
8,0.1465,0.630621,0.782768,0.584035,0.55739,0.556209
9,0.1215,0.610423,0.787351,0.646915,0.587461,0.596497
10,0.1062,0.610251,0.784601,0.647413,0.611306,0.616162


[I 2025-03-23 04:43:40,548] Trial 19 finished with value: 0.6465644163904785 and parameters: {'learning_rate': 0.0027288210583899194, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 7.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 20 with params: {'learning_rate': 0.0005440056024291415, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0927,1.81566,0.36297,0.041584,0.076065,0.050528
2,1.6019,1.463149,0.439047,0.087944,0.100279,0.075008
3,1.3472,1.270045,0.508708,0.137435,0.135852,0.111021
4,1.1648,1.126516,0.587534,0.172976,0.187335,0.167163
5,1.0223,1.042597,0.611366,0.211856,0.208915,0.186294


[I 2025-03-23 04:44:10,634] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.004570653236073519, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.754,1.31196,0.502291,0.113664,0.141055,0.116169
2,1.0582,0.929282,0.665445,0.283849,0.28484,0.268959
3,0.6752,0.762514,0.730522,0.374074,0.381424,0.362675
4,0.4163,0.6736,0.754354,0.466909,0.455602,0.451258
5,0.2698,0.621822,0.777269,0.53563,0.512495,0.514344
6,0.1854,0.602877,0.790101,0.633636,0.575047,0.585457
7,0.1342,0.597053,0.792851,0.682357,0.616723,0.633503
8,0.1087,0.582624,0.802016,0.684326,0.651359,0.656353
9,0.0951,0.575341,0.799267,0.674516,0.643799,0.648829
10,0.0881,0.578713,0.796517,0.691858,0.659748,0.663635


[I 2025-03-23 04:45:30,571] Trial 21 finished with value: 0.6927574243355825 and parameters: {'learning_rate': 0.004570653236073519, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 22 with params: {'learning_rate': 0.004100430205484654, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7976,1.3435,0.48121,0.115415,0.13294,0.109721
2,1.0875,0.939017,0.658112,0.292828,0.2758,0.265299
3,0.6904,0.737996,0.733272,0.356538,0.380287,0.36297
4,0.4324,0.677348,0.75802,0.476528,0.450849,0.445817
5,0.2818,0.638326,0.768103,0.528663,0.506289,0.505902
6,0.1961,0.629797,0.781852,0.64543,0.56585,0.583033
7,0.1451,0.621041,0.786434,0.613053,0.59012,0.58886
8,0.113,0.60436,0.79835,0.648692,0.631884,0.627717
9,0.0974,0.600181,0.804766,0.646894,0.637021,0.631289
10,0.0881,0.60316,0.792851,0.658241,0.639512,0.638289


[I 2025-03-23 04:47:26,980] Trial 22 finished with value: 0.6533162242786535 and parameters: {'learning_rate': 0.004100430205484654, 'weight_decay': 0.008, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 5.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 23 with params: {'learning_rate': 0.0038665609148511673, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.748,1.294598,0.513291,0.123573,0.143545,0.122153
2,1.0796,0.972172,0.64253,0.257505,0.269944,0.254992
3,0.7083,0.763488,0.719523,0.339377,0.363419,0.341384
4,0.4445,0.677865,0.751604,0.477531,0.43626,0.437137
5,0.2888,0.630996,0.762603,0.500383,0.479276,0.476836
6,0.2019,0.603101,0.784601,0.605597,0.555286,0.560014
7,0.1455,0.590035,0.791934,0.641338,0.601772,0.610237
8,0.118,0.578155,0.794684,0.670886,0.639614,0.645768
9,0.0999,0.55889,0.807516,0.683374,0.644147,0.653618
10,0.0903,0.562946,0.811182,0.684429,0.661945,0.662911


[I 2025-03-23 04:48:49,469] Trial 23 finished with value: 0.6701440581247562 and parameters: {'learning_rate': 0.0038665609148511673, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 5.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 24 with params: {'learning_rate': 0.004449158826250523, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6939,1.250431,0.531622,0.11612,0.151836,0.122529
2,1.0381,0.897986,0.665445,0.278674,0.28504,0.265457
3,0.676,0.738254,0.733272,0.365999,0.374829,0.356052
4,0.4354,0.657737,0.753437,0.447955,0.437371,0.431261
5,0.2781,0.637373,0.770852,0.54561,0.497392,0.501006
6,0.196,0.605738,0.793767,0.615852,0.570117,0.575751
7,0.1403,0.592371,0.792851,0.635449,0.593055,0.598135
8,0.1115,0.587208,0.796517,0.663629,0.618734,0.625924
9,0.096,0.585414,0.797434,0.707925,0.628437,0.652837
10,0.0879,0.582235,0.802016,0.703783,0.638235,0.657365


[I 2025-03-23 04:50:10,683] Trial 24 finished with value: 0.699263329186024 and parameters: {'learning_rate': 0.004449158826250523, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 25 with params: {'learning_rate': 0.004023407071721637, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.748,1.300995,0.503208,0.118177,0.139779,0.119548
2,1.0853,0.946944,0.663611,0.263298,0.279591,0.263286
3,0.7034,0.753761,0.731439,0.355777,0.374892,0.351693
4,0.4435,0.661831,0.754354,0.463281,0.446026,0.44125
5,0.2897,0.624132,0.771769,0.511139,0.491579,0.489013
6,0.2031,0.600232,0.791934,0.605145,0.577875,0.575538
7,0.1432,0.592371,0.791017,0.628088,0.594485,0.600796
8,0.1163,0.580598,0.8011,0.671319,0.642245,0.638232
9,0.0989,0.572113,0.80385,0.716444,0.677898,0.685279
10,0.0892,0.570003,0.805683,0.715269,0.674279,0.679376


[I 2025-03-23 04:51:34,919] Trial 25 finished with value: 0.6789959043686141 and parameters: {'learning_rate': 0.004023407071721637, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 26 with params: {'learning_rate': 0.003190537155437768, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.72,1.305617,0.509624,0.117471,0.143446,0.121336
2,1.0969,0.973204,0.648029,0.273472,0.268687,0.252651
3,0.7364,0.784907,0.715857,0.352364,0.349849,0.335572
4,0.4885,0.697906,0.735105,0.414857,0.412764,0.404813
5,0.3182,0.63651,0.772686,0.537883,0.475464,0.487762
6,0.222,0.613884,0.766269,0.517656,0.49916,0.496806
7,0.1598,0.613989,0.784601,0.618034,0.557884,0.570705
8,0.1271,0.596428,0.791934,0.637709,0.580828,0.590139
9,0.1065,0.593391,0.793767,0.640224,0.591896,0.602454
10,0.0951,0.590974,0.791934,0.696298,0.619688,0.638543


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-23 04:52:57,591] Trial 26 finished with value: 0.6398260275956891 and parameters: {'learning_rate': 0.003190537155437768, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 27 with params: {'learning_rate': 0.004578355474373815, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6894,1.247833,0.536205,0.116095,0.15581,0.125985
2,1.0282,0.891048,0.663611,0.276382,0.286844,0.266119
3,0.6676,0.729686,0.738772,0.383597,0.386059,0.370868
4,0.42,0.686884,0.753437,0.485095,0.453642,0.452594
5,0.2679,0.617415,0.789184,0.557795,0.531371,0.526852
6,0.1837,0.601748,0.791017,0.635103,0.577269,0.587304
7,0.1381,0.596583,0.793767,0.660336,0.619341,0.622267
8,0.1123,0.591045,0.7956,0.695459,0.641416,0.654128
9,0.0971,0.586328,0.799267,0.726748,0.66595,0.68211
10,0.0886,0.581114,0.80385,0.7437,0.685088,0.696775


[I 2025-03-23 04:54:18,474] Trial 27 finished with value: 0.6986664321670952 and parameters: {'learning_rate': 0.004578355474373815, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 28 with params: {'learning_rate': 0.004307583426714664, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6937,1.263519,0.52154,0.136668,0.148994,0.12333
2,1.0413,0.935533,0.670944,0.300422,0.297575,0.282137
3,0.6792,0.735226,0.735105,0.371748,0.380121,0.366063
4,0.4412,0.671904,0.757104,0.473878,0.442904,0.443498
5,0.2817,0.64148,0.775435,0.540343,0.514728,0.512884
6,0.1937,0.621961,0.782768,0.589821,0.561738,0.561772
7,0.137,0.610985,0.787351,0.652705,0.620475,0.62253
8,0.1107,0.599876,0.796517,0.682374,0.639544,0.648968
9,0.0961,0.599571,0.792851,0.678917,0.624165,0.634148
10,0.0877,0.59422,0.802016,0.688465,0.643637,0.653844


[I 2025-03-23 04:55:44,780] Trial 28 finished with value: 0.677714796564872 and parameters: {'learning_rate': 0.004307583426714664, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 29 with params: {'learning_rate': 0.004969081512163701, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6746,1.271964,0.517874,0.11875,0.155246,0.126313
2,1.0088,0.893116,0.670944,0.314445,0.291356,0.276763
3,0.6354,0.735429,0.736939,0.387801,0.38598,0.372193
4,0.4063,0.658482,0.762603,0.491069,0.471623,0.46944
5,0.2515,0.647702,0.774519,0.541305,0.520896,0.51937
6,0.1789,0.612618,0.790101,0.674628,0.614409,0.626478
7,0.1272,0.602073,0.799267,0.700052,0.644942,0.659036
8,0.1016,0.595599,0.8011,0.710054,0.649244,0.666542
9,0.0893,0.595006,0.794684,0.723012,0.651647,0.673097
10,0.0843,0.594599,0.79835,0.738742,0.660206,0.683641


[I 2025-03-23 04:57:10,693] Trial 29 finished with value: 0.6900634333581257 and parameters: {'learning_rate': 0.004969081512163701, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 30 with params: {'learning_rate': 0.002621228828422614, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7997,1.385171,0.490376,0.133105,0.138326,0.11986
2,1.1786,1.050342,0.605866,0.213131,0.214696,0.197074
3,0.8321,0.836071,0.704858,0.320705,0.331992,0.317767
4,0.5629,0.733396,0.726856,0.376467,0.372094,0.365346
5,0.3774,0.66922,0.754354,0.495683,0.443591,0.453003
6,0.2706,0.643191,0.765353,0.495344,0.48438,0.479646
7,0.1904,0.625406,0.774519,0.572718,0.507445,0.520742
8,0.1518,0.613055,0.780018,0.605861,0.567018,0.574743
9,0.1258,0.60669,0.778185,0.605622,0.565514,0.573198
10,0.1089,0.605543,0.784601,0.651187,0.611034,0.617079


[I 2025-03-23 04:58:05,067] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0043272379034676525, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6937,1.262847,0.52429,0.13686,0.150139,0.122066
2,1.0444,0.930241,0.662695,0.300153,0.285697,0.269046
3,0.6857,0.748806,0.728689,0.361629,0.374572,0.355403
4,0.4499,0.670686,0.75527,0.457746,0.437394,0.433777
5,0.2859,0.640432,0.776352,0.546051,0.505679,0.507448
6,0.1974,0.617012,0.781852,0.597563,0.537102,0.545798
7,0.1392,0.610009,0.791934,0.657075,0.592747,0.607186
8,0.1121,0.594353,0.80385,0.687429,0.632538,0.642012
9,0.0974,0.592547,0.805683,0.706868,0.643409,0.65629
10,0.0888,0.592096,0.802016,0.698968,0.650213,0.658645


[I 2025-03-23 05:00:07,097] Trial 31 finished with value: 0.6778138072153763 and parameters: {'learning_rate': 0.0043272379034676525, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 32 with params: {'learning_rate': 0.0006430043333997293, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0997,1.766576,0.36572,0.041189,0.076896,0.050015
2,1.5625,1.43688,0.451879,0.090338,0.106331,0.082837
3,1.3052,1.223223,0.535289,0.134757,0.151839,0.128059
4,1.0968,1.066531,0.60495,0.192669,0.195195,0.176583
5,0.9235,0.965701,0.651696,0.293206,0.24739,0.239708


[I 2025-03-23 05:00:36,389] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0036010714826518737, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7428,1.297906,0.499542,0.116825,0.138912,0.115517
2,1.0833,0.945347,0.658112,0.26417,0.276345,0.261536
3,0.709,0.752394,0.729606,0.345296,0.363125,0.346262
4,0.4515,0.662895,0.745188,0.448941,0.421849,0.423574
5,0.2864,0.619459,0.770852,0.536007,0.497196,0.50535
6,0.2072,0.586895,0.782768,0.578256,0.538402,0.541251
7,0.1478,0.591152,0.788268,0.648032,0.596923,0.608247
8,0.1186,0.578373,0.793767,0.654763,0.627055,0.626557
9,0.1002,0.565157,0.799267,0.675451,0.644325,0.648116
10,0.0904,0.566068,0.799267,0.68526,0.649083,0.654341


[I 2025-03-23 05:03:03,420] Trial 33 finished with value: 0.6655733307319109 and parameters: {'learning_rate': 0.0036010714826518737, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 34 with params: {'learning_rate': 0.004461637002173936, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6957,1.247303,0.530706,0.115315,0.150294,0.121296
2,1.0387,0.900556,0.670027,0.279249,0.288465,0.269192
3,0.6737,0.744537,0.733272,0.403071,0.382795,0.371004
4,0.4349,0.658186,0.75802,0.447902,0.431813,0.423908
5,0.2749,0.62592,0.786434,0.582829,0.526988,0.531951
6,0.193,0.601906,0.796517,0.633602,0.578821,0.588026
7,0.1358,0.591237,0.800183,0.643067,0.603794,0.613051
8,0.111,0.580104,0.806599,0.667466,0.637974,0.640403
9,0.0974,0.577395,0.805683,0.671771,0.635615,0.64197
10,0.0887,0.575862,0.806599,0.66906,0.642651,0.645717


[I 2025-03-23 05:05:14,878] Trial 34 finished with value: 0.6748640051743533 and parameters: {'learning_rate': 0.004461637002173936, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 35 with params: {'learning_rate': 5.817102176211476e-05, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.433,2.341235,0.176902,0.003538,0.02,0.006012
2,2.2005,2.093047,0.176902,0.003538,0.02,0.006012
3,2.085,2.031008,0.176902,0.003538,0.02,0.006012
4,1.9917,1.941039,0.296059,0.043169,0.051745,0.036206
5,1.9187,1.863829,0.337305,0.040523,0.063619,0.042964
6,1.8366,1.806294,0.346471,0.037004,0.067342,0.042053
7,1.785,1.763044,0.346471,0.037232,0.06718,0.042892
8,1.7458,1.725252,0.368469,0.03841,0.074433,0.048036
9,1.7145,1.695612,0.373052,0.03829,0.075678,0.048853
10,1.6846,1.677593,0.376719,0.039159,0.078595,0.050766


[I 2025-03-23 05:06:12,029] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.004081300256029174, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6964,1.265447,0.52154,0.116768,0.147333,0.122944
2,1.0588,0.96995,0.650779,0.265567,0.274009,0.254534
3,0.6993,0.749415,0.724106,0.378642,0.3626,0.352163
4,0.4533,0.679719,0.75527,0.453286,0.440953,0.436769
5,0.2909,0.644876,0.770852,0.509642,0.489329,0.488292
6,0.2058,0.612083,0.781852,0.562582,0.534981,0.529486
7,0.1476,0.604857,0.784601,0.655556,0.598722,0.610287
8,0.1159,0.581883,0.800183,0.660035,0.618243,0.625378
9,0.1007,0.579881,0.791934,0.655387,0.612,0.620323
10,0.0917,0.576998,0.797434,0.669748,0.63982,0.641372


[I 2025-03-23 05:07:30,311] Trial 36 finished with value: 0.685028245829431 and parameters: {'learning_rate': 0.004081300256029174, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 37 with params: {'learning_rate': 5.431299921217806e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4338,2.350368,0.176902,0.003538,0.02,0.006012
2,2.2156,2.099802,0.176902,0.003538,0.02,0.006012
3,2.0929,2.045092,0.176902,0.003538,0.02,0.006012
4,2.0143,1.974318,0.229148,0.029077,0.03382,0.024389
5,1.9463,1.890027,0.331806,0.044617,0.061743,0.043131
6,1.8629,1.832117,0.345555,0.039668,0.066959,0.043933
7,1.8113,1.789997,0.345555,0.038662,0.066929,0.043178
8,1.7722,1.751075,0.367553,0.038842,0.073809,0.047708
9,1.7409,1.722089,0.366636,0.0379,0.073668,0.047547
10,1.7114,1.704117,0.371219,0.03909,0.076779,0.04983


[I 2025-03-23 05:08:18,944] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00015181932061058664, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2947,2.079124,0.176902,0.003538,0.02,0.006012
2,2.0002,1.887628,0.296059,0.038663,0.051837,0.029139
3,1.8177,1.721051,0.366636,0.040975,0.073453,0.047542
4,1.666,1.60251,0.401467,0.042869,0.083998,0.055655
5,1.5713,1.507827,0.44088,0.076379,0.099882,0.07367
6,1.4693,1.43843,0.457379,0.094558,0.106593,0.079988
7,1.4105,1.391469,0.476627,0.105144,0.116483,0.089914
8,1.3662,1.367099,0.47846,0.124036,0.120842,0.09817
9,1.329,1.331833,0.496792,0.113717,0.127017,0.101512
10,1.2931,1.315832,0.498625,0.13654,0.130585,0.103077


[I 2025-03-23 05:09:21,235] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.002301922512292658, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.804,1.400515,0.484876,0.114261,0.130294,0.109592
2,1.1989,1.075919,0.595784,0.21497,0.205202,0.185161
3,0.8631,0.876132,0.684693,0.342509,0.315085,0.303302
4,0.5975,0.769218,0.711274,0.36764,0.357939,0.352616
5,0.4123,0.675156,0.752521,0.479067,0.424954,0.429494


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-23 05:10:18,644] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.003133231742693726, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7281,1.308739,0.507791,0.115125,0.142937,0.119645
2,1.1002,0.962756,0.653529,0.277926,0.268917,0.255578
3,0.7436,0.792714,0.713107,0.343601,0.354789,0.339289
4,0.4916,0.714388,0.732356,0.402108,0.395814,0.386592
5,0.3271,0.642236,0.765353,0.491587,0.451643,0.461456
6,0.2275,0.61487,0.774519,0.540595,0.516155,0.515514
7,0.1676,0.615665,0.778185,0.607773,0.537368,0.548576
8,0.135,0.600079,0.785518,0.609458,0.5755,0.576659
9,0.1115,0.588914,0.791017,0.63772,0.581312,0.590901
10,0.0986,0.591572,0.789184,0.647003,0.598642,0.60826


[I 2025-03-23 05:12:10,758] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0021765353923214447, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8032,1.401713,0.485793,0.108948,0.128332,0.104885
2,1.2084,1.118083,0.582035,0.16889,0.197002,0.174422
3,0.8896,0.877842,0.68011,0.318312,0.302167,0.288889
4,0.626,0.774434,0.716774,0.387927,0.35309,0.353977
5,0.4326,0.700503,0.742438,0.448752,0.408782,0.410663


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-23 05:13:08,988] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.003310420311455098, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7464,1.302392,0.505041,0.107808,0.13867,0.111989
2,1.0961,0.98692,0.649863,0.279862,0.271965,0.257768
3,0.7226,0.769413,0.72594,0.376778,0.36407,0.347555
4,0.4581,0.668354,0.753437,0.456493,0.435664,0.431922
5,0.2921,0.621067,0.768103,0.520997,0.486557,0.487754
6,0.208,0.596095,0.782768,0.568988,0.530154,0.5362
7,0.1509,0.589579,0.785518,0.634843,0.573758,0.585836
8,0.1201,0.583609,0.791934,0.649068,0.605893,0.609078
9,0.1027,0.569694,0.802016,0.686813,0.649348,0.656403
10,0.0916,0.57112,0.8011,0.685739,0.648194,0.654232


[I 2025-03-23 05:14:37,737] Trial 42 finished with value: 0.6790623567177275 and parameters: {'learning_rate': 0.003310420311455098, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 4.5}. Best is trial 10 with value: 0.7004773282546286.


Trial 43 with params: {'learning_rate': 0.001667724683323363, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8464,1.453943,0.460128,0.105803,0.114539,0.09542
2,1.2616,1.149929,0.56187,0.178032,0.176725,0.157483
3,0.9555,0.90594,0.667278,0.29612,0.275988,0.26948
4,0.6862,0.807373,0.702108,0.382802,0.32509,0.327279
5,0.5066,0.705897,0.740605,0.402136,0.386346,0.382535


[I 2025-03-23 05:15:04,296] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0004014407821893915, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2063,1.915751,0.295142,0.059587,0.054374,0.043716
2,1.7334,1.567618,0.428048,0.066074,0.093492,0.063237
3,1.4632,1.37572,0.477544,0.078298,0.117335,0.090226
4,1.2996,1.259388,0.527039,0.126197,0.149568,0.127697
5,1.1704,1.143402,0.573786,0.150946,0.17685,0.15521
6,1.0359,1.08724,0.613199,0.174661,0.209901,0.184979
7,0.9431,1.02242,0.631531,0.233275,0.22375,0.204834
8,0.8688,0.988806,0.637947,0.276463,0.229269,0.222654
9,0.7982,0.952376,0.649863,0.325496,0.252856,0.25351
10,0.7332,0.924053,0.662695,0.315364,0.269245,0.265256


[I 2025-03-23 05:16:09,527] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.004229168606699789, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8095,1.348306,0.474794,0.117876,0.131453,0.110988
2,1.09,0.940948,0.651696,0.274058,0.268862,0.25937
3,0.6922,0.749698,0.734189,0.383105,0.387585,0.372066
4,0.4368,0.672064,0.76077,0.498501,0.463316,0.4642
5,0.2817,0.636159,0.765353,0.527041,0.505749,0.506099
6,0.1999,0.632684,0.782768,0.575547,0.55592,0.552181
7,0.1502,0.618128,0.785518,0.631471,0.591765,0.597854
8,0.1157,0.60297,0.793767,0.665395,0.641424,0.642961
9,0.0983,0.597067,0.79835,0.697294,0.66636,0.669941
10,0.0884,0.595907,0.79835,0.712798,0.672059,0.680576


[I 2025-03-23 05:17:34,485] Trial 45 finished with value: 0.691276104901086 and parameters: {'learning_rate': 0.004229168606699789, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 10 with value: 0.7004773282546286.


Trial 46 with params: {'learning_rate': 0.00044789989803155166, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1742,1.939689,0.318057,0.04715,0.067392,0.05093
2,1.7117,1.552579,0.429881,0.075995,0.094327,0.064913
3,1.4542,1.35331,0.48121,0.100135,0.120452,0.092851
4,1.2818,1.227194,0.538955,0.128775,0.154712,0.129432
5,1.1426,1.110398,0.590284,0.172643,0.186242,0.163333
6,0.9984,1.050256,0.617782,0.19341,0.213382,0.187861
7,0.8918,0.973964,0.646196,0.251662,0.237652,0.221827
8,0.8086,0.941653,0.652612,0.302476,0.254519,0.252191
9,0.7355,0.906223,0.67461,0.332625,0.280178,0.27975
10,0.666,0.872798,0.686526,0.361113,0.297929,0.298241


[I 2025-03-23 05:18:35,353] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.00017209337253776082, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2987,2.073207,0.176902,0.003538,0.02,0.006012
2,1.982,1.86377,0.336389,0.041747,0.062609,0.039635
3,1.7821,1.684826,0.384968,0.043332,0.079922,0.052773
4,1.6243,1.55546,0.414299,0.063339,0.088547,0.06027
5,1.5188,1.45837,0.455545,0.072803,0.105332,0.078612


[I 2025-03-23 05:19:04,497] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.004349031425050886, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7513,1.277896,0.51879,0.110785,0.145949,0.11747
2,1.0687,0.930618,0.659945,0.283997,0.27914,0.267977
3,0.6802,0.742177,0.735105,0.389738,0.385921,0.36907
4,0.4159,0.66681,0.753437,0.447858,0.43735,0.434706
5,0.2691,0.621318,0.781852,0.518253,0.516633,0.50569
6,0.1878,0.602092,0.785518,0.610625,0.57528,0.576353
7,0.1376,0.593592,0.793767,0.665032,0.625988,0.633251
8,0.1086,0.57632,0.796517,0.671923,0.645251,0.646028
9,0.0962,0.572413,0.8011,0.710015,0.660432,0.674567
10,0.088,0.575268,0.80385,0.735711,0.69382,0.704151


[I 2025-03-23 05:20:28,910] Trial 48 finished with value: 0.7110402816073218 and parameters: {'learning_rate': 0.004349031425050886, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 49 with params: {'learning_rate': 0.0028076644489777973, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8038,1.37044,0.482126,0.110082,0.129978,0.105553
2,1.1601,1.091148,0.601283,0.240459,0.238098,0.226019
3,0.8004,0.807007,0.699358,0.330399,0.338369,0.319389
4,0.5299,0.709731,0.736939,0.419895,0.404482,0.395774
5,0.3543,0.653304,0.76077,0.494556,0.45739,0.460409
6,0.2498,0.619345,0.779102,0.534861,0.50397,0.506999
7,0.1794,0.614789,0.772686,0.587652,0.529236,0.543656
8,0.1419,0.607429,0.782768,0.622746,0.569457,0.58045
9,0.1188,0.605203,0.792851,0.658107,0.61268,0.620992
10,0.1024,0.59982,0.793767,0.631857,0.599494,0.601255


[I 2025-03-23 05:21:27,898] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.004209361180122759, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7469,1.297484,0.52154,0.112266,0.147285,0.120915
2,1.0801,0.940325,0.667278,0.316702,0.285583,0.271902
3,0.6961,0.754079,0.727773,0.367161,0.374367,0.355618
4,0.4346,0.675008,0.749771,0.421463,0.430758,0.419516
5,0.2833,0.628225,0.773602,0.519778,0.494466,0.496936
6,0.1947,0.614913,0.785518,0.594191,0.562196,0.563401
7,0.1417,0.601254,0.788268,0.66991,0.617723,0.624163
8,0.1122,0.584654,0.797434,0.667126,0.635864,0.638983
9,0.0964,0.576454,0.805683,0.672955,0.645164,0.648854
10,0.0877,0.574446,0.804766,0.7287,0.674207,0.687225


[I 2025-03-23 05:22:56,541] Trial 50 finished with value: 0.6906894809594195 and parameters: {'learning_rate': 0.004209361180122759, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 51 with params: {'learning_rate': 0.00046560532599103875, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1791,1.948729,0.313474,0.04599,0.067241,0.049628
2,1.7108,1.548675,0.429881,0.068,0.094829,0.065832
3,1.4505,1.346724,0.48396,0.083069,0.123845,0.096477
4,1.2679,1.213153,0.548121,0.130092,0.159117,0.134269
5,1.129,1.099811,0.588451,0.172187,0.187095,0.163734


[I 2025-03-23 05:23:24,986] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.004875510457226452, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6663,1.282533,0.508708,0.115894,0.156248,0.126709
2,1.0032,0.876336,0.681943,0.325283,0.301091,0.284561
3,0.6446,0.733003,0.736022,0.383968,0.381909,0.371235
4,0.4089,0.670785,0.75802,0.486385,0.457512,0.456496
5,0.2614,0.636913,0.777269,0.510317,0.507225,0.498917
6,0.1802,0.605573,0.786434,0.609117,0.555679,0.564476
7,0.1311,0.599246,0.790101,0.665994,0.6117,0.620903
8,0.1073,0.589511,0.792851,0.711921,0.634748,0.653961
9,0.0932,0.584705,0.791017,0.733221,0.636981,0.665212
10,0.0854,0.58769,0.790101,0.72879,0.64513,0.671276


[I 2025-03-23 05:24:47,258] Trial 52 finished with value: 0.677087644394118 and parameters: {'learning_rate': 0.004875510457226452, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 53 with params: {'learning_rate': 0.0045799626154511995, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.861,1.35902,0.472961,0.111669,0.129593,0.103838
2,1.1042,1.001811,0.638863,0.243337,0.255707,0.239788
3,0.7002,0.743572,0.735105,0.388872,0.384222,0.372815
4,0.4312,0.679303,0.76077,0.463189,0.457825,0.449891
5,0.2722,0.63929,0.780018,0.526059,0.510045,0.508885
6,0.1916,0.619179,0.786434,0.611073,0.577483,0.583466
7,0.1335,0.605271,0.792851,0.668114,0.633251,0.638962
8,0.1094,0.594204,0.802933,0.663836,0.648021,0.644548
9,0.0953,0.591298,0.799267,0.697616,0.656936,0.660636
10,0.087,0.593353,0.802933,0.67965,0.664937,0.662087


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 05:26:42,995] Trial 53 finished with value: 0.6885210123719115 and parameters: {'learning_rate': 0.0045799626154511995, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 54 with params: {'learning_rate': 0.003637966255528044, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7183,1.291634,0.507791,0.115854,0.142463,0.119928
2,1.0846,0.982491,0.644363,0.26735,0.268969,0.250403
3,0.7284,0.792169,0.700275,0.320802,0.342324,0.319743
4,0.471,0.712393,0.728689,0.427566,0.408856,0.404003
5,0.3045,0.637168,0.767186,0.523696,0.480039,0.487906
6,0.214,0.62834,0.779102,0.576936,0.53921,0.539459
7,0.1543,0.615813,0.781852,0.587928,0.559759,0.562098
8,0.1232,0.604928,0.782768,0.616722,0.580969,0.584781
9,0.1052,0.598343,0.782768,0.668235,0.602451,0.614925
10,0.0941,0.599425,0.790101,0.708267,0.640425,0.654216


[I 2025-03-23 05:28:44,489] Trial 54 finished with value: 0.6484007792066694 and parameters: {'learning_rate': 0.003637966255528044, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 55 with params: {'learning_rate': 0.004425853229733686, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.752,1.272287,0.514207,0.107343,0.14271,0.113674
2,1.063,0.9383,0.664528,0.291733,0.283687,0.270009
3,0.6804,0.739019,0.731439,0.361403,0.373257,0.358867
4,0.4179,0.671453,0.758937,0.463128,0.440462,0.438213
5,0.2649,0.620497,0.780018,0.520658,0.511388,0.504091
6,0.1855,0.598989,0.796517,0.58122,0.574323,0.568406
7,0.1352,0.601705,0.792851,0.658408,0.62269,0.628494
8,0.1107,0.587147,0.796517,0.67556,0.649359,0.649563
9,0.0958,0.581202,0.802016,0.706582,0.657932,0.668592
10,0.087,0.581574,0.804766,0.711871,0.666727,0.674106


[I 2025-03-23 05:30:48,305] Trial 55 finished with value: 0.7052700153540644 and parameters: {'learning_rate': 0.004425853229733686, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 56 with params: {'learning_rate': 0.0001413812546509425, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3319,2.100262,0.176902,0.003538,0.02,0.006012
2,2.0278,1.918404,0.283226,0.039241,0.048617,0.027393
3,1.8505,1.752053,0.364803,0.040199,0.072355,0.046535
4,1.698,1.631967,0.392301,0.041261,0.081307,0.053601
5,1.5996,1.533185,0.431714,0.070743,0.095491,0.06793
6,1.4953,1.461129,0.450962,0.076922,0.103861,0.076998
7,1.4338,1.41179,0.464711,0.103047,0.110143,0.083595
8,1.3867,1.379355,0.474794,0.128479,0.116982,0.094254
9,1.3502,1.349793,0.491292,0.113187,0.122921,0.09801
10,1.3152,1.331763,0.499542,0.134454,0.13007,0.104013


[I 2025-03-23 05:32:17,184] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.002905749449806912, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8032,1.373108,0.479377,0.103448,0.127027,0.104322
2,1.1568,1.057719,0.611366,0.231641,0.241028,0.225831
3,0.7857,0.800209,0.716774,0.352125,0.352198,0.336445
4,0.5172,0.712438,0.745188,0.443691,0.410977,0.41208
5,0.34,0.669086,0.754354,0.488304,0.453464,0.457395
6,0.24,0.646249,0.765353,0.536796,0.507637,0.508723
7,0.1771,0.635352,0.777269,0.598144,0.542821,0.552317
8,0.1407,0.627999,0.777269,0.58953,0.571806,0.570172
9,0.1172,0.615561,0.781852,0.635749,0.582418,0.592283
10,0.101,0.612442,0.789184,0.6535,0.613195,0.621533


[I 2025-03-23 05:33:09,099] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.003917219444976533, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7046,1.265049,0.519707,0.114555,0.147483,0.122509
2,1.0641,0.947634,0.650779,0.267779,0.273678,0.253172
3,0.7014,0.751354,0.72594,0.367472,0.363779,0.350141
4,0.4475,0.689066,0.753437,0.461218,0.438094,0.435718
5,0.2909,0.644859,0.777269,0.541975,0.498016,0.501993
6,0.2035,0.614486,0.778185,0.557679,0.533795,0.53176
7,0.1502,0.59738,0.785518,0.647813,0.588476,0.597009
8,0.1182,0.583483,0.793767,0.670175,0.626191,0.630887
9,0.1008,0.575337,0.7956,0.67782,0.631745,0.637188
10,0.0915,0.577844,0.79835,0.695383,0.653094,0.656938


[I 2025-03-23 05:34:42,106] Trial 58 finished with value: 0.6922162467612832 and parameters: {'learning_rate': 0.003917219444976533, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 59 with params: {'learning_rate': 0.004390881449579889, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7492,1.27725,0.52429,0.111018,0.147774,0.118115
2,1.0638,0.932857,0.663611,0.288569,0.284207,0.270376
3,0.6761,0.74403,0.731439,0.368758,0.37532,0.359351
4,0.4217,0.664032,0.759853,0.434512,0.440265,0.430457
5,0.2699,0.619258,0.784601,0.540722,0.505517,0.506102
6,0.1893,0.59825,0.7956,0.604715,0.59211,0.585372
7,0.1378,0.592849,0.79835,0.654865,0.631566,0.633113
8,0.1092,0.584215,0.796517,0.646704,0.641079,0.631705
9,0.0965,0.574822,0.8011,0.68076,0.658883,0.660097
10,0.0889,0.576548,0.805683,0.690437,0.67231,0.670462


[I 2025-03-23 05:36:18,734] Trial 59 finished with value: 0.6865016341921395 and parameters: {'learning_rate': 0.004390881449579889, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 60 with params: {'learning_rate': 0.00046762991988506683, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1249,1.892472,0.364803,0.047248,0.080478,0.056389
2,1.6595,1.503268,0.437214,0.087784,0.097667,0.070357
3,1.3985,1.305921,0.504125,0.130626,0.131353,0.103416
4,1.2218,1.190262,0.552704,0.158004,0.157888,0.13709
5,1.0938,1.08911,0.593951,0.2001,0.190815,0.172499


[I 2025-03-23 05:36:51,092] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.003512294423914155, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7418,1.297164,0.505958,0.117037,0.143244,0.120285
2,1.0878,0.949115,0.653529,0.262338,0.270236,0.258226
3,0.7128,0.761418,0.724106,0.34125,0.355899,0.338324
4,0.4614,0.659537,0.753437,0.43712,0.426372,0.420015
5,0.2928,0.623097,0.772686,0.539964,0.500039,0.508231
6,0.209,0.599986,0.787351,0.58636,0.545799,0.55244
7,0.1512,0.588431,0.789184,0.639661,0.596581,0.603304
8,0.119,0.577298,0.799267,0.654577,0.619164,0.625818
9,0.1023,0.565689,0.802016,0.692394,0.647303,0.656514
10,0.0916,0.565566,0.802016,0.680213,0.640747,0.64684


[I 2025-03-23 05:38:24,723] Trial 61 finished with value: 0.6632763420129081 and parameters: {'learning_rate': 0.003512294423914155, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 62 with params: {'learning_rate': 0.001176258636582346, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9434,1.544281,0.407883,0.087893,0.090098,0.065142
2,1.357,1.266115,0.525206,0.133877,0.149663,0.1311
3,1.0842,0.993159,0.638863,0.233733,0.227635,0.213683
4,0.8247,0.889582,0.672777,0.320692,0.284103,0.277942
5,0.646,0.787742,0.713107,0.348718,0.335408,0.327453


[I 2025-03-23 05:38:54,733] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.004690595139428901, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.684,1.284896,0.51604,0.137551,0.153054,0.126764
2,0.9968,0.900639,0.670944,0.32051,0.297346,0.288523
3,0.6178,0.713893,0.743355,0.37343,0.394048,0.377961
4,0.3811,0.667261,0.76077,0.531333,0.489056,0.490974
5,0.2425,0.630244,0.777269,0.593248,0.527212,0.538723
6,0.1684,0.600887,0.787351,0.669023,0.601919,0.618586
7,0.1245,0.584874,0.799267,0.691497,0.626518,0.645876
8,0.0999,0.577942,0.8011,0.676749,0.637473,0.646503
9,0.0873,0.57472,0.799267,0.733569,0.653558,0.676656
10,0.0802,0.572714,0.8011,0.739619,0.660263,0.682757


[I 2025-03-23 05:40:44,635] Trial 63 finished with value: 0.6772420690940563 and parameters: {'learning_rate': 0.004690595139428901, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 64 with params: {'learning_rate': 0.004351395889525563, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6931,1.262611,0.527956,0.136789,0.152302,0.124183
2,1.045,0.925996,0.660862,0.284066,0.283471,0.264851
3,0.6843,0.743344,0.72594,0.387469,0.373078,0.359714
4,0.4406,0.668899,0.753437,0.448184,0.425136,0.422533
5,0.2805,0.63667,0.774519,0.551392,0.508126,0.515063
6,0.1939,0.607181,0.791017,0.635597,0.587712,0.596971
7,0.1362,0.604887,0.792851,0.659271,0.616681,0.625766
8,0.109,0.595532,0.793767,0.709179,0.63498,0.651681
9,0.0955,0.592299,0.799267,0.710318,0.65105,0.662671
10,0.0869,0.591611,0.79835,0.728177,0.667706,0.681356


[I 2025-03-23 05:42:04,232] Trial 64 finished with value: 0.6828084191530754 and parameters: {'learning_rate': 0.004351395889525563, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 65 with params: {'learning_rate': 0.0029471514420167755, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8057,1.37737,0.472961,0.105433,0.124945,0.103876
2,1.1583,1.035707,0.618698,0.225596,0.232275,0.214317
3,0.784,0.794245,0.71769,0.345253,0.349796,0.334955
4,0.5097,0.718005,0.742438,0.404262,0.401131,0.390743
5,0.3425,0.660122,0.757104,0.468495,0.457163,0.453479
6,0.2446,0.642653,0.774519,0.536749,0.510108,0.508309
7,0.1757,0.621709,0.782768,0.574159,0.539568,0.542048
8,0.1395,0.622863,0.784601,0.613208,0.588994,0.589855
9,0.1153,0.610088,0.781852,0.645926,0.616825,0.620515
10,0.102,0.610221,0.786434,0.655759,0.611011,0.620629


[I 2025-03-23 05:42:55,990] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.004387816666803014, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7459,1.285431,0.496792,0.122901,0.139526,0.115073
2,1.0449,0.895759,0.67461,0.345985,0.292551,0.285424
3,0.6474,0.735849,0.743355,0.387882,0.406665,0.386572
4,0.4055,0.65337,0.765353,0.501827,0.479158,0.476289
5,0.248,0.621313,0.777269,0.556721,0.52344,0.523325
6,0.1733,0.596767,0.805683,0.660579,0.615132,0.62817
7,0.1263,0.608883,0.788268,0.657369,0.617331,0.626882
8,0.1005,0.581015,0.80385,0.722763,0.654757,0.675517
9,0.0879,0.579837,0.807516,0.731972,0.659787,0.682702
10,0.081,0.578046,0.808433,0.720421,0.655485,0.674962


[I 2025-03-23 05:44:11,482] Trial 66 finished with value: 0.6809580645972915 and parameters: {'learning_rate': 0.004387816666803014, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 4.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 67 with params: {'learning_rate': 0.0007558916993340009, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0533,1.691572,0.367553,0.042225,0.076723,0.048886
2,1.5006,1.382942,0.470211,0.112021,0.114581,0.092032
3,1.2409,1.174457,0.557287,0.172566,0.165036,0.143574
4,1.0188,1.009907,0.631531,0.237169,0.218381,0.202346
5,0.8466,0.933054,0.669111,0.289859,0.275667,0.269128
6,0.6985,0.837478,0.697525,0.32879,0.303712,0.294247
7,0.5803,0.797637,0.716774,0.393093,0.338547,0.342707
8,0.5016,0.758492,0.725023,0.383163,0.365229,0.36375
9,0.4263,0.731074,0.747021,0.436762,0.396102,0.39942
10,0.3658,0.7243,0.750687,0.438,0.429697,0.425123


[I 2025-03-23 05:46:20,992] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.002698755042487853, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8081,1.389212,0.474794,0.109543,0.127764,0.104252
2,1.1676,1.037995,0.609533,0.214842,0.221829,0.207148
3,0.8085,0.809723,0.710357,0.355585,0.351244,0.331273
4,0.5388,0.722866,0.730522,0.393643,0.385757,0.375145
5,0.3598,0.655752,0.761687,0.474492,0.454052,0.450178
6,0.2553,0.631794,0.770852,0.511431,0.489149,0.487642
7,0.1838,0.625746,0.769019,0.510984,0.495923,0.492821
8,0.1486,0.624175,0.784601,0.630758,0.594539,0.598879
9,0.1231,0.611448,0.782768,0.663179,0.601806,0.615692
10,0.105,0.609134,0.786434,0.66591,0.618346,0.627034


[I 2025-03-23 05:47:39,488] Trial 68 finished with value: 0.6656874639582117 and parameters: {'learning_rate': 0.002698755042487853, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 69 with params: {'learning_rate': 7.808255793137976e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4122,2.255752,0.176902,0.003538,0.02,0.006012
2,2.1337,2.06175,0.176902,0.003538,0.02,0.006012
3,2.0293,1.941733,0.306141,0.033842,0.053882,0.037718
4,1.8926,1.847364,0.327223,0.038222,0.060939,0.039729
5,1.8227,1.765334,0.356554,0.037586,0.070414,0.04607


[I 2025-03-23 05:48:03,426] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.004196947150531088, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7029,1.267608,0.533456,0.127865,0.152536,0.126027
2,1.0494,0.923855,0.662695,0.258156,0.277664,0.257759
3,0.6837,0.751568,0.718607,0.363948,0.369885,0.35414
4,0.4406,0.670613,0.75802,0.484425,0.470925,0.467261
5,0.2803,0.626422,0.782768,0.550692,0.520172,0.520741
6,0.1932,0.611796,0.790101,0.631962,0.569396,0.584078
7,0.1402,0.598403,0.791017,0.672202,0.606363,0.621604
8,0.1137,0.579205,0.80385,0.690617,0.649797,0.659224
9,0.0978,0.576579,0.802933,0.709938,0.651249,0.665111
10,0.0891,0.572809,0.807516,0.709503,0.65767,0.669784


[I 2025-03-23 05:49:53,541] Trial 70 finished with value: 0.690198548359122 and parameters: {'learning_rate': 0.004196947150531088, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 71 with params: {'learning_rate': 0.004613618118454668, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6854,1.261436,0.533456,0.11694,0.156405,0.126213
2,1.0282,0.886924,0.666361,0.284986,0.284398,0.267304
3,0.6615,0.734132,0.736022,0.400939,0.387972,0.37535
4,0.4183,0.681707,0.754354,0.470258,0.454997,0.450026
5,0.2644,0.629789,0.780018,0.58244,0.534156,0.534384
6,0.1797,0.60318,0.787351,0.636834,0.584323,0.5903
7,0.1342,0.601969,0.782768,0.672247,0.61553,0.624634
8,0.1079,0.593662,0.792851,0.708638,0.652049,0.66429
9,0.0934,0.585767,0.797434,0.71693,0.658724,0.675057
10,0.0861,0.583876,0.794684,0.740071,0.668078,0.687879


[I 2025-03-23 05:51:14,506] Trial 71 finished with value: 0.6943785779009096 and parameters: {'learning_rate': 0.004613618118454668, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 72 with params: {'learning_rate': 0.004882057336518716, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.666,1.283203,0.508708,0.117917,0.156796,0.128024
2,1.0008,0.881783,0.67736,0.301666,0.297604,0.278708
3,0.6382,0.732053,0.738772,0.385339,0.383571,0.372507
4,0.3994,0.668937,0.756187,0.479733,0.474199,0.46611
5,0.2582,0.62198,0.790101,0.585634,0.543202,0.544069
6,0.1753,0.606705,0.783685,0.611625,0.566161,0.573561
7,0.1277,0.596352,0.79835,0.675737,0.629234,0.639543
8,0.1029,0.582664,0.800183,0.715887,0.65028,0.663647
9,0.0909,0.578621,0.802933,0.721063,0.658579,0.670192
10,0.0841,0.577171,0.802016,0.762722,0.673904,0.695868


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-23 05:52:48,040] Trial 72 finished with value: 0.7075281509899812 and parameters: {'learning_rate': 0.004882057336518716, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 73 with params: {'learning_rate': 0.0035070921798970225, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.741,1.294876,0.508708,0.116435,0.144809,0.121511
2,1.0845,0.945285,0.662695,0.285502,0.279844,0.269183
3,0.7091,0.764151,0.71769,0.337933,0.354017,0.336106
4,0.4609,0.662883,0.752521,0.407005,0.420349,0.40782
5,0.2962,0.628991,0.769019,0.539873,0.478478,0.49047
6,0.211,0.600307,0.781852,0.563939,0.521195,0.52651
7,0.1532,0.589047,0.789184,0.633149,0.580211,0.591367
8,0.1193,0.58338,0.794684,0.681849,0.639901,0.64751
9,0.1011,0.574457,0.799267,0.688163,0.641725,0.655031
10,0.0916,0.574823,0.799267,0.697511,0.651556,0.660137


[I 2025-03-23 05:54:28,375] Trial 73 finished with value: 0.66264424819819 and parameters: {'learning_rate': 0.0035070921798970225, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 74 with params: {'learning_rate': 0.000347802741623925, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2024,1.939161,0.272227,0.03928,0.044898,0.033699
2,1.7693,1.614007,0.409716,0.064429,0.087083,0.057647
3,1.5162,1.410361,0.473877,0.077453,0.115307,0.087474
4,1.3478,1.301748,0.504125,0.116107,0.137091,0.115142
5,1.224,1.19459,0.553621,0.137504,0.162044,0.139425


[I 2025-03-23 05:54:56,537] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.003993661603343735, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6947,1.264532,0.51879,0.11698,0.14565,0.120641
2,1.0556,0.971467,0.652612,0.292797,0.279621,0.266016
3,0.7001,0.754828,0.72594,0.358727,0.37074,0.351842
4,0.4493,0.668886,0.747021,0.459687,0.446316,0.442011
5,0.2868,0.633313,0.775435,0.506589,0.487332,0.487327
6,0.2017,0.613626,0.783685,0.589788,0.55122,0.555339
7,0.1486,0.606982,0.790101,0.669297,0.602983,0.619586
8,0.1187,0.591362,0.796517,0.668717,0.630675,0.636694
9,0.1002,0.577608,0.7956,0.69342,0.630801,0.645937
10,0.0901,0.577588,0.802016,0.69849,0.64522,0.657773


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-23 05:56:40,544] Trial 75 finished with value: 0.6602540322378974 and parameters: {'learning_rate': 0.003993661603343735, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 76 with params: {'learning_rate': 0.0030252285160228564, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7331,1.331125,0.494959,0.116548,0.137689,0.115017
2,1.1161,0.978085,0.646196,0.281152,0.259446,0.243599
3,0.7549,0.803226,0.704858,0.346824,0.344396,0.328454
4,0.506,0.708145,0.727773,0.395166,0.395831,0.382273
5,0.3331,0.639066,0.767186,0.495105,0.460671,0.466714
6,0.2367,0.623537,0.772686,0.575113,0.519098,0.522828
7,0.1674,0.608761,0.793767,0.601886,0.536208,0.549419
8,0.1337,0.604091,0.791934,0.636862,0.584933,0.596
9,0.1111,0.59432,0.792851,0.640505,0.590054,0.599454
10,0.0993,0.591814,0.7956,0.643535,0.607304,0.610658


[I 2025-03-23 05:57:32,197] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.004093951584192879, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6941,1.261529,0.52154,0.135904,0.147511,0.122447
2,1.053,0.961521,0.651696,0.282673,0.281117,0.262524
3,0.6949,0.752691,0.726856,0.346513,0.362509,0.339878
4,0.453,0.673764,0.756187,0.461824,0.447795,0.446298
5,0.29,0.634855,0.777269,0.536421,0.501346,0.500926
6,0.2042,0.614959,0.785518,0.579053,0.553305,0.54989
7,0.1494,0.60129,0.787351,0.630074,0.601133,0.601273
8,0.1167,0.589901,0.790101,0.648207,0.614404,0.615728
9,0.1012,0.583923,0.7956,0.672645,0.623176,0.631359
10,0.0908,0.580472,0.797434,0.664017,0.632991,0.635414


[I 2025-03-23 05:59:28,136] Trial 77 finished with value: 0.6774320100428306 and parameters: {'learning_rate': 0.004093951584192879, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 78 with params: {'learning_rate': 0.004118641713930016, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7551,1.30054,0.499542,0.116181,0.139197,0.118215
2,1.0857,0.938332,0.658112,0.28344,0.285529,0.269093
3,0.7083,0.774542,0.724106,0.36568,0.371686,0.348484
4,0.449,0.673442,0.747021,0.458181,0.433174,0.429571
5,0.2837,0.634342,0.762603,0.535459,0.498002,0.499447
6,0.2027,0.61303,0.791017,0.63482,0.578732,0.581834
7,0.1462,0.590275,0.793767,0.669249,0.598066,0.612844
8,0.1177,0.573269,0.802933,0.688388,0.640433,0.648427
9,0.0989,0.566061,0.804766,0.729189,0.663298,0.682792
10,0.0891,0.565376,0.802933,0.739755,0.683029,0.695733


[I 2025-03-23 06:00:47,562] Trial 78 finished with value: 0.6975307032136965 and parameters: {'learning_rate': 0.004118641713930016, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 79 with params: {'learning_rate': 0.0022050552541174246, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8162,1.410025,0.472044,0.102133,0.121896,0.100472
2,1.203,1.074903,0.587534,0.204859,0.201301,0.183103
3,0.86,0.836594,0.693859,0.340527,0.321376,0.310703
4,0.5953,0.749287,0.721357,0.367512,0.365794,0.359079
5,0.4089,0.678803,0.747021,0.461032,0.42712,0.427669


[I 2025-03-23 06:01:13,934] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.004021233574889254, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7002,1.269234,0.522456,0.116588,0.147945,0.122356
2,1.0629,0.953076,0.654445,0.268547,0.274308,0.254187
3,0.7058,0.750441,0.721357,0.369311,0.365465,0.354967
4,0.4529,0.680819,0.747938,0.43786,0.433409,0.426992
5,0.2898,0.641519,0.764436,0.491274,0.468322,0.470405
6,0.2071,0.616173,0.781852,0.593822,0.55095,0.558179
7,0.153,0.596413,0.792851,0.65415,0.602039,0.612601
8,0.1181,0.582318,0.8011,0.67813,0.634813,0.64208
9,0.1001,0.577557,0.80385,0.675613,0.635709,0.641886
10,0.0895,0.578141,0.806599,0.705331,0.650203,0.66271


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 06:02:42,315] Trial 80 finished with value: 0.6570261610921893 and parameters: {'learning_rate': 0.004021233574889254, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 5.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 81 with params: {'learning_rate': 0.002584929964962992, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7995,1.385079,0.485793,0.126078,0.134652,0.115137
2,1.1805,1.047377,0.60495,0.212787,0.214668,0.19639
3,0.8385,0.828639,0.703941,0.315997,0.330253,0.314516
4,0.5642,0.741566,0.729606,0.405957,0.378926,0.378044
5,0.3819,0.666026,0.761687,0.506427,0.447328,0.454393


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 06:03:07,287] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0028412072722884454, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.802,1.373186,0.486709,0.113784,0.129663,0.106083
2,1.155,1.046158,0.613199,0.252852,0.236017,0.224443
3,0.791,0.802958,0.707608,0.339493,0.347876,0.330936
4,0.5203,0.722322,0.733272,0.422311,0.400968,0.395066
5,0.3486,0.667635,0.753437,0.460674,0.445303,0.444912
6,0.2498,0.643109,0.76352,0.553601,0.494417,0.501922
7,0.1832,0.628072,0.785518,0.617523,0.551245,0.560273
8,0.1433,0.627066,0.778185,0.630465,0.57993,0.587572
9,0.1205,0.620255,0.780935,0.650103,0.603267,0.612005
10,0.103,0.614713,0.785518,0.65159,0.611237,0.617875


[I 2025-03-23 06:03:58,984] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.004241582138534045, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6944,1.259117,0.516957,0.134703,0.144476,0.117311
2,1.0399,0.948308,0.661778,0.279006,0.289368,0.271792
3,0.6778,0.737369,0.730522,0.375904,0.37389,0.360113
4,0.4309,0.660636,0.76077,0.482588,0.466154,0.464684
5,0.271,0.621832,0.779102,0.549358,0.518266,0.518736
6,0.188,0.619022,0.776352,0.578792,0.572349,0.560542
7,0.1367,0.599763,0.787351,0.669913,0.62257,0.631423
8,0.109,0.586968,0.788268,0.656015,0.636511,0.635053
9,0.0946,0.578829,0.79835,0.689991,0.64261,0.653076
10,0.0867,0.581904,0.79835,0.720988,0.669372,0.681481


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-23 06:05:22,192] Trial 83 finished with value: 0.6990679989740144 and parameters: {'learning_rate': 0.004241582138534045, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 84 with params: {'learning_rate': 0.0030730068944669698, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7314,1.319555,0.499542,0.116188,0.138911,0.11612
2,1.1129,0.977397,0.651696,0.281067,0.262932,0.251242
3,0.7489,0.807513,0.702108,0.359279,0.342142,0.327996
4,0.5047,0.713769,0.738772,0.439194,0.411361,0.406449
5,0.328,0.650652,0.76077,0.500083,0.457317,0.46547
6,0.2319,0.635931,0.769936,0.544653,0.508105,0.507064
7,0.1684,0.62518,0.772686,0.60374,0.532992,0.54658
8,0.1381,0.6145,0.779102,0.601552,0.562604,0.565824
9,0.1129,0.602631,0.790101,0.646485,0.584233,0.597402
10,0.0997,0.601852,0.785518,0.667093,0.605586,0.616928


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-23 06:06:51,119] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.00028731625417467325, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2239,1.992364,0.185151,0.032479,0.022169,0.009677
2,1.8282,1.687532,0.385885,0.042939,0.079617,0.051276
3,1.5934,1.483288,0.44363,0.068292,0.101496,0.073016
4,1.4247,1.369462,0.474794,0.124034,0.119342,0.096357
5,1.3139,1.280041,0.508708,0.105816,0.134456,0.107319


[I 2025-03-23 06:07:24,202] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.004268748067128073, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7537,1.29736,0.509624,0.107321,0.140732,0.113427
2,1.0827,0.946323,0.64528,0.292933,0.271965,0.260188
3,0.6984,0.73662,0.727773,0.394789,0.378455,0.36721
4,0.4273,0.691082,0.754354,0.479941,0.448452,0.45009
5,0.2733,0.634414,0.775435,0.541829,0.510651,0.511847
6,0.189,0.604878,0.794684,0.652067,0.609073,0.616088
7,0.1384,0.59811,0.791017,0.659812,0.607876,0.616302
8,0.1094,0.582349,0.802016,0.691695,0.638571,0.655281
9,0.0963,0.575425,0.8011,0.709328,0.631665,0.650796
10,0.0875,0.578362,0.8011,0.709044,0.635931,0.655289


[I 2025-03-23 06:09:02,677] Trial 86 finished with value: 0.6563648055402789 and parameters: {'learning_rate': 0.004268748067128073, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 6.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 87 with params: {'learning_rate': 0.00025335316923329827, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2376,2.018377,0.178735,0.024258,0.020453,0.006943
2,1.8652,1.732006,0.366636,0.040179,0.075027,0.048036
3,1.6452,1.541695,0.428048,0.071743,0.09413,0.066267
4,1.4815,1.422158,0.467461,0.086215,0.113505,0.089649
5,1.3716,1.332612,0.494959,0.092357,0.125697,0.099437


[I 2025-03-23 06:09:27,931] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0035735650691701547, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7415,1.29729,0.505958,0.119839,0.143214,0.120002
2,1.0859,0.938508,0.657195,0.259845,0.272059,0.257461
3,0.7083,0.759817,0.726856,0.342897,0.359621,0.341243
4,0.456,0.660138,0.751604,0.44007,0.43091,0.429591
5,0.2911,0.618639,0.770852,0.55951,0.502233,0.512988
6,0.2056,0.592999,0.787351,0.594642,0.549835,0.553577
7,0.1506,0.589766,0.794684,0.679721,0.602458,0.622092
8,0.119,0.58185,0.792851,0.670091,0.625263,0.63356
9,0.1008,0.571481,0.797434,0.677058,0.643168,0.650288
10,0.0907,0.569602,0.800183,0.698413,0.663,0.667471


[I 2025-03-23 06:11:17,483] Trial 88 finished with value: 0.6688832866070271 and parameters: {'learning_rate': 0.0035735650691701547, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 5.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 89 with params: {'learning_rate': 0.0014449117481998073, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9235,1.509603,0.414299,0.113159,0.093597,0.073317
2,1.3142,1.235098,0.541705,0.144567,0.164484,0.145399
3,1.0232,0.937677,0.657195,0.274976,0.25227,0.240286
4,0.741,0.830802,0.689276,0.380392,0.318617,0.321606
5,0.5617,0.745733,0.736939,0.393233,0.372416,0.365696
6,0.4193,0.701949,0.744271,0.463305,0.419676,0.424706
7,0.3196,0.680214,0.759853,0.502339,0.445022,0.451963
8,0.2522,0.663057,0.761687,0.502993,0.474802,0.477101
9,0.2039,0.660332,0.761687,0.523452,0.487266,0.48719
10,0.1777,0.648012,0.769019,0.542013,0.506262,0.508272


[I 2025-03-23 06:12:09,129] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.004630868767980423, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6835,1.265295,0.522456,0.115833,0.154463,0.123987
2,1.0257,0.895469,0.670027,0.295674,0.289596,0.273235
3,0.6593,0.731497,0.736939,0.385334,0.385461,0.372945
4,0.4212,0.662725,0.76352,0.468356,0.472446,0.464076
5,0.264,0.623776,0.784601,0.549831,0.523789,0.517622
6,0.183,0.605042,0.791934,0.643417,0.576522,0.590536
7,0.1398,0.602775,0.799267,0.723987,0.646006,0.660642
8,0.1104,0.593629,0.797434,0.697484,0.645287,0.648663
9,0.0939,0.585222,0.800183,0.722961,0.652425,0.672847
10,0.0863,0.583186,0.805683,0.722926,0.666437,0.679303


[I 2025-03-23 06:13:30,451] Trial 90 finished with value: 0.6982281942265937 and parameters: {'learning_rate': 0.004630868767980423, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 91 with params: {'learning_rate': 0.002003405276313808, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8085,1.409405,0.47571,0.103152,0.122836,0.099711
2,1.2206,1.091149,0.592117,0.177726,0.200801,0.178524
3,0.8959,0.870474,0.681943,0.326117,0.299662,0.289843
4,0.633,0.780924,0.716774,0.379264,0.350346,0.351944
5,0.4467,0.698785,0.747938,0.451784,0.413356,0.415489


[I 2025-03-23 06:14:27,202] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0008846159350465202, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0539,1.66477,0.36572,0.049185,0.07514,0.049113
2,1.4667,1.33457,0.48396,0.100503,0.120459,0.096515
3,1.197,1.106779,0.575619,0.198289,0.177146,0.157346
4,0.9473,0.962188,0.64253,0.295046,0.243619,0.235965
5,0.7678,0.875604,0.694775,0.333759,0.304164,0.297337
6,0.6152,0.78905,0.708524,0.361637,0.341493,0.334268
7,0.5008,0.745606,0.728689,0.38794,0.367766,0.367181
8,0.4199,0.726827,0.736939,0.421099,0.392596,0.392719
9,0.3495,0.705067,0.756187,0.480744,0.427582,0.434401
10,0.2952,0.708513,0.749771,0.475668,0.458452,0.458108


[I 2025-03-23 06:15:32,867] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.004899894216064316, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6649,1.287416,0.498625,0.114815,0.151165,0.122286
2,0.9955,0.868363,0.681943,0.329493,0.299597,0.284685
3,0.6294,0.718089,0.739688,0.401736,0.391778,0.383905
4,0.395,0.654344,0.761687,0.494187,0.464467,0.465815
5,0.2499,0.616936,0.790101,0.582989,0.527712,0.532767
6,0.1685,0.611333,0.779102,0.592034,0.552951,0.559977
7,0.1257,0.585608,0.793767,0.657593,0.626597,0.627688
8,0.1001,0.577338,0.791017,0.686894,0.632372,0.644462
9,0.088,0.572978,0.8011,0.713579,0.651035,0.665114
10,0.0831,0.570481,0.794684,0.73102,0.65521,0.676016


[I 2025-03-23 06:16:56,070] Trial 93 finished with value: 0.7065615110223932 and parameters: {'learning_rate': 0.004899894216064316, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}. Best is trial 48 with value: 0.7110402816073218.


Trial 94 with params: {'learning_rate': 0.003591677491027874, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7184,1.292874,0.511457,0.116475,0.144629,0.121936
2,1.0882,0.980808,0.649863,0.279902,0.272643,0.254432
3,0.7307,0.786427,0.702108,0.321329,0.338492,0.319123
4,0.475,0.708769,0.738772,0.429203,0.417565,0.412934
5,0.3102,0.629402,0.772686,0.497425,0.478767,0.478609
6,0.2161,0.618665,0.775435,0.558972,0.522037,0.520334
7,0.1568,0.615984,0.787351,0.615615,0.561032,0.572123
8,0.1262,0.596601,0.791017,0.632269,0.585215,0.590803
9,0.1065,0.596715,0.790101,0.65721,0.599361,0.610838
10,0.0949,0.591486,0.793767,0.701862,0.653482,0.661682


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-23 06:18:58,655] Trial 94 finished with value: 0.6631647393442958 and parameters: {'learning_rate': 0.003591677491027874, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 95 with params: {'learning_rate': 0.004814716414721807, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6764,1.281763,0.511457,0.113459,0.151405,0.119382
2,1.0104,0.891584,0.673694,0.31833,0.300862,0.287759
3,0.6447,0.728811,0.747938,0.390503,0.39061,0.37849
4,0.4121,0.681523,0.753437,0.4726,0.450862,0.451175
5,0.2607,0.613481,0.784601,0.572582,0.532916,0.531476
6,0.1822,0.601509,0.786434,0.571135,0.560623,0.554987
7,0.1291,0.595096,0.792851,0.648295,0.616803,0.615901
8,0.106,0.586727,0.796517,0.690137,0.62886,0.641477
9,0.0937,0.578172,0.800183,0.700513,0.647771,0.656308
10,0.087,0.572737,0.802933,0.750751,0.675846,0.695301


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-23 06:20:58,367] Trial 95 finished with value: 0.699242166291199 and parameters: {'learning_rate': 0.004814716414721807, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 96 with params: {'learning_rate': 0.004611630299927095, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7579,1.318367,0.496792,0.111816,0.139508,0.114819
2,1.0583,0.919553,0.665445,0.284023,0.284674,0.268264
3,0.6698,0.767185,0.732356,0.35949,0.378079,0.360287
4,0.4231,0.668855,0.759853,0.457595,0.459491,0.451599
5,0.267,0.629762,0.769019,0.527919,0.48633,0.491117
6,0.1853,0.610149,0.7956,0.642534,0.61094,0.610909
7,0.1372,0.602007,0.791934,0.674063,0.642749,0.646528
8,0.1087,0.591023,0.799267,0.719856,0.673533,0.683811
9,0.0951,0.581234,0.802933,0.719893,0.670815,0.682274
10,0.0873,0.585153,0.800183,0.754279,0.697363,0.712986


[I 2025-03-23 06:22:19,080] Trial 96 finished with value: 0.7037106446410308 and parameters: {'learning_rate': 0.004611630299927095, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 97 with params: {'learning_rate': 0.002048638261871512, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8038,1.407703,0.477544,0.111123,0.124213,0.102394
2,1.2161,1.092647,0.589368,0.174802,0.198463,0.17645
3,0.8907,0.871724,0.68011,0.324822,0.292734,0.282303
4,0.6276,0.782086,0.709441,0.377892,0.345506,0.347831
5,0.4408,0.697399,0.745188,0.427336,0.405626,0.403118


[I 2025-03-23 06:22:53,746] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0045853254041206114, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6857,1.256685,0.534372,0.117465,0.156917,0.126682
2,1.0271,0.893343,0.662695,0.284454,0.281625,0.261274
3,0.6647,0.741264,0.735105,0.36099,0.383046,0.364539
4,0.4212,0.686494,0.751604,0.47021,0.446678,0.446623
5,0.2666,0.621845,0.784601,0.544431,0.525975,0.517801
6,0.1828,0.604245,0.789184,0.649481,0.601333,0.604378
7,0.1383,0.598744,0.796517,0.680575,0.635007,0.645517
8,0.1105,0.594789,0.799267,0.71364,0.656703,0.670891
9,0.0952,0.585538,0.799267,0.712418,0.655494,0.668625
10,0.0861,0.587738,0.799267,0.738317,0.675571,0.691484


[I 2025-03-23 06:24:33,802] Trial 98 finished with value: 0.6888882095115363 and parameters: {'learning_rate': 0.0045853254041206114, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 99 with params: {'learning_rate': 0.0030276512232906465, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7712,1.335976,0.505958,0.110627,0.13833,0.112144
2,1.1376,1.034984,0.616865,0.251844,0.240799,0.2289
3,0.7744,0.794429,0.707608,0.343207,0.344441,0.331666
4,0.4996,0.697559,0.741522,0.415559,0.399377,0.393035
5,0.3394,0.653208,0.75527,0.457114,0.443303,0.441917
6,0.2364,0.623813,0.775435,0.557582,0.516257,0.517664
7,0.1689,0.617085,0.786434,0.614174,0.546464,0.560962
8,0.1364,0.602734,0.789184,0.631354,0.58771,0.595303
9,0.1127,0.590558,0.800183,0.675124,0.617937,0.6328
10,0.0981,0.593231,0.8011,0.694307,0.640454,0.653372


[I 2025-03-23 06:26:37,991] Trial 99 finished with value: 0.6719781496237293 and parameters: {'learning_rate': 0.0030276512232906465, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 100 with params: {'learning_rate': 0.0035380643535597904, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8031,1.350543,0.477544,0.11402,0.127198,0.105305
2,1.114,0.976622,0.641613,0.266483,0.254862,0.247682
3,0.7288,0.768264,0.72319,0.351442,0.363519,0.346093
4,0.468,0.699891,0.740605,0.441851,0.412383,0.408781
5,0.309,0.645902,0.762603,0.513126,0.468296,0.473349
6,0.2185,0.635691,0.777269,0.60702,0.536904,0.547281
7,0.1533,0.621327,0.780018,0.64697,0.568504,0.588977
8,0.1216,0.61599,0.780935,0.617545,0.578258,0.587095
9,0.1051,0.60123,0.787351,0.669976,0.620393,0.632812
10,0.0927,0.60094,0.794684,0.727782,0.67163,0.682989


[I 2025-03-23 06:28:14,954] Trial 100 finished with value: 0.6741122997945798 and parameters: {'learning_rate': 0.0035380643535597904, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}. Best is trial 48 with value: 0.7110402816073218.


Trial 101 with params: {'learning_rate': 0.0026771444114211643, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7988,1.385842,0.486709,0.131363,0.135779,0.115309
2,1.1762,1.051308,0.606783,0.231697,0.217296,0.198781
3,0.8243,0.834276,0.700275,0.32627,0.331542,0.317308
4,0.558,0.749151,0.730522,0.410915,0.386666,0.383069
5,0.3698,0.66312,0.766269,0.482185,0.453,0.45561


[I 2025-03-23 06:28:45,906] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.0048482139769710254, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6689,1.281911,0.508708,0.113753,0.153728,0.122058
2,1.006,0.889239,0.67736,0.306298,0.291778,0.273962
3,0.637,0.723969,0.745188,0.393451,0.388491,0.376163
4,0.4061,0.650793,0.762603,0.496795,0.465112,0.469134
5,0.2541,0.615344,0.783685,0.547232,0.523567,0.520832
6,0.1724,0.602636,0.784601,0.645953,0.583767,0.593633
7,0.1258,0.595556,0.787351,0.67691,0.625185,0.629498
8,0.1021,0.588854,0.792851,0.742613,0.648738,0.67246
9,0.0889,0.579979,0.791934,0.702482,0.636536,0.648449
10,0.0828,0.577886,0.796517,0.745343,0.664412,0.68616


[I 2025-03-23 06:30:13,199] Trial 102 finished with value: 0.7136158284489457 and parameters: {'learning_rate': 0.0048482139769710254, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 102 with value: 0.7136158284489457.


Trial 103 with params: {'learning_rate': 0.0045169428491710585, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6958,1.255367,0.531622,0.114085,0.150299,0.118679
2,1.0333,0.897283,0.670027,0.300253,0.297935,0.283028
3,0.6728,0.748223,0.737855,0.389335,0.378571,0.363961
4,0.4208,0.699761,0.746104,0.445021,0.440497,0.435018
5,0.2745,0.620352,0.791934,0.553707,0.53392,0.536114
6,0.1809,0.608331,0.791017,0.616193,0.584758,0.586511
7,0.1344,0.599472,0.793767,0.640333,0.608156,0.609057
8,0.11,0.591881,0.794684,0.675254,0.642429,0.644121
9,0.0938,0.58422,0.800183,0.712128,0.660359,0.672031
10,0.087,0.580753,0.802933,0.737476,0.683333,0.696112


[I 2025-03-23 06:32:21,071] Trial 103 finished with value: 0.7012270121376564 and parameters: {'learning_rate': 0.0045169428491710585, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 6.5}. Best is trial 102 with value: 0.7136158284489457.


Trial 104 with params: {'learning_rate': 0.0017692457376394383, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8366,1.445589,0.462878,0.101751,0.114613,0.093423
2,1.2542,1.158623,0.558203,0.17904,0.175103,0.156409
3,0.9493,0.89103,0.666361,0.304635,0.277313,0.267054
4,0.6737,0.81049,0.704858,0.390595,0.337994,0.342595
5,0.4912,0.70773,0.740605,0.403202,0.391869,0.382686
6,0.3563,0.689756,0.747021,0.423877,0.425085,0.418631
7,0.2656,0.66379,0.75802,0.5145,0.455746,0.466318
8,0.2082,0.653433,0.764436,0.53008,0.509521,0.507192
9,0.1697,0.651255,0.768103,0.582893,0.535966,0.541959
10,0.1445,0.647427,0.766269,0.586434,0.533821,0.543423


[I 2025-03-23 06:33:16,778] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.004268282121754096, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6921,1.25787,0.519707,0.135315,0.146366,0.120288
2,1.0392,0.94551,0.659028,0.291472,0.283005,0.26849
3,0.679,0.739995,0.732356,0.356234,0.37255,0.354794
4,0.4357,0.667783,0.750687,0.466703,0.443487,0.44374
5,0.2764,0.639708,0.769019,0.520229,0.498084,0.496552
6,0.1948,0.616599,0.782768,0.607374,0.561231,0.563014
7,0.1415,0.602978,0.783685,0.657521,0.606422,0.613991
8,0.1141,0.59907,0.789184,0.656938,0.62119,0.623924
9,0.0983,0.582813,0.794684,0.682099,0.636223,0.642043
10,0.088,0.583421,0.800183,0.716955,0.664199,0.67671


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 06:35:19,023] Trial 105 finished with value: 0.7052581394911107 and parameters: {'learning_rate': 0.004268282121754096, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 6.5}. Best is trial 102 with value: 0.7136158284489457.


Trial 106 with params: {'learning_rate': 0.0015069759441065558, 'weight_decay': 0.01, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9404,1.499837,0.414299,0.070317,0.092601,0.069253
2,1.313,1.202625,0.554537,0.149606,0.168623,0.147923
3,1.0118,0.928536,0.659028,0.304782,0.264034,0.253172
4,0.7267,0.833657,0.691109,0.354977,0.323288,0.323167
5,0.5423,0.731301,0.734189,0.387259,0.381713,0.371091
6,0.4011,0.690634,0.753437,0.478435,0.438006,0.444158
7,0.2992,0.684062,0.76352,0.51094,0.457207,0.462542
8,0.2376,0.652877,0.76352,0.516967,0.492012,0.492982
9,0.1918,0.648864,0.762603,0.536388,0.496493,0.500103
10,0.1639,0.6442,0.766269,0.575393,0.526644,0.533334


[I 2025-03-23 06:36:18,158] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.0024654486096163638, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8026,1.393032,0.482126,0.134904,0.130687,0.112927
2,1.189,1.048921,0.607699,0.219515,0.212627,0.193464
3,0.851,0.844614,0.695692,0.329824,0.327318,0.31108
4,0.5765,0.762381,0.722273,0.386405,0.369768,0.364996
5,0.3967,0.669647,0.756187,0.456907,0.436137,0.433829


[I 2025-03-23 06:36:45,112] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0036813993785011186, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7445,1.305396,0.498625,0.119363,0.1399,0.117113
2,1.0848,0.933244,0.656279,0.262496,0.273064,0.259348
3,0.7068,0.759083,0.71769,0.345021,0.3543,0.338929
4,0.4523,0.667508,0.751604,0.430623,0.429424,0.424657
5,0.2874,0.626986,0.76352,0.546754,0.489382,0.497309
6,0.2016,0.600403,0.789184,0.585466,0.55942,0.559645
7,0.1478,0.584882,0.794684,0.638352,0.600247,0.608866
8,0.1153,0.578118,0.796517,0.658704,0.640545,0.634701
9,0.1,0.566407,0.8011,0.700607,0.654131,0.664982
10,0.0904,0.569195,0.796517,0.689398,0.64707,0.653245


[I 2025-03-23 06:37:38,994] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0015417291918135428, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.859,1.470587,0.450046,0.104105,0.109277,0.090309
2,1.2775,1.175713,0.55912,0.176334,0.175702,0.158236
3,0.9821,0.919453,0.660862,0.270482,0.26771,0.257588
4,0.7098,0.816157,0.696609,0.368017,0.324447,0.325592
5,0.5316,0.719588,0.736022,0.371985,0.371992,0.36484
6,0.3996,0.704043,0.749771,0.445499,0.43284,0.428969
7,0.2971,0.682943,0.747938,0.480018,0.420405,0.429182
8,0.2379,0.661784,0.766269,0.523467,0.491751,0.493594
9,0.1926,0.646108,0.764436,0.533733,0.477043,0.489026
10,0.1648,0.647671,0.764436,0.559383,0.507886,0.51637


[I 2025-03-23 06:39:19,807] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.00493376619351952, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6711,1.275579,0.511457,0.11812,0.155421,0.126706
2,1.0059,0.884115,0.68011,0.300421,0.300761,0.280308
3,0.6417,0.731078,0.744271,0.395853,0.388372,0.379943
4,0.4005,0.65102,0.770852,0.508111,0.481719,0.482831
5,0.2508,0.624125,0.777269,0.559617,0.525956,0.526141
6,0.1688,0.603064,0.786434,0.635797,0.606156,0.60976
7,0.1264,0.588084,0.792851,0.654996,0.630141,0.631483
8,0.1028,0.581849,0.797434,0.716674,0.656386,0.670311
9,0.0904,0.576045,0.799267,0.703986,0.654336,0.664714
10,0.0843,0.575489,0.800183,0.759472,0.689286,0.708708


[I 2025-03-23 06:40:50,412] Trial 110 finished with value: 0.7069579959281276 and parameters: {'learning_rate': 0.00493376619351952, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 6.0}. Best is trial 102 with value: 0.7136158284489457.


Trial 111 with params: {'learning_rate': 0.004238505821560262, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6942,1.243162,0.519707,0.136326,0.151711,0.130968
2,1.0109,0.901806,0.665445,0.308734,0.295133,0.284802
3,0.621,0.719764,0.751604,0.408093,0.407024,0.390387
4,0.3893,0.640621,0.768103,0.512859,0.483388,0.485457
5,0.2425,0.612842,0.780935,0.594868,0.527926,0.536219
6,0.1671,0.600938,0.7956,0.666509,0.612789,0.621812
7,0.1247,0.580171,0.802933,0.673993,0.629953,0.638406
8,0.0983,0.570854,0.80385,0.70688,0.631712,0.649798
9,0.0871,0.568744,0.810266,0.739198,0.65852,0.682252
10,0.0805,0.569682,0.806599,0.730231,0.653392,0.675262


[I 2025-03-23 06:42:08,642] Trial 111 finished with value: 0.7078934605473404 and parameters: {'learning_rate': 0.004238505821560262, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 5.5}. Best is trial 102 with value: 0.7136158284489457.


Trial 112 with params: {'learning_rate': 0.0030129062109167924, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.786,1.336392,0.494959,0.110138,0.131924,0.10696
2,1.1227,0.99365,0.629698,0.284178,0.244127,0.233297
3,0.7437,0.787951,0.724106,0.354568,0.363006,0.348844
4,0.4835,0.701362,0.739688,0.433757,0.415115,0.414778
5,0.3238,0.649259,0.761687,0.505907,0.477966,0.478357
6,0.2317,0.631619,0.777269,0.526346,0.50414,0.503445
7,0.1678,0.615183,0.780935,0.585408,0.545787,0.553695
8,0.1316,0.608642,0.780935,0.593142,0.574727,0.574924
9,0.1106,0.595905,0.793767,0.646014,0.616745,0.620129
10,0.0969,0.595576,0.783685,0.648011,0.609529,0.615058


[I 2025-03-23 06:43:09,261] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00414365205219229, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6981,1.268539,0.527039,0.137491,0.150991,0.124236
2,1.0519,0.954093,0.654445,0.257032,0.279236,0.258536
3,0.6935,0.746292,0.730522,0.38386,0.376278,0.363441
4,0.4478,0.672389,0.754354,0.453124,0.440738,0.437728
5,0.2849,0.642557,0.775435,0.520375,0.496563,0.494821
6,0.1961,0.608667,0.781852,0.618145,0.565554,0.574834
7,0.1418,0.598273,0.790101,0.657353,0.615622,0.622353
8,0.1125,0.587274,0.796517,0.658246,0.621787,0.625887
9,0.0988,0.583021,0.799267,0.672081,0.639781,0.63899
10,0.0891,0.578378,0.802016,0.706405,0.658316,0.663632


[I 2025-03-23 06:44:32,279] Trial 113 finished with value: 0.6799989536345923 and parameters: {'learning_rate': 0.00414365205219229, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 102 with value: 0.7136158284489457.


Trial 114 with params: {'learning_rate': 0.002820580790543889, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8034,1.371622,0.480293,0.109505,0.129126,0.104425
2,1.1611,1.084515,0.597617,0.229519,0.230628,0.217975
3,0.8017,0.811351,0.703025,0.33476,0.343456,0.324627
4,0.5318,0.707992,0.741522,0.404234,0.395959,0.38606
5,0.3528,0.64928,0.759853,0.504334,0.452383,0.459053
6,0.2496,0.621726,0.777269,0.540919,0.4927,0.501198
7,0.1799,0.618472,0.773602,0.609067,0.527507,0.546935
8,0.1425,0.609908,0.783685,0.618173,0.582119,0.589921
9,0.1171,0.602146,0.793767,0.645003,0.594141,0.605385
10,0.1033,0.600375,0.788268,0.649712,0.606331,0.616726


[I 2025-03-23 06:45:31,440] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.004891137600008801, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6682,1.280806,0.516957,0.136855,0.15759,0.127749
2,1.0023,0.874284,0.679193,0.322363,0.29776,0.281735
3,0.6403,0.72826,0.736939,0.396288,0.390633,0.378735
4,0.4033,0.658956,0.75802,0.482857,0.472102,0.466559
5,0.2546,0.626334,0.782768,0.567221,0.527881,0.529374
6,0.1747,0.600422,0.780018,0.620977,0.56066,0.570576
7,0.1288,0.591748,0.787351,0.67382,0.615808,0.627798
8,0.1034,0.586925,0.790101,0.717253,0.647045,0.663822
9,0.0906,0.580469,0.788268,0.711946,0.657338,0.66613
10,0.0838,0.578058,0.7956,0.751788,0.665301,0.688386


[I 2025-03-23 06:46:54,281] Trial 115 finished with value: 0.6816904172159767 and parameters: {'learning_rate': 0.004891137600008801, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 6.5}. Best is trial 102 with value: 0.7136158284489457.


Trial 116 with params: {'learning_rate': 0.0028773676876397737, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8007,1.372838,0.485793,0.100521,0.129441,0.102811
2,1.1539,1.063595,0.611366,0.232753,0.243501,0.227367
3,0.7848,0.805194,0.705775,0.330262,0.346883,0.327142
4,0.5173,0.715276,0.739688,0.420782,0.403,0.393868
5,0.3437,0.672222,0.754354,0.483861,0.450633,0.453876
6,0.242,0.646379,0.771769,0.556624,0.493829,0.503957
7,0.1799,0.633534,0.777269,0.614614,0.527239,0.541851
8,0.1413,0.637581,0.777269,0.62216,0.583497,0.588554
9,0.1174,0.630934,0.775435,0.619953,0.568782,0.57902
10,0.1016,0.623506,0.778185,0.647903,0.598341,0.603803


[I 2025-03-23 06:48:12,079] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.0017351885698858327, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8808,1.460234,0.447296,0.10293,0.106508,0.087037
2,1.2644,1.175705,0.55637,0.159685,0.17445,0.152692
3,0.9417,0.891005,0.679193,0.327903,0.292942,0.285156
4,0.6633,0.780183,0.708524,0.366315,0.33932,0.336471
5,0.4818,0.705117,0.740605,0.385845,0.389234,0.382474


[I 2025-03-23 06:48:36,987] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.003987665683413407, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7142,1.326956,0.499542,0.138447,0.141778,0.118361
2,1.0305,0.942738,0.666361,0.297709,0.286457,0.271804
3,0.6571,0.754262,0.731439,0.362724,0.393053,0.369917
4,0.4215,0.653408,0.767186,0.503198,0.466692,0.470112
5,0.2635,0.634248,0.776352,0.565112,0.520939,0.523423
6,0.1945,0.606428,0.788268,0.588099,0.574792,0.567276
7,0.1363,0.600719,0.788268,0.645337,0.613984,0.615908
8,0.109,0.596126,0.790101,0.649282,0.629522,0.62036
9,0.0958,0.586954,0.8011,0.697171,0.648918,0.657457
10,0.0863,0.583324,0.799267,0.712103,0.656406,0.668048


[I 2025-03-23 06:50:03,961] Trial 118 finished with value: 0.6810047771497374 and parameters: {'learning_rate': 0.003987665683413407, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 5.5}. Best is trial 102 with value: 0.7136158284489457.


Trial 119 with params: {'learning_rate': 5.416926623119094e-05, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4419,2.364656,0.176902,0.003538,0.02,0.006012
2,2.229,2.102414,0.176902,0.003538,0.02,0.006012
3,2.0955,2.048342,0.176902,0.003538,0.02,0.006012
4,2.0196,1.977173,0.207149,0.023809,0.028088,0.017795
5,1.9503,1.892963,0.329973,0.044012,0.061367,0.043034
6,1.8647,1.833714,0.341888,0.0397,0.065892,0.043154
7,1.8125,1.790522,0.342805,0.037936,0.066135,0.042255
8,1.7737,1.752182,0.36297,0.038317,0.072474,0.046628
9,1.742,1.722966,0.362053,0.037162,0.072444,0.04653
10,1.7127,1.704317,0.371219,0.038816,0.076927,0.049799


[I 2025-03-23 06:51:00,837] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0033899839944756356, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7167,1.299785,0.507791,0.118465,0.14229,0.120404
2,1.0896,0.969929,0.653529,0.275075,0.270214,0.256017
3,0.7399,0.796046,0.705775,0.330504,0.340043,0.325094
4,0.4809,0.708188,0.733272,0.408229,0.400965,0.390363
5,0.3172,0.635499,0.776352,0.503885,0.492997,0.48848
6,0.2231,0.616064,0.777269,0.51322,0.517655,0.501543
7,0.1591,0.618382,0.791017,0.607203,0.568647,0.572981
8,0.1271,0.596531,0.7956,0.620155,0.584321,0.586061
9,0.1063,0.593857,0.8011,0.660412,0.622365,0.627891
10,0.0952,0.594163,0.796517,0.652872,0.619316,0.621795


[I 2025-03-23 06:51:50,056] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.004399672847514077, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.75,1.273485,0.526123,0.112034,0.14839,0.118653
2,1.0633,0.935063,0.663611,0.285426,0.284529,0.268995
3,0.6764,0.750177,0.727773,0.366251,0.372662,0.356401
4,0.4225,0.665825,0.75527,0.427823,0.441968,0.429502
5,0.2668,0.610618,0.787351,0.531962,0.51754,0.515783
6,0.1835,0.601305,0.787351,0.611234,0.596636,0.590691
7,0.1351,0.587017,0.8011,0.664924,0.633369,0.638445
8,0.1081,0.575828,0.7956,0.651843,0.633863,0.634505
9,0.0957,0.576276,0.8011,0.671827,0.645649,0.647616
10,0.0876,0.572363,0.802016,0.699207,0.66163,0.66978


[I 2025-03-23 06:53:14,541] Trial 121 finished with value: 0.6849612730597602 and parameters: {'learning_rate': 0.004399672847514077, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 102 with value: 0.7136158284489457.


Trial 122 with params: {'learning_rate': 0.00010692115265466455, 'weight_decay': 0.004, 'warmup_steps': 4, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3739,2.13129,0.176902,0.003538,0.02,0.006012
2,2.0799,2.006038,0.176902,0.003538,0.02,0.006012
3,1.9397,1.850958,0.36022,0.042808,0.071357,0.046492
4,1.7965,1.741848,0.356554,0.036472,0.07068,0.045741
5,1.7143,1.652421,0.386801,0.039034,0.080449,0.052029


[I 2025-03-23 06:53:40,379] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.004731743980712873, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.677,1.263458,0.508708,0.112563,0.149092,0.118797
2,1.0111,0.895858,0.673694,0.318923,0.298225,0.283798
3,0.6491,0.721278,0.746104,0.390929,0.391871,0.380946
4,0.4152,0.671081,0.759853,0.480832,0.473649,0.467366
5,0.2653,0.624136,0.780018,0.564345,0.511703,0.521828
6,0.1831,0.601169,0.789184,0.627581,0.569609,0.576486
7,0.1332,0.601357,0.796517,0.712348,0.619171,0.640453
8,0.1069,0.592778,0.7956,0.701113,0.641178,0.653711
9,0.0958,0.58132,0.802016,0.741219,0.653392,0.679598
10,0.0871,0.580872,0.8011,0.76239,0.676265,0.701794


[I 2025-03-23 06:55:20,763] Trial 123 finished with value: 0.6957711864148481 and parameters: {'learning_rate': 0.004731743980712873, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 7.0}. Best is trial 102 with value: 0.7136158284489457.


Trial 124 with params: {'learning_rate': 0.0038394869640170255, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7028,1.266732,0.522456,0.114635,0.147651,0.12257
2,1.0629,0.962557,0.657195,0.281746,0.280666,0.260016
3,0.7016,0.768784,0.715857,0.363792,0.360626,0.347446
4,0.4516,0.684267,0.745188,0.439128,0.439688,0.429708
5,0.2906,0.633803,0.774519,0.522143,0.487647,0.491961
6,0.2081,0.610967,0.782768,0.588951,0.552922,0.553778
7,0.1495,0.601921,0.791017,0.618425,0.566363,0.574307
8,0.1174,0.581226,0.800183,0.658754,0.629274,0.627337
9,0.1014,0.579822,0.800183,0.666031,0.631559,0.633923
10,0.0904,0.576964,0.804766,0.686643,0.652056,0.656381


[I 2025-03-23 06:56:11,240] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0031267077064537764, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7287,1.309199,0.500458,0.114928,0.13914,0.116079
2,1.1046,0.972953,0.654445,0.281393,0.268365,0.255432
3,0.7494,0.794636,0.712191,0.345274,0.353209,0.338535
4,0.4969,0.711829,0.733272,0.40492,0.401106,0.392075
5,0.3259,0.650157,0.765353,0.495001,0.4492,0.459649
6,0.2276,0.62449,0.765353,0.518773,0.496354,0.496772
7,0.1687,0.612904,0.784601,0.610431,0.548267,0.558997
8,0.1359,0.597623,0.788268,0.633684,0.58969,0.592898
9,0.111,0.589976,0.792851,0.650299,0.594644,0.601515
10,0.0992,0.586462,0.791017,0.646239,0.603497,0.608619


[I 2025-03-23 06:57:06,581] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0009049791490282845, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0453,1.660847,0.370302,0.06449,0.076535,0.050881
2,1.4572,1.328791,0.489459,0.109157,0.123153,0.100875
3,1.1866,1.094554,0.581118,0.196992,0.181729,0.161772
4,0.9334,0.949141,0.647113,0.279522,0.245213,0.236151
5,0.7575,0.86621,0.692026,0.336833,0.304226,0.298154


[I 2025-03-23 06:57:34,636] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.004671748002195041, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9223,1.36441,0.47571,0.084018,0.124324,0.09718
2,1.1127,0.958497,0.651696,0.288374,0.276381,0.267575
3,0.6972,0.733408,0.731439,0.386099,0.378983,0.366534
4,0.4327,0.658306,0.769936,0.464529,0.463789,0.45233
5,0.2784,0.626823,0.784601,0.552307,0.515868,0.520122
6,0.19,0.614805,0.788268,0.570024,0.570689,0.561743
7,0.1349,0.59774,0.8011,0.653251,0.604545,0.614933
8,0.1103,0.588073,0.797434,0.672823,0.640237,0.640098
9,0.0964,0.580508,0.800183,0.689273,0.662641,0.661718
10,0.0869,0.578672,0.799267,0.76534,0.695262,0.713068


[I 2025-03-23 06:58:49,142] Trial 127 finished with value: 0.7179674682647288 and parameters: {'learning_rate': 0.004671748002195041, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 6.0}. Best is trial 127 with value: 0.7179674682647288.


Trial 128 with params: {'learning_rate': 0.0033056010482458006, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7859,1.335976,0.488543,0.108026,0.1283,0.103642
2,1.1099,0.970089,0.647113,0.26294,0.261909,0.248954
3,0.7283,0.767754,0.725023,0.355369,0.372092,0.354113
4,0.4682,0.684548,0.747938,0.458609,0.421759,0.423063
5,0.3038,0.637547,0.767186,0.524215,0.479054,0.484343
6,0.2185,0.631112,0.783685,0.56968,0.533444,0.536772
7,0.1634,0.610927,0.790101,0.622179,0.579947,0.589315
8,0.1267,0.602645,0.789184,0.591121,0.586961,0.579553
9,0.107,0.595178,0.799267,0.644267,0.628032,0.625186
10,0.0944,0.594291,0.791017,0.656679,0.626173,0.629374


[I 2025-03-23 06:59:55,077] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.002617026118659144, 'weight_decay': 0.004, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8701,1.402655,0.470211,0.108752,0.123099,0.104382
2,1.1867,1.049905,0.606783,0.235298,0.22652,0.212396
3,0.8193,0.827925,0.699358,0.335702,0.335146,0.321678
4,0.5471,0.721177,0.737855,0.397123,0.394689,0.381937
5,0.3757,0.674399,0.753437,0.492733,0.445816,0.451454


[I 2025-03-23 07:01:02,064] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.002861225832814764, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8576,1.399396,0.470211,0.106617,0.124283,0.10144
2,1.1744,1.030674,0.619615,0.255802,0.232771,0.222561
3,0.7965,0.818753,0.703025,0.335349,0.338976,0.325773
4,0.5246,0.709628,0.737855,0.411183,0.403534,0.393722
5,0.3546,0.671876,0.745188,0.46859,0.430954,0.437141


[I 2025-03-23 07:01:27,278] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0005612567161548509, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1391,1.821534,0.385885,0.041313,0.083758,0.053818
2,1.6191,1.478385,0.434464,0.070449,0.097376,0.07036
3,1.3553,1.26401,0.512374,0.141478,0.137656,0.113902
4,1.1594,1.120926,0.578368,0.169145,0.178463,0.158209
5,1.0049,1.026689,0.613199,0.19297,0.203536,0.18441
6,0.8615,0.96023,0.647113,0.266751,0.252806,0.23964
7,0.7476,0.884859,0.671861,0.330191,0.280886,0.273882
8,0.6622,0.859699,0.692026,0.381079,0.308288,0.310676
9,0.593,0.82955,0.703941,0.373567,0.327457,0.331118
10,0.5224,0.803932,0.703941,0.342934,0.330383,0.328742


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 07:03:47,067] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 8.502224922764362e-05, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3993,2.212968,0.176902,0.003538,0.02,0.006012
2,2.1143,2.050555,0.176902,0.003538,0.02,0.006012
3,2.0045,1.916226,0.347388,0.031626,0.06511,0.041145
4,1.8673,1.819939,0.341888,0.040339,0.065427,0.04264
5,1.7954,1.737896,0.36297,0.037407,0.072303,0.04697


[I 2025-03-23 07:04:31,182] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.004229744029857695, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6941,1.256981,0.51879,0.13495,0.145705,0.119534
2,1.0412,0.944069,0.667278,0.286348,0.295496,0.277482
3,0.6823,0.729275,0.733272,0.36492,0.372545,0.358676
4,0.4367,0.666411,0.754354,0.460693,0.451642,0.445177
5,0.2757,0.62333,0.777269,0.525094,0.51135,0.507573
6,0.1903,0.601816,0.790101,0.622026,0.578356,0.584178
7,0.1355,0.59664,0.791017,0.679707,0.622653,0.63242
8,0.1084,0.585642,0.794684,0.667297,0.629391,0.636451
9,0.0956,0.578734,0.797434,0.688513,0.646139,0.654054
10,0.0877,0.575392,0.800183,0.704431,0.656539,0.666267


[I 2025-03-23 07:05:55,756] Trial 133 finished with value: 0.6941481825263685 and parameters: {'learning_rate': 0.004229744029857695, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 127 with value: 0.7179674682647288.


Trial 134 with params: {'learning_rate': 6.558978114640059e-05, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4259,2.312083,0.176902,0.003538,0.02,0.006012
2,2.1708,2.079444,0.176902,0.003538,0.02,0.006012
3,2.067,1.998495,0.191567,0.03806,0.023812,0.012622
4,1.9505,1.899827,0.31439,0.038547,0.056896,0.037853
5,1.8813,1.827998,0.342805,0.039377,0.065423,0.043088


[I 2025-03-23 07:06:20,332] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.004990254581484993, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.683,1.279088,0.51604,0.148428,0.153199,0.128378
2,0.9839,0.882939,0.676444,0.318628,0.30398,0.293795
3,0.6134,0.707876,0.742438,0.414154,0.39922,0.393625
4,0.3777,0.643333,0.764436,0.494896,0.483356,0.482403
5,0.239,0.623499,0.780935,0.560354,0.528672,0.534158
6,0.1637,0.617324,0.785518,0.653844,0.602908,0.615063
7,0.1191,0.604583,0.785518,0.666359,0.605527,0.620585
8,0.0991,0.594032,0.792851,0.685803,0.631977,0.643813
9,0.0874,0.591066,0.792851,0.694327,0.629122,0.64359
10,0.0803,0.587121,0.791934,0.69068,0.627654,0.642332


[I 2025-03-23 07:07:53,071] Trial 135 finished with value: 0.6670795154438508 and parameters: {'learning_rate': 0.004990254581484993, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 127 with value: 0.7179674682647288.


Trial 136 with params: {'learning_rate': 0.0033531539562348615, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7186,1.3002,0.506874,0.114729,0.142177,0.118371
2,1.0903,0.965494,0.648946,0.272237,0.265886,0.250988
3,0.7365,0.790719,0.710357,0.334726,0.34614,0.329869
4,0.478,0.711802,0.736022,0.432961,0.414231,0.409391
5,0.3102,0.624559,0.771769,0.495941,0.474163,0.476223
6,0.2192,0.621233,0.775435,0.542026,0.525295,0.52039
7,0.1578,0.61497,0.787351,0.63061,0.570018,0.583623
8,0.1253,0.604491,0.786434,0.622908,0.578155,0.583876
9,0.1069,0.599268,0.793767,0.647779,0.587695,0.598979
10,0.0963,0.596106,0.796517,0.67734,0.646499,0.650799


[I 2025-03-23 07:09:22,993] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.004793165986580557, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6733,1.25946,0.517874,0.115767,0.154772,0.124358
2,1.0057,0.892708,0.672777,0.283309,0.297406,0.278799
3,0.6432,0.724594,0.739688,0.375308,0.381991,0.369872
4,0.4052,0.666773,0.754354,0.481709,0.463721,0.463939
5,0.2596,0.615581,0.791017,0.549838,0.529843,0.526793
6,0.1773,0.590593,0.79835,0.654395,0.600641,0.607083
7,0.1287,0.602738,0.79835,0.689041,0.625839,0.639257
8,0.1037,0.581241,0.806599,0.744066,0.676414,0.691678
9,0.0911,0.579924,0.80385,0.726885,0.666949,0.68067
10,0.0846,0.574758,0.806599,0.779334,0.688714,0.710152


[I 2025-03-23 07:10:41,911] Trial 137 finished with value: 0.7165932751518682 and parameters: {'learning_rate': 0.004793165986580557, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}. Best is trial 127 with value: 0.7179674682647288.


Trial 138 with params: {'learning_rate': 0.0028198700542429625, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7596,1.337309,0.498625,0.121911,0.136068,0.113013
2,1.1286,0.993586,0.632447,0.257579,0.232101,0.21268
3,0.7887,0.814028,0.710357,0.350693,0.342246,0.331519
4,0.5302,0.730202,0.734189,0.427234,0.401482,0.398688
5,0.3508,0.675769,0.756187,0.498818,0.448631,0.460235
6,0.2613,0.633933,0.768103,0.519276,0.492801,0.489662
7,0.1831,0.622617,0.772686,0.55366,0.504856,0.509914
8,0.1481,0.618816,0.783685,0.595962,0.564406,0.567929
9,0.123,0.6104,0.783685,0.632473,0.598794,0.605399
10,0.107,0.602456,0.792851,0.666227,0.633842,0.640505


[I 2025-03-23 07:12:09,503] Trial 138 finished with value: 0.6545168218779771 and parameters: {'learning_rate': 0.0028198700542429625, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}. Best is trial 127 with value: 0.7179674682647288.


Trial 139 with params: {'learning_rate': 0.002900602557532512, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8025,1.374484,0.48121,0.102056,0.127725,0.10332
2,1.1555,1.057134,0.608616,0.229466,0.236754,0.22265
3,0.7867,0.806224,0.710357,0.339027,0.348801,0.33057
4,0.5203,0.713838,0.746104,0.452596,0.411551,0.410897
5,0.3432,0.666716,0.757104,0.465214,0.457518,0.451359


[I 2025-03-23 07:12:34,445] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.004253096471193693, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.753,1.296353,0.509624,0.107526,0.140449,0.113731
2,1.0812,0.949974,0.648029,0.291849,0.272676,0.261877
3,0.6969,0.74335,0.732356,0.392425,0.384864,0.368953
4,0.4308,0.676339,0.758937,0.475934,0.452321,0.450774
5,0.277,0.634485,0.773602,0.536127,0.511833,0.514395
6,0.1894,0.609393,0.783685,0.581233,0.562604,0.557779
7,0.14,0.594156,0.793767,0.673379,0.612476,0.6273
8,0.1163,0.588336,0.79835,0.666955,0.625011,0.632383
9,0.0989,0.573934,0.802016,0.664539,0.628359,0.635378
10,0.0887,0.5764,0.8011,0.690882,0.633808,0.648097


[I 2025-03-23 07:14:47,982] Trial 140 finished with value: 0.6726561033288394 and parameters: {'learning_rate': 0.004253096471193693, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 7.0}. Best is trial 127 with value: 0.7179674682647288.


Trial 141 with params: {'learning_rate': 0.004999442141173464, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6731,1.256709,0.525206,0.118171,0.158455,0.127609
2,1.0033,0.892052,0.670944,0.290482,0.28881,0.267587
3,0.6376,0.729604,0.743355,0.413917,0.395707,0.386587
4,0.3983,0.658636,0.75802,0.49882,0.443783,0.450411
5,0.2553,0.629242,0.769936,0.527838,0.500784,0.50155
6,0.1774,0.612054,0.788268,0.617954,0.572794,0.577129
7,0.1358,0.593952,0.802016,0.700008,0.643558,0.657036
8,0.1062,0.585102,0.802933,0.704255,0.654683,0.666428
9,0.0925,0.579049,0.802016,0.743063,0.670126,0.689606
10,0.0843,0.579338,0.802016,0.762997,0.677031,0.703399


[I 2025-03-23 07:16:17,270] Trial 141 finished with value: 0.7244345195390912 and parameters: {'learning_rate': 0.004999442141173464, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}. Best is trial 141 with value: 0.7244345195390912.


Trial 142 with params: {'learning_rate': 0.00295175099052809, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7437,1.320963,0.509624,0.127393,0.141731,0.119072
2,1.1229,1.013536,0.628781,0.290857,0.243197,0.233603
3,0.7677,0.80775,0.714024,0.339867,0.347853,0.330678
4,0.5157,0.730202,0.733272,0.375153,0.391591,0.375359
5,0.3424,0.656507,0.762603,0.487757,0.452963,0.456523
6,0.2419,0.627349,0.775435,0.520514,0.500664,0.494732
7,0.1738,0.615302,0.785518,0.596251,0.539362,0.552131
8,0.1372,0.614998,0.787351,0.64662,0.583855,0.595148
9,0.117,0.60304,0.787351,0.63949,0.582644,0.596792
10,0.1023,0.6034,0.7956,0.65211,0.630837,0.632932


[I 2025-03-23 07:17:16,677] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0035718780603691954, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.787,1.328075,0.493126,0.106806,0.130663,0.104127
2,1.1074,0.969889,0.63978,0.2559,0.255908,0.247122
3,0.7196,0.757278,0.72594,0.345671,0.370343,0.353346
4,0.4498,0.69308,0.747938,0.464626,0.439935,0.43861
5,0.2998,0.642664,0.766269,0.526043,0.489867,0.494245
6,0.2071,0.62407,0.780018,0.613966,0.554181,0.56675
7,0.1518,0.618162,0.784601,0.63983,0.568467,0.587394
8,0.1188,0.59613,0.793767,0.657826,0.634085,0.633797
9,0.1009,0.5993,0.7956,0.669363,0.627326,0.637154
10,0.0914,0.597938,0.793767,0.682006,0.644386,0.652678


[I 2025-03-23 07:18:11,139] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 5.8193477735771966e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4329,2.341201,0.176902,0.003538,0.02,0.006012
2,2.2005,2.092989,0.176902,0.003538,0.02,0.006012
3,2.0849,2.030647,0.176902,0.003538,0.02,0.006012
4,1.9914,1.940628,0.296975,0.043222,0.052019,0.036362
5,1.9184,1.863606,0.335472,0.04031,0.063189,0.042644
6,1.8364,1.806154,0.347388,0.037064,0.06758,0.04217
7,1.7848,1.762957,0.346471,0.037223,0.06718,0.042897
8,1.7458,1.72517,0.369386,0.038652,0.074796,0.048384
9,1.7146,1.69574,0.372136,0.038183,0.075463,0.048705
10,1.6849,1.677736,0.375802,0.039183,0.07838,0.050649


[I 2025-03-23 07:19:19,477] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.004868214053540535, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7707,1.313534,0.498625,0.112051,0.140406,0.114266
2,1.0532,0.926946,0.664528,0.301594,0.285162,0.270573
3,0.6716,0.760797,0.72594,0.377182,0.368421,0.349853
4,0.4198,0.672866,0.758937,0.478944,0.462406,0.457812
5,0.2628,0.651199,0.775435,0.52009,0.513488,0.505982
6,0.1844,0.622223,0.782768,0.645655,0.573216,0.593123
7,0.1364,0.617788,0.791017,0.664991,0.633476,0.638669
8,0.1079,0.605983,0.794684,0.688518,0.643726,0.653878
9,0.0949,0.591149,0.797434,0.685609,0.650427,0.657925
10,0.0867,0.593504,0.800183,0.723373,0.663435,0.679647


[I 2025-03-23 07:20:42,358] Trial 145 finished with value: 0.6951664850090813 and parameters: {'learning_rate': 0.004868214053540535, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 141 with value: 0.7244345195390912.


Trial 146 with params: {'learning_rate': 0.004502116779253223, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6961,1.255037,0.52429,0.112177,0.147005,0.116577
2,1.0339,0.905115,0.673694,0.282255,0.296316,0.278094
3,0.6802,0.753002,0.728689,0.381432,0.373923,0.359901
4,0.4393,0.697512,0.750687,0.457837,0.435177,0.438472
5,0.2855,0.639004,0.783685,0.561018,0.519922,0.522679
6,0.1934,0.607043,0.785518,0.597986,0.574563,0.574671
7,0.1457,0.606007,0.789184,0.637893,0.614793,0.617604
8,0.1154,0.598234,0.792851,0.676111,0.642084,0.64724
9,0.0986,0.589375,0.792851,0.712271,0.652969,0.668257
10,0.0892,0.588185,0.7956,0.720413,0.651843,0.671271


[I 2025-03-23 07:22:52,916] Trial 146 finished with value: 0.6912458650914217 and parameters: {'learning_rate': 0.004502116779253223, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 141 with value: 0.7244345195390912.


Trial 147 with params: {'learning_rate': 0.003541938066688561, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7171,1.29944,0.505041,0.116183,0.140987,0.117288
2,1.0906,0.984746,0.648946,0.274544,0.270584,0.254875
3,0.7413,0.789245,0.709441,0.33218,0.34221,0.328244
4,0.4781,0.718752,0.729606,0.399304,0.402768,0.390716
5,0.3178,0.623341,0.770852,0.507084,0.473176,0.474549
6,0.2217,0.615622,0.782768,0.590438,0.539267,0.543452
7,0.1576,0.605235,0.782768,0.610548,0.554098,0.562661
8,0.1266,0.597255,0.789184,0.62791,0.586519,0.592488
9,0.1082,0.589453,0.791017,0.63493,0.59779,0.599816
10,0.0945,0.59126,0.796517,0.660472,0.632599,0.628708


[I 2025-03-23 07:23:48,224] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.003911768052522711, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7492,1.296109,0.509624,0.119337,0.141604,0.120638
2,1.0847,0.954217,0.653529,0.258881,0.271103,0.256962
3,0.7106,0.748866,0.727773,0.346624,0.36853,0.349308
4,0.4419,0.676705,0.752521,0.486292,0.442425,0.44709
5,0.287,0.615687,0.779102,0.518675,0.503303,0.499933
6,0.1974,0.603129,0.777269,0.591114,0.553096,0.553294
7,0.1442,0.58463,0.79835,0.668887,0.601725,0.613287
8,0.1164,0.577741,0.793767,0.654347,0.62702,0.626377
9,0.0988,0.559081,0.811182,0.717225,0.655027,0.667206
10,0.0893,0.555744,0.810266,0.713223,0.660072,0.666599


[I 2025-03-23 07:25:23,734] Trial 148 finished with value: 0.6914117762871147 and parameters: {'learning_rate': 0.003911768052522711, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 7.0}. Best is trial 141 with value: 0.7244345195390912.


Trial 149 with params: {'learning_rate': 0.0030843245155453645, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7296,1.31312,0.501375,0.118281,0.140116,0.117797
2,1.106,0.979593,0.647113,0.277149,0.264993,0.250147
3,0.7482,0.806822,0.706691,0.337754,0.348813,0.32888
4,0.4992,0.704607,0.732356,0.398933,0.393113,0.382659
5,0.3256,0.633707,0.766269,0.492365,0.463112,0.467973
6,0.2292,0.625932,0.770852,0.529966,0.503541,0.503412
7,0.1666,0.619782,0.780018,0.584847,0.536926,0.545888
8,0.1372,0.612348,0.782768,0.629641,0.56624,0.577316
9,0.1128,0.603116,0.790101,0.655179,0.590251,0.603692
10,0.1005,0.602743,0.785518,0.653416,0.585631,0.599917


[I 2025-03-23 07:26:19,501] Trial 149 pruned. 


In [32]:
print(best_trial_distill)

BestRun(run_id='141', objective=0.7244345195390912, hyperparameters={'learning_rate': 0.004999442141173464, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 6.5}, run_summary=None)


Přepočet kroků s ohledem na změnu velikosti datasetu.

In [33]:
data_length = len(all_train_data)
min_r = math.ceil(data_length/batch_size)*5
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

In [34]:
base.reset_seed()

## Prohledávání s normálním tréninkem nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [35]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-base-embedd-aug_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-base-embedd-aug_fine_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [36]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [37]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [38]:
trainer = Trainer(
    args=training_args,
    train_dataset=all_train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [39]:
best_trial_normal_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-aug-embedd",
    n_trials=150
)

[I 2025-03-23 07:26:19,829] A new study created in memory with name: Base-aug-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 39}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6969,1.183622,0.701192,0.41682,0.357296,0.360198
2,0.4535,1.065017,0.759853,0.578596,0.528995,0.536193
3,0.1469,1.177877,0.764436,0.605606,0.577759,0.578389
4,0.062,1.296311,0.773602,0.696767,0.629443,0.644098
5,0.0326,1.44001,0.776352,0.661277,0.621002,0.628949


[I 2025-03-23 07:27:51,676] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9836,1.056263,0.757104,0.593924,0.569768,0.564541
2,0.0857,1.353773,0.777269,0.681933,0.646921,0.647852
3,0.0238,1.620157,0.773602,0.682327,0.66097,0.655332
4,0.0146,1.48164,0.793767,0.68681,0.702357,0.67998
5,0.0073,1.678156,0.799267,0.701752,0.698028,0.692367
6,0.0037,1.590192,0.802933,0.753122,0.66926,0.69342
7,0.0042,1.562722,0.802016,0.729795,0.684905,0.691895
8,0.002,1.692148,0.80385,0.765414,0.702517,0.719576
9,0.0026,1.705157,0.8011,0.738288,0.717158,0.714316
10,0.001,1.799118,0.805683,0.739525,0.70639,0.712403


[I 2025-03-23 07:31:10,645] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 32}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.589,2.102561,0.469294,0.092682,0.116674,0.092409
2,1.705,1.686185,0.577452,0.189233,0.192232,0.176318
3,1.2996,1.460604,0.631531,0.277801,0.249748,0.240915
4,1.0137,1.318764,0.656279,0.285971,0.28722,0.280754
5,0.8008,1.245366,0.681027,0.376707,0.349828,0.3495


[I 2025-03-23 07:32:41,922] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 52}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8587,1.079097,0.792851,0.68792,0.642017,0.646327
2,0.042,1.209382,0.802016,0.780962,0.697801,0.712671
3,0.013,1.379915,0.814849,0.756898,0.731731,0.729849
4,0.0082,1.346084,0.826764,0.762554,0.761437,0.749835
5,0.0057,1.497526,0.818515,0.766899,0.738671,0.731969
6,0.004,1.643467,0.812099,0.78318,0.704196,0.725123
7,0.0042,1.61998,0.796517,0.727948,0.729748,0.716054
8,0.0007,1.739771,0.812099,0.751914,0.712261,0.716444
9,0.0016,1.75294,0.809349,0.783076,0.737334,0.745796
10,0.0005,1.812656,0.813932,0.80336,0.724314,0.746681


[I 2025-03-23 07:35:35,940] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5947,1.166061,0.802016,0.792461,0.688269,0.716003
2,0.0277,1.267956,0.794684,0.786681,0.695547,0.72044
3,0.0127,1.306381,0.813932,0.784161,0.732398,0.740999
4,0.0052,1.549858,0.808433,0.776194,0.730481,0.733887
5,0.0052,1.530725,0.8011,0.742754,0.739772,0.722128
6,0.0045,1.581686,0.805683,0.770918,0.72914,0.734958
7,0.002,1.696232,0.800183,0.745867,0.723443,0.720462
8,0.0009,1.74072,0.812099,0.75644,0.743795,0.738606
9,0.0006,1.916093,0.813932,0.785798,0.745892,0.752726
10,0.0015,1.843879,0.815765,0.779674,0.753218,0.754811


[I 2025-03-23 07:38:27,294] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2591,1.71676,0.562786,0.184174,0.173029,0.156846
2,1.212,1.311756,0.670027,0.321834,0.309131,0.30179
3,0.7548,1.160527,0.711274,0.414225,0.381338,0.388355
4,0.4748,1.137289,0.72319,0.492668,0.426227,0.442035
5,0.3108,1.159407,0.728689,0.532988,0.473465,0.488237
6,0.2055,1.165028,0.736939,0.561271,0.51144,0.516194
7,0.137,1.245952,0.736939,0.59272,0.554532,0.5573
8,0.098,1.281162,0.742438,0.599198,0.554123,0.560048
9,0.0723,1.327518,0.743355,0.611014,0.608093,0.596892
10,0.0543,1.367975,0.743355,0.610342,0.606426,0.593684


[I 2025-03-23 07:41:55,602] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 33}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4992,1.079563,0.736022,0.465594,0.426854,0.431696
2,0.2913,1.150187,0.766269,0.632485,0.59397,0.602984
3,0.0852,1.32538,0.774519,0.645728,0.653112,0.636371
4,0.0362,1.437714,0.784601,0.729555,0.664413,0.679837
5,0.0195,1.496242,0.786434,0.683546,0.670032,0.664461
6,0.0108,1.573777,0.785518,0.716069,0.682746,0.687291
7,0.0073,1.602113,0.790101,0.710241,0.675239,0.677796
8,0.0051,1.688624,0.791017,0.713287,0.673012,0.67733
9,0.0033,1.693257,0.797434,0.709856,0.691802,0.683741
10,0.0026,1.799218,0.794684,0.707639,0.676676,0.677828


[I 2025-03-23 07:45:23,873] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3691,1.850562,0.536205,0.16427,0.149985,0.126674
2,1.3928,1.444559,0.636114,0.273794,0.255441,0.242473
3,0.953,1.251507,0.68286,0.381513,0.323663,0.327766
4,0.6594,1.174851,0.707608,0.412081,0.382192,0.387002
5,0.4661,1.151371,0.719523,0.455122,0.413736,0.418005
6,0.3344,1.153267,0.730522,0.488782,0.469647,0.468396
7,0.2425,1.185596,0.737855,0.600402,0.506863,0.526553
8,0.184,1.221483,0.726856,0.553704,0.4925,0.50599
9,0.142,1.260898,0.727773,0.552087,0.520025,0.521443
10,0.1123,1.291904,0.736939,0.587215,0.564701,0.564114


[I 2025-03-23 07:48:30,416] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3824,1.042589,0.736022,0.451636,0.42673,0.428374
2,0.2466,1.145764,0.774519,0.63675,0.60593,0.611527
3,0.0706,1.349052,0.75802,0.609799,0.619888,0.601872
4,0.0306,1.481056,0.784601,0.665547,0.667591,0.654368
5,0.0155,1.66196,0.775435,0.667033,0.637904,0.632882
6,0.0092,1.575597,0.787351,0.662761,0.671767,0.653603
7,0.0065,1.669225,0.787351,0.674093,0.657903,0.65352
8,0.0044,1.768787,0.791017,0.640837,0.67781,0.649256
9,0.0038,1.777608,0.785518,0.655058,0.680542,0.65712
10,0.0023,1.778033,0.793767,0.649758,0.66678,0.646272


[I 2025-03-23 07:52:03,813] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1919,1.038065,0.749771,0.519128,0.482004,0.484048
2,0.156,1.246522,0.780935,0.663835,0.664121,0.654594
3,0.0455,1.583624,0.75527,0.647489,0.635721,0.624822
4,0.0203,1.522424,0.778185,0.696611,0.676902,0.669312
5,0.0114,1.582084,0.781852,0.674239,0.663558,0.659713


[I 2025-03-23 07:53:39,724] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.004518165681587256, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4668,1.237694,0.80385,0.780605,0.724406,0.739807
2,0.0279,1.272206,0.810266,0.764207,0.740913,0.736551
3,0.0123,1.568806,0.811182,0.799417,0.747001,0.756071
4,0.0085,1.666445,0.820348,0.762166,0.746622,0.742599
5,0.0073,1.611883,0.825848,0.821727,0.761253,0.778953
6,0.0042,1.89912,0.814849,0.774974,0.747763,0.747633
7,0.0057,2.027904,0.812099,0.779115,0.758225,0.756899
8,0.0029,2.123118,0.820348,0.786547,0.753504,0.760784
9,0.0007,2.148042,0.817599,0.786889,0.761744,0.761815
10,0.0004,2.222501,0.821265,0.803202,0.761854,0.771384


[I 2025-03-23 07:58:07,935] Trial 10 finished with value: 0.7675468468538934 and parameters: {'learning_rate': 0.004518165681587256, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 10 with value: 0.7675468468538934.


Trial 11 with params: {'learning_rate': 0.004258197772781102, 'weight_decay': 0.003, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4818,1.177161,0.818515,0.803591,0.754196,0.764338
2,0.0257,1.405775,0.805683,0.800196,0.760911,0.762501
3,0.0142,1.633,0.810266,0.781081,0.748614,0.745643
4,0.0094,1.70257,0.809349,0.771464,0.736513,0.738338
5,0.0075,2.008786,0.813016,0.807866,0.740942,0.75541
6,0.0072,1.999798,0.797434,0.758542,0.74708,0.739509
7,0.0052,1.99723,0.802933,0.765256,0.77254,0.759391
8,0.0029,2.187817,0.813016,0.78722,0.772952,0.766274
9,0.0013,2.165557,0.814849,0.764293,0.771333,0.74972
10,0.0009,2.219382,0.807516,0.764028,0.762471,0.746305


[I 2025-03-23 08:02:40,026] Trial 11 finished with value: 0.7527508695952502 and parameters: {'learning_rate': 0.004258197772781102, 'weight_decay': 0.003, 'warmup_steps': 0}. Best is trial 10 with value: 0.7675468468538934.


Trial 12 with params: {'learning_rate': 0.003924266210177522, 'weight_decay': 0.005, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4717,1.159924,0.817599,0.793544,0.724802,0.743915
2,0.0242,1.351095,0.812099,0.781151,0.722767,0.733959
3,0.0106,1.556843,0.808433,0.757392,0.716486,0.723747
4,0.0077,1.666331,0.812099,0.801878,0.755507,0.763531
5,0.0081,1.879015,0.805683,0.799301,0.754256,0.757815
6,0.007,1.995714,0.809349,0.828258,0.769135,0.788305
7,0.0038,2.025027,0.809349,0.741098,0.762516,0.73758
8,0.0021,2.100856,0.807516,0.75147,0.729492,0.730778
9,0.0013,2.106781,0.813932,0.742752,0.743565,0.731946
10,0.001,2.102062,0.811182,0.751748,0.749205,0.74026


[I 2025-03-23 08:05:35,607] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.004662832745740382, 'weight_decay': 0.0, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4536,1.187899,0.7956,0.788087,0.703641,0.728531
2,0.027,1.412024,0.808433,0.786641,0.753599,0.753022
3,0.0122,1.576688,0.807516,0.781611,0.751894,0.744405
4,0.0092,1.666844,0.802016,0.780127,0.768504,0.759445
5,0.0095,1.808217,0.813932,0.790055,0.758723,0.759438
6,0.0092,1.955886,0.809349,0.790788,0.743971,0.757025
7,0.0041,2.181951,0.799267,0.792507,0.760842,0.759752
8,0.0036,2.176749,0.807516,0.795439,0.72556,0.746762
9,0.0021,2.303178,0.809349,0.784677,0.756332,0.753229
10,0.0009,2.22172,0.823098,0.813522,0.774948,0.783574


[I 2025-03-23 08:10:19,913] Trial 13 finished with value: 0.779634660594181 and parameters: {'learning_rate': 0.004662832745740382, 'weight_decay': 0.0, 'warmup_steps': 7}. Best is trial 13 with value: 0.779634660594181.


Trial 14 with params: {'learning_rate': 0.004628369539464375, 'weight_decay': 0.0, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.473,1.220313,0.821265,0.831774,0.755705,0.776053
2,0.0239,1.41328,0.789184,0.740538,0.721249,0.714881
3,0.0156,1.547998,0.806599,0.759327,0.730518,0.729275
4,0.0108,1.742529,0.805683,0.789823,0.749203,0.749325
5,0.0113,1.794988,0.809349,0.754936,0.733279,0.727294
6,0.0048,2.059991,0.8011,0.768024,0.733479,0.733891
7,0.0033,1.914076,0.813932,0.769321,0.758467,0.74882
8,0.0012,1.993087,0.807516,0.777293,0.763063,0.755268
9,0.0013,2.058779,0.808433,0.782038,0.764334,0.758914
10,0.0007,2.100249,0.808433,0.776832,0.75039,0.748602


[I 2025-03-23 08:13:21,317] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0022925419343973524, 'weight_decay': 0.004, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6343,1.088734,0.809349,0.766589,0.680666,0.706467
2,0.0254,1.267806,0.8011,0.767723,0.722984,0.732953
3,0.0102,1.292837,0.802016,0.748704,0.732672,0.729864
4,0.0079,1.514646,0.802933,0.763609,0.7313,0.730934
5,0.0055,1.541546,0.816682,0.780858,0.737345,0.739981
6,0.0046,1.571126,0.812099,0.782381,0.750035,0.751472
7,0.0023,1.706429,0.821265,0.777491,0.724187,0.735106
8,0.0015,1.733543,0.818515,0.746517,0.713333,0.716056
9,0.0007,1.750172,0.817599,0.774551,0.733614,0.739007
10,0.0003,1.784742,0.819432,0.754961,0.718195,0.725279


[I 2025-03-23 08:18:00,263] Trial 15 finished with value: 0.7282679242977995 and parameters: {'learning_rate': 0.0022925419343973524, 'weight_decay': 0.004, 'warmup_steps': 28}. Best is trial 13 with value: 0.779634660594181.


Trial 16 with params: {'learning_rate': 0.001678964059886791, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7133,1.13051,0.802933,0.763175,0.696735,0.708879
2,0.0312,1.299944,0.814849,0.767477,0.723128,0.729876
3,0.0124,1.492425,0.80385,0.768172,0.705333,0.719477
4,0.0071,1.53116,0.797434,0.754029,0.674897,0.692951
5,0.0055,1.596769,0.815765,0.778778,0.741561,0.739126
6,0.0043,1.605402,0.814849,0.778671,0.72047,0.73003
7,0.0022,1.702171,0.806599,0.731371,0.706146,0.699389
8,0.001,1.847528,0.810266,0.777327,0.719742,0.73215
9,0.0017,1.693916,0.817599,0.769493,0.725547,0.732848
10,0.0006,1.709056,0.811182,0.779762,0.72679,0.734572


[I 2025-03-23 08:20:58,449] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 43}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6929,1.069817,0.802016,0.780561,0.711267,0.727414
2,0.0276,1.282529,0.7956,0.7882,0.679585,0.711243
3,0.0111,1.329448,0.818515,0.788394,0.752872,0.753063
4,0.0086,1.413236,0.819432,0.804566,0.737128,0.754886
5,0.0033,1.513331,0.816682,0.787164,0.765897,0.761874
6,0.003,1.555993,0.820348,0.813929,0.756293,0.767598
7,0.0033,1.621171,0.819432,0.790029,0.761126,0.761498
8,0.0017,1.658409,0.819432,0.776176,0.748171,0.747526
9,0.0009,1.713542,0.826764,0.781027,0.75425,0.753639
10,0.001,1.752093,0.816682,0.770104,0.736933,0.734809


[I 2025-03-23 08:26:02,542] Trial 17 finished with value: 0.7496704469359724 and parameters: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 43}. Best is trial 13 with value: 0.779634660594181.


Trial 18 with params: {'learning_rate': 0.0026868566033176914, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5951,1.188874,0.794684,0.748261,0.677162,0.696297
2,0.0263,1.254473,0.816682,0.786093,0.730452,0.743856
3,0.0112,1.251635,0.825848,0.787147,0.768603,0.769196
4,0.0063,1.326558,0.827681,0.800084,0.75197,0.763148
5,0.0041,1.399493,0.818515,0.758459,0.746647,0.741866
6,0.0035,1.498774,0.822181,0.785681,0.755306,0.759343
7,0.0044,1.648891,0.812099,0.763739,0.746577,0.74512
8,0.003,1.656858,0.813932,0.744783,0.747305,0.7351
9,0.001,1.728976,0.819432,0.760183,0.75407,0.747689
10,0.0004,1.755949,0.825848,0.765039,0.751188,0.749926


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-23 08:31:28,612] Trial 18 finished with value: 0.7584292091170731 and parameters: {'learning_rate': 0.0026868566033176914, 'weight_decay': 0.01, 'warmup_steps': 18}. Best is trial 13 with value: 0.779634660594181.


Trial 19 with params: {'learning_rate': 0.00017098269191031398, 'weight_decay': 0.005, 'warmup_steps': 53}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0549,1.455306,0.633364,0.257532,0.237218,0.221218
2,0.8542,1.117802,0.71494,0.402023,0.393893,0.388793
3,0.4161,1.087906,0.753437,0.5883,0.50897,0.523085
4,0.2073,1.169716,0.741522,0.609103,0.507475,0.535543
5,0.1131,1.253202,0.751604,0.630098,0.579476,0.588848


[I 2025-03-23 08:33:13,720] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0030680129968586258, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5274,1.12666,0.814849,0.773684,0.718919,0.731794
2,0.0239,1.288818,0.810266,0.796575,0.73705,0.754446
3,0.0097,1.362837,0.822181,0.770519,0.762676,0.754853
4,0.0086,1.553317,0.806599,0.740511,0.722531,0.719664
5,0.0055,1.5054,0.811182,0.767866,0.726086,0.732512
6,0.0043,1.615647,0.817599,0.769722,0.746613,0.74443
7,0.0031,1.736788,0.815765,0.747645,0.75275,0.734822
8,0.0014,1.895216,0.808433,0.755725,0.737261,0.732704
9,0.0017,2.039249,0.807516,0.753551,0.727433,0.721885
10,0.0016,2.089985,0.805683,0.759509,0.724744,0.726116


[I 2025-03-23 08:36:20,231] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.004682495750020224, 'weight_decay': 0.01, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4833,1.19052,0.80385,0.775556,0.728413,0.734635
2,0.0224,1.266452,0.818515,0.758083,0.730911,0.732853
3,0.0142,1.598839,0.808433,0.768384,0.736826,0.738827
4,0.0145,1.694582,0.807516,0.766851,0.748214,0.743729
5,0.0069,1.889169,0.810266,0.776158,0.749849,0.748544
6,0.0043,1.9324,0.80385,0.761026,0.715121,0.722082
7,0.0026,2.048628,0.812099,0.751166,0.734902,0.730222
8,0.0016,2.12806,0.809349,0.763132,0.733361,0.731383
9,0.001,2.162999,0.816682,0.779456,0.750908,0.749258
10,0.0012,2.215756,0.810266,0.755685,0.735228,0.729101


[I 2025-03-23 08:40:01,198] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0005022101714155331, 'weight_decay': 0.01, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2679,1.000048,0.764436,0.563212,0.519114,0.523643
2,0.1643,1.223707,0.780935,0.647693,0.671208,0.645249
3,0.0455,1.412023,0.784601,0.704455,0.670336,0.663863
4,0.0211,1.503948,0.788268,0.720057,0.684538,0.683548
5,0.0114,1.542283,0.797434,0.73842,0.694076,0.697382
6,0.0071,1.662594,0.802933,0.729698,0.701184,0.69662
7,0.0051,1.670808,0.806599,0.727696,0.708361,0.704478
8,0.0032,1.789505,0.802016,0.712917,0.695342,0.687789
9,0.0032,1.798412,0.8011,0.68889,0.713058,0.688715
10,0.0017,1.83804,0.80385,0.731397,0.700516,0.70015


[I 2025-03-23 08:44:30,322] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.003396207225612089, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4958,1.155842,0.808433,0.771471,0.742046,0.742075
2,0.0229,1.3366,0.79835,0.778493,0.725552,0.728478
3,0.0125,1.463622,0.807516,0.770046,0.725403,0.732149
4,0.007,1.552222,0.804766,0.748446,0.732034,0.725013
5,0.005,1.68024,0.809349,0.777994,0.748474,0.748204
6,0.0057,1.68535,0.797434,0.764297,0.722388,0.727955
7,0.0054,1.814343,0.806599,0.766275,0.759488,0.746764
8,0.0026,1.874515,0.7956,0.741508,0.74802,0.734812
9,0.0008,1.900164,0.807516,0.754039,0.74558,0.738797
10,0.0008,1.864729,0.808433,0.74979,0.740865,0.734377


[I 2025-03-23 08:47:34,274] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0011443213529451842, 'weight_decay': 0.007, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8399,1.074849,0.785518,0.660256,0.624704,0.623815
2,0.0501,1.257891,0.802933,0.758685,0.711792,0.716886
3,0.0152,1.435646,0.804766,0.753351,0.708941,0.715499
4,0.0098,1.464896,0.791934,0.748356,0.691303,0.697651
5,0.0056,1.690569,0.8011,0.698045,0.697887,0.681326
6,0.0033,1.738487,0.794684,0.709298,0.686611,0.686145
7,0.0021,1.671896,0.8011,0.73449,0.693557,0.699488
8,0.002,1.886913,0.79835,0.778829,0.689929,0.713858
9,0.0022,1.683133,0.807516,0.777741,0.745089,0.744501
10,0.0011,1.74109,0.810266,0.772981,0.733146,0.738238


[I 2025-03-23 08:52:23,088] Trial 24 finished with value: 0.7430091996215838 and parameters: {'learning_rate': 0.0011443213529451842, 'weight_decay': 0.007, 'warmup_steps': 19}. Best is trial 13 with value: 0.779634660594181.


Trial 25 with params: {'learning_rate': 0.0016711971947234283, 'weight_decay': 0.01, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7009,1.120887,0.7956,0.72451,0.658587,0.67286
2,0.0333,1.28292,0.802016,0.774807,0.724043,0.732383
3,0.0128,1.36283,0.804766,0.740907,0.732033,0.722962
4,0.0064,1.541023,0.806599,0.790108,0.729042,0.73913
5,0.0039,1.574768,0.800183,0.75413,0.750576,0.732936
6,0.0041,1.530482,0.807516,0.793576,0.736766,0.748661
7,0.0033,1.486574,0.808433,0.763683,0.725168,0.730234
8,0.0019,1.617234,0.805683,0.760027,0.733844,0.730127
9,0.0016,1.603543,0.79835,0.756328,0.742458,0.724062
10,0.0006,1.734084,0.807516,0.763872,0.740564,0.735811


[I 2025-03-23 08:55:48,181] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.003014055889062914, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5143,1.113861,0.804766,0.804475,0.719492,0.740133
2,0.0266,1.341733,0.802016,0.796542,0.746936,0.753169
3,0.0095,1.40094,0.816682,0.790195,0.770889,0.763475
4,0.0083,1.556226,0.813016,0.756776,0.744956,0.73434
5,0.004,1.788786,0.814849,0.82127,0.757004,0.770791
6,0.0043,1.907423,0.802016,0.792761,0.732265,0.743462
7,0.0034,1.73192,0.810266,0.746331,0.747076,0.732496
8,0.0016,1.796,0.818515,0.777808,0.749268,0.750993
9,0.0012,1.847778,0.812099,0.802869,0.748717,0.758661
10,0.0005,1.794258,0.824015,0.795277,0.757363,0.762633


[I 2025-03-23 09:00:47,305] Trial 26 finished with value: 0.7729609527052977 and parameters: {'learning_rate': 0.003014055889062914, 'weight_decay': 0.001, 'warmup_steps': 3}. Best is trial 13 with value: 0.779634660594181.


Trial 27 with params: {'learning_rate': 0.002635753194028497, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5463,1.152335,0.814849,0.789104,0.748438,0.757958
2,0.028,1.337698,0.815765,0.796896,0.743048,0.749497
3,0.0115,1.372303,0.824015,0.798193,0.747504,0.754379
4,0.0085,1.371634,0.821265,0.761369,0.757739,0.750661
5,0.0049,1.527475,0.822181,0.74725,0.754632,0.73683
6,0.0023,1.63283,0.824931,0.79113,0.765883,0.76564
7,0.0036,1.709661,0.827681,0.765592,0.753316,0.750996
8,0.002,1.739487,0.822181,0.792102,0.747804,0.755408
9,0.0022,1.764032,0.821265,0.758031,0.754427,0.742942
10,0.0006,1.804973,0.823098,0.780169,0.757913,0.757051


[I 2025-03-23 09:06:30,533] Trial 27 finished with value: 0.7616979440325409 and parameters: {'learning_rate': 0.002635753194028497, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 13 with value: 0.779634660594181.


Trial 28 with params: {'learning_rate': 0.004726456682217401, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4683,1.157964,0.808433,0.793945,0.731502,0.747117
2,0.027,1.385195,0.793767,0.803243,0.732024,0.746036
3,0.0171,1.485286,0.808433,0.807365,0.751862,0.759483
4,0.0095,1.514141,0.802016,0.773153,0.760502,0.746524
5,0.0082,1.754296,0.802016,0.751014,0.727469,0.723984


[I 2025-03-23 09:07:58,536] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.004630793727993228, 'weight_decay': 0.001, 'warmup_steps': 34}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5217,1.217504,0.811182,0.786723,0.699666,0.723707
2,0.0241,1.400061,0.791934,0.769172,0.732607,0.734403
3,0.0147,1.573571,0.799267,0.786636,0.743468,0.749147
4,0.0107,1.796769,0.800183,0.770054,0.716887,0.727023
5,0.008,1.860566,0.794684,0.776476,0.744668,0.737815
6,0.006,2.084652,0.800183,0.77527,0.71928,0.732743
7,0.0038,2.163518,0.800183,0.76933,0.738462,0.738649
8,0.0028,2.235659,0.807516,0.779903,0.720455,0.732704
9,0.0032,2.325445,0.793767,0.765982,0.719855,0.723282
10,0.001,2.409752,0.80385,0.779495,0.74781,0.746051


[I 2025-03-23 09:12:25,531] Trial 29 finished with value: 0.7444851092675613 and parameters: {'learning_rate': 0.004630793727993228, 'weight_decay': 0.001, 'warmup_steps': 34}. Best is trial 13 with value: 0.779634660594181.


Trial 30 with params: {'learning_rate': 0.004984317502296361, 'weight_decay': 0.004, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4693,1.283641,0.786434,0.76812,0.70868,0.721506
2,0.0249,1.532685,0.796517,0.777072,0.723292,0.73784
3,0.0158,1.614516,0.804766,0.746383,0.726832,0.720663
4,0.0097,1.799582,0.802933,0.718906,0.722533,0.707922
5,0.0114,2.155432,0.80385,0.762831,0.743877,0.73985
6,0.0087,2.171151,0.794684,0.75228,0.714554,0.721325
7,0.006,2.314867,0.802933,0.778646,0.738321,0.74288
8,0.0026,2.372493,0.796517,0.761115,0.745195,0.740083
9,0.0012,2.45577,0.79835,0.764061,0.758838,0.743416
10,0.0009,2.535129,0.7956,0.760093,0.740317,0.731968


[I 2025-03-23 09:15:30,751] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.002506380782513315, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5659,1.188728,0.79835,0.772235,0.727601,0.73646
2,0.0268,1.324873,0.806599,0.756887,0.725934,0.727261
3,0.0123,1.357028,0.813016,0.77782,0.7526,0.749769
4,0.0071,1.450133,0.818515,0.772508,0.754209,0.749921
5,0.0035,1.468301,0.818515,0.766369,0.746325,0.737268
6,0.0027,1.65594,0.819432,0.75442,0.74596,0.738702
7,0.0036,1.71541,0.813932,0.765031,0.740275,0.734218
8,0.0029,1.815661,0.814849,0.761896,0.739758,0.739789
9,0.0024,1.841856,0.818515,0.784871,0.756008,0.757897
10,0.001,1.859608,0.819432,0.772651,0.740753,0.742618


[I 2025-03-23 09:20:23,326] Trial 31 finished with value: 0.7434748284489434 and parameters: {'learning_rate': 0.002506380782513315, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 13 with value: 0.779634660594181.


Trial 32 with params: {'learning_rate': 0.0018684134344300722, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6355,1.033157,0.802933,0.750498,0.703818,0.711142
2,0.0307,1.360353,0.797434,0.765978,0.729422,0.733091
3,0.013,1.425778,0.8011,0.720966,0.702237,0.698544
4,0.0062,1.586945,0.806599,0.769806,0.736403,0.739309
5,0.006,1.574225,0.809349,0.773625,0.720932,0.732074
6,0.0036,1.662368,0.804766,0.752647,0.714915,0.72208
7,0.0025,1.686843,0.806599,0.774413,0.741981,0.746323
8,0.0014,1.765176,0.814849,0.789351,0.744998,0.752932
9,0.0009,1.951802,0.804766,0.777056,0.754462,0.753459
10,0.0006,1.912475,0.810266,0.750613,0.747352,0.736523


[I 2025-03-23 09:25:04,530] Trial 32 finished with value: 0.7475339785740591 and parameters: {'learning_rate': 0.0018684134344300722, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 13 with value: 0.779634660594181.


Trial 33 with params: {'learning_rate': 0.0011263994981969983, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8257,1.053605,0.793767,0.711469,0.671793,0.676176
2,0.0541,1.255613,0.79835,0.693227,0.695528,0.677991
3,0.0172,1.478835,0.790101,0.707597,0.675997,0.678544
4,0.0083,1.491373,0.79835,0.744759,0.69883,0.702431
5,0.0055,1.495533,0.809349,0.699432,0.713645,0.691144


[I 2025-03-23 09:26:52,369] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.003860319332460499, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5023,1.158407,0.804766,0.780476,0.69595,0.717086
2,0.023,1.331811,0.806599,0.749441,0.736671,0.729049
3,0.0117,1.472198,0.814849,0.77045,0.723501,0.732044
4,0.009,1.65629,0.817599,0.800632,0.754418,0.759386
5,0.0082,1.694439,0.805683,0.755861,0.717076,0.724051
6,0.0034,1.750659,0.809349,0.730301,0.71628,0.711305
7,0.003,1.809451,0.802933,0.759844,0.71661,0.72203
8,0.0013,1.947204,0.80385,0.706861,0.721219,0.705183
9,0.0004,2.000636,0.802933,0.736702,0.716966,0.716756
10,0.0007,1.977137,0.809349,0.759923,0.729424,0.733456


[I 2025-03-23 09:29:51,948] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.004179496680828633, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4777,1.193691,0.800183,0.798785,0.739025,0.749638
2,0.0282,1.355038,0.802933,0.76862,0.744671,0.738656
3,0.0119,1.577486,0.807516,0.772584,0.73874,0.741537
4,0.0096,1.688079,0.806599,0.811448,0.734352,0.754911
5,0.0055,1.685304,0.802933,0.760654,0.723243,0.727541
6,0.0055,1.821715,0.805683,0.758871,0.727258,0.728852
7,0.0035,1.978003,0.804766,0.757614,0.758305,0.746906
8,0.0028,2.093589,0.804766,0.750066,0.737384,0.731276
9,0.0023,2.250566,0.799267,0.7461,0.750681,0.732267
10,0.0016,2.234441,0.804766,0.760864,0.765939,0.745328


[I 2025-03-23 09:34:53,416] Trial 35 finished with value: 0.7479326234912006 and parameters: {'learning_rate': 0.004179496680828633, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 13 with value: 0.779634660594181.


Trial 36 with params: {'learning_rate': 0.0010966304217962216, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8305,1.155026,0.777269,0.678739,0.617755,0.62758
2,0.0544,1.302282,0.791017,0.712062,0.698106,0.692639
3,0.0177,1.506472,0.789184,0.707886,0.688606,0.683074
4,0.0091,1.552545,0.804766,0.782615,0.725995,0.738439
5,0.0045,1.651246,0.793767,0.73461,0.684278,0.694315
6,0.0033,1.709487,0.809349,0.750398,0.697595,0.7084
7,0.0036,1.611933,0.811182,0.755816,0.722181,0.723692
8,0.0031,1.620627,0.805683,0.764596,0.72032,0.726371
9,0.0021,1.665049,0.807516,0.741426,0.716351,0.712242
10,0.0014,1.707931,0.807516,0.720326,0.701105,0.702648


[I 2025-03-23 09:37:48,046] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.004827485420691993, 'weight_decay': 0.0, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4722,1.248196,0.809349,0.75445,0.716398,0.722311
2,0.0252,1.382413,0.812099,0.770919,0.742051,0.734348
3,0.0157,1.564174,0.811182,0.750631,0.739191,0.730061
4,0.0105,1.714787,0.807516,0.791371,0.749788,0.752363
5,0.008,1.731613,0.807516,0.763197,0.766828,0.754381
6,0.0068,1.839373,0.806599,0.763917,0.739238,0.734264
7,0.0031,1.968677,0.813932,0.757438,0.746891,0.739215
8,0.0015,1.969698,0.817599,0.778611,0.752777,0.751957
9,0.0011,2.120498,0.815765,0.754076,0.738686,0.729926
10,0.001,2.055596,0.819432,0.799924,0.753019,0.759554


[I 2025-03-23 09:42:21,281] Trial 37 finished with value: 0.7495559321626888 and parameters: {'learning_rate': 0.004827485420691993, 'weight_decay': 0.0, 'warmup_steps': 9}. Best is trial 13 with value: 0.779634660594181.


Trial 38 with params: {'learning_rate': 0.0019138674507153104, 'weight_decay': 0.0, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6482,1.092198,0.797434,0.742513,0.66268,0.679002
2,0.0315,1.230438,0.810266,0.774583,0.721503,0.727552
3,0.0099,1.342349,0.806599,0.742596,0.728012,0.714811
4,0.0076,1.36125,0.815765,0.769743,0.722224,0.727662
5,0.008,1.44566,0.807516,0.731154,0.730927,0.718699


[I 2025-03-23 09:44:04,599] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 49}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6752,2.17818,0.445463,0.087443,0.104011,0.079912
2,1.7998,1.770579,0.549038,0.149369,0.167827,0.149486
3,1.411,1.539793,0.605866,0.263892,0.222907,0.21212
4,1.1351,1.388474,0.646196,0.279134,0.266966,0.261471
5,0.9266,1.302521,0.661778,0.347208,0.314976,0.312912
6,0.7599,1.240571,0.684693,0.377024,0.35472,0.355307
7,0.6337,1.212997,0.697525,0.422264,0.379235,0.384857
8,0.5392,1.206912,0.710357,0.477389,0.412805,0.420414
9,0.4665,1.202449,0.714024,0.467602,0.423421,0.424834
10,0.4091,1.202296,0.711274,0.453377,0.425069,0.425077


[I 2025-03-23 09:47:08,059] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0021595443303790377, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5861,1.206002,0.788268,0.766408,0.71037,0.721412
2,0.0294,1.323301,0.790101,0.776027,0.738373,0.739884
3,0.0134,1.434784,0.802933,0.744826,0.740751,0.7285
4,0.0068,1.496598,0.812099,0.751471,0.728714,0.722673
5,0.0035,1.622909,0.811182,0.772872,0.743434,0.744261
6,0.0031,1.658084,0.808433,0.728762,0.723631,0.710944
7,0.0034,1.665543,0.808433,0.722544,0.745053,0.719952
8,0.0026,1.761616,0.807516,0.746998,0.708137,0.710373
9,0.0026,1.751211,0.804766,0.771409,0.740168,0.73694
10,0.0007,1.794065,0.806599,0.761495,0.725266,0.726677


[I 2025-03-23 09:50:10,133] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5725,2.10997,0.466544,0.093817,0.115129,0.091676
2,1.7158,1.700491,0.580202,0.194271,0.190066,0.175419
3,1.3195,1.478776,0.63428,0.279009,0.247615,0.238374
4,1.0437,1.344764,0.659945,0.320648,0.280211,0.278106
5,0.8325,1.261806,0.683776,0.380942,0.34475,0.346675
6,0.6649,1.210523,0.702108,0.425344,0.396462,0.395864
7,0.5398,1.197605,0.716774,0.472346,0.425588,0.426353
8,0.449,1.19461,0.71494,0.480707,0.427512,0.430206
9,0.3812,1.207783,0.71494,0.501548,0.44978,0.458979
10,0.3276,1.206027,0.722273,0.492207,0.462367,0.465947


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-23 09:53:38,485] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0038640888535385835, 'weight_decay': 0.002, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4876,1.170691,0.816682,0.804153,0.734661,0.756091
2,0.0241,1.373049,0.794684,0.762154,0.723553,0.725434
3,0.0114,1.494138,0.797434,0.786987,0.751547,0.755871
4,0.0101,1.738419,0.811182,0.768804,0.753702,0.742174
5,0.0059,1.761809,0.818515,0.774015,0.754433,0.744261
6,0.0057,1.943116,0.809349,0.727384,0.740774,0.723517
7,0.0034,2.100062,0.809349,0.74647,0.737611,0.727352
8,0.0025,2.075096,0.813016,0.769577,0.75406,0.747237
9,0.0012,2.145041,0.806599,0.763401,0.751766,0.747549
10,0.0009,2.20751,0.813016,0.759005,0.762106,0.750065


[I 2025-03-23 09:58:11,914] Trial 42 finished with value: 0.743439264119405 and parameters: {'learning_rate': 0.0038640888535385835, 'weight_decay': 0.002, 'warmup_steps': 8}. Best is trial 13 with value: 0.779634660594181.


Trial 43 with params: {'learning_rate': 0.004406905284977779, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4638,1.265581,0.7956,0.761871,0.711291,0.72525
2,0.0267,1.363743,0.815765,0.802102,0.722987,0.744083
3,0.0128,1.512166,0.810266,0.767666,0.757008,0.751622
4,0.0107,1.615543,0.80385,0.793102,0.754615,0.757792
5,0.0086,1.757625,0.815765,0.764457,0.74319,0.743237
6,0.004,1.829646,0.817599,0.782352,0.738303,0.746528
7,0.0026,2.152097,0.797434,0.752501,0.73038,0.726895
8,0.003,2.118153,0.796517,0.748656,0.71223,0.717161
9,0.0048,2.17749,0.787351,0.753854,0.735533,0.734652
10,0.0013,2.36351,0.785518,0.75518,0.735976,0.732692


[I 2025-03-23 10:02:45,615] Trial 43 finished with value: 0.7335659762675205 and parameters: {'learning_rate': 0.004406905284977779, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 13 with value: 0.779634660594181.


Trial 44 with params: {'learning_rate': 0.0024432682280190778, 'weight_decay': 0.008, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6056,1.125771,0.799267,0.781772,0.72044,0.737388
2,0.0253,1.314052,0.809349,0.786215,0.716165,0.732304
3,0.0118,1.439166,0.802016,0.772769,0.723188,0.73519
4,0.0061,1.524253,0.815765,0.772882,0.734107,0.736231
5,0.0046,1.543579,0.806599,0.762683,0.718617,0.722558


[I 2025-03-23 10:04:14,191] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0024791706391598892, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5617,1.214166,0.793767,0.756081,0.69107,0.704024
2,0.026,1.379372,0.808433,0.75517,0.71625,0.719964
3,0.0108,1.352163,0.813932,0.749445,0.730166,0.725114
4,0.0071,1.385755,0.813016,0.799038,0.742621,0.756936
5,0.0048,1.59173,0.813016,0.774182,0.760875,0.750099
6,0.0031,1.544003,0.827681,0.793492,0.766943,0.768306
7,0.0048,1.636619,0.813932,0.782543,0.728569,0.738361
8,0.0006,1.686659,0.813932,0.772967,0.726786,0.735743
9,0.0007,1.764671,0.826764,0.77596,0.74919,0.749152
10,0.0012,1.639112,0.823098,0.810277,0.76421,0.772812


[I 2025-03-23 10:09:14,796] Trial 45 finished with value: 0.753084808919524 and parameters: {'learning_rate': 0.0024791706391598892, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 13 with value: 0.779634660594181.


Trial 46 with params: {'learning_rate': 0.004971057573624711, 'weight_decay': 0.01, 'warmup_steps': 35}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5143,1.213375,0.808433,0.779326,0.718487,0.731146
2,0.0249,1.404993,0.810266,0.768474,0.735393,0.737119
3,0.0141,1.524786,0.799267,0.78521,0.722887,0.737888
4,0.0114,1.825702,0.806599,0.760291,0.71216,0.717484
5,0.009,1.751116,0.809349,0.763588,0.721542,0.729188
6,0.0066,1.962123,0.809349,0.78246,0.723373,0.73347
7,0.0067,2.142573,0.805683,0.742934,0.72527,0.720992
8,0.0033,2.224216,0.813932,0.766681,0.732727,0.729738
9,0.0017,2.146095,0.812099,0.767857,0.736293,0.733319
10,0.0008,2.099426,0.816682,0.780847,0.729492,0.736705


[I 2025-03-23 10:12:23,937] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0038584186109008483, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4981,1.172321,0.802016,0.73838,0.683612,0.691954
2,0.0227,1.217766,0.815765,0.766381,0.765859,0.746397
3,0.012,1.408377,0.813932,0.760386,0.747956,0.727659
4,0.0096,1.441671,0.821265,0.749771,0.764958,0.744948
5,0.0047,1.74276,0.814849,0.763437,0.748486,0.73786
6,0.0045,1.795202,0.815765,0.795732,0.779851,0.776032
7,0.0034,1.951021,0.811182,0.777848,0.768531,0.759306
8,0.0041,1.908769,0.822181,0.777751,0.768396,0.759611
9,0.0027,2.075082,0.816682,0.745763,0.759871,0.742534
10,0.0014,1.993261,0.818515,0.784804,0.783454,0.769087


[I 2025-03-23 10:18:00,417] Trial 47 finished with value: 0.760532568762144 and parameters: {'learning_rate': 0.0038584186109008483, 'weight_decay': 0.008, 'warmup_steps': 13}. Best is trial 13 with value: 0.779634660594181.


Trial 48 with params: {'learning_rate': 0.0010383698487536448, 'weight_decay': 0.002, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8804,1.050853,0.8011,0.71082,0.646467,0.661252
2,0.0572,1.267416,0.80385,0.749453,0.690312,0.705401
3,0.0173,1.484474,0.800183,0.736447,0.680344,0.694071
4,0.0085,1.516734,0.804766,0.779101,0.705787,0.722913
5,0.0045,1.727074,0.797434,0.755226,0.719512,0.720041
6,0.0055,1.603816,0.813932,0.794005,0.736791,0.746784
7,0.0041,1.563945,0.809349,0.751415,0.706284,0.712383
8,0.0019,1.703203,0.813932,0.770848,0.721217,0.72849
9,0.0021,1.744693,0.8011,0.762933,0.721289,0.722756
10,0.0014,1.823689,0.804766,0.774995,0.730972,0.738591


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-23 10:23:43,073] Trial 48 finished with value: 0.7403669343032981 and parameters: {'learning_rate': 0.0010383698487536448, 'weight_decay': 0.002, 'warmup_steps': 14}. Best is trial 13 with value: 0.779634660594181.


Trial 49 with params: {'learning_rate': 0.0012863044292388266, 'weight_decay': 0.007, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.78,1.080247,0.792851,0.667754,0.617399,0.630787
2,0.0471,1.32792,0.791934,0.715085,0.676672,0.681046
3,0.0127,1.484968,0.79835,0.730496,0.719999,0.712067
4,0.0105,1.473054,0.810266,0.765506,0.728723,0.73683
5,0.0044,1.5175,0.816682,0.767079,0.712034,0.722733


[I 2025-03-23 10:25:15,724] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.004355234699071093, 'weight_decay': 0.008, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4774,1.193135,0.805683,0.790249,0.7383,0.749572
2,0.0225,1.414309,0.787351,0.676544,0.660705,0.654078
3,0.015,1.565872,0.802016,0.769269,0.728875,0.732347
4,0.0092,1.78065,0.799267,0.759796,0.743506,0.729144
5,0.0088,2.038741,0.799267,0.767014,0.757331,0.746874
6,0.0061,1.880713,0.802933,0.758813,0.760592,0.74722
7,0.0061,1.888576,0.812099,0.794477,0.753482,0.75802
8,0.002,2.037581,0.817599,0.786954,0.786628,0.774135
9,0.0006,2.062641,0.816682,0.787759,0.780033,0.769232
10,0.0003,2.082693,0.818515,0.792408,0.782556,0.775542


[I 2025-03-23 10:30:00,559] Trial 50 finished with value: 0.7761613922375242 and parameters: {'learning_rate': 0.004355234699071093, 'weight_decay': 0.008, 'warmup_steps': 15}. Best is trial 13 with value: 0.779634660594181.


Trial 51 with params: {'learning_rate': 0.004843208211127261, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4825,1.176789,0.817599,0.780625,0.716259,0.729792
2,0.0237,1.430037,0.804766,0.764127,0.723588,0.729521
3,0.0134,1.653817,0.811182,0.76466,0.728693,0.730001
4,0.0101,1.624285,0.810266,0.763864,0.7263,0.731649
5,0.0119,1.846046,0.791017,0.731832,0.71179,0.708155
6,0.0075,1.967184,0.8011,0.744248,0.706594,0.710231
7,0.0032,2.129177,0.792851,0.726722,0.70192,0.697276
8,0.0025,2.093289,0.79835,0.738711,0.703534,0.703242
9,0.002,2.120218,0.796517,0.741587,0.714263,0.712782
10,0.0007,2.191417,0.787351,0.71985,0.722168,0.70365


[I 2025-03-23 10:33:15,248] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 6.1005881023266626e-05, 'weight_decay': 0.007, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6047,2.145963,0.453712,0.09544,0.108739,0.08521
2,1.7611,1.74082,0.557287,0.15171,0.173346,0.155091
3,1.3729,1.509688,0.622365,0.274643,0.23722,0.227761
4,1.0984,1.363811,0.648029,0.289976,0.271594,0.269018
5,0.8885,1.280502,0.678277,0.372624,0.332703,0.332022


[I 2025-03-23 10:34:31,922] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.003038200183162917, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5364,1.160966,0.815765,0.772787,0.742398,0.744159
2,0.0236,1.234668,0.812099,0.781455,0.781111,0.765839
3,0.0116,1.277538,0.819432,0.786284,0.769561,0.762828
4,0.0076,1.448502,0.815765,0.779054,0.725704,0.729251
5,0.0048,1.595866,0.816682,0.790695,0.753613,0.756613
6,0.0042,1.663271,0.815765,0.802309,0.760066,0.768531
7,0.0038,1.778995,0.816682,0.752809,0.743071,0.73432
8,0.0032,1.845986,0.804766,0.760473,0.754277,0.742557
9,0.0016,1.863276,0.808433,0.784291,0.736159,0.744417
10,0.0008,1.888594,0.817599,0.794384,0.763201,0.766518


[I 2025-03-23 10:39:10,443] Trial 53 finished with value: 0.7682044996535828 and parameters: {'learning_rate': 0.003038200183162917, 'weight_decay': 0.008, 'warmup_steps': 11}. Best is trial 13 with value: 0.779634660594181.


Trial 54 with params: {'learning_rate': 0.000403916017640712, 'weight_decay': 0.0, 'warmup_steps': 39}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4394,1.037761,0.747021,0.492242,0.453512,0.459329
2,0.2417,1.145301,0.785518,0.679187,0.641118,0.647305
3,0.0683,1.3742,0.762603,0.617951,0.623742,0.608805
4,0.0277,1.498208,0.785518,0.675863,0.64227,0.640676
5,0.0158,1.570726,0.785518,0.658279,0.635526,0.635671


[I 2025-03-23 10:40:46,854] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.003818518023295936, 'weight_decay': 0.007, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4879,1.1453,0.815765,0.786375,0.748509,0.750857
2,0.0253,1.298261,0.810266,0.775678,0.749155,0.750118
3,0.011,1.546445,0.804766,0.785191,0.749041,0.753943
4,0.0097,1.559726,0.804766,0.775055,0.753501,0.749703
5,0.0086,1.697779,0.812099,0.780504,0.740972,0.744954
6,0.0078,1.742282,0.809349,0.762049,0.744123,0.743646
7,0.0043,1.967928,0.807516,0.751117,0.745773,0.73414
8,0.0024,1.993317,0.807516,0.781027,0.749019,0.752681
9,0.0021,2.012321,0.807516,0.777283,0.747878,0.750123
10,0.001,2.077168,0.808433,0.773936,0.743621,0.745255


[I 2025-03-23 10:45:47,017] Trial 55 finished with value: 0.7453721633983119 and parameters: {'learning_rate': 0.003818518023295936, 'weight_decay': 0.007, 'warmup_steps': 2}. Best is trial 13 with value: 0.779634660594181.


Trial 56 with params: {'learning_rate': 0.004353069464048556, 'weight_decay': 0.007, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4883,1.131044,0.813932,0.777462,0.716456,0.733621
2,0.0223,1.53206,0.799267,0.771795,0.716863,0.728693
3,0.0152,1.509876,0.802933,0.755442,0.7366,0.730658
4,0.0137,1.723907,0.800183,0.76515,0.735941,0.737234
5,0.0067,1.91018,0.793767,0.776508,0.766258,0.747176
6,0.0041,1.91274,0.814849,0.773917,0.762777,0.752053
7,0.0036,2.062488,0.809349,0.771263,0.770377,0.756713
8,0.0033,2.012957,0.810266,0.769353,0.761033,0.745951
9,0.0015,1.987703,0.820348,0.780153,0.772503,0.756531
10,0.0005,2.037271,0.816682,0.776209,0.760524,0.750404


[I 2025-03-23 10:50:42,798] Trial 56 finished with value: 0.7610967700498602 and parameters: {'learning_rate': 0.004353069464048556, 'weight_decay': 0.007, 'warmup_steps': 17}. Best is trial 13 with value: 0.779634660594181.


Trial 57 with params: {'learning_rate': 0.0005415956091573914, 'weight_decay': 0.007, 'warmup_steps': 48}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.266,1.002697,0.765353,0.579606,0.534555,0.538767
2,0.1457,1.268991,0.786434,0.664609,0.656605,0.646964
3,0.0415,1.441624,0.779102,0.731383,0.659457,0.674971
4,0.0194,1.445254,0.791017,0.725381,0.698979,0.697176
5,0.01,1.602303,0.804766,0.738734,0.732425,0.716422
6,0.0062,1.658326,0.800183,0.754511,0.708615,0.711106
7,0.0056,1.576107,0.806599,0.748487,0.718733,0.721042
8,0.0033,1.745488,0.805683,0.758863,0.694378,0.706516
9,0.0033,1.732007,0.806599,0.751736,0.71494,0.717045
10,0.0017,1.857234,0.807516,0.742069,0.717507,0.712672


[I 2025-03-23 10:54:02,494] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0037715393130696977, 'weight_decay': 0.0, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5047,1.152008,0.814849,0.784943,0.738185,0.746607
2,0.024,1.370697,0.806599,0.772043,0.735751,0.732965
3,0.0128,1.431583,0.818515,0.789578,0.770844,0.760352
4,0.0101,1.619916,0.809349,0.790801,0.748092,0.749498
5,0.0055,1.796386,0.813016,0.760372,0.745497,0.732356
6,0.003,1.856862,0.819432,0.803567,0.76259,0.769631
7,0.0039,1.87095,0.812099,0.778242,0.732107,0.735428
8,0.0022,1.900902,0.820348,0.794242,0.74816,0.755677
9,0.0014,1.964376,0.809349,0.752262,0.734457,0.728586
10,0.0014,2.064843,0.80385,0.769835,0.731615,0.733887


[I 2025-03-23 10:58:46,643] Trial 58 finished with value: 0.7374388466785846 and parameters: {'learning_rate': 0.0037715393130696977, 'weight_decay': 0.0, 'warmup_steps': 18}. Best is trial 13 with value: 0.779634660594181.


Trial 59 with params: {'learning_rate': 0.0002471824952041614, 'weight_decay': 0.001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7532,1.239599,0.681943,0.369112,0.305817,0.303324
2,0.5463,1.090577,0.747938,0.580671,0.493943,0.519319
3,0.198,1.150612,0.759853,0.597372,0.547411,0.557485
4,0.0858,1.266178,0.769936,0.671279,0.600211,0.616175
5,0.0421,1.416942,0.775435,0.679753,0.633885,0.639479


[I 2025-03-23 11:00:23,193] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.0011700191952905836, 'weight_decay': 0.003, 'warmup_steps': 52}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8968,1.041215,0.80385,0.746731,0.688578,0.69322
2,0.0474,1.247054,0.791934,0.695471,0.695424,0.685682
3,0.0141,1.391679,0.808433,0.710483,0.732315,0.711353
4,0.0083,1.4431,0.814849,0.790015,0.72481,0.736203
5,0.0059,1.5023,0.809349,0.747607,0.707452,0.715623
6,0.0039,1.620778,0.813016,0.752437,0.709156,0.718758
7,0.0027,1.599041,0.822181,0.797249,0.730559,0.747188
8,0.0014,1.670292,0.819432,0.776837,0.735502,0.739505
9,0.0019,1.61935,0.820348,0.784325,0.739026,0.748928
10,0.0009,1.716976,0.813016,0.778093,0.731638,0.74195


[I 2025-03-23 11:04:48,262] Trial 60 finished with value: 0.7381930091258846 and parameters: {'learning_rate': 0.0011700191952905836, 'weight_decay': 0.003, 'warmup_steps': 52}. Best is trial 13 with value: 0.779634660594181.


Trial 61 with params: {'learning_rate': 0.0027645426758578515, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5872,1.199222,0.80385,0.77347,0.716991,0.729341
2,0.0272,1.230061,0.811182,0.758828,0.759678,0.745225
3,0.0113,1.368437,0.815765,0.775189,0.72475,0.732427
4,0.005,1.460644,0.815765,0.775013,0.729409,0.742671
5,0.0031,1.538176,0.813016,0.754375,0.729917,0.725829


[I 2025-03-23 11:06:12,587] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.003372930008215549, 'weight_decay': 0.006, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5527,1.175596,0.804766,0.803885,0.724125,0.745246
2,0.0242,1.274807,0.829514,0.795203,0.733245,0.74786
3,0.0112,1.586537,0.813016,0.792699,0.721297,0.73326
4,0.0075,1.571286,0.810266,0.783899,0.733578,0.740744
5,0.0039,1.634659,0.815765,0.782162,0.752648,0.752089
6,0.0055,1.7613,0.807516,0.799797,0.764612,0.769104
7,0.0065,1.702921,0.812099,0.770171,0.735492,0.73795
8,0.0026,1.837529,0.817599,0.803446,0.747046,0.753775
9,0.0015,1.789244,0.819432,0.79444,0.752733,0.756764
10,0.0004,1.814325,0.826764,0.791732,0.761889,0.759865


[I 2025-03-23 11:11:02,145] Trial 62 finished with value: 0.757307476143584 and parameters: {'learning_rate': 0.003372930008215549, 'weight_decay': 0.006, 'warmup_steps': 27}. Best is trial 13 with value: 0.779634660594181.


Trial 63 with params: {'learning_rate': 0.0031188700880251234, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5318,1.15128,0.802933,0.761388,0.704574,0.714732
2,0.023,1.355713,0.806599,0.791449,0.752881,0.752564
3,0.0086,1.471722,0.820348,0.796904,0.761114,0.765252
4,0.0081,1.580192,0.802933,0.764126,0.744148,0.738352
5,0.006,1.589945,0.8011,0.752312,0.776386,0.750213
6,0.0043,1.751604,0.814849,0.771003,0.76641,0.749065
7,0.0034,1.776834,0.811182,0.813154,0.768935,0.776932
8,0.0017,1.916103,0.811182,0.805112,0.729917,0.749672
9,0.0007,1.920663,0.819432,0.796766,0.778156,0.770362
10,0.0003,1.9513,0.824015,0.800059,0.75333,0.75818


[I 2025-03-23 11:15:37,103] Trial 63 finished with value: 0.7646580413752853 and parameters: {'learning_rate': 0.0031188700880251234, 'weight_decay': 0.008, 'warmup_steps': 11}. Best is trial 13 with value: 0.779634660594181.


Trial 64 with params: {'learning_rate': 0.002757564439707834, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5555,1.216767,0.790101,0.725469,0.681588,0.682572
2,0.0252,1.282326,0.805683,0.774076,0.763144,0.756946
3,0.01,1.415437,0.811182,0.770643,0.733332,0.737258
4,0.0077,1.463648,0.802016,0.777476,0.738555,0.740671
5,0.0039,1.653231,0.814849,0.800112,0.746133,0.755535
6,0.0037,1.673394,0.804766,0.771402,0.724372,0.72739
7,0.002,1.69736,0.813016,0.761587,0.762637,0.750636
8,0.0011,1.813727,0.815765,0.776523,0.763216,0.754095
9,0.0018,1.818839,0.811182,0.746668,0.750075,0.73665
10,0.002,1.862398,0.805683,0.762616,0.758571,0.747224


[I 2025-03-23 11:20:13,466] Trial 64 finished with value: 0.7403078306934053 and parameters: {'learning_rate': 0.002757564439707834, 'weight_decay': 0.008, 'warmup_steps': 10}. Best is trial 13 with value: 0.779634660594181.


Trial 65 with params: {'learning_rate': 0.0009290880813825519, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9116,1.101243,0.777269,0.647001,0.606246,0.610203
2,0.0678,1.323919,0.789184,0.693948,0.678671,0.671232
3,0.0188,1.450952,0.799267,0.711756,0.674414,0.673346
4,0.0102,1.618023,0.799267,0.753539,0.690098,0.701704
5,0.0063,1.572473,0.800183,0.728679,0.700345,0.695269


[I 2025-03-23 11:21:51,249] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.00018223195416198636, 'weight_decay': 0.01, 'warmup_steps': 39}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9917,1.416579,0.635197,0.256039,0.242489,0.225972
2,0.8024,1.101826,0.724106,0.415659,0.419199,0.410177
3,0.3807,1.089176,0.752521,0.569121,0.513676,0.521669
4,0.186,1.183014,0.749771,0.614776,0.548182,0.567217
5,0.0998,1.260585,0.753437,0.638165,0.600644,0.607301
6,0.0583,1.347587,0.752521,0.639935,0.600718,0.601311
7,0.0346,1.45402,0.757104,0.624253,0.607322,0.604769
8,0.0236,1.528239,0.756187,0.628626,0.623267,0.616346
9,0.0166,1.568274,0.756187,0.621657,0.622131,0.613039
10,0.012,1.62342,0.76077,0.640497,0.617225,0.615302


[I 2025-03-23 11:25:02,250] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 0.0018818788890061495, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6373,1.154816,0.792851,0.698814,0.671834,0.67348
2,0.0313,1.271776,0.802016,0.773109,0.700622,0.709385
3,0.0126,1.347043,0.816682,0.805488,0.729697,0.74919
4,0.0078,1.521112,0.807516,0.728708,0.711955,0.706138
5,0.005,1.553158,0.808433,0.719043,0.720654,0.706168


[I 2025-03-23 11:26:30,440] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0027993438371199044, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5404,1.119609,0.806599,0.764161,0.715729,0.725939
2,0.0276,1.231793,0.804766,0.758124,0.743066,0.740042
3,0.0113,1.563568,0.805683,0.774157,0.757849,0.746342
4,0.0064,1.562569,0.80385,0.759483,0.736705,0.730929
5,0.0037,1.532834,0.806599,0.754942,0.757513,0.747705
6,0.0032,1.714067,0.808433,0.765314,0.722783,0.726867
7,0.0055,1.713494,0.800183,0.749395,0.720963,0.725215
8,0.0036,1.628622,0.817599,0.746497,0.769962,0.74878
9,0.0011,1.740757,0.80385,0.724922,0.763842,0.730748
10,0.0006,1.755144,0.811182,0.749366,0.768747,0.746299


[I 2025-03-23 11:31:10,652] Trial 68 finished with value: 0.7418444366331671 and parameters: {'learning_rate': 0.0027993438371199044, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 13 with value: 0.779634660594181.


Trial 69 with params: {'learning_rate': 0.0016709698577836338, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6947,1.169829,0.791934,0.68253,0.646758,0.646599
2,0.0316,1.297198,0.7956,0.775291,0.688609,0.706845
3,0.0126,1.515734,0.804766,0.750296,0.724466,0.71935
4,0.0082,1.390714,0.810266,0.791126,0.729492,0.742269
5,0.0046,1.494376,0.804766,0.747949,0.709378,0.714782
6,0.002,1.628044,0.807516,0.789916,0.707992,0.733189
7,0.0049,1.584133,0.814849,0.770991,0.722101,0.73053
8,0.0022,1.714096,0.80385,0.782627,0.719728,0.733192
9,0.0015,1.712901,0.797434,0.749011,0.714188,0.711863
10,0.0011,1.703008,0.799267,0.736855,0.721489,0.712953


[I 2025-03-23 11:34:22,899] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0049092627494412115, 'weight_decay': 0.003, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4784,1.10793,0.820348,0.769152,0.738059,0.741612
2,0.0242,1.455427,0.813932,0.776487,0.721582,0.726745
3,0.017,1.566537,0.816682,0.801603,0.752827,0.762323
4,0.0127,1.802661,0.810266,0.800526,0.754422,0.764548
5,0.0083,1.942418,0.805683,0.787227,0.72994,0.742804
6,0.0051,2.071431,0.797434,0.74928,0.754093,0.735174
7,0.0042,2.167802,0.80385,0.747773,0.749105,0.730761
8,0.0033,2.240275,0.805683,0.760944,0.73454,0.733784
9,0.0028,2.319421,0.808433,0.783959,0.745497,0.749986
10,0.001,2.301008,0.813016,0.773458,0.757209,0.751635


[I 2025-03-23 11:39:03,354] Trial 70 finished with value: 0.7507127004186763 and parameters: {'learning_rate': 0.0049092627494412115, 'weight_decay': 0.003, 'warmup_steps': 5}. Best is trial 13 with value: 0.779634660594181.


Trial 71 with params: {'learning_rate': 0.00496296059767105, 'weight_decay': 0.007, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4686,1.294675,0.805683,0.78515,0.734938,0.746812
2,0.0263,1.477303,0.80385,0.733852,0.711396,0.697682
3,0.0138,1.660707,0.80385,0.71056,0.692331,0.68753
4,0.012,1.657015,0.80385,0.710165,0.723416,0.701608
5,0.0062,1.952494,0.807516,0.763991,0.734154,0.732603
6,0.0095,2.111265,0.8011,0.739459,0.731636,0.715083
7,0.0065,2.22744,0.80385,0.747877,0.724474,0.713748
8,0.0045,2.313539,0.805683,0.749041,0.719934,0.715368
9,0.002,2.462088,0.802933,0.721437,0.723904,0.710021
10,0.0018,2.292375,0.808433,0.752508,0.738024,0.73162


[I 2025-03-23 11:42:02,757] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0025151783920388717, 'weight_decay': 0.006, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5845,1.179897,0.800183,0.793192,0.692355,0.718949
2,0.0259,1.261829,0.818515,0.733943,0.735351,0.723347
3,0.0111,1.378578,0.813932,0.753951,0.731166,0.728623
4,0.0052,1.458997,0.828598,0.757529,0.758297,0.745974
5,0.0048,1.562424,0.829514,0.781638,0.742807,0.745614
6,0.0039,1.523189,0.828598,0.790921,0.784822,0.778771
7,0.003,1.641895,0.818515,0.773067,0.77136,0.756192
8,0.0019,1.796531,0.824015,0.805805,0.744969,0.761975
9,0.0022,1.736333,0.815765,0.786339,0.746448,0.750952
10,0.0013,1.732137,0.828598,0.77293,0.770752,0.757471


[I 2025-03-23 11:46:24,094] Trial 72 finished with value: 0.7611890116141211 and parameters: {'learning_rate': 0.0025151783920388717, 'weight_decay': 0.006, 'warmup_steps': 13}. Best is trial 13 with value: 0.779634660594181.


Trial 73 with params: {'learning_rate': 0.0018187748036461002, 'weight_decay': 0.005, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6742,1.11252,0.807516,0.713811,0.667289,0.674737
2,0.0315,1.325914,0.802933,0.764977,0.736449,0.733118
3,0.0112,1.444617,0.805683,0.764077,0.7272,0.731665
4,0.0093,1.524935,0.807516,0.784454,0.728124,0.737413
5,0.0047,1.510333,0.812099,0.781217,0.741305,0.742637
6,0.0029,1.690436,0.809349,0.778865,0.696399,0.716922
7,0.0025,1.669828,0.814849,0.761636,0.72923,0.734011
8,0.0015,1.750695,0.805683,0.773294,0.734289,0.733785
9,0.0013,1.803229,0.802016,0.749148,0.708564,0.709381
10,0.0011,1.729207,0.810266,0.763696,0.716386,0.72439


[I 2025-03-23 11:49:21,541] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0024034860130271573, 'weight_decay': 0.006, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5848,1.186096,0.799267,0.788791,0.70311,0.724449
2,0.0274,1.396482,0.807516,0.784602,0.72318,0.7358
3,0.0116,1.482134,0.8011,0.754943,0.717721,0.720111
4,0.0064,1.523942,0.80385,0.768234,0.7452,0.740615
5,0.0058,1.528141,0.811182,0.789278,0.715705,0.729643
6,0.006,1.582404,0.810266,0.760423,0.722807,0.727141
7,0.003,1.671153,0.806599,0.758133,0.722917,0.725635
8,0.0006,1.747307,0.805683,0.777119,0.720267,0.733504
9,0.0008,1.69061,0.818515,0.768033,0.737457,0.739585
10,0.0013,1.688484,0.815765,0.778674,0.729266,0.736257


[I 2025-03-23 11:53:54,379] Trial 74 finished with value: 0.7373535075423672 and parameters: {'learning_rate': 0.0024034860130271573, 'weight_decay': 0.006, 'warmup_steps': 8}. Best is trial 13 with value: 0.779634660594181.


Trial 75 with params: {'learning_rate': 0.004422010877558593, 'weight_decay': 0.0, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4694,1.255906,0.804766,0.75522,0.723681,0.725401
2,0.0256,1.342613,0.820348,0.808912,0.764468,0.771382
3,0.0128,1.512272,0.818515,0.760956,0.77792,0.757658
4,0.0061,1.81495,0.802016,0.760529,0.754748,0.739239
5,0.0097,1.756705,0.80385,0.760919,0.729182,0.731156
6,0.0079,2.031118,0.814849,0.772906,0.737496,0.738634
7,0.0049,2.128639,0.806599,0.758464,0.748831,0.74078
8,0.0017,2.181846,0.810266,0.760236,0.747788,0.737349
9,0.0008,2.281986,0.805683,0.749766,0.733195,0.724893
10,0.001,2.183625,0.805683,0.739084,0.750213,0.730873


[I 2025-03-23 11:57:02,813] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.00024402219890752626, 'weight_decay': 0.005, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.764,1.25775,0.668194,0.333927,0.291226,0.287078
2,0.5646,1.05749,0.757104,0.565224,0.513444,0.525914
3,0.2068,1.141331,0.764436,0.596456,0.558027,0.563621
4,0.0888,1.26707,0.765353,0.631767,0.576442,0.58498
5,0.0455,1.378479,0.774519,0.681351,0.621763,0.634682
6,0.0254,1.464264,0.779102,0.666354,0.629764,0.624803
7,0.0152,1.526834,0.773602,0.632821,0.624974,0.612805
8,0.011,1.554708,0.781852,0.643828,0.649615,0.635971
9,0.0084,1.572502,0.779102,0.648292,0.647423,0.63711
10,0.0052,1.711874,0.782768,0.65813,0.646314,0.636426


[I 2025-03-23 12:00:08,255] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.004491724688092141, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.475,1.083874,0.813932,0.769682,0.724021,0.730386
2,0.0241,1.358401,0.804766,0.772622,0.751027,0.742141
3,0.0129,1.535531,0.802933,0.756912,0.745997,0.738433
4,0.0107,1.707307,0.7956,0.733679,0.724321,0.71534
5,0.0091,1.952965,0.79835,0.770711,0.733201,0.735077
6,0.0059,2.00432,0.802933,0.757658,0.733597,0.728231
7,0.0045,2.099762,0.811182,0.787404,0.752271,0.751412
8,0.0021,2.147537,0.806599,0.770561,0.738902,0.734335
9,0.0017,2.149896,0.811182,0.766186,0.751307,0.744181
10,0.0016,2.049294,0.804766,0.757608,0.746112,0.737741


[I 2025-03-23 12:03:22,675] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.0018128339749173702, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6533,1.064533,0.802016,0.741325,0.672419,0.690963
2,0.0327,1.279767,0.7956,0.743524,0.703949,0.708921
3,0.0113,1.323216,0.809349,0.745956,0.705715,0.708656
4,0.0077,1.441897,0.809349,0.797638,0.715951,0.734558
5,0.0054,1.3532,0.820348,0.747098,0.740981,0.730036
6,0.0033,1.504168,0.817599,0.725388,0.724319,0.716422
7,0.0029,1.591303,0.815765,0.756927,0.731942,0.730992
8,0.0018,1.711283,0.810266,0.737552,0.718686,0.711493
9,0.0015,1.663707,0.810266,0.764808,0.747176,0.743168
10,0.0007,1.755438,0.819432,0.772205,0.741199,0.74408


[I 2025-03-23 12:07:55,780] Trial 78 finished with value: 0.7575626226956108 and parameters: {'learning_rate': 0.0018128339749173702, 'weight_decay': 0.0, 'warmup_steps': 4}. Best is trial 13 with value: 0.779634660594181.


Trial 79 with params: {'learning_rate': 0.001897697386667094, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6631,1.074101,0.802933,0.704419,0.658916,0.661704
2,0.0305,1.28437,0.800183,0.781438,0.684365,0.704326
3,0.0112,1.385556,0.808433,0.724065,0.718964,0.709057
4,0.0065,1.451129,0.819432,0.770012,0.718967,0.722718
5,0.0052,1.562389,0.799267,0.744677,0.730118,0.721994
6,0.0034,1.573673,0.815765,0.794765,0.733895,0.747927
7,0.0037,1.65271,0.817599,0.788544,0.701709,0.718651
8,0.0017,1.719537,0.824931,0.779384,0.721645,0.730016
9,0.0006,1.753744,0.824931,0.776337,0.71648,0.729889
10,0.0006,1.748398,0.824015,0.76702,0.70746,0.716678


[I 2025-03-23 12:11:13,319] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0017995533248948395, 'weight_decay': 0.001, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6823,1.047378,0.806599,0.725115,0.695279,0.699593
2,0.0325,1.280012,0.79835,0.762737,0.695155,0.709905
3,0.0112,1.454636,0.797434,0.769718,0.747717,0.743328
4,0.0074,1.510691,0.80385,0.771189,0.699379,0.717885
5,0.004,1.55949,0.809349,0.770064,0.7312,0.734838
6,0.0061,1.58462,0.819432,0.783422,0.724249,0.739154
7,0.0035,1.666021,0.810266,0.777504,0.736688,0.739683
8,0.0019,1.723106,0.809349,0.772608,0.737228,0.737165
9,0.0009,1.834213,0.804766,0.745189,0.729503,0.723786
10,0.001,1.786196,0.809349,0.746374,0.738772,0.729838


[I 2025-03-23 12:14:08,435] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.004566103424056557, 'weight_decay': 0.006, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4687,1.143381,0.810266,0.80386,0.707113,0.733323
2,0.0242,1.336883,0.809349,0.785672,0.751905,0.75137
3,0.0119,1.632978,0.7956,0.749429,0.713938,0.719377
4,0.015,1.719161,0.791934,0.775952,0.721641,0.727858
5,0.0074,1.738061,0.805683,0.778702,0.76263,0.755591
6,0.0042,1.889236,0.811182,0.779023,0.737977,0.74405
7,0.005,2.169382,0.796517,0.711833,0.721236,0.701414
8,0.0062,2.272681,0.792851,0.730965,0.712388,0.708323
9,0.0015,2.278277,0.7956,0.723129,0.73384,0.712939
10,0.0019,2.229947,0.797434,0.729798,0.730059,0.709256


[I 2025-03-23 12:17:26,350] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0029145502405617234, 'weight_decay': 0.007, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5585,1.18628,0.79835,0.76286,0.701914,0.711339
2,0.0246,1.2518,0.805683,0.799771,0.747453,0.757751
3,0.0111,1.3952,0.812099,0.785051,0.738081,0.743054
4,0.007,1.434903,0.817599,0.804401,0.737869,0.752429
5,0.0034,1.529614,0.811182,0.788013,0.747293,0.751389
6,0.0049,1.646721,0.809349,0.780717,0.762823,0.758149
7,0.004,1.740518,0.79835,0.75895,0.740418,0.735777
8,0.0033,1.727938,0.806599,0.775436,0.756706,0.751961
9,0.0014,1.853909,0.808433,0.75757,0.752631,0.745061
10,0.0005,1.911194,0.806599,0.767316,0.755539,0.749195


[I 2025-03-23 12:22:17,073] Trial 82 finished with value: 0.7537482461993733 and parameters: {'learning_rate': 0.0029145502405617234, 'weight_decay': 0.007, 'warmup_steps': 16}. Best is trial 13 with value: 0.779634660594181.


Trial 83 with params: {'learning_rate': 0.0027921473299157467, 'weight_decay': 0.006, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5855,1.149419,0.8011,0.752471,0.714817,0.719911
2,0.0248,1.245442,0.811182,0.785164,0.724817,0.733234
3,0.0109,1.314784,0.813016,0.788628,0.748517,0.754872
4,0.0063,1.448038,0.818515,0.777407,0.741887,0.747329
5,0.0055,1.492662,0.815765,0.789703,0.739668,0.750629
6,0.0036,1.551695,0.824931,0.783884,0.757099,0.756481
7,0.0035,1.725748,0.813932,0.791903,0.746692,0.755839
8,0.0019,1.833142,0.814849,0.783088,0.762962,0.756564
9,0.0012,1.833766,0.821265,0.788625,0.749436,0.754862
10,0.0003,1.849435,0.824015,0.795901,0.750757,0.760548


[I 2025-03-23 12:27:05,263] Trial 83 finished with value: 0.7547528344391294 and parameters: {'learning_rate': 0.0027921473299157467, 'weight_decay': 0.006, 'warmup_steps': 19}. Best is trial 13 with value: 0.779634660594181.


Trial 84 with params: {'learning_rate': 0.0034164075921058103, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5301,1.18146,0.809349,0.780308,0.728433,0.73863
2,0.0256,1.27204,0.809349,0.785765,0.725664,0.734899
3,0.0114,1.391427,0.800183,0.762997,0.731525,0.729647
4,0.0084,1.521795,0.802933,0.772549,0.752276,0.747375
5,0.007,1.637667,0.812099,0.763933,0.744462,0.736566
6,0.0041,1.754935,0.811182,0.795435,0.750294,0.753229
7,0.0031,1.871327,0.802933,0.782334,0.743014,0.744444
8,0.0022,1.909367,0.806599,0.761812,0.725704,0.725322
9,0.0008,2.014921,0.808433,0.793405,0.736743,0.747821
10,0.0004,1.977918,0.812099,0.783838,0.761748,0.75181


[I 2025-03-23 12:31:57,482] Trial 84 finished with value: 0.74301178029523 and parameters: {'learning_rate': 0.0034164075921058103, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20}. Best is trial 13 with value: 0.779634660594181.


Trial 85 with params: {'learning_rate': 0.0040832586556585546, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4711,1.099778,0.811182,0.789095,0.736439,0.747816
2,0.0253,1.319194,0.802016,0.778927,0.753473,0.748182
3,0.0144,1.374607,0.816682,0.792641,0.758267,0.763768
4,0.0111,1.668214,0.8011,0.760383,0.760479,0.754315
5,0.0052,1.73038,0.814849,0.789432,0.758801,0.760058
6,0.0029,1.755827,0.808433,0.751392,0.744614,0.734101
7,0.0033,1.837409,0.814849,0.769977,0.74634,0.741864
8,0.0028,1.952303,0.808433,0.731455,0.740555,0.721321
9,0.0015,1.992267,0.822181,0.789031,0.763514,0.763765
10,0.0023,2.0141,0.813016,0.733859,0.751237,0.730212


[I 2025-03-23 12:35:10,809] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0006921009053376534, 'weight_decay': 0.004, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0914,1.026844,0.769936,0.657481,0.574087,0.593206
2,0.1018,1.296042,0.786434,0.678629,0.659835,0.653517
3,0.0282,1.444117,0.778185,0.693893,0.683191,0.672503
4,0.0152,1.492747,0.785518,0.738882,0.683088,0.696798
5,0.0083,1.444875,0.806599,0.750873,0.731024,0.729359


[I 2025-03-23 12:36:39,314] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.004514328255479344, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4672,1.180782,0.810266,0.755468,0.72637,0.725599
2,0.0262,1.326027,0.813932,0.776315,0.753803,0.750027
3,0.0136,1.550508,0.804766,0.759162,0.733915,0.733095
4,0.0092,1.616881,0.799267,0.74214,0.721596,0.71401
5,0.0117,1.730657,0.816682,0.771888,0.789185,0.765198
6,0.0069,1.971028,0.808433,0.777681,0.7455,0.749203
7,0.0045,2.002271,0.806599,0.774489,0.757069,0.751203
8,0.0016,1.96152,0.816682,0.766791,0.746325,0.745269
9,0.0006,2.074196,0.817599,0.784977,0.7512,0.753418
10,0.0005,2.097481,0.816682,0.78506,0.760195,0.759311


[I 2025-03-23 12:41:30,274] Trial 87 finished with value: 0.7634555502159719 and parameters: {'learning_rate': 0.004514328255479344, 'weight_decay': 0.008, 'warmup_steps': 10}. Best is trial 13 with value: 0.779634660594181.


Trial 88 with params: {'learning_rate': 0.0034092942567819423, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5031,1.180046,0.802933,0.762748,0.706121,0.714949
2,0.0257,1.394494,0.809349,0.790314,0.758857,0.757804
3,0.0122,1.516145,0.805683,0.768211,0.72848,0.733409
4,0.007,1.553963,0.810266,0.763992,0.743226,0.738637
5,0.004,1.671301,0.827681,0.788179,0.773198,0.768959
6,0.0056,1.739674,0.804766,0.764123,0.721473,0.724718
7,0.0046,1.903708,0.806599,0.760894,0.752989,0.744572
8,0.0019,1.951705,0.805683,0.765733,0.728762,0.732827
9,0.002,1.89129,0.807516,0.771616,0.744755,0.742602
10,0.0005,1.974192,0.810266,0.776278,0.742911,0.747701


[I 2025-03-23 12:46:17,349] Trial 88 finished with value: 0.7412234306151723 and parameters: {'learning_rate': 0.0034092942567819423, 'weight_decay': 0.008, 'warmup_steps': 7}. Best is trial 13 with value: 0.779634660594181.


Trial 89 with params: {'learning_rate': 0.0042050800729586685, 'weight_decay': 0.008, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4629,1.126681,0.808433,0.796627,0.756329,0.760405
2,0.0247,1.354624,0.804766,0.780863,0.728877,0.740857
3,0.0135,1.497177,0.809349,0.799119,0.746384,0.753073
4,0.0099,1.649678,0.79835,0.763292,0.707749,0.719588
5,0.0102,1.861531,0.806599,0.803108,0.729798,0.749925
6,0.0054,2.112978,0.800183,0.801588,0.736019,0.750116
7,0.0029,2.036061,0.813016,0.786629,0.742031,0.747905
8,0.0013,2.059587,0.806599,0.808948,0.745459,0.75717
9,0.0014,2.133846,0.806599,0.80975,0.746023,0.764544
10,0.0011,2.190656,0.799267,0.773423,0.718845,0.729356


[I 2025-03-23 12:49:27,328] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.004490769040450828, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4771,1.216132,0.804766,0.76539,0.70304,0.720887
2,0.0257,1.365523,0.788268,0.747026,0.713478,0.716461
3,0.0118,1.450234,0.815765,0.809903,0.770745,0.777196
4,0.0103,1.714232,0.804766,0.771033,0.742795,0.741221
5,0.0051,1.843724,0.813932,0.760029,0.762421,0.745054
6,0.006,1.786366,0.813932,0.790453,0.743314,0.750671
7,0.0062,1.968398,0.813932,0.744987,0.72927,0.72675
8,0.0021,2.033052,0.806599,0.762276,0.757277,0.747679
9,0.0025,2.091722,0.820348,0.778575,0.752118,0.755861
10,0.0008,2.136867,0.822181,0.759263,0.746666,0.745035


[I 2025-03-23 12:54:14,328] Trial 90 finished with value: 0.7520083980670702 and parameters: {'learning_rate': 0.004490769040450828, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14}. Best is trial 13 with value: 0.779634660594181.


Trial 91 with params: {'learning_rate': 0.0049360969097463556, 'weight_decay': 0.007, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.476,1.234848,0.799267,0.767919,0.733307,0.737074
2,0.0235,1.470999,0.790101,0.749906,0.731432,0.724414
3,0.0181,1.749858,0.783685,0.735879,0.692002,0.695743
4,0.0103,1.736645,0.8011,0.760749,0.730845,0.732691
5,0.0077,1.806593,0.810266,0.75801,0.741833,0.72969
6,0.0071,1.935298,0.806599,0.766956,0.737215,0.735388
7,0.0038,2.189791,0.802016,0.767272,0.764424,0.749304
8,0.0036,2.057847,0.807516,0.753167,0.741134,0.732052
9,0.0014,2.254854,0.810266,0.767877,0.760616,0.746268
10,0.0012,2.206888,0.814849,0.760446,0.762909,0.749925


[I 2025-03-23 12:59:02,113] Trial 91 finished with value: 0.7515437586697903 and parameters: {'learning_rate': 0.0049360969097463556, 'weight_decay': 0.007, 'warmup_steps': 23}. Best is trial 13 with value: 0.779634660594181.


Trial 92 with params: {'learning_rate': 0.004131315861392035, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4992,1.135446,0.807516,0.794523,0.739672,0.752278
2,0.0236,1.27488,0.812099,0.800548,0.736374,0.749787
3,0.0097,1.524566,0.811182,0.777032,0.743301,0.746694
4,0.0097,1.606405,0.810266,0.776316,0.720369,0.735896
5,0.0078,1.869191,0.805683,0.784812,0.732887,0.743427
6,0.006,1.825462,0.813932,0.793316,0.755651,0.762638
7,0.0053,1.945005,0.817599,0.799866,0.749623,0.76414
8,0.0014,2.065252,0.814849,0.778903,0.738754,0.74926
9,0.002,2.022614,0.819432,0.789081,0.747688,0.758769
10,0.0009,1.99239,0.824015,0.791232,0.773869,0.77109


[I 2025-03-23 13:04:03,151] Trial 92 finished with value: 0.7698049996770794 and parameters: {'learning_rate': 0.004131315861392035, 'weight_decay': 0.008, 'warmup_steps': 16}. Best is trial 13 with value: 0.779634660594181.


Trial 93 with params: {'learning_rate': 0.004325694262904242, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4916,1.168259,0.809349,0.800221,0.734549,0.750093
2,0.0251,1.313081,0.810266,0.775506,0.73694,0.740249
3,0.012,1.532569,0.80385,0.752356,0.731449,0.730786
4,0.0073,1.62732,0.805683,0.756997,0.756134,0.738766
5,0.0083,1.868028,0.809349,0.770965,0.754539,0.747522
6,0.0057,1.944774,0.805683,0.751685,0.732905,0.730692
7,0.0048,1.880001,0.807516,0.749775,0.746524,0.732728
8,0.0008,1.990659,0.804766,0.767652,0.738198,0.737291
9,0.0015,1.932784,0.80385,0.739035,0.745171,0.73181
10,0.0012,1.964881,0.804766,0.745003,0.749055,0.736969


[I 2025-03-23 13:08:39,462] Trial 93 finished with value: 0.7449452129231298 and parameters: {'learning_rate': 0.004325694262904242, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}. Best is trial 13 with value: 0.779634660594181.


Trial 94 with params: {'learning_rate': 0.003094843965306177, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5276,1.138805,0.80385,0.752546,0.690353,0.705239
2,0.0226,1.289175,0.811182,0.751477,0.761734,0.741137
3,0.0118,1.45495,0.804766,0.746374,0.700817,0.709391
4,0.0073,1.51939,0.808433,0.785367,0.748937,0.749189
5,0.0075,1.785997,0.796517,0.758953,0.714462,0.719928


[I 2025-03-23 13:10:24,868] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.001654618769365048, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6708,1.111776,0.793767,0.763657,0.684335,0.705553
2,0.0352,1.241132,0.806599,0.767335,0.731665,0.733125
3,0.0114,1.399836,0.800183,0.751274,0.734407,0.724038
4,0.0089,1.538292,0.800183,0.725064,0.674683,0.678755
5,0.0047,1.619198,0.808433,0.774327,0.712871,0.726529
6,0.0035,1.707302,0.805683,0.78166,0.726422,0.737726
7,0.0046,1.644679,0.794684,0.750734,0.715317,0.715511
8,0.0023,1.703354,0.807516,0.786686,0.728101,0.740337
9,0.001,1.800326,0.819432,0.790954,0.731123,0.74693
10,0.0003,1.774765,0.814849,0.788319,0.747179,0.754733


[I 2025-03-23 13:15:32,123] Trial 95 finished with value: 0.7537987453950556 and parameters: {'learning_rate': 0.001654618769365048, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 13 with value: 0.779634660594181.


Trial 96 with params: {'learning_rate': 0.0028728094920982524, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5428,1.111666,0.810266,0.804182,0.749699,0.760111
2,0.0251,1.248207,0.828598,0.797727,0.76793,0.766528
3,0.0102,1.380347,0.830431,0.799556,0.774809,0.774241
4,0.0058,1.588301,0.816682,0.802135,0.779351,0.774831
5,0.0053,1.603869,0.821265,0.807974,0.775876,0.778786
6,0.0074,1.736408,0.822181,0.780422,0.74978,0.75022
7,0.0025,1.788671,0.835014,0.816862,0.779206,0.781995
8,0.0009,1.839437,0.835014,0.816991,0.776979,0.78229
9,0.0012,1.859368,0.828598,0.80186,0.774263,0.774626
10,0.0018,1.986399,0.818515,0.788815,0.770927,0.766266


[I 2025-03-23 13:20:50,463] Trial 96 finished with value: 0.7586761616105865 and parameters: {'learning_rate': 0.0028728094920982524, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}. Best is trial 13 with value: 0.779634660594181.


Trial 97 with params: {'learning_rate': 0.003637007894606513, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4816,1.18742,0.808433,0.73911,0.694521,0.70308
2,0.0265,1.396165,0.807516,0.767955,0.740593,0.740004
3,0.011,1.473657,0.810266,0.771838,0.750887,0.746892
4,0.0095,1.561582,0.804766,0.778165,0.724795,0.736159
5,0.0064,1.729407,0.812099,0.783611,0.764698,0.758163
6,0.0048,1.818236,0.8011,0.777779,0.755611,0.75293
7,0.0026,1.979128,0.807516,0.792196,0.760642,0.761383
8,0.0013,1.976364,0.810266,0.798101,0.755201,0.759283
9,0.0017,1.92936,0.809349,0.783031,0.765829,0.760655
10,0.0008,1.930218,0.812099,0.78536,0.773287,0.768097


[I 2025-03-23 13:25:55,232] Trial 97 finished with value: 0.7654289190769522 and parameters: {'learning_rate': 0.003637007894606513, 'weight_decay': 0.001, 'warmup_steps': 0}. Best is trial 13 with value: 0.779634660594181.


Trial 98 with params: {'learning_rate': 0.004770636308895639, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4767,1.315168,0.80385,0.744667,0.716741,0.714557
2,0.0252,1.473081,0.809349,0.795681,0.748368,0.746432
3,0.0142,1.593708,0.807516,0.750576,0.733342,0.72675
4,0.008,1.774083,0.80385,0.762277,0.753775,0.744528
5,0.0086,1.947458,0.812099,0.780302,0.744298,0.74769
6,0.0083,1.917055,0.808433,0.77305,0.740049,0.745413
7,0.0044,1.969574,0.814849,0.758466,0.75104,0.73954
8,0.0021,2.079033,0.810266,0.751126,0.752798,0.740205
9,0.0007,2.119117,0.813932,0.749234,0.748397,0.736214
10,0.0003,2.171443,0.814849,0.760644,0.747102,0.740683


[I 2025-03-23 13:30:47,722] Trial 98 finished with value: 0.7512935782847603 and parameters: {'learning_rate': 0.004770636308895639, 'weight_decay': 0.002, 'warmup_steps': 11}. Best is trial 13 with value: 0.779634660594181.


Trial 99 with params: {'learning_rate': 0.0018381381214642078, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6372,1.155079,0.802016,0.736261,0.680085,0.686384
2,0.0317,1.329831,0.806599,0.765558,0.727103,0.732926
3,0.0129,1.378904,0.808433,0.751316,0.73497,0.727865
4,0.0093,1.474408,0.813016,0.779767,0.715268,0.729932
5,0.0038,1.557519,0.813932,0.770547,0.749312,0.744238
6,0.003,1.669967,0.809349,0.758964,0.719709,0.721934
7,0.004,1.563091,0.817599,0.768101,0.73098,0.733723
8,0.0015,1.757652,0.804766,0.766308,0.701851,0.717196
9,0.0013,1.720255,0.810266,0.774425,0.709699,0.716614
10,0.0007,1.745342,0.821265,0.792696,0.740515,0.74608


[I 2025-03-23 13:35:55,719] Trial 99 finished with value: 0.7435935031513329 and parameters: {'learning_rate': 0.0018381381214642078, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 13 with value: 0.779634660594181.


Trial 100 with params: {'learning_rate': 0.002966696183887599, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5194,1.163732,0.809349,0.75839,0.733558,0.73499
2,0.025,1.234952,0.824931,0.829912,0.77363,0.785529
3,0.011,1.332437,0.815765,0.775952,0.743485,0.748761
4,0.0062,1.614516,0.804766,0.779286,0.769368,0.756662
5,0.0061,1.573173,0.823098,0.791173,0.758632,0.761661
6,0.0056,1.706329,0.818515,0.78591,0.752462,0.759404
7,0.0033,1.780064,0.819432,0.804896,0.775716,0.775607
8,0.0017,1.816944,0.820348,0.800864,0.757933,0.766214
9,0.0017,1.798251,0.818515,0.781253,0.742457,0.744204
10,0.0015,1.870608,0.825848,0.783206,0.738245,0.748782


[I 2025-03-23 13:41:17,083] Trial 100 finished with value: 0.7599893593347555 and parameters: {'learning_rate': 0.002966696183887599, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 13 with value: 0.779634660594181.


Trial 101 with params: {'learning_rate': 0.0033202459818762564, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.504,1.143926,0.794684,0.767323,0.728001,0.730959
2,0.025,1.349778,0.805683,0.782172,0.715853,0.730276
3,0.0112,1.640084,0.806599,0.773257,0.725661,0.732942
4,0.0084,1.653337,0.809349,0.784988,0.739144,0.745286
5,0.0065,1.683324,0.822181,0.790293,0.748306,0.755354
6,0.0037,1.821291,0.79835,0.762283,0.713645,0.720144
7,0.0038,1.907995,0.813932,0.80644,0.754595,0.764711
8,0.0014,1.960645,0.810266,0.808545,0.748759,0.760388
9,0.0012,1.930545,0.816682,0.793942,0.749341,0.753555
10,0.0003,1.964779,0.815765,0.813507,0.769302,0.776988


[I 2025-03-23 13:46:01,941] Trial 101 finished with value: 0.7695419470824194 and parameters: {'learning_rate': 0.0033202459818762564, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 13 with value: 0.779634660594181.


Trial 102 with params: {'learning_rate': 0.003378214763391322, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4949,1.193386,0.807516,0.797591,0.721655,0.744206
2,0.0243,1.322361,0.802016,0.741937,0.742466,0.7252
3,0.0131,1.449284,0.80385,0.791371,0.727805,0.744778
4,0.0087,1.588872,0.817599,0.780095,0.769238,0.764692
5,0.0047,1.653825,0.813932,0.776164,0.753438,0.750725
6,0.0026,1.855582,0.817599,0.787747,0.727501,0.745434
7,0.0019,1.871926,0.815765,0.774722,0.736581,0.741219
8,0.0031,1.980417,0.810266,0.77748,0.709023,0.72004
9,0.0037,1.934192,0.816682,0.775324,0.76531,0.750612
10,0.001,1.935484,0.826764,0.789426,0.759316,0.763639


[I 2025-03-23 13:50:54,225] Trial 102 finished with value: 0.7647937706582241 and parameters: {'learning_rate': 0.003378214763391322, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 13 with value: 0.779634660594181.


Trial 103 with params: {'learning_rate': 0.0030077615972653846, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5168,1.10973,0.808433,0.804164,0.771943,0.771408
2,0.023,1.331859,0.808433,0.768255,0.769934,0.756077
3,0.0133,1.293985,0.811182,0.755986,0.770412,0.751074
4,0.0078,1.525706,0.8011,0.752202,0.751827,0.732456
5,0.0053,1.50549,0.813932,0.787774,0.739214,0.751746
6,0.0043,1.691668,0.815765,0.776073,0.738594,0.742743
7,0.0025,1.644791,0.813932,0.766737,0.743289,0.737372
8,0.0014,1.686977,0.828598,0.801999,0.758999,0.766801
9,0.001,1.749454,0.824015,0.762352,0.750234,0.746753
10,0.0007,1.774116,0.826764,0.76846,0.749968,0.749348


[I 2025-03-23 13:55:36,719] Trial 103 finished with value: 0.7482143260584911 and parameters: {'learning_rate': 0.0030077615972653846, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 13 with value: 0.779634660594181.


Trial 104 with params: {'learning_rate': 6.119956273045214e-05, 'weight_decay': 0.006, 'warmup_steps': 41}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6387,2.145449,0.455545,0.099237,0.10977,0.086966
2,1.7574,1.733836,0.56187,0.1845,0.177997,0.163653
3,1.3607,1.503837,0.620532,0.283953,0.239355,0.233281
4,1.0832,1.35741,0.653529,0.300127,0.275496,0.272853
5,0.8741,1.278745,0.67461,0.372242,0.340611,0.342396
6,0.7095,1.223959,0.694775,0.428036,0.377079,0.382719
7,0.586,1.207064,0.703025,0.448859,0.399638,0.406172
8,0.4942,1.204795,0.713107,0.48321,0.427829,0.431519
9,0.4239,1.206608,0.71769,0.502927,0.456763,0.461031
10,0.368,1.207552,0.713107,0.469487,0.437828,0.439389


[I 2025-03-23 13:58:53,075] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.0032359402660410573, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5081,1.169317,0.8011,0.800683,0.750257,0.759425
2,0.0274,1.210222,0.813932,0.821045,0.743014,0.763895
3,0.0102,1.431447,0.806599,0.774119,0.717916,0.734066
4,0.0074,1.527717,0.802016,0.738516,0.734878,0.717999
5,0.0057,1.512177,0.824931,0.805785,0.768978,0.771538
6,0.0039,1.791615,0.799267,0.759713,0.730758,0.729317
7,0.0022,1.917941,0.808433,0.805697,0.771007,0.77284
8,0.0026,1.816078,0.813932,0.788606,0.767096,0.760212
9,0.0025,1.88472,0.815765,0.789892,0.761362,0.759277
10,0.0009,1.859285,0.813016,0.783677,0.760077,0.754013


[I 2025-03-23 14:03:47,297] Trial 105 finished with value: 0.7634191247434435 and parameters: {'learning_rate': 0.0032359402660410573, 'weight_decay': 0.001, 'warmup_steps': 1}. Best is trial 13 with value: 0.779634660594181.


Trial 106 with params: {'learning_rate': 0.003404128667410356, 'weight_decay': 0.0, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4928,1.204561,0.813016,0.789795,0.708031,0.733103
2,0.0236,1.31086,0.817599,0.7931,0.735516,0.749948
3,0.012,1.463792,0.809349,0.809886,0.760369,0.769134
4,0.0094,1.496958,0.815765,0.767792,0.755433,0.742992
5,0.0078,1.631825,0.812099,0.751056,0.733928,0.731294
6,0.0052,1.675727,0.826764,0.798264,0.740661,0.759378
7,0.0031,1.948285,0.815765,0.785708,0.746243,0.753908
8,0.0018,2.077856,0.813932,0.784893,0.731979,0.740799
9,0.0021,1.970502,0.814849,0.811536,0.762226,0.770065
10,0.0007,2.040566,0.809349,0.803255,0.75749,0.764554


[I 2025-03-23 14:08:19,242] Trial 106 finished with value: 0.761333677440735 and parameters: {'learning_rate': 0.003404128667410356, 'weight_decay': 0.0, 'warmup_steps': 0}. Best is trial 13 with value: 0.779634660594181.


Trial 107 with params: {'learning_rate': 0.004988786638690904, 'weight_decay': 0.001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4727,1.258983,0.8011,0.755187,0.706267,0.719119
2,0.0254,1.489227,0.789184,0.752682,0.687226,0.700074
3,0.0208,1.7068,0.796517,0.716164,0.713602,0.698064
4,0.0104,1.906599,0.805683,0.761312,0.740566,0.738608
5,0.0052,1.995055,0.792851,0.73256,0.735863,0.722686


[I 2025-03-23 14:09:58,939] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0031345051249434043, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5275,1.122039,0.802933,0.781608,0.727436,0.739127
2,0.0227,1.286574,0.817599,0.814857,0.752431,0.763922
3,0.0097,1.311297,0.822181,0.771424,0.755306,0.749161
4,0.0059,1.539173,0.813016,0.759435,0.744772,0.733399
5,0.0089,1.634687,0.815765,0.777669,0.759486,0.751305
6,0.0048,1.694051,0.809349,0.782136,0.739371,0.741887
7,0.0052,1.667398,0.815765,0.789193,0.751028,0.74919
8,0.0019,1.746703,0.813932,0.789817,0.755289,0.756823
9,0.0012,1.795673,0.818515,0.77147,0.757129,0.743736
10,0.0005,1.803216,0.818515,0.753137,0.750901,0.73648


[I 2025-03-23 14:13:17,589] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.00476609484265147, 'weight_decay': 0.0, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4743,1.161639,0.811182,0.789479,0.74387,0.751851
2,0.0268,1.388607,0.810266,0.767965,0.738026,0.742103
3,0.0152,1.507153,0.814849,0.755417,0.743255,0.73694
4,0.0106,1.622116,0.80385,0.721695,0.717367,0.702062
5,0.0084,1.809958,0.80385,0.742458,0.738155,0.73021
6,0.0054,1.998385,0.800183,0.795373,0.741816,0.750756
7,0.0025,2.063232,0.811182,0.746098,0.724018,0.722646
8,0.0011,2.165238,0.809349,0.745212,0.721375,0.721619
9,0.0027,2.265161,0.808433,0.733955,0.739411,0.721384
10,0.0021,2.260518,0.813932,0.765479,0.756754,0.74469


[I 2025-03-23 14:18:22,338] Trial 109 finished with value: 0.7518884711501105 and parameters: {'learning_rate': 0.00476609484265147, 'weight_decay': 0.0, 'warmup_steps': 6}. Best is trial 13 with value: 0.779634660594181.


Trial 110 with params: {'learning_rate': 0.004443318718817076, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4678,1.178115,0.810266,0.77346,0.727967,0.737082
2,0.0261,1.337455,0.807516,0.79843,0.723632,0.743484
3,0.0126,1.689915,0.791017,0.772592,0.727442,0.733611
4,0.0135,1.659189,0.80385,0.792262,0.752752,0.749583
5,0.0061,1.8094,0.807516,0.809845,0.754064,0.769203
6,0.0046,1.846026,0.811182,0.794301,0.757543,0.763277
7,0.0022,1.887875,0.817599,0.791522,0.776054,0.775136
8,0.0029,2.018765,0.816682,0.793454,0.773726,0.771052
9,0.0014,2.123225,0.818515,0.811075,0.784074,0.78466
10,0.0006,2.229062,0.814849,0.801796,0.788154,0.781545


[I 2025-03-23 14:22:57,060] Trial 110 finished with value: 0.7740440335068783 and parameters: {'learning_rate': 0.004443318718817076, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 13 with value: 0.779634660594181.


Trial 111 with params: {'learning_rate': 0.0034601545991787734, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5007,1.167316,0.805683,0.750709,0.720812,0.722153
2,0.0259,1.350183,0.806599,0.731768,0.729644,0.717901
3,0.0123,1.448449,0.806599,0.769074,0.732691,0.736608
4,0.0064,1.598638,0.804766,0.753164,0.733863,0.731813
5,0.0056,1.708547,0.818515,0.81913,0.749045,0.759932
6,0.0042,1.82508,0.819432,0.795648,0.765646,0.770235
7,0.0059,1.891431,0.810266,0.807333,0.763779,0.77405
8,0.0032,1.802207,0.809349,0.774704,0.749892,0.750421
9,0.0016,1.82197,0.816682,0.785953,0.769756,0.764943
10,0.0008,1.888779,0.821265,0.7975,0.765618,0.767233


[I 2025-03-23 14:27:49,243] Trial 111 finished with value: 0.7725431926410201 and parameters: {'learning_rate': 0.0034601545991787734, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 13 with value: 0.779634660594181.


Trial 112 with params: {'learning_rate': 0.002213133037714334, 'weight_decay': 0.003, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5917,1.160938,0.804766,0.743994,0.672106,0.689935
2,0.0302,1.224349,0.806599,0.754083,0.7168,0.719836
3,0.0107,1.348918,0.810266,0.754196,0.729365,0.730758
4,0.0058,1.596655,0.804766,0.749606,0.718418,0.7144
5,0.0045,1.684822,0.800183,0.78736,0.718758,0.732606
6,0.0038,1.799438,0.797434,0.740989,0.686411,0.698807
7,0.004,1.758074,0.7956,0.767794,0.708767,0.719568
8,0.0039,1.727443,0.806599,0.746559,0.732906,0.727795
9,0.0017,1.73254,0.810266,0.75057,0.741437,0.737586
10,0.0008,1.786584,0.809349,0.727731,0.716998,0.71203


[I 2025-03-23 14:30:51,824] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.00405991963845122, 'weight_decay': 0.002, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4793,1.219766,0.805683,0.767282,0.73811,0.735969
2,0.0265,1.391076,0.802933,0.779172,0.70938,0.725333
3,0.0123,1.618519,0.799267,0.784618,0.690142,0.714253
4,0.0089,1.701037,0.802016,0.758753,0.724792,0.725277
5,0.0066,1.94298,0.814849,0.777144,0.724867,0.736945
6,0.0042,2.081167,0.802933,0.745589,0.72595,0.727643
7,0.0023,2.177679,0.807516,0.763299,0.736417,0.736976
8,0.0027,2.204479,0.799267,0.727389,0.710492,0.702112
9,0.0023,2.137875,0.812099,0.783736,0.740813,0.749021
10,0.0007,2.224526,0.811182,0.777381,0.744674,0.75032


[I 2025-03-23 14:35:38,283] Trial 113 finished with value: 0.7511605529817311 and parameters: {'learning_rate': 0.00405991963845122, 'weight_decay': 0.002, 'warmup_steps': 4}. Best is trial 13 with value: 0.779634660594181.


Trial 114 with params: {'learning_rate': 0.0048920199821464085, 'weight_decay': 0.003, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4787,1.142115,0.816682,0.789162,0.765567,0.766304
2,0.0259,1.481208,0.816682,0.778927,0.75232,0.749492
3,0.015,1.450935,0.814849,0.812511,0.781469,0.783629
4,0.011,1.666548,0.8011,0.727754,0.739101,0.71785
5,0.0091,1.8616,0.814849,0.82027,0.750046,0.765756
6,0.008,1.88286,0.811182,0.794824,0.730335,0.747045
7,0.0067,1.966561,0.807516,0.79501,0.722738,0.739425
8,0.0032,2.05336,0.815765,0.809706,0.753655,0.767047
9,0.0018,2.153552,0.815765,0.819954,0.768653,0.774279
10,0.0007,2.076447,0.821265,0.820274,0.76834,0.777959


[I 2025-03-23 14:40:12,661] Trial 114 finished with value: 0.7766897417077893 and parameters: {'learning_rate': 0.0048920199821464085, 'weight_decay': 0.003, 'warmup_steps': 5}. Best is trial 13 with value: 0.779634660594181.


Trial 115 with params: {'learning_rate': 0.0022783434927615793, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5825,1.19081,0.800183,0.787784,0.706964,0.730787
2,0.0287,1.357854,0.802016,0.740958,0.697316,0.704271
3,0.0104,1.483605,0.802016,0.741777,0.741687,0.725358
4,0.0081,1.486894,0.812099,0.780198,0.762979,0.751578
5,0.0047,1.573455,0.817599,0.788888,0.774955,0.764723
6,0.0044,1.675867,0.799267,0.760462,0.712953,0.720538
7,0.0046,1.597874,0.819432,0.773563,0.758338,0.752015
8,0.002,1.713483,0.820348,0.771386,0.739875,0.740826
9,0.001,1.723929,0.812099,0.757652,0.743994,0.737669
10,0.0003,1.732474,0.825848,0.778853,0.766391,0.764074


[I 2025-03-23 14:45:02,008] Trial 115 finished with value: 0.7594219910089476 and parameters: {'learning_rate': 0.0022783434927615793, 'weight_decay': 0.004, 'warmup_steps': 0}. Best is trial 13 with value: 0.779634660594181.


Trial 116 with params: {'learning_rate': 0.003523221778570819, 'weight_decay': 0.003, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5032,1.141372,0.814849,0.777279,0.736825,0.740803
2,0.0253,1.364297,0.811182,0.771127,0.748034,0.746649
3,0.0106,1.430277,0.821265,0.754616,0.759914,0.746007
4,0.0103,1.573912,0.819432,0.779749,0.758937,0.751352
5,0.0055,1.782744,0.815765,0.765372,0.740552,0.735386
6,0.0034,1.978144,0.823098,0.790424,0.749596,0.751829
7,0.003,1.904149,0.824015,0.796366,0.774943,0.770032
8,0.0025,2.076192,0.823098,0.773141,0.764544,0.755533
9,0.0012,2.05501,0.828598,0.778341,0.766993,0.761304
10,0.0011,2.025795,0.825848,0.780017,0.770808,0.765385


[I 2025-03-23 14:50:02,283] Trial 116 finished with value: 0.7937875646913016 and parameters: {'learning_rate': 0.003523221778570819, 'weight_decay': 0.003, 'warmup_steps': 6}. Best is trial 116 with value: 0.7937875646913016.


Trial 117 with params: {'learning_rate': 0.004786469215576022, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4406,1.181008,0.814849,0.785379,0.744083,0.752462
2,0.0264,1.4143,0.804766,0.799356,0.717364,0.739436
3,0.0134,1.608405,0.810266,0.799179,0.751292,0.754911
4,0.0081,1.778414,0.811182,0.804014,0.7603,0.768327
5,0.0067,1.769411,0.814849,0.776585,0.754865,0.752073
6,0.0044,2.153281,0.810266,0.786834,0.718649,0.737083
7,0.0053,2.165667,0.7956,0.744772,0.729405,0.721731
8,0.0021,2.403399,0.802016,0.745711,0.734808,0.729793
9,0.0022,2.507596,0.799267,0.781732,0.729085,0.736884
10,0.0009,2.478691,0.800183,0.767044,0.732898,0.737722


[I 2025-03-23 14:53:05,531] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0030773312027973882, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5165,1.126014,0.80385,0.791617,0.732197,0.741013
2,0.0254,1.366932,0.808433,0.768248,0.743117,0.737018
3,0.0099,1.432221,0.811182,0.786665,0.74933,0.75692
4,0.0101,1.493425,0.808433,0.784759,0.745049,0.74273
5,0.006,1.544111,0.817599,0.818543,0.778776,0.784152
6,0.0029,1.571597,0.822181,0.807843,0.740026,0.755417
7,0.0026,1.638344,0.815765,0.763086,0.75401,0.745955
8,0.0004,1.718419,0.824931,0.805713,0.763296,0.76739
9,0.0005,1.800386,0.821265,0.78929,0.772976,0.766891
10,0.0008,1.778319,0.813016,0.77658,0.753049,0.745826


[I 2025-03-23 14:56:06,120] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.003052546896065959, 'weight_decay': 0.004, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.54,1.162053,0.79835,0.787144,0.737949,0.749018
2,0.0235,1.262059,0.816682,0.792605,0.722339,0.735503
3,0.0105,1.327821,0.813932,0.769136,0.757829,0.745416
4,0.0073,1.605948,0.793767,0.721077,0.74612,0.716963
5,0.0059,1.801293,0.809349,0.752231,0.749802,0.737088
6,0.0037,1.826989,0.805683,0.752964,0.72782,0.726477
7,0.0029,1.891252,0.812099,0.773287,0.747288,0.740525
8,0.0022,1.856559,0.807516,0.762999,0.736158,0.735149
9,0.0006,1.915536,0.813016,0.77707,0.75414,0.746915
10,0.0007,1.993113,0.813016,0.758776,0.730171,0.729503


[I 2025-03-23 14:59:30,209] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.003693594959986811, 'weight_decay': 0.003, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5064,1.179776,0.813016,0.777281,0.711319,0.730013
2,0.0239,1.421684,0.80385,0.771898,0.70595,0.719917
3,0.0124,1.423801,0.811182,0.75048,0.733711,0.729361
4,0.0076,1.629113,0.7956,0.743741,0.715327,0.717799
5,0.0068,1.848756,0.792851,0.769419,0.734314,0.726467


[I 2025-03-23 15:01:15,848] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.004319585126398173, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4851,1.171208,0.805683,0.789976,0.732484,0.747227
2,0.025,1.436698,0.807516,0.798802,0.742742,0.755857
3,0.0143,1.607347,0.815765,0.784305,0.74228,0.746805
4,0.0101,1.604859,0.816682,0.803302,0.752363,0.765155
5,0.0077,1.884219,0.802933,0.796569,0.761738,0.766539
6,0.0074,2.099085,0.791017,0.762778,0.719107,0.727321
7,0.0053,2.169357,0.796517,0.765544,0.74339,0.738726
8,0.003,2.191521,0.807516,0.770201,0.735822,0.736705
9,0.001,2.371378,0.793767,0.773856,0.745485,0.74141
10,0.001,2.440258,0.8011,0.764551,0.735792,0.733142


[I 2025-03-23 15:04:09,121] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.003689368618153253, 'weight_decay': 0.0, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4799,1.265684,0.7956,0.776346,0.697099,0.717616
2,0.0259,1.336637,0.811182,0.779787,0.740116,0.744405
3,0.0103,1.383748,0.812099,0.763607,0.763084,0.748253
4,0.0105,1.600257,0.8011,0.770657,0.749722,0.746868
5,0.0064,1.745876,0.813016,0.804037,0.761524,0.766332
6,0.0047,1.748494,0.810266,0.767924,0.752369,0.74993
7,0.0028,2.062273,0.80385,0.781977,0.748289,0.751136
8,0.0013,2.107751,0.809349,0.770085,0.76409,0.756616
9,0.0015,2.011588,0.804766,0.764635,0.754015,0.744625
10,0.0006,2.098513,0.809349,0.769761,0.761175,0.755133


[I 2025-03-23 15:08:49,313] Trial 122 finished with value: 0.7564262438431695 and parameters: {'learning_rate': 0.003689368618153253, 'weight_decay': 0.0, 'warmup_steps': 0}. Best is trial 116 with value: 0.7937875646913016.


Trial 123 with params: {'learning_rate': 0.0033674148180455844, 'weight_decay': 0.002, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5319,1.215648,0.797434,0.748096,0.730894,0.725416
2,0.0256,1.35338,0.80385,0.764457,0.746723,0.738326
3,0.0117,1.545908,0.815765,0.753693,0.740413,0.726237
4,0.0065,1.581519,0.809349,0.758213,0.760274,0.747687
5,0.0044,1.682598,0.814849,0.788082,0.762786,0.762402
6,0.0048,1.752624,0.809349,0.778108,0.743403,0.747969
7,0.0024,1.913144,0.794684,0.775168,0.756985,0.748881
8,0.0013,1.994002,0.809349,0.76451,0.734646,0.735662
9,0.0012,1.910923,0.813016,0.75732,0.751759,0.74166
10,0.0007,2.035961,0.808433,0.778055,0.725442,0.736939


[I 2025-03-23 15:11:54,433] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0034387562177036458, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4876,1.221378,0.804766,0.746893,0.718875,0.717654
2,0.0238,1.470063,0.79835,0.765811,0.708821,0.712714
3,0.0141,1.465336,0.805683,0.755519,0.736796,0.730425
4,0.0076,1.386344,0.818515,0.750846,0.751859,0.735477
5,0.0049,1.498176,0.814849,0.754998,0.74507,0.739876
6,0.0032,1.690849,0.813016,0.769543,0.748086,0.744923
7,0.0035,1.80774,0.813016,0.760023,0.739525,0.732246
8,0.0022,1.76376,0.814849,0.783825,0.750102,0.750284
9,0.0031,1.988559,0.800183,0.781766,0.725614,0.736745
10,0.0012,2.049634,0.809349,0.766298,0.740419,0.739772


[I 2025-03-23 15:14:52,891] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.00023456763594861504, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7724,1.294142,0.663611,0.303391,0.282328,0.277581
2,0.6107,1.080897,0.745188,0.5239,0.472693,0.487327
3,0.2335,1.154612,0.757104,0.601307,0.556027,0.561056
4,0.1007,1.256484,0.768103,0.659379,0.589855,0.604834
5,0.0512,1.413933,0.758937,0.646217,0.592965,0.603596
6,0.0301,1.470436,0.76077,0.672321,0.641492,0.637604
7,0.0182,1.593524,0.769936,0.657835,0.629892,0.62904
8,0.0117,1.598156,0.766269,0.646092,0.648186,0.630922
9,0.0082,1.606295,0.769936,0.637761,0.666283,0.639806
10,0.0061,1.712922,0.775435,0.660116,0.661857,0.646272


[I 2025-03-23 15:18:22,943] Trial 125 pruned. 


Trial 126 with params: {'learning_rate': 0.0042937838024760784, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4782,1.385594,0.802016,0.809059,0.71008,0.739424
2,0.026,1.390832,0.80385,0.747709,0.74564,0.730411
3,0.0129,1.470949,0.807516,0.759661,0.7639,0.747822
4,0.0091,1.732487,0.813016,0.76486,0.737707,0.73918
5,0.006,1.806373,0.806599,0.766278,0.723478,0.730165


[I 2025-03-23 15:19:49,110] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.0019748842068581015, 'weight_decay': 0.002, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6372,1.121768,0.804766,0.777916,0.714458,0.730072
2,0.0293,1.317918,0.805683,0.786906,0.724493,0.740381
3,0.0102,1.435829,0.807516,0.790369,0.776045,0.768101
4,0.0081,1.425058,0.807516,0.760836,0.719388,0.720751
5,0.0051,1.491168,0.824015,0.81103,0.762043,0.7689
6,0.0066,1.570757,0.800183,0.756885,0.715186,0.718327
7,0.0026,1.649737,0.817599,0.769648,0.726098,0.732061
8,0.0011,1.661948,0.820348,0.760511,0.740951,0.737013
9,0.0012,1.696222,0.818515,0.767893,0.731861,0.738785
10,0.0008,1.741814,0.818515,0.778133,0.737218,0.742565


[I 2025-03-23 15:24:49,558] Trial 127 finished with value: 0.7459979537931674 and parameters: {'learning_rate': 0.0019748842068581015, 'weight_decay': 0.002, 'warmup_steps': 9}. Best is trial 116 with value: 0.7937875646913016.


Trial 128 with params: {'learning_rate': 0.004880044745033502, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5069,1.263137,0.80385,0.786361,0.695971,0.717971
2,0.0287,1.371228,0.804766,0.795891,0.765875,0.766671
3,0.0135,1.559436,0.813932,0.798515,0.757151,0.765635
4,0.0098,1.767983,0.800183,0.762939,0.740396,0.730316
5,0.0095,1.892501,0.820348,0.804781,0.765162,0.769172
6,0.0103,1.907278,0.800183,0.765934,0.748368,0.743917
7,0.0053,2.107246,0.802016,0.767325,0.743213,0.741414
8,0.0028,2.076214,0.809349,0.745701,0.742802,0.732551
9,0.0015,2.124664,0.799267,0.751639,0.757789,0.744245
10,0.0012,2.277431,0.79835,0.749777,0.72773,0.72846


[I 2025-03-23 15:28:13,820] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.001948861261427512, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6256,1.162205,0.794684,0.758412,0.690788,0.703407
2,0.0311,1.296372,0.813016,0.736178,0.727219,0.716349
3,0.0131,1.375831,0.808433,0.742915,0.741446,0.729241
4,0.0059,1.48578,0.806599,0.781244,0.730862,0.73834
5,0.0039,1.613406,0.787351,0.730611,0.730834,0.719702


[I 2025-03-23 15:29:57,898] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0008027687508333916, 'weight_decay': 0.01, 'warmup_steps': 36}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0193,1.028442,0.780935,0.648935,0.615012,0.614335
2,0.0764,1.299293,0.791934,0.769051,0.699218,0.712743
3,0.0237,1.467385,0.788268,0.715311,0.666507,0.668789
4,0.0111,1.450518,0.80385,0.769562,0.723761,0.727795
5,0.0076,1.521905,0.809349,0.736131,0.710293,0.702637


[I 2025-03-23 15:31:40,133] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.004991816860557416, 'weight_decay': 0.004, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4515,1.184793,0.811182,0.800881,0.755736,0.763889
2,0.0246,1.424854,0.802016,0.761185,0.711578,0.713178
3,0.0157,1.527683,0.813016,0.784877,0.723779,0.740157
4,0.0123,1.786299,0.822181,0.774923,0.749248,0.748871
5,0.0096,1.901493,0.818515,0.811024,0.743679,0.758991
6,0.0059,2.035082,0.814849,0.785178,0.735259,0.747354
7,0.0032,2.18574,0.804766,0.758656,0.724681,0.727743
8,0.0055,2.127706,0.805683,0.76872,0.743154,0.739727
9,0.0026,2.330597,0.804766,0.761113,0.744262,0.736377
10,0.0009,2.329532,0.814849,0.784571,0.765503,0.757078


[I 2025-03-23 15:37:14,184] Trial 131 finished with value: 0.7500083458513422 and parameters: {'learning_rate': 0.004991816860557416, 'weight_decay': 0.004, 'warmup_steps': 4}. Best is trial 116 with value: 0.7937875646913016.


Trial 132 with params: {'learning_rate': 0.0027816829009973763, 'weight_decay': 0.002, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5532,1.169954,0.796517,0.770063,0.698193,0.715045
2,0.0244,1.338204,0.802933,0.727986,0.673026,0.6824
3,0.0113,1.384179,0.815765,0.752322,0.725808,0.723948
4,0.005,1.566321,0.810266,0.769412,0.722246,0.728439
5,0.0051,1.68729,0.80385,0.749673,0.717564,0.715172


[I 2025-03-23 15:38:55,441] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 8.153014791034117e-05, 'weight_decay': 0.0, 'warmup_steps': 40}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4809,1.955027,0.511457,0.10751,0.135638,0.109381
2,1.5186,1.539018,0.613199,0.275866,0.231178,0.220464
3,1.0891,1.324631,0.660862,0.327364,0.288682,0.284186
4,0.799,1.223699,0.689276,0.40877,0.337141,0.345971
5,0.5915,1.171001,0.715857,0.449155,0.413532,0.416714
6,0.4462,1.153399,0.71769,0.466407,0.4355,0.437534
7,0.3404,1.175797,0.725023,0.47695,0.447694,0.451384
8,0.2697,1.200045,0.736022,0.575863,0.507342,0.520118
9,0.2164,1.22918,0.732356,0.572388,0.510038,0.521313
10,0.1777,1.253211,0.735105,0.566796,0.522709,0.529797


[I 2025-03-23 15:42:27,713] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 9.351316075703443e-05, 'weight_decay': 0.01, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3818,1.862322,0.531622,0.124564,0.14685,0.122285
2,1.4008,1.448575,0.630614,0.261248,0.249319,0.237664
3,0.9601,1.252557,0.683776,0.36527,0.325368,0.325146
4,0.6674,1.169847,0.711274,0.407378,0.383532,0.38648
5,0.4733,1.149225,0.72044,0.474472,0.439487,0.442078


[I 2025-03-23 15:44:22,023] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.004427647143845693, 'weight_decay': 0.002, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4932,1.288219,0.792851,0.749186,0.695748,0.709927
2,0.0252,1.3839,0.802016,0.760234,0.743585,0.729587
3,0.0118,1.649871,0.799267,0.796722,0.734859,0.749078
4,0.0082,1.75617,0.807516,0.791938,0.741206,0.749151
5,0.0115,2.03597,0.791934,0.767933,0.727914,0.732334
6,0.0077,1.962838,0.809349,0.783831,0.772267,0.765653
7,0.0041,2.213513,0.805683,0.775601,0.753918,0.750965
8,0.0026,2.263817,0.805683,0.746211,0.759219,0.736514
9,0.0013,2.186975,0.808433,0.775069,0.755309,0.747411
10,0.0007,2.209644,0.815765,0.769343,0.765984,0.752596


[I 2025-03-23 15:50:05,819] Trial 135 finished with value: 0.752466646488279 and parameters: {'learning_rate': 0.004427647143845693, 'weight_decay': 0.002, 'warmup_steps': 12}. Best is trial 116 with value: 0.7937875646913016.


Trial 136 with params: {'learning_rate': 0.00482307088151126, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4569,1.219745,0.804766,0.772948,0.740448,0.743394
2,0.0275,1.502836,0.810266,0.813951,0.719363,0.749039
3,0.013,1.475395,0.808433,0.775545,0.745391,0.745183
4,0.0144,1.863708,0.805683,0.783942,0.736342,0.74768
5,0.0092,1.759415,0.807516,0.785395,0.730511,0.737879
6,0.0051,2.04018,0.806599,0.807274,0.762568,0.765584
7,0.0031,2.330248,0.804766,0.788029,0.751099,0.749273
8,0.0024,2.422564,0.810266,0.771203,0.745308,0.746097
9,0.0024,2.353822,0.799267,0.767953,0.762277,0.751758
10,0.0017,2.458487,0.806599,0.794847,0.758598,0.758361


[I 2025-03-23 15:55:58,362] Trial 136 finished with value: 0.7676020586053011 and parameters: {'learning_rate': 0.00482307088151126, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 116 with value: 0.7937875646913016.


Trial 137 with params: {'learning_rate': 0.004950854201942154, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4541,1.210709,0.813932,0.778713,0.728649,0.731809
2,0.0269,1.339861,0.818515,0.828312,0.757714,0.776295
3,0.0132,1.5275,0.797434,0.788532,0.736869,0.74548
4,0.0142,1.591769,0.802016,0.807474,0.75179,0.759274
5,0.0097,1.804937,0.80385,0.813394,0.733519,0.754075
6,0.006,1.865645,0.809349,0.800592,0.749052,0.760062
7,0.0027,2.22299,0.802933,0.77501,0.735652,0.740623
8,0.0018,2.223391,0.799267,0.8016,0.740717,0.756539
9,0.0013,2.212561,0.813016,0.774187,0.733706,0.742004
10,0.0013,2.394407,0.808433,0.784812,0.748407,0.751606


[I 2025-03-23 16:01:32,798] Trial 137 finished with value: 0.7524818989219766 and parameters: {'learning_rate': 0.004950854201942154, 'weight_decay': 0.001, 'warmup_steps': 0}. Best is trial 116 with value: 0.7937875646913016.


Trial 138 with params: {'learning_rate': 0.00013086161901401913, 'weight_decay': 0.004, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1583,1.65209,0.586618,0.207373,0.192799,0.176022
2,1.1191,1.269618,0.665445,0.326213,0.313073,0.308271
3,0.6593,1.140983,0.714024,0.449276,0.399129,0.406777
4,0.394,1.142068,0.728689,0.514623,0.453021,0.466275
5,0.2427,1.174876,0.741522,0.643415,0.509221,0.542399
6,0.1551,1.217062,0.750687,0.597386,0.533381,0.542629
7,0.0992,1.2874,0.745188,0.579573,0.551633,0.550885
8,0.0699,1.334429,0.742438,0.600169,0.577494,0.577343
9,0.0507,1.401385,0.748854,0.598715,0.60571,0.593673
10,0.037,1.435294,0.756187,0.625623,0.613472,0.609413


[I 2025-03-23 16:05:13,471] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.004899076388857493, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4639,1.248749,0.809349,0.820593,0.748953,0.765215
2,0.0283,1.469979,0.802016,0.775291,0.746209,0.745969
3,0.0146,1.490615,0.799267,0.773663,0.740329,0.742263
4,0.0106,1.68958,0.800183,0.791938,0.744493,0.747
5,0.01,1.904507,0.805683,0.795344,0.741092,0.748701
6,0.008,2.021977,0.802933,0.763948,0.731774,0.733501
7,0.0057,2.102346,0.797434,0.743786,0.73213,0.727402
8,0.0033,2.166515,0.799267,0.771399,0.726098,0.738253
9,0.0022,2.298526,0.802016,0.776444,0.734457,0.740539
10,0.0007,2.339888,0.791934,0.754478,0.711437,0.721324


[I 2025-03-23 16:09:08,528] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0005890956406546818, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1414,1.022663,0.764436,0.589853,0.526544,0.534411
2,0.1355,1.303725,0.784601,0.69862,0.670793,0.667598
3,0.0384,1.496082,0.770852,0.643478,0.640694,0.630165
4,0.0175,1.479848,0.789184,0.691178,0.663751,0.661324
5,0.0093,1.593944,0.793767,0.717136,0.677621,0.680853
6,0.0056,1.66775,0.792851,0.723656,0.682287,0.689135
7,0.005,1.590292,0.805683,0.728542,0.696524,0.693732
8,0.0032,1.766319,0.787351,0.718466,0.673542,0.680476
9,0.0033,1.777365,0.794684,0.727846,0.684982,0.691921
10,0.0019,1.822505,0.8011,0.746696,0.678688,0.692309


[I 2025-03-23 16:13:09,054] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.004392098032029998, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4806,1.269282,0.807516,0.791343,0.714952,0.735652
2,0.027,1.292167,0.818515,0.787566,0.733606,0.745724
3,0.0134,1.440746,0.809349,0.802006,0.732984,0.749452
4,0.0089,1.615981,0.813932,0.748969,0.772334,0.739753
5,0.0084,1.758935,0.802933,0.775529,0.743489,0.741948
6,0.0051,1.779253,0.8011,0.740077,0.744486,0.730117
7,0.0028,2.002877,0.809349,0.781783,0.750163,0.749826
8,0.0009,2.058925,0.812099,0.775835,0.742497,0.743528
9,0.0037,2.176723,0.808433,0.786182,0.76853,0.763291
10,0.0017,2.371676,0.808433,0.79738,0.754836,0.759302


[I 2025-03-23 16:19:02,505] Trial 141 finished with value: 0.7629584315449236 and parameters: {'learning_rate': 0.004392098032029998, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 116 with value: 0.7937875646913016.


Trial 142 with params: {'learning_rate': 0.0025643195563959742, 'weight_decay': 0.0, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5791,1.205216,0.80385,0.788243,0.697362,0.722698
2,0.027,1.375453,0.8011,0.756479,0.715025,0.717482
3,0.0111,1.40537,0.813016,0.788823,0.755837,0.754247
4,0.0057,1.483147,0.816682,0.77865,0.764443,0.755755
5,0.004,1.688812,0.808433,0.752942,0.732499,0.728646
6,0.0055,1.730297,0.802016,0.786064,0.706016,0.726474
7,0.0039,1.740054,0.811182,0.746765,0.728203,0.718542
8,0.0008,1.847494,0.815765,0.762386,0.724723,0.728343
9,0.0008,1.833796,0.816682,0.789954,0.736586,0.745939
10,0.0005,1.881188,0.814849,0.791752,0.733353,0.745993


[I 2025-03-23 16:24:53,445] Trial 142 finished with value: 0.7408823032477889 and parameters: {'learning_rate': 0.0025643195563959742, 'weight_decay': 0.0, 'warmup_steps': 7}. Best is trial 116 with value: 0.7937875646913016.


Trial 143 with params: {'learning_rate': 0.0014920286077545577, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7089,1.091645,0.791934,0.758308,0.670779,0.696886
2,0.0381,1.336074,0.7956,0.717154,0.694865,0.694332
3,0.0139,1.391886,0.799267,0.741662,0.694065,0.694484
4,0.0086,1.332684,0.804766,0.744561,0.715262,0.712603
5,0.0049,1.447489,0.813016,0.748202,0.740987,0.731625


[I 2025-03-23 16:26:53,765] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.004902407746704626, 'weight_decay': 0.003, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4498,1.172976,0.818515,0.775387,0.734724,0.744408
2,0.026,1.41245,0.806599,0.763325,0.743986,0.739271
3,0.0167,1.521426,0.807516,0.791612,0.748452,0.75204
4,0.0099,1.733302,0.792851,0.755561,0.732285,0.731252
5,0.0102,1.93117,0.802016,0.771804,0.733277,0.736695
6,0.0077,2.091787,0.784601,0.728616,0.699295,0.68874
7,0.0067,2.117998,0.794684,0.733955,0.717558,0.712633
8,0.0036,2.18032,0.796517,0.767528,0.712149,0.724209
9,0.0013,2.255899,0.802933,0.752143,0.734905,0.733124
10,0.0007,2.299518,0.813016,0.765973,0.7431,0.743956


[I 2025-03-23 16:32:44,298] Trial 144 finished with value: 0.74770341806721 and parameters: {'learning_rate': 0.004902407746704626, 'weight_decay': 0.003, 'warmup_steps': 7}. Best is trial 116 with value: 0.7937875646913016.


Trial 145 with params: {'learning_rate': 0.0002766308519922785, 'weight_decay': 0.002, 'warmup_steps': 51}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7159,1.184922,0.699358,0.397336,0.356212,0.3542
2,0.4546,1.075888,0.751604,0.595292,0.532049,0.542659
3,0.1498,1.172501,0.767186,0.643381,0.592423,0.60113
4,0.0587,1.314366,0.777269,0.670063,0.637129,0.640859
5,0.0311,1.448665,0.780935,0.688425,0.638522,0.645549


[I 2025-03-23 16:34:45,662] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.001853709869121236, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6706,1.091149,0.809349,0.743967,0.699834,0.70769
2,0.03,1.432319,0.780018,0.751502,0.68557,0.69039
3,0.0125,1.288223,0.808433,0.745263,0.724024,0.722848
4,0.0065,1.50542,0.809349,0.786973,0.723801,0.735866
5,0.006,1.521161,0.804766,0.769167,0.729681,0.735077
6,0.0031,1.631243,0.809349,0.770849,0.711288,0.724154
7,0.0028,1.714686,0.813932,0.787848,0.734513,0.741625
8,0.0028,1.722634,0.805683,0.788325,0.716225,0.728948
9,0.001,1.762076,0.815765,0.79609,0.739977,0.749733
10,0.0009,1.793142,0.812099,0.765129,0.737824,0.732848


[I 2025-03-23 16:38:52,615] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.004115075722360922, 'weight_decay': 0.0, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5025,1.239849,0.802933,0.761524,0.720523,0.721486
2,0.0249,1.47096,0.804766,0.809696,0.742392,0.757531
3,0.012,1.602412,0.80385,0.790613,0.735817,0.746084
4,0.0095,1.717075,0.797434,0.781743,0.729971,0.734829
5,0.0085,1.859591,0.808433,0.792616,0.746714,0.752968
6,0.0036,2.020965,0.805683,0.765871,0.728614,0.732013
7,0.0033,2.052903,0.811182,0.777107,0.750674,0.746519
8,0.0008,2.108755,0.80385,0.78787,0.752407,0.756045
9,0.0013,2.144482,0.8011,0.743464,0.749448,0.734592
10,0.0005,2.162688,0.814849,0.771764,0.735159,0.738734


[I 2025-03-23 16:42:45,703] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.00224473743989911, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5889,1.203988,0.810266,0.765514,0.712079,0.718195
2,0.0294,1.356908,0.806599,0.750163,0.688214,0.700871
3,0.0116,1.384105,0.816682,0.78705,0.74745,0.748715
4,0.0079,1.442833,0.813016,0.795194,0.739312,0.751858
5,0.005,1.449193,0.816682,0.774763,0.743851,0.746819
6,0.0028,1.518874,0.822181,0.781815,0.75546,0.757788
7,0.003,1.605371,0.812099,0.775318,0.751073,0.750486
8,0.002,1.630836,0.819432,0.78377,0.737719,0.750147
9,0.0022,1.590193,0.816682,0.787483,0.739817,0.752582
10,0.0004,1.710855,0.819432,0.785137,0.758758,0.760907


[I 2025-03-23 16:48:23,127] Trial 148 finished with value: 0.7586174858058865 and parameters: {'learning_rate': 0.00224473743989911, 'weight_decay': 0.001, 'warmup_steps': 0}. Best is trial 116 with value: 0.7937875646913016.


Trial 149 with params: {'learning_rate': 0.003937875022867262, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4828,1.15724,0.806599,0.772973,0.719817,0.733405
2,0.0247,1.310796,0.805683,0.773378,0.739149,0.740696
3,0.0133,1.542389,0.814849,0.799324,0.753782,0.763041
4,0.008,1.679979,0.812099,0.782599,0.744096,0.749502
5,0.0072,1.604445,0.813016,0.786253,0.743238,0.75006
6,0.0058,1.780284,0.813932,0.766107,0.756718,0.752112
7,0.0035,1.943498,0.812099,0.808192,0.748132,0.762382
8,0.0022,1.965499,0.822181,0.809607,0.762409,0.776422
9,0.0024,1.989234,0.818515,0.787282,0.770648,0.766751
10,0.0006,2.034948,0.819432,0.775175,0.746415,0.748304


[I 2025-03-23 16:54:01,050] Trial 149 finished with value: 0.7585518183584314 and parameters: {'learning_rate': 0.003937875022867262, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 116 with value: 0.7937875646913016.


In [40]:
print(best_trial_normal_aug)

BestRun(run_id='116', objective=0.7937875646913016, hyperparameters={'learning_rate': 0.003523221778570819, 'weight_decay': 0.003, 'warmup_steps': 6}, run_summary=None)


In [41]:
base.reset_seed()

## Prohledávání s destilací nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [42]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-distill-embedd-aug_fine_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-distill-embedd-aug_fine_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [43]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [44]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [45]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=all_train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [46]:
best_trial_distill_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill-aug-embedd",
    n_trials=150
)

[I 2025-03-23 16:54:01,333] A new study created in memory with name: Distill-aug-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 39, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1602,0.828101,0.694775,0.314388,0.306172,0.295042
2,0.4251,0.634547,0.780935,0.511661,0.485083,0.485877
3,0.2228,0.585156,0.790101,0.628787,0.552567,0.57256
4,0.148,0.560218,0.807516,0.718019,0.629717,0.658331
5,0.115,0.551896,0.808433,0.72444,0.660381,0.68124
6,0.099,0.552129,0.808433,0.740795,0.665557,0.690026
7,0.09,0.556047,0.813932,0.743962,0.66997,0.695364
8,0.0838,0.546292,0.811182,0.75292,0.676842,0.703096
9,0.0798,0.546623,0.817599,0.768858,0.703305,0.726095
10,0.0764,0.549268,0.815765,0.770501,0.698507,0.722482


[I 2025-03-23 16:57:43,863] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 46, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5483,1.21448,0.533456,0.149299,0.149957,0.127398
2,0.9451,0.961617,0.640697,0.275194,0.240149,0.227544
3,0.6853,0.843234,0.699358,0.3148,0.309228,0.299187
4,0.5219,0.780473,0.718607,0.347993,0.336779,0.334301
5,0.4144,0.736022,0.732356,0.487096,0.398194,0.413387


[I 2025-03-23 16:59:37,436] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 44, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7489,1.436007,0.447296,0.076214,0.103267,0.076314
2,1.2302,1.218206,0.534372,0.143126,0.154995,0.1349
3,1.0325,1.089041,0.603116,0.176025,0.204122,0.18062
4,0.8787,0.99128,0.63703,0.24525,0.23039,0.214374
5,0.7584,0.922158,0.663611,0.289131,0.260506,0.251395
6,0.6632,0.874701,0.676444,0.289641,0.282143,0.273979
7,0.5898,0.84258,0.694775,0.319969,0.308026,0.298989
8,0.5355,0.819012,0.706691,0.351545,0.32522,0.323144
9,0.4924,0.803757,0.72044,0.346397,0.346485,0.337611
10,0.4574,0.786679,0.721357,0.37466,0.35398,0.352494


[I 2025-03-23 17:03:31,105] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4909,1.162162,0.571036,0.172397,0.175795,0.153874
2,0.8824,0.913192,0.664528,0.291164,0.263801,0.255059
3,0.6201,0.802372,0.718607,0.336396,0.335842,0.326133
4,0.4597,0.744857,0.732356,0.407586,0.362208,0.370113
5,0.3559,0.700082,0.741522,0.518637,0.424887,0.446737


[I 2025-03-23 17:05:31,592] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7194,0.574088,0.80385,0.608003,0.54769,0.558788
2,0.1414,0.525428,0.816682,0.779544,0.719337,0.737331
3,0.0909,0.537397,0.802933,0.753815,0.678426,0.702555
4,0.0776,0.53448,0.819432,0.794951,0.7132,0.740187
5,0.0711,0.514693,0.828598,0.816691,0.725722,0.757024
6,0.0685,0.501149,0.831347,0.81489,0.742515,0.763698
7,0.0667,0.513635,0.827681,0.802207,0.73667,0.754324
8,0.0647,0.516264,0.824931,0.812841,0.736674,0.761323
9,0.0643,0.513561,0.827681,0.788533,0.726855,0.744571
10,0.063,0.514203,0.825848,0.815361,0.730555,0.757112


[I 2025-03-23 17:11:24,219] Trial 4 finished with value: 0.7582117365009416 and parameters: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 15, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 4 with value: 0.7582117365009416.


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5343,0.523954,0.818515,0.733647,0.670001,0.687897
2,0.0948,0.516216,0.824015,0.789013,0.731801,0.748445
3,0.075,0.510974,0.826764,0.801381,0.727906,0.750873
4,0.0686,0.489869,0.839597,0.848622,0.773855,0.797834
5,0.0667,0.493281,0.834097,0.834772,0.758312,0.782426
6,0.0645,0.490046,0.834097,0.842934,0.764597,0.787457
7,0.0633,0.494952,0.829514,0.833229,0.769128,0.789024
8,0.0623,0.495944,0.840513,0.830241,0.769963,0.788067
9,0.0616,0.496528,0.83593,0.834533,0.778136,0.795526
10,0.0609,0.498318,0.833181,0.837129,0.778852,0.797177


[I 2025-03-23 17:15:09,076] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7144,0.58606,0.792851,0.583721,0.517708,0.535426
2,0.1453,0.534085,0.814849,0.706706,0.648384,0.667254
3,0.0922,0.52764,0.813016,0.788918,0.696072,0.723894
4,0.079,0.507429,0.829514,0.802778,0.710941,0.742101
5,0.0717,0.510525,0.828598,0.827827,0.726747,0.761076
6,0.0686,0.49735,0.829514,0.82676,0.737222,0.764733
7,0.0671,0.511653,0.828598,0.828907,0.727594,0.761924
8,0.0651,0.499508,0.832264,0.839478,0.738842,0.771477
9,0.0637,0.503626,0.833181,0.840958,0.745704,0.777025
10,0.0629,0.496943,0.836847,0.840156,0.750501,0.778641


[I 2025-03-23 17:21:01,094] Trial 6 finished with value: 0.7766027706671251 and parameters: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 3, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 6 with value: 0.7766027706671251.


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4983,0.51996,0.825848,0.761884,0.684322,0.706379
2,0.0939,0.512258,0.831347,0.809737,0.747451,0.761387
3,0.0752,0.489111,0.843263,0.813006,0.772531,0.778806
4,0.0693,0.496291,0.836847,0.849963,0.768117,0.790923
5,0.0661,0.495877,0.83868,0.806736,0.756276,0.768471
6,0.0652,0.501422,0.830431,0.835035,0.76864,0.78655
7,0.0653,0.492285,0.83868,0.841884,0.777194,0.794601
8,0.0627,0.492289,0.84143,0.83409,0.789594,0.799615
9,0.062,0.502446,0.835014,0.836271,0.776595,0.791333
10,0.0612,0.491026,0.83593,0.838812,0.775373,0.792001


[I 2025-03-23 17:25:01,893] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5722,1.278282,0.513291,0.120969,0.13682,0.111763
2,1.0313,1.040085,0.614115,0.190076,0.212735,0.191201
3,0.7839,0.905552,0.661778,0.286514,0.264502,0.253711
4,0.6177,0.831847,0.694775,0.315341,0.303553,0.29666
5,0.5027,0.782872,0.716774,0.409186,0.350994,0.353024


[I 2025-03-23 17:27:03,416] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6629,0.552586,0.807516,0.657214,0.591696,0.609745
2,0.1218,0.514165,0.825848,0.765944,0.705733,0.723462
3,0.0839,0.506827,0.831347,0.794827,0.725158,0.746543
4,0.074,0.51024,0.829514,0.795011,0.722103,0.74527
5,0.0687,0.491704,0.837764,0.824108,0.739204,0.768452
6,0.0665,0.498606,0.84143,0.832343,0.748013,0.775569
7,0.0656,0.508505,0.835014,0.815942,0.742866,0.763505
8,0.0638,0.508964,0.834097,0.810207,0.73865,0.760242
9,0.0628,0.507368,0.83593,0.818207,0.740456,0.764786
10,0.0616,0.501603,0.836847,0.823364,0.74392,0.77003


[I 2025-03-23 17:33:05,476] Trial 9 finished with value: 0.7698258481620249 and parameters: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 6 with value: 0.7766027706671251.


Trial 10 with params: {'learning_rate': 0.0004285183260552018, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9604,0.697679,0.750687,0.418557,0.400787,0.396583
2,0.2706,0.575507,0.79835,0.581282,0.544975,0.55158
3,0.1432,0.55569,0.804766,0.732834,0.63335,0.662411
4,0.1057,0.542235,0.813016,0.771812,0.662488,0.696343
5,0.0897,0.535863,0.815765,0.776011,0.707444,0.727644
6,0.0814,0.531441,0.819432,0.81526,0.725852,0.755371
7,0.0767,0.531787,0.815765,0.807238,0.720043,0.748822
8,0.0728,0.536457,0.813932,0.782523,0.707934,0.730558
9,0.0702,0.524727,0.819432,0.779736,0.709187,0.731646
10,0.0682,0.532748,0.818515,0.795923,0.721987,0.744488


[I 2025-03-23 17:37:01,778] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0010950817605907411, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6545,0.544329,0.809349,0.665037,0.601634,0.61929
2,0.1187,0.513275,0.827681,0.788088,0.725238,0.746115
3,0.0819,0.512731,0.828598,0.79325,0.7241,0.746225
4,0.0735,0.498797,0.83868,0.807652,0.743345,0.763374
5,0.069,0.494545,0.84143,0.8138,0.747132,0.768652
6,0.0669,0.493178,0.840513,0.816498,0.752214,0.770864
7,0.0649,0.490764,0.839597,0.801985,0.734945,0.756907
8,0.063,0.501863,0.84143,0.784845,0.739029,0.752093
9,0.0625,0.497158,0.83868,0.786152,0.730908,0.74767
10,0.0618,0.495671,0.834097,0.778359,0.73633,0.747489


[I 2025-03-23 17:41:03,348] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.00018632288627671892, 'weight_decay': 0.006, 'warmup_steps': 3, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2976,0.984301,0.627864,0.208711,0.2178,0.193139
2,0.6395,0.761968,0.728689,0.37735,0.353209,0.34879
3,0.3873,0.676441,0.758937,0.484724,0.427889,0.442078
4,0.26,0.637823,0.773602,0.54231,0.490596,0.49752
5,0.1911,0.620004,0.776352,0.616943,0.541886,0.563687
6,0.1512,0.602423,0.784601,0.645119,0.588545,0.603184
7,0.1278,0.591716,0.794684,0.656032,0.588866,0.610139
8,0.1135,0.58544,0.792851,0.69433,0.6017,0.630878
9,0.1038,0.58434,0.8011,0.709371,0.630836,0.655191
10,0.0974,0.584025,0.8011,0.71383,0.624195,0.651606


[I 2025-03-23 17:45:03,244] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0002609239853516876, 'weight_decay': 0.004, 'warmup_steps': 35, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1888,0.853838,0.689276,0.310113,0.294908,0.285437
2,0.4606,0.649462,0.777269,0.520641,0.467528,0.47303
3,0.2442,0.589748,0.7956,0.589185,0.547463,0.557635
4,0.1613,0.568715,0.806599,0.710359,0.606161,0.639047
5,0.1232,0.563586,0.80385,0.729751,0.649219,0.673277


[I 2025-03-23 17:47:03,074] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0016538876791015451, 'weight_decay': 0.002, 'warmup_steps': 49, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5785,0.517932,0.821265,0.75393,0.66517,0.692783
2,0.0992,0.500695,0.830431,0.826334,0.757035,0.778999
3,0.0752,0.484791,0.829514,0.809315,0.740783,0.761052
4,0.0699,0.486504,0.83593,0.838301,0.758484,0.783613
5,0.0659,0.48509,0.840513,0.845177,0.768344,0.793615
6,0.0645,0.488369,0.826764,0.845695,0.762573,0.787128
7,0.063,0.494992,0.835014,0.848223,0.773538,0.797048
8,0.0624,0.499687,0.836847,0.84299,0.775024,0.794754
9,0.0615,0.487317,0.835014,0.845523,0.778881,0.798947
10,0.0611,0.485161,0.836847,0.844987,0.780921,0.799348


[I 2025-03-23 17:52:58,677] Trial 14 finished with value: 0.800320986413265 and parameters: {'learning_rate': 0.0016538876791015451, 'weight_decay': 0.002, 'warmup_steps': 49, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 14 with value: 0.800320986413265.


Trial 15 with params: {'learning_rate': 0.003420477557521518, 'weight_decay': 0.002, 'warmup_steps': 7, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4316,0.509464,0.821265,0.806836,0.724953,0.750514
2,0.0872,0.502925,0.827681,0.825616,0.756131,0.775367
3,0.073,0.494678,0.832264,0.82529,0.749952,0.769754
4,0.0701,0.505028,0.83593,0.836826,0.755573,0.777116
5,0.0667,0.504817,0.832264,0.819665,0.754942,0.77217
6,0.0647,0.516459,0.831347,0.818981,0.763843,0.777969
7,0.0641,0.530076,0.823098,0.812594,0.740333,0.765668
8,0.0639,0.509877,0.828598,0.826194,0.748111,0.770167
9,0.0625,0.495512,0.830431,0.822216,0.743108,0.763972
10,0.0612,0.512947,0.830431,0.826215,0.753414,0.775886


[I 2025-03-23 17:56:50,459] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0010325844507139503, 'weight_decay': 0.003, 'warmup_steps': 45, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6966,0.546572,0.804766,0.625321,0.575189,0.586914
2,0.1233,0.520357,0.826764,0.798126,0.734654,0.753525
3,0.0844,0.510484,0.829514,0.785668,0.73151,0.747888
4,0.0735,0.501274,0.83593,0.813611,0.744079,0.76663
5,0.0691,0.506845,0.830431,0.803271,0.73204,0.754431
6,0.0662,0.49994,0.83593,0.819807,0.742281,0.768099
7,0.0644,0.49715,0.833181,0.814351,0.739691,0.763774
8,0.0631,0.497652,0.842346,0.821411,0.755398,0.776069
9,0.063,0.504703,0.829514,0.812082,0.733786,0.759225
10,0.0625,0.508444,0.831347,0.808459,0.741445,0.761047


[I 2025-03-23 18:02:40,264] Trial 16 finished with value: 0.7742056049108453 and parameters: {'learning_rate': 0.0010325844507139503, 'weight_decay': 0.003, 'warmup_steps': 45, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 14 with value: 0.800320986413265.


Trial 17 with params: {'learning_rate': 0.001012579519268219, 'weight_decay': 0.001, 'warmup_steps': 32, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6771,0.552468,0.805683,0.66067,0.598866,0.613696
2,0.124,0.507882,0.828598,0.782761,0.725758,0.741633
3,0.0847,0.51675,0.824931,0.787131,0.717587,0.738988
4,0.0747,0.505076,0.831347,0.814617,0.736249,0.761355
5,0.0696,0.491021,0.83593,0.816415,0.737266,0.764701


[I 2025-03-23 18:04:38,822] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.004200247568420068, 'weight_decay': 0.003, 'warmup_steps': 45, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4455,0.513162,0.824015,0.799586,0.725447,0.747365
2,0.0854,0.522075,0.824015,0.810438,0.718112,0.745332
3,0.0731,0.494833,0.83593,0.83302,0.757692,0.781843
4,0.0687,0.506132,0.835014,0.80466,0.768771,0.776885
5,0.0658,0.499176,0.839597,0.840472,0.769681,0.790657
6,0.0648,0.49846,0.836847,0.821986,0.771965,0.784366
7,0.0639,0.503645,0.830431,0.828454,0.770362,0.786521
8,0.0633,0.509955,0.836847,0.824274,0.765311,0.783313
9,0.0617,0.514311,0.83593,0.824245,0.784152,0.792468
10,0.0609,0.520999,0.831347,0.84285,0.764933,0.786393


[I 2025-03-23 18:08:33,458] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.003585510562480817, 'weight_decay': 0.003, 'warmup_steps': 46, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4532,0.510607,0.824931,0.81932,0.74174,0.767453
2,0.0851,0.51759,0.830431,0.80581,0.773853,0.778819
3,0.0721,0.499942,0.827681,0.829042,0.758868,0.781754
4,0.0688,0.496447,0.836847,0.835439,0.772317,0.791117
5,0.0656,0.492451,0.839597,0.829843,0.791915,0.803114
6,0.0639,0.492065,0.836847,0.809741,0.773168,0.780343
7,0.0629,0.491265,0.83868,0.82469,0.790811,0.795427
8,0.0626,0.503968,0.840513,0.806988,0.77401,0.78093
9,0.0623,0.4898,0.847846,0.836983,0.796516,0.805941
10,0.0608,0.507196,0.832264,0.827234,0.774272,0.788591


[I 2025-03-23 18:14:38,615] Trial 19 finished with value: 0.7990139079310553 and parameters: {'learning_rate': 0.003585510562480817, 'weight_decay': 0.003, 'warmup_steps': 46, 'lambda_param': 0.2, 'temperature': 2.5}. Best is trial 14 with value: 0.800320986413265.


Trial 20 with params: {'learning_rate': 0.0030196592552662557, 'weight_decay': 0.003, 'warmup_steps': 51, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4818,0.496975,0.826764,0.801591,0.722321,0.745513
2,0.0863,0.530612,0.825848,0.813326,0.755297,0.768912
3,0.0721,0.496073,0.83593,0.841015,0.78049,0.798439
4,0.0681,0.507391,0.837764,0.839064,0.784682,0.802087
5,0.0655,0.50032,0.84418,0.833308,0.792446,0.806166
6,0.0639,0.503276,0.840513,0.835399,0.784784,0.799981
7,0.0636,0.496985,0.84418,0.836048,0.795244,0.807325
8,0.0628,0.510875,0.83593,0.822795,0.787821,0.796004
9,0.0623,0.502685,0.84418,0.824658,0.794189,0.802766
10,0.0609,0.498126,0.84418,0.834876,0.790766,0.805609


[I 2025-03-23 18:20:32,054] Trial 20 finished with value: 0.7984233525114354 and parameters: {'learning_rate': 0.0030196592552662557, 'weight_decay': 0.003, 'warmup_steps': 51, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 14 with value: 0.800320986413265.


Trial 21 with params: {'learning_rate': 0.002710262435115171, 'weight_decay': 0.0, 'warmup_steps': 53, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4996,0.511307,0.833181,0.811361,0.731479,0.756088
2,0.0875,0.509612,0.824015,0.807606,0.739905,0.758625
3,0.072,0.498231,0.826764,0.795729,0.742438,0.757835
4,0.0678,0.495705,0.832264,0.819412,0.751829,0.771692
5,0.0656,0.504354,0.834097,0.845912,0.775732,0.796823
6,0.0634,0.498141,0.835014,0.849453,0.782044,0.801736
7,0.0632,0.501353,0.824015,0.819925,0.751138,0.77238
8,0.0632,0.495371,0.839597,0.84199,0.772241,0.793808
9,0.0616,0.495327,0.832264,0.840399,0.764151,0.785435
10,0.0605,0.493363,0.837764,0.842119,0.778855,0.796572


[I 2025-03-23 18:26:34,084] Trial 21 finished with value: 0.8027729331961035 and parameters: {'learning_rate': 0.002710262435115171, 'weight_decay': 0.0, 'warmup_steps': 53, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 21 with value: 0.8027729331961035.


Trial 22 with params: {'learning_rate': 0.002894423211984817, 'weight_decay': 0.005, 'warmup_steps': 49, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4836,0.536548,0.825848,0.751382,0.707519,0.711707
2,0.0864,0.528465,0.824931,0.807002,0.73985,0.759565
3,0.0718,0.508289,0.829514,0.8023,0.747141,0.761291
4,0.0687,0.513589,0.824931,0.79169,0.768847,0.772958
5,0.066,0.506355,0.831347,0.791918,0.757015,0.765381


[I 2025-03-23 18:28:31,924] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.0014427910213420786, 'weight_decay': 0.0, 'warmup_steps': 49, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6123,0.533901,0.817599,0.719731,0.638429,0.662531
2,0.1041,0.518933,0.823098,0.786939,0.73294,0.748147
3,0.077,0.505275,0.826764,0.782971,0.73491,0.749399
4,0.0703,0.507835,0.832264,0.792046,0.736257,0.752918
5,0.0664,0.505303,0.831347,0.795814,0.724753,0.747239
6,0.0656,0.501243,0.83593,0.825368,0.748594,0.774972
7,0.0641,0.523651,0.825848,0.825451,0.736309,0.763764
8,0.0626,0.510586,0.833181,0.800051,0.743565,0.761452
9,0.0621,0.509755,0.833181,0.833909,0.744405,0.773627
10,0.0613,0.50724,0.832264,0.809887,0.744597,0.763749


[I 2025-03-23 18:34:29,286] Trial 23 finished with value: 0.7846675482753193 and parameters: {'learning_rate': 0.0014427910213420786, 'weight_decay': 0.0, 'warmup_steps': 49, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 21 with value: 0.8027729331961035.


Trial 24 with params: {'learning_rate': 0.0004903718039707595, 'weight_decay': 0.002, 'warmup_steps': 48, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9358,0.658012,0.769019,0.465542,0.458087,0.449714
2,0.2295,0.544104,0.809349,0.672096,0.598647,0.618836
3,0.1251,0.54511,0.810266,0.755254,0.668816,0.696903
4,0.0963,0.523523,0.818515,0.810036,0.720986,0.751027
5,0.0835,0.525612,0.822181,0.799598,0.724944,0.749779


[I 2025-03-23 18:36:26,319] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.0019012579473822386, 'weight_decay': 0.0, 'warmup_steps': 48, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5537,0.51217,0.829514,0.71297,0.662975,0.677671
2,0.094,0.505112,0.831347,0.786255,0.730602,0.746517
3,0.0729,0.501342,0.83868,0.786582,0.724318,0.744115
4,0.0678,0.495926,0.83868,0.816397,0.749213,0.768491
5,0.0655,0.486658,0.845096,0.809321,0.758553,0.773241
6,0.0638,0.491827,0.835014,0.795895,0.760076,0.768185
7,0.0633,0.519584,0.831347,0.793413,0.756297,0.765974
8,0.0621,0.500195,0.84143,0.796783,0.764939,0.770864
9,0.0608,0.503269,0.836847,0.829953,0.767177,0.786109
10,0.0602,0.488276,0.83868,0.80309,0.759158,0.769833


[I 2025-03-23 18:42:06,163] Trial 25 finished with value: 0.788152228511128 and parameters: {'learning_rate': 0.0019012579473822386, 'weight_decay': 0.0, 'warmup_steps': 48, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 21 with value: 0.8027729331961035.


Trial 26 with params: {'learning_rate': 0.0026410410114791035, 'weight_decay': 0.003, 'warmup_steps': 42, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4909,0.508783,0.824931,0.812566,0.726327,0.752106
2,0.0872,0.51893,0.830431,0.807994,0.752955,0.766608
3,0.0716,0.494398,0.836847,0.812786,0.753765,0.769494
4,0.068,0.507889,0.831347,0.803949,0.759677,0.770439
5,0.0656,0.513889,0.834097,0.818311,0.760335,0.773716
6,0.064,0.511374,0.835014,0.832054,0.762533,0.780594
7,0.0631,0.512645,0.839597,0.837159,0.792571,0.803352
8,0.0623,0.526412,0.833181,0.826211,0.764965,0.781344
9,0.0612,0.526686,0.827681,0.838601,0.772798,0.792681
10,0.0604,0.521245,0.831347,0.832061,0.77973,0.795137


[I 2025-03-23 18:48:14,281] Trial 26 finished with value: 0.7979944766226619 and parameters: {'learning_rate': 0.0026410410114791035, 'weight_decay': 0.003, 'warmup_steps': 42, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 21 with value: 0.8027729331961035.


Trial 27 with params: {'learning_rate': 0.004851030350712429, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4051,0.508765,0.832264,0.81534,0.741354,0.764632
2,0.0852,0.512141,0.829514,0.843117,0.767031,0.789688
3,0.0732,0.489144,0.836847,0.849194,0.782025,0.802476
4,0.0692,0.499079,0.835014,0.845235,0.77622,0.796065
5,0.0664,0.516149,0.825848,0.836731,0.768444,0.787745
6,0.0645,0.509365,0.831347,0.830231,0.770869,0.787546
7,0.0637,0.488893,0.839597,0.847272,0.781655,0.801881
8,0.0632,0.528726,0.828598,0.832269,0.766076,0.786483
9,0.062,0.516679,0.828598,0.831042,0.771143,0.786048
10,0.0611,0.517785,0.830431,0.840081,0.77263,0.79363


[I 2025-03-23 18:53:56,966] Trial 27 finished with value: 0.797044322762393 and parameters: {'learning_rate': 0.004851030350712429, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 21 with value: 0.8027729331961035.


Trial 28 with params: {'learning_rate': 0.0026252793477683457, 'weight_decay': 0.0, 'warmup_steps': 45, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4939,0.537715,0.814849,0.762837,0.704429,0.718825
2,0.0875,0.514132,0.830431,0.799199,0.746039,0.759255
3,0.0721,0.489821,0.84143,0.853281,0.785239,0.805112
4,0.0679,0.502625,0.845096,0.823361,0.778142,0.789283
5,0.0662,0.503405,0.842346,0.841915,0.80105,0.811482
6,0.0646,0.501593,0.839597,0.840869,0.769554,0.792705
7,0.063,0.499722,0.840513,0.815614,0.761032,0.774223
8,0.0619,0.507185,0.83593,0.817047,0.764351,0.776529
9,0.0611,0.492053,0.846013,0.844695,0.79662,0.808836
10,0.0604,0.494222,0.846013,0.841797,0.79303,0.805866


[I 2025-03-23 18:59:49,068] Trial 28 finished with value: 0.8150654028379515 and parameters: {'learning_rate': 0.0026252793477683457, 'weight_decay': 0.0, 'warmup_steps': 45, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 28 with value: 0.8150654028379515.


Trial 29 with params: {'learning_rate': 0.004465858399905994, 'weight_decay': 0.002, 'warmup_steps': 49, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4364,0.518347,0.824931,0.815881,0.732812,0.75739
2,0.0864,0.505647,0.827681,0.802788,0.739075,0.758367
3,0.0729,0.502882,0.833181,0.816006,0.76193,0.778874
4,0.069,0.521417,0.831347,0.828614,0.769071,0.782932
5,0.0672,0.506332,0.83593,0.853551,0.789326,0.809829
6,0.0659,0.528867,0.826764,0.840449,0.779107,0.796755
7,0.0642,0.527337,0.827681,0.846127,0.768656,0.793334
8,0.0628,0.525426,0.834097,0.843886,0.783373,0.798683
9,0.0619,0.506171,0.833181,0.844365,0.796288,0.807693
10,0.0609,0.509391,0.834097,0.848053,0.781102,0.800621


[I 2025-03-23 19:05:45,564] Trial 29 finished with value: 0.8166135562260471 and parameters: {'learning_rate': 0.004465858399905994, 'weight_decay': 0.002, 'warmup_steps': 49, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 30 with params: {'learning_rate': 0.0028033393546551383, 'weight_decay': 0.001, 'warmup_steps': 50, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4893,0.512476,0.825848,0.777925,0.70954,0.729417
2,0.0869,0.503784,0.833181,0.79827,0.73595,0.750482
3,0.0717,0.491515,0.83593,0.808869,0.753587,0.767301
4,0.068,0.505803,0.832264,0.797563,0.745676,0.758553
5,0.0657,0.507303,0.828598,0.819324,0.752623,0.769871
6,0.0641,0.526204,0.824931,0.833867,0.76622,0.78319
7,0.0639,0.516542,0.836847,0.843786,0.783669,0.801377
8,0.0625,0.528698,0.829514,0.83018,0.770149,0.787634
9,0.0616,0.518849,0.834097,0.833563,0.771647,0.789282
10,0.0604,0.51591,0.833181,0.825375,0.771343,0.785275


[I 2025-03-23 19:09:44,202] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.002441653327883922, 'weight_decay': 0.0, 'warmup_steps': 43, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.504,0.511792,0.821265,0.783796,0.705182,0.730371
2,0.0888,0.509232,0.832264,0.809815,0.752892,0.767947
3,0.0728,0.50185,0.833181,0.790808,0.740015,0.752342
4,0.0676,0.509536,0.823098,0.823833,0.74697,0.768583
5,0.0657,0.50376,0.834097,0.810619,0.76593,0.776151
6,0.0645,0.518876,0.831347,0.837555,0.768114,0.789
7,0.0638,0.518838,0.836847,0.844479,0.78251,0.803252
8,0.0622,0.508618,0.842346,0.833793,0.79674,0.805617
9,0.0615,0.502573,0.839597,0.830308,0.781352,0.795667
10,0.0607,0.499526,0.835014,0.81701,0.788232,0.795827


[I 2025-03-23 19:15:44,706] Trial 31 finished with value: 0.7963097667524502 and parameters: {'learning_rate': 0.002441653327883922, 'weight_decay': 0.0, 'warmup_steps': 43, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 32 with params: {'learning_rate': 0.0025776224987252992, 'weight_decay': 0.0, 'warmup_steps': 51, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5037,0.507238,0.831347,0.78676,0.718727,0.740833
2,0.0882,0.502087,0.827681,0.80058,0.752805,0.764054
3,0.0726,0.483599,0.837764,0.796562,0.747196,0.759876
4,0.069,0.492798,0.832264,0.806387,0.749257,0.765917
5,0.0655,0.482086,0.835014,0.81359,0.764468,0.778055
6,0.0634,0.487443,0.837764,0.836413,0.782911,0.799316
7,0.0625,0.477958,0.845096,0.841452,0.790354,0.803375
8,0.0617,0.491739,0.84143,0.851224,0.79191,0.808784
9,0.0613,0.486181,0.84143,0.842359,0.784302,0.800694
10,0.0604,0.485542,0.840513,0.843456,0.788722,0.804108


[I 2025-03-23 19:21:38,729] Trial 32 finished with value: 0.8016484152701004 and parameters: {'learning_rate': 0.0025776224987252992, 'weight_decay': 0.0, 'warmup_steps': 51, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 33 with params: {'learning_rate': 0.0032717228407858536, 'weight_decay': 0.001, 'warmup_steps': 52, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4731,0.507855,0.829514,0.76647,0.706587,0.722087
2,0.0855,0.519816,0.830431,0.832678,0.76597,0.78527
3,0.0726,0.501155,0.839597,0.835909,0.783602,0.799262
4,0.0685,0.513385,0.836847,0.833633,0.781368,0.794472
5,0.0663,0.509796,0.83868,0.822582,0.78948,0.794462
6,0.0645,0.495443,0.842346,0.841379,0.785248,0.800034
7,0.0632,0.503289,0.839597,0.83427,0.793255,0.800834
8,0.062,0.507835,0.839597,0.835646,0.787435,0.797045
9,0.0618,0.527365,0.831347,0.83233,0.778488,0.79176
10,0.0614,0.504035,0.836847,0.822243,0.779196,0.78992


[I 2025-03-23 19:27:20,234] Trial 33 finished with value: 0.7911225963288924 and parameters: {'learning_rate': 0.0032717228407858536, 'weight_decay': 0.001, 'warmup_steps': 52, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 34 with params: {'learning_rate': 0.00351676940778005, 'weight_decay': 0.0, 'warmup_steps': 51, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4583,0.494043,0.831347,0.80429,0.735113,0.755001
2,0.0846,0.50106,0.830431,0.818177,0.755946,0.771269
3,0.0718,0.483035,0.83868,0.842752,0.781223,0.798426
4,0.0686,0.493855,0.840513,0.851348,0.779823,0.798242
5,0.0656,0.475559,0.846013,0.849948,0.784375,0.804124
6,0.0635,0.490203,0.834097,0.849852,0.772957,0.795234
7,0.064,0.523507,0.826764,0.821419,0.777083,0.785939
8,0.0628,0.515995,0.835014,0.836946,0.778088,0.793428
9,0.0614,0.506255,0.84418,0.854868,0.795669,0.812635
10,0.0609,0.499054,0.836847,0.830845,0.778676,0.792643


[I 2025-03-23 19:31:13,337] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.004997538254510304, 'weight_decay': 0.004, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4283,0.517113,0.827681,0.820162,0.744107,0.762781
2,0.0853,0.506098,0.832264,0.848524,0.763601,0.789301
3,0.0731,0.516694,0.825848,0.842404,0.778676,0.797082
4,0.0694,0.512258,0.83593,0.848139,0.787178,0.804018
5,0.0682,0.48947,0.848763,0.859843,0.804705,0.81968
6,0.0661,0.49759,0.840513,0.853928,0.793751,0.811253
7,0.0646,0.496555,0.84143,0.851713,0.793695,0.809361
8,0.0627,0.515784,0.840513,0.842454,0.784328,0.797785
9,0.0625,0.524414,0.835014,0.821751,0.772185,0.785524
10,0.0611,0.507151,0.836847,0.846841,0.781525,0.798547


[I 2025-03-23 19:36:51,427] Trial 35 finished with value: 0.8072561173372403 and parameters: {'learning_rate': 0.004997538254510304, 'weight_decay': 0.004, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 36 with params: {'learning_rate': 0.004703505030295058, 'weight_decay': 0.006, 'warmup_steps': 33, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4129,0.494126,0.831347,0.814628,0.751267,0.768127
2,0.0843,0.51084,0.826764,0.832149,0.759202,0.781123
3,0.0723,0.495011,0.830431,0.825945,0.76722,0.7841
4,0.0683,0.509526,0.823098,0.842826,0.760088,0.784704
5,0.0666,0.513884,0.826764,0.835395,0.773507,0.789595
6,0.0644,0.51756,0.826764,0.824226,0.769478,0.782996
7,0.0638,0.503235,0.837764,0.854563,0.794078,0.811163
8,0.0628,0.520882,0.825848,0.82759,0.775146,0.788675
9,0.0618,0.524545,0.825848,0.834753,0.785592,0.795275
10,0.0608,0.510015,0.830431,0.849156,0.783344,0.80011


[I 2025-03-23 19:42:51,807] Trial 36 finished with value: 0.8041886610096034 and parameters: {'learning_rate': 0.004703505030295058, 'weight_decay': 0.006, 'warmup_steps': 33, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 37 with params: {'learning_rate': 0.003866617600263156, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.435,0.49608,0.825848,0.798149,0.725428,0.747265
2,0.0857,0.515296,0.821265,0.800372,0.738679,0.755978
3,0.0721,0.502807,0.826764,0.81455,0.755936,0.771906
4,0.0684,0.523114,0.823098,0.810709,0.755371,0.770944
5,0.0665,0.521744,0.823098,0.814259,0.754604,0.772621
6,0.0647,0.506621,0.83593,0.825186,0.767507,0.784644
7,0.0641,0.521583,0.824931,0.818176,0.75055,0.76812
8,0.0629,0.515516,0.831347,0.830243,0.782769,0.795228
9,0.0617,0.501363,0.833181,0.840718,0.777513,0.796699
10,0.0608,0.498629,0.834097,0.830611,0.762667,0.782741


[I 2025-03-23 19:48:57,050] Trial 37 finished with value: 0.8024096578835546 and parameters: {'learning_rate': 0.003866617600263156, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.8, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 38 with params: {'learning_rate': 0.004659526112804844, 'weight_decay': 0.003, 'warmup_steps': 36, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.419,0.513917,0.826764,0.817417,0.753428,0.773584
2,0.0846,0.510671,0.834097,0.814565,0.754671,0.771551
3,0.0722,0.511497,0.836847,0.817161,0.753154,0.773163
4,0.0689,0.515443,0.834097,0.824215,0.763733,0.78261
5,0.066,0.524278,0.834097,0.829789,0.758134,0.780563
6,0.0647,0.525074,0.825848,0.822754,0.758905,0.778918
7,0.0634,0.525584,0.824931,0.830187,0.765651,0.7847
8,0.0621,0.539468,0.823098,0.843545,0.762034,0.787694
9,0.0622,0.518099,0.833181,0.838032,0.776259,0.794714
10,0.061,0.516303,0.831347,0.831629,0.765146,0.786211


[I 2025-03-23 19:54:42,589] Trial 38 finished with value: 0.7882623874429516 and parameters: {'learning_rate': 0.004659526112804844, 'weight_decay': 0.003, 'warmup_steps': 36, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 39 with params: {'learning_rate': 0.0039054798962149476, 'weight_decay': 0.007, 'warmup_steps': 35, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4416,0.496147,0.830431,0.797652,0.741882,0.756919
2,0.0849,0.507841,0.826764,0.818532,0.764667,0.77632
3,0.0726,0.497615,0.836847,0.836005,0.773619,0.790195
4,0.0693,0.496156,0.845096,0.852151,0.786941,0.804648
5,0.0667,0.493927,0.840513,0.844404,0.777768,0.798768
6,0.0644,0.492454,0.833181,0.854567,0.778359,0.801529
7,0.0635,0.499114,0.834097,0.858121,0.784422,0.806964
8,0.0627,0.505028,0.834097,0.852234,0.780361,0.799898
9,0.0623,0.521824,0.833181,0.850075,0.793725,0.810895
10,0.0612,0.513692,0.832264,0.848646,0.781862,0.802241


[I 2025-03-23 20:00:19,583] Trial 39 finished with value: 0.8038975310542412 and parameters: {'learning_rate': 0.0039054798962149476, 'weight_decay': 0.007, 'warmup_steps': 35, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 40 with params: {'learning_rate': 0.0016630487413335422, 'weight_decay': 0.006, 'warmup_steps': 43, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5694,0.52217,0.823098,0.723083,0.656,0.678309
2,0.0985,0.507162,0.816682,0.756519,0.717358,0.726885
3,0.0752,0.483857,0.839597,0.80432,0.733657,0.75381
4,0.0688,0.481873,0.84143,0.824933,0.747781,0.773573
5,0.0663,0.484973,0.83593,0.794364,0.734607,0.75439
6,0.0639,0.48689,0.83593,0.768609,0.732603,0.74176
7,0.0633,0.492756,0.83868,0.800959,0.741861,0.760243
8,0.0623,0.504728,0.83593,0.784619,0.743572,0.752334
9,0.0614,0.500966,0.832264,0.774171,0.734614,0.74396
10,0.0606,0.491697,0.83593,0.807963,0.742061,0.761273


[I 2025-03-23 20:04:02,595] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0019960741576737478, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5254,0.532444,0.824931,0.721107,0.679059,0.68675
2,0.0931,0.504474,0.835014,0.803145,0.755329,0.76515
3,0.0739,0.497127,0.83593,0.784105,0.731857,0.744219
4,0.069,0.512728,0.827681,0.819312,0.742793,0.763211
5,0.0658,0.498034,0.83593,0.841309,0.775006,0.79278
6,0.0642,0.501744,0.83868,0.840897,0.762075,0.782181
7,0.0629,0.519125,0.823098,0.818525,0.76071,0.772959
8,0.0622,0.500575,0.832264,0.813739,0.753788,0.768151
9,0.0615,0.516455,0.834097,0.844235,0.770571,0.792987
10,0.0608,0.510279,0.831347,0.838101,0.765929,0.786072


[I 2025-03-23 20:07:54,930] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0043045330345562235, 'weight_decay': 0.01, 'warmup_steps': 47, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4402,0.516713,0.826764,0.826345,0.758757,0.77664
2,0.086,0.497911,0.843263,0.827967,0.766825,0.78364
3,0.0731,0.505034,0.833181,0.825619,0.775172,0.788546
4,0.0693,0.514057,0.833181,0.823136,0.758857,0.777407
5,0.0664,0.480869,0.843263,0.840867,0.784295,0.800857
6,0.0652,0.493608,0.840513,0.829834,0.768721,0.787228
7,0.0634,0.501392,0.834097,0.841281,0.765818,0.788338
8,0.0627,0.505717,0.832264,0.837929,0.769184,0.789014
9,0.0617,0.500056,0.83593,0.826055,0.770681,0.784538
10,0.0609,0.51062,0.829514,0.838598,0.765686,0.786709


[I 2025-03-23 20:13:46,861] Trial 42 finished with value: 0.7913798652731852 and parameters: {'learning_rate': 0.0043045330345562235, 'weight_decay': 0.01, 'warmup_steps': 47, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 43 with params: {'learning_rate': 0.002847510313101469, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4776,0.513028,0.820348,0.788157,0.717696,0.737747
2,0.0868,0.529271,0.827681,0.808567,0.743985,0.762475
3,0.0733,0.50081,0.840513,0.822178,0.762175,0.778994
4,0.0693,0.493061,0.840513,0.832723,0.785583,0.798846
5,0.0662,0.489163,0.84418,0.839464,0.792091,0.804812
6,0.0641,0.49396,0.84143,0.838559,0.790041,0.804978
7,0.0629,0.493075,0.837764,0.839429,0.789035,0.802764
8,0.0625,0.502916,0.833181,0.821549,0.786208,0.793592
9,0.0614,0.488922,0.83593,0.810606,0.770246,0.778102
10,0.0609,0.490617,0.83868,0.836347,0.777588,0.793427


[I 2025-03-23 20:17:33,514] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0038572668318417173, 'weight_decay': 0.006, 'warmup_steps': 42, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4492,0.500703,0.833181,0.805728,0.751676,0.765929
2,0.0862,0.511678,0.835014,0.818248,0.754655,0.766752
3,0.0728,0.494863,0.842346,0.848124,0.789215,0.802268
4,0.0686,0.490373,0.83868,0.851869,0.784401,0.803858
5,0.0665,0.48461,0.843263,0.854312,0.795855,0.813462
6,0.0657,0.497048,0.836847,0.843404,0.791745,0.806124
7,0.0647,0.500649,0.837764,0.851518,0.792517,0.809823
8,0.0633,0.503631,0.842346,0.842638,0.798879,0.809588
9,0.0621,0.486394,0.843263,0.845924,0.796952,0.810004
10,0.0608,0.492772,0.840513,0.849311,0.78508,0.803126


[I 2025-03-23 20:23:31,793] Trial 44 finished with value: 0.8135621776780739 and parameters: {'learning_rate': 0.0038572668318417173, 'weight_decay': 0.006, 'warmup_steps': 42, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 45 with params: {'learning_rate': 0.004321423875456281, 'weight_decay': 0.005, 'warmup_steps': 40, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.438,0.522409,0.821265,0.829536,0.734221,0.761674
2,0.0871,0.513484,0.832264,0.833133,0.757248,0.779381
3,0.0733,0.498981,0.84143,0.836699,0.772922,0.791639
4,0.0689,0.500521,0.842346,0.853784,0.789803,0.810717
5,0.0662,0.498282,0.842346,0.844556,0.771841,0.79581
6,0.0644,0.526293,0.825848,0.839931,0.769368,0.788165
7,0.0648,0.503268,0.84143,0.824548,0.776946,0.788897
8,0.0637,0.508613,0.83593,0.832244,0.766186,0.783088
9,0.0629,0.51751,0.832264,0.844084,0.777167,0.798433
10,0.0613,0.511031,0.83868,0.859153,0.783647,0.80956


[I 2025-03-23 20:29:18,445] Trial 45 finished with value: 0.8010874326428291 and parameters: {'learning_rate': 0.004321423875456281, 'weight_decay': 0.005, 'warmup_steps': 40, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 46 with params: {'learning_rate': 0.0034541687714900503, 'weight_decay': 0.006, 'warmup_steps': 53, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4628,0.500263,0.83593,0.792685,0.7411,0.753041
2,0.0848,0.522409,0.827681,0.821628,0.762835,0.778224
3,0.0726,0.515289,0.830431,0.830906,0.775377,0.789626
4,0.0687,0.51369,0.835014,0.826136,0.766854,0.784139
5,0.0656,0.515148,0.830431,0.820833,0.771641,0.783497
6,0.0638,0.508449,0.827681,0.807839,0.768789,0.77566
7,0.0623,0.51362,0.832264,0.819236,0.783969,0.792218
8,0.0619,0.515146,0.83593,0.814803,0.789866,0.792537
9,0.0618,0.527869,0.830431,0.825322,0.77718,0.790148
10,0.0607,0.538547,0.824931,0.826131,0.782617,0.795689


[I 2025-03-23 20:35:11,941] Trial 46 finished with value: 0.7943821620684658 and parameters: {'learning_rate': 0.0034541687714900503, 'weight_decay': 0.006, 'warmup_steps': 53, 'lambda_param': 0.8, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 47 with params: {'learning_rate': 0.00039294592429744307, 'weight_decay': 0.01, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0063,0.723135,0.740605,0.394065,0.378494,0.373013
2,0.2917,0.574859,0.800183,0.571848,0.538854,0.545538
3,0.1512,0.572088,0.80385,0.693031,0.613915,0.641999
4,0.1085,0.545252,0.817599,0.779747,0.679354,0.712983
5,0.0918,0.546203,0.813016,0.75581,0.693457,0.711947


[I 2025-03-23 20:37:07,161] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.001875393492963395, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5385,0.523353,0.817599,0.755609,0.654477,0.68904
2,0.0957,0.503596,0.836847,0.816601,0.743378,0.766577
3,0.0755,0.501973,0.834097,0.830863,0.751861,0.777345
4,0.0689,0.493923,0.83868,0.833325,0.743975,0.771669
5,0.0662,0.492899,0.836847,0.836905,0.765869,0.788779
6,0.0653,0.503662,0.831347,0.82287,0.738306,0.767844
7,0.0637,0.500416,0.828598,0.824745,0.731214,0.761954
8,0.0625,0.505179,0.826764,0.817337,0.743836,0.765211
9,0.0622,0.505588,0.827681,0.799196,0.744401,0.759863
10,0.0614,0.505484,0.829514,0.829498,0.755303,0.777705


[I 2025-03-23 20:41:12,179] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.0036715622744215304, 'weight_decay': 0.003, 'warmup_steps': 37, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4486,0.509585,0.822181,0.799946,0.751419,0.764889
2,0.0863,0.510119,0.832264,0.814828,0.758876,0.770468
3,0.073,0.500621,0.839597,0.817569,0.77056,0.783061
4,0.0689,0.491059,0.840513,0.844856,0.779088,0.795373
5,0.0662,0.494563,0.84143,0.849275,0.787751,0.80324
6,0.0645,0.487017,0.840513,0.820573,0.772749,0.784815
7,0.0632,0.487769,0.840513,0.846718,0.779377,0.799831
8,0.0629,0.501113,0.83868,0.846622,0.774063,0.792301
9,0.0621,0.496923,0.83868,0.844704,0.774973,0.791959
10,0.0609,0.499087,0.839597,0.834921,0.769938,0.786797


[I 2025-03-23 20:46:53,111] Trial 49 finished with value: 0.7983732457984409 and parameters: {'learning_rate': 0.0036715622744215304, 'weight_decay': 0.003, 'warmup_steps': 37, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 50 with params: {'learning_rate': 0.00011155354646039437, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4885,1.190367,0.550871,0.148744,0.16117,0.137623
2,0.9143,0.943616,0.648946,0.2813,0.246752,0.234033
3,0.6493,0.821968,0.704858,0.354416,0.326879,0.323157
4,0.485,0.760816,0.727773,0.451665,0.367385,0.378481
5,0.3796,0.719638,0.736022,0.496846,0.406099,0.425828
6,0.3095,0.689629,0.75527,0.492867,0.455053,0.456896
7,0.26,0.666254,0.75802,0.492332,0.448253,0.458927
8,0.2247,0.652828,0.764436,0.518891,0.480578,0.487308
9,0.1989,0.642731,0.773602,0.548669,0.517197,0.523438
10,0.1803,0.637735,0.771769,0.567607,0.523264,0.53546


[I 2025-03-23 20:50:41,927] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.004008432231366527, 'weight_decay': 0.005, 'warmup_steps': 45, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4425,0.523863,0.820348,0.808039,0.726757,0.752544
2,0.0854,0.525811,0.827681,0.812115,0.745246,0.763972
3,0.0725,0.517601,0.826764,0.806078,0.746881,0.765393
4,0.0692,0.52493,0.831347,0.809713,0.760757,0.773816
5,0.0668,0.519295,0.83593,0.811716,0.76389,0.777342
6,0.0646,0.518767,0.830431,0.820924,0.751569,0.770109
7,0.0636,0.517064,0.832264,0.817262,0.751868,0.770225
8,0.0631,0.520093,0.834097,0.841733,0.754076,0.781097
9,0.0617,0.521354,0.827681,0.855717,0.756509,0.788876
10,0.0608,0.506537,0.837764,0.864236,0.773875,0.801608


[I 2025-03-23 20:56:11,256] Trial 51 finished with value: 0.7926553412921613 and parameters: {'learning_rate': 0.004008432231366527, 'weight_decay': 0.005, 'warmup_steps': 45, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 52 with params: {'learning_rate': 0.0022716543852100545, 'weight_decay': 0.01, 'warmup_steps': 43, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5182,0.513766,0.827681,0.749088,0.694693,0.710488
2,0.0897,0.505983,0.83868,0.812823,0.755895,0.769302
3,0.073,0.508661,0.828598,0.803997,0.751677,0.761893
4,0.0682,0.495028,0.83868,0.827919,0.760878,0.780764
5,0.0659,0.482127,0.83593,0.83476,0.758117,0.781908
6,0.0638,0.49092,0.830431,0.79785,0.735508,0.752441
7,0.0633,0.493016,0.827681,0.82177,0.74721,0.768464
8,0.0622,0.511067,0.819432,0.79505,0.747172,0.755539
9,0.0615,0.505063,0.829514,0.838766,0.761722,0.785502
10,0.0605,0.496453,0.834097,0.836283,0.767797,0.788205


[I 2025-03-23 21:01:43,512] Trial 52 finished with value: 0.8042500886554348 and parameters: {'learning_rate': 0.0022716543852100545, 'weight_decay': 0.01, 'warmup_steps': 43, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 53 with params: {'learning_rate': 0.004675347933389907, 'weight_decay': 0.01, 'warmup_steps': 42, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4298,0.51257,0.826764,0.805185,0.741324,0.759784
2,0.0857,0.530223,0.813016,0.810357,0.733985,0.756839
3,0.0722,0.517447,0.825848,0.823974,0.750538,0.774465
4,0.069,0.51186,0.831347,0.817582,0.755165,0.773705
5,0.0679,0.5257,0.831347,0.844167,0.772941,0.794746
6,0.0656,0.527915,0.824931,0.840288,0.764276,0.787146
7,0.0639,0.505537,0.83593,0.839492,0.771326,0.792034
8,0.0632,0.524713,0.828598,0.833818,0.767732,0.787378
9,0.0622,0.526172,0.827681,0.842486,0.763029,0.787196
10,0.0611,0.524675,0.824015,0.838663,0.768326,0.790743


[I 2025-03-23 21:05:41,722] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0017069240629884299, 'weight_decay': 0.008, 'warmup_steps': 42, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.57,0.527991,0.821265,0.737378,0.678034,0.696102
2,0.0973,0.516121,0.831347,0.786237,0.725974,0.744088
3,0.075,0.504927,0.834097,0.806178,0.732414,0.754139
4,0.0696,0.489917,0.846929,0.82019,0.752453,0.771219
5,0.0664,0.491993,0.837764,0.792362,0.730377,0.749335
6,0.0646,0.497096,0.84143,0.82178,0.755605,0.772752
7,0.0632,0.486419,0.83868,0.83247,0.753327,0.776483
8,0.062,0.501201,0.836847,0.814317,0.748912,0.766467
9,0.0616,0.488549,0.83593,0.816903,0.750056,0.771028
10,0.0611,0.480664,0.84143,0.812641,0.74409,0.764086


[I 2025-03-23 21:09:41,828] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 7.242888062473813e-05, 'weight_decay': 0.001, 'warmup_steps': 39, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6558,1.339505,0.494042,0.117322,0.124881,0.098627
2,1.1143,1.11134,0.597617,0.17657,0.199064,0.176305
3,0.8825,0.970315,0.640697,0.242055,0.235921,0.215914
4,0.7165,0.88139,0.679193,0.292184,0.277022,0.270499
5,0.5975,0.82642,0.696609,0.338804,0.317169,0.311357


[I 2025-03-23 21:11:29,411] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0012356364232774745, 'weight_decay': 0.009000000000000001, 'warmup_steps': 47, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6436,0.533056,0.823098,0.7503,0.641551,0.673725
2,0.1101,0.509394,0.827681,0.799361,0.740957,0.759296
3,0.0795,0.498214,0.830431,0.81528,0.73232,0.760916
4,0.0725,0.496508,0.832264,0.803501,0.737054,0.75913
5,0.0672,0.507196,0.829514,0.826633,0.73666,0.768357
6,0.0658,0.495426,0.837764,0.830068,0.76,0.779477
7,0.0655,0.508548,0.83868,0.840564,0.767209,0.789929
8,0.063,0.493075,0.837764,0.836106,0.761207,0.785675
9,0.0622,0.483394,0.845096,0.855529,0.78049,0.804328
10,0.0618,0.484028,0.843263,0.841784,0.761748,0.785805


[I 2025-03-23 21:15:19,899] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.001271907097053504, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6118,0.555474,0.809349,0.726422,0.644342,0.669868
2,0.1118,0.51259,0.824931,0.795628,0.7357,0.749111
3,0.0802,0.511262,0.820348,0.809119,0.740861,0.76065
4,0.0724,0.497629,0.826764,0.84324,0.748964,0.779771
5,0.0681,0.491355,0.833181,0.830603,0.754122,0.779428
6,0.0663,0.49638,0.833181,0.824848,0.752913,0.775539
7,0.0652,0.503133,0.830431,0.813844,0.736205,0.759016
8,0.064,0.508463,0.827681,0.81307,0.75465,0.772536
9,0.0634,0.501329,0.832264,0.817447,0.746295,0.768213
10,0.0619,0.501671,0.834097,0.814388,0.761828,0.776505


[I 2025-03-23 21:19:05,355] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.003402145244727973, 'weight_decay': 0.004, 'warmup_steps': 50, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4602,0.505967,0.831347,0.826772,0.737572,0.760974
2,0.0845,0.51899,0.831347,0.825435,0.754864,0.773636
3,0.0723,0.502521,0.831347,0.843937,0.764321,0.788243
4,0.0677,0.503346,0.837764,0.851789,0.766624,0.790763
5,0.0657,0.495773,0.842346,0.847307,0.783873,0.803121
6,0.0635,0.493867,0.842346,0.850395,0.770272,0.793419
7,0.0636,0.50332,0.84143,0.861244,0.785843,0.807372
8,0.0634,0.516437,0.832264,0.836553,0.771329,0.787849
9,0.0616,0.506831,0.842346,0.853497,0.79311,0.810455
10,0.0607,0.492919,0.84143,0.846419,0.781078,0.800937


[I 2025-03-23 21:25:00,452] Trial 58 finished with value: 0.8069552060567473 and parameters: {'learning_rate': 0.003402145244727973, 'weight_decay': 0.004, 'warmup_steps': 50, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 59 with params: {'learning_rate': 0.0011646794332754787, 'weight_decay': 0.003, 'warmup_steps': 52, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6705,0.536746,0.816682,0.672533,0.611079,0.630515
2,0.1147,0.518354,0.824015,0.808141,0.739015,0.759624
3,0.0815,0.49848,0.834097,0.797051,0.740776,0.757473
4,0.0717,0.505627,0.830431,0.823239,0.74635,0.770375
5,0.0692,0.508159,0.827681,0.809175,0.735662,0.758951
6,0.0654,0.503417,0.831347,0.811483,0.735519,0.760313
7,0.0643,0.497667,0.829514,0.794881,0.74213,0.7549
8,0.0634,0.50098,0.833181,0.806091,0.739781,0.761125
9,0.0622,0.500846,0.830431,0.824894,0.747425,0.771824
10,0.0618,0.500543,0.827681,0.810443,0.733967,0.756996


[I 2025-03-23 21:28:55,529] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00017559280388301614, 'weight_decay': 0.0, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3244,1.001476,0.626948,0.199256,0.216898,0.191202
2,0.6692,0.775968,0.721357,0.369542,0.343201,0.337546
3,0.4123,0.685684,0.757104,0.474596,0.411003,0.424889
4,0.2812,0.646626,0.766269,0.541968,0.470554,0.487501
5,0.2067,0.628237,0.779102,0.603906,0.536802,0.55425


[I 2025-03-23 21:30:52,631] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.003527831241528447, 'weight_decay': 0.004, 'warmup_steps': 51, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.458,0.509099,0.826764,0.800814,0.729492,0.750916
2,0.0856,0.491728,0.842346,0.827016,0.758492,0.777344
3,0.0721,0.486526,0.84418,0.849842,0.798998,0.812797
4,0.0678,0.503314,0.835014,0.851695,0.777654,0.801346
5,0.0663,0.491494,0.843263,0.849401,0.788649,0.80667
6,0.0642,0.507069,0.837764,0.833563,0.751004,0.776639
7,0.0636,0.503652,0.836847,0.823327,0.758991,0.777808
8,0.0625,0.511928,0.836847,0.84107,0.762211,0.786809
9,0.0613,0.522023,0.834097,0.843304,0.773573,0.795694
10,0.0605,0.504891,0.837764,0.837703,0.776625,0.793641


[I 2025-03-23 21:36:45,181] Trial 61 finished with value: 0.7952485071593125 and parameters: {'learning_rate': 0.003527831241528447, 'weight_decay': 0.004, 'warmup_steps': 51, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 62 with params: {'learning_rate': 0.0025762569969124956, 'weight_decay': 0.003, 'warmup_steps': 42, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4972,0.509183,0.836847,0.792919,0.717559,0.739256
2,0.0881,0.531246,0.824015,0.798583,0.754111,0.76367
3,0.0718,0.506021,0.828598,0.809212,0.739684,0.758934
4,0.068,0.504444,0.830431,0.845968,0.766858,0.791326
5,0.0653,0.492327,0.837764,0.841908,0.773663,0.794971
6,0.0638,0.478711,0.84418,0.853065,0.776954,0.795917
7,0.0631,0.484933,0.84418,0.853817,0.794132,0.812357
8,0.0618,0.489502,0.843263,0.84167,0.795125,0.808933
9,0.0612,0.494004,0.842346,0.839574,0.802399,0.811651
10,0.0611,0.489056,0.836847,0.837796,0.777375,0.794509


[I 2025-03-23 21:42:39,108] Trial 62 finished with value: 0.8054750067828738 and parameters: {'learning_rate': 0.0025762569969124956, 'weight_decay': 0.003, 'warmup_steps': 42, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 63 with params: {'learning_rate': 0.0029806884373051743, 'weight_decay': 0.003, 'warmup_steps': 47, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4795,0.511198,0.828598,0.802574,0.730065,0.74824
2,0.0862,0.510055,0.836847,0.806711,0.764652,0.772842
3,0.0715,0.505271,0.83593,0.818303,0.756182,0.77491
4,0.0675,0.508765,0.828598,0.819825,0.770323,0.78258
5,0.0655,0.494832,0.839597,0.827485,0.788135,0.794771
6,0.0639,0.507317,0.830431,0.808068,0.738986,0.758329
7,0.0635,0.515569,0.831347,0.820958,0.776174,0.78518
8,0.0622,0.505974,0.84143,0.80285,0.763226,0.770868
9,0.0612,0.507076,0.834097,0.818355,0.776965,0.787401
10,0.0602,0.511317,0.83593,0.826654,0.781162,0.793733


[I 2025-03-23 21:48:33,607] Trial 63 finished with value: 0.7991480451077783 and parameters: {'learning_rate': 0.0029806884373051743, 'weight_decay': 0.003, 'warmup_steps': 47, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 64 with params: {'learning_rate': 0.0016322565087383226, 'weight_decay': 0.01, 'warmup_steps': 35, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5713,0.533166,0.824015,0.747802,0.672155,0.69739
2,0.0994,0.501545,0.833181,0.809908,0.754017,0.768061
3,0.076,0.49198,0.839597,0.825213,0.748416,0.772471
4,0.0697,0.490881,0.839597,0.843423,0.763972,0.787572
5,0.0669,0.490109,0.836847,0.831785,0.773778,0.789234
6,0.0652,0.49501,0.833181,0.82283,0.751759,0.772986
7,0.0632,0.501305,0.830431,0.838993,0.763805,0.786649
8,0.0627,0.507308,0.832264,0.834017,0.76982,0.789284
9,0.0626,0.500642,0.833181,0.827992,0.769411,0.786856
10,0.0615,0.494596,0.839597,0.839158,0.774321,0.793823


[I 2025-03-23 21:52:34,073] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0010980410665600046, 'weight_decay': 0.0, 'warmup_steps': 39, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6636,0.543467,0.806599,0.666092,0.593583,0.612619
2,0.1184,0.512377,0.830431,0.796314,0.746072,0.755956
3,0.0828,0.503487,0.832264,0.797988,0.733769,0.75451
4,0.0733,0.49117,0.847846,0.845334,0.760697,0.788555
5,0.0696,0.485946,0.846013,0.834614,0.760094,0.786094
6,0.0663,0.495332,0.840513,0.825441,0.752555,0.777245
7,0.0641,0.493583,0.842346,0.832036,0.753282,0.780686
8,0.0633,0.496161,0.837764,0.816341,0.753677,0.772126
9,0.0627,0.495313,0.84143,0.828748,0.762228,0.784478
10,0.0618,0.497011,0.840513,0.816563,0.75256,0.773184


[I 2025-03-23 21:56:21,743] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.000978699068432172, 'weight_decay': 0.003, 'warmup_steps': 48, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7106,0.547838,0.804766,0.642588,0.577628,0.589899
2,0.1267,0.503034,0.828598,0.772832,0.720899,0.734154
3,0.0863,0.513106,0.822181,0.798026,0.715414,0.739692
4,0.0751,0.503112,0.828598,0.838955,0.744098,0.775153
5,0.0702,0.504419,0.827681,0.825112,0.735313,0.7672


[I 2025-03-23 21:58:16,257] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 8.465954991738309e-05, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5972,1.283118,0.505958,0.111974,0.131079,0.103455
2,1.0415,1.047929,0.609533,0.196541,0.210297,0.190476
3,0.7987,0.912542,0.651696,0.277857,0.252901,0.242199
4,0.6339,0.838661,0.693859,0.316776,0.301585,0.296627
5,0.5186,0.78701,0.718607,0.391268,0.349771,0.351236
6,0.4335,0.753994,0.736022,0.419897,0.375554,0.373665
7,0.3707,0.724944,0.745188,0.501091,0.41789,0.436254
8,0.3265,0.708452,0.753437,0.503322,0.437261,0.449413
9,0.2916,0.695078,0.761687,0.502593,0.463625,0.470055
10,0.2651,0.681791,0.758937,0.509904,0.456039,0.469293


[I 2025-03-23 22:02:00,860] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.004214429118553423, 'weight_decay': 0.008, 'warmup_steps': 47, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4388,0.503083,0.83593,0.834411,0.731379,0.763725
2,0.0853,0.520385,0.831347,0.820574,0.752289,0.769623
3,0.0726,0.503949,0.840513,0.850127,0.770603,0.792904
4,0.0683,0.498832,0.831347,0.823294,0.772715,0.786573
5,0.0664,0.496747,0.837764,0.84198,0.771592,0.791198
6,0.0649,0.508935,0.833181,0.842432,0.788712,0.800736
7,0.0639,0.510881,0.835014,0.829559,0.769234,0.785884
8,0.0629,0.520157,0.840513,0.857705,0.781284,0.80547
9,0.062,0.542528,0.827681,0.84961,0.773682,0.796709
10,0.0609,0.540855,0.833181,0.839159,0.785127,0.801386


[I 2025-03-23 22:07:55,914] Trial 68 finished with value: 0.8098498672645345 and parameters: {'learning_rate': 0.004214429118553423, 'weight_decay': 0.008, 'warmup_steps': 47, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 69 with params: {'learning_rate': 0.004472599869526768, 'weight_decay': 0.008, 'warmup_steps': 52, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4349,0.509652,0.826764,0.802438,0.719712,0.745706
2,0.0848,0.519596,0.823098,0.825065,0.74424,0.766789
3,0.0726,0.502745,0.832264,0.838252,0.780709,0.796635
4,0.0687,0.506527,0.837764,0.853663,0.781939,0.804053
5,0.0668,0.499374,0.83593,0.826093,0.777123,0.787732
6,0.0662,0.500071,0.836847,0.840767,0.773609,0.794032
7,0.0653,0.499372,0.839597,0.844893,0.788057,0.802614
8,0.0634,0.499092,0.846929,0.848433,0.792115,0.805012
9,0.062,0.489833,0.851512,0.847433,0.790583,0.805224
10,0.0611,0.481087,0.849679,0.84672,0.78892,0.804747


[I 2025-03-23 22:13:38,095] Trial 69 finished with value: 0.8036494942720772 and parameters: {'learning_rate': 0.004472599869526768, 'weight_decay': 0.008, 'warmup_steps': 52, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 70 with params: {'learning_rate': 0.0038230971394073356, 'weight_decay': 0.003, 'warmup_steps': 53, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4533,0.503718,0.830431,0.833085,0.727795,0.757903
2,0.0848,0.49853,0.835014,0.827186,0.775953,0.789051
3,0.0724,0.483968,0.845096,0.85145,0.780316,0.801334
4,0.0683,0.495639,0.842346,0.824679,0.762314,0.779092
5,0.067,0.51358,0.825848,0.815662,0.763645,0.773064


[I 2025-03-23 22:15:40,615] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.001978681421601604, 'weight_decay': 0.003, 'warmup_steps': 37, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5368,0.527464,0.817599,0.744152,0.689499,0.705742
2,0.0932,0.508611,0.827681,0.792965,0.727599,0.747245
3,0.0732,0.499565,0.832264,0.773032,0.720919,0.735511
4,0.0685,0.499826,0.832264,0.807886,0.740199,0.761313
5,0.0655,0.492275,0.83593,0.814763,0.74879,0.767916
6,0.0639,0.499566,0.831347,0.815409,0.742176,0.765837
7,0.0635,0.501063,0.831347,0.8073,0.7498,0.76702
8,0.0626,0.501261,0.828598,0.811247,0.750438,0.766829
9,0.0615,0.486006,0.836847,0.811566,0.748943,0.767004
10,0.0608,0.493332,0.830431,0.835648,0.754253,0.778869


[I 2025-03-23 22:19:40,221] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0046380136898955355, 'weight_decay': 0.009000000000000001, 'warmup_steps': 46, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4344,0.50976,0.835014,0.825362,0.739951,0.766049
2,0.0863,0.502105,0.827681,0.821226,0.753144,0.773132
3,0.0735,0.498073,0.832264,0.844761,0.773523,0.795144
4,0.0698,0.519813,0.829514,0.83957,0.772758,0.792621
5,0.067,0.522295,0.83593,0.822811,0.784421,0.791249
6,0.0658,0.51455,0.830431,0.838614,0.77251,0.791943
7,0.0642,0.502959,0.83593,0.832792,0.783469,0.793013
8,0.0631,0.518367,0.837764,0.844845,0.785391,0.801506
9,0.0625,0.515249,0.837764,0.846105,0.787377,0.803937
10,0.0616,0.538225,0.828598,0.843161,0.772567,0.793201


[I 2025-03-23 22:23:23,390] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.004521128233874042, 'weight_decay': 0.006, 'warmup_steps': 50, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.434,0.504845,0.830431,0.822291,0.753804,0.774622
2,0.0852,0.495965,0.83593,0.825156,0.769485,0.78588
3,0.0724,0.486983,0.83868,0.853777,0.790286,0.808612
4,0.0684,0.495866,0.84418,0.852412,0.787413,0.805207
5,0.0661,0.489582,0.846013,0.834056,0.772979,0.791185
6,0.0647,0.487022,0.845096,0.862275,0.782234,0.806778
7,0.0639,0.506918,0.840513,0.845253,0.79151,0.807423
8,0.0636,0.505681,0.840513,0.858373,0.786714,0.809214
9,0.062,0.497256,0.845096,0.858401,0.7921,0.811906
10,0.0608,0.498737,0.845096,0.852272,0.79784,0.811134


[I 2025-03-23 22:29:14,543] Trial 73 finished with value: 0.8045398420688145 and parameters: {'learning_rate': 0.004521128233874042, 'weight_decay': 0.006, 'warmup_steps': 50, 'lambda_param': 0.8, 'temperature': 3.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 74 with params: {'learning_rate': 0.0019206908111904088, 'weight_decay': 0.006, 'warmup_steps': 48, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5519,0.516003,0.822181,0.749753,0.675517,0.695043
2,0.0943,0.503579,0.825848,0.814339,0.742268,0.762391
3,0.0742,0.492061,0.836847,0.817355,0.757121,0.77503
4,0.0685,0.491047,0.83868,0.814802,0.760731,0.775462
5,0.0663,0.47383,0.846013,0.843767,0.769205,0.792304
6,0.064,0.464969,0.843263,0.825342,0.760298,0.779701
7,0.0626,0.473489,0.840513,0.829084,0.765632,0.783775
8,0.0621,0.482608,0.836847,0.829695,0.763268,0.780547
9,0.0618,0.473156,0.84143,0.848471,0.766069,0.790817
10,0.0611,0.471009,0.83868,0.831759,0.7763,0.788974


[I 2025-03-23 22:33:11,736] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.003975674711808358, 'weight_decay': 0.006, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4406,0.517243,0.828598,0.81193,0.732185,0.755956
2,0.0843,0.513677,0.828598,0.798499,0.760989,0.768492
3,0.0722,0.519819,0.829514,0.837335,0.783383,0.797669
4,0.0685,0.497239,0.835014,0.822017,0.790379,0.796162
5,0.0663,0.506767,0.835014,0.807429,0.776199,0.783902
6,0.0644,0.490704,0.835014,0.836517,0.784648,0.800554
7,0.0631,0.51358,0.832264,0.804606,0.764543,0.776184
8,0.0625,0.538856,0.824931,0.829136,0.77453,0.790722
9,0.0619,0.551255,0.826764,0.821266,0.784501,0.793122
10,0.0614,0.527216,0.829514,0.828249,0.78741,0.79762


[I 2025-03-23 22:38:56,722] Trial 75 finished with value: 0.791341304562223 and parameters: {'learning_rate': 0.003975674711808358, 'weight_decay': 0.006, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 76 with params: {'learning_rate': 0.00304258923612374, 'weight_decay': 0.001, 'warmup_steps': 41, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.467,0.511738,0.832264,0.80519,0.719265,0.745412
2,0.0862,0.516986,0.821265,0.78999,0.723601,0.743075
3,0.0719,0.510683,0.832264,0.822609,0.759134,0.777555
4,0.0676,0.511751,0.829514,0.813157,0.75769,0.772823
5,0.0665,0.515232,0.833181,0.819966,0.764676,0.780994
6,0.0648,0.512281,0.833181,0.815702,0.769539,0.782945
7,0.0634,0.521314,0.831347,0.806007,0.750216,0.762773
8,0.0619,0.552345,0.828598,0.81474,0.757885,0.774756
9,0.0614,0.521371,0.834097,0.814645,0.771444,0.78235
10,0.0605,0.500235,0.839597,0.830217,0.774309,0.79052


[I 2025-03-23 22:42:49,155] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.003246766593079066, 'weight_decay': 0.004, 'warmup_steps': 53, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4756,0.494985,0.83868,0.814756,0.748243,0.768581
2,0.0854,0.511743,0.830431,0.838687,0.749023,0.775411
3,0.0717,0.499212,0.83593,0.835348,0.773396,0.788309
4,0.0684,0.510763,0.837764,0.843145,0.756786,0.778401
5,0.066,0.492605,0.840513,0.836035,0.777958,0.792433
6,0.0646,0.502553,0.832264,0.831499,0.766341,0.782319
7,0.0634,0.509219,0.837764,0.859325,0.796289,0.814926
8,0.0619,0.505297,0.839597,0.847521,0.79062,0.806621
9,0.0615,0.504007,0.831347,0.845439,0.788387,0.804155
10,0.0607,0.498224,0.840513,0.854307,0.784641,0.803496


[I 2025-03-23 22:48:39,162] Trial 77 finished with value: 0.8153570297518971 and parameters: {'learning_rate': 0.003246766593079066, 'weight_decay': 0.004, 'warmup_steps': 53, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 78 with params: {'learning_rate': 0.004163287540569293, 'weight_decay': 0.003, 'warmup_steps': 46, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4403,0.506312,0.824931,0.788968,0.716557,0.735485
2,0.0856,0.501664,0.837764,0.821248,0.753969,0.770635
3,0.0725,0.483433,0.84418,0.850875,0.784874,0.803321
4,0.0697,0.501444,0.834097,0.804526,0.750338,0.764714
5,0.067,0.507764,0.831347,0.835901,0.770865,0.78961
6,0.0648,0.479566,0.84143,0.817175,0.768108,0.780042
7,0.0641,0.496287,0.835014,0.828375,0.777066,0.790804
8,0.0626,0.504832,0.840513,0.829105,0.778798,0.793509
9,0.0619,0.506462,0.83593,0.828295,0.774711,0.791027
10,0.0612,0.491218,0.84418,0.846246,0.784907,0.803156


[I 2025-03-23 22:54:17,150] Trial 78 finished with value: 0.8008865519027895 and parameters: {'learning_rate': 0.004163287540569293, 'weight_decay': 0.003, 'warmup_steps': 46, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 79 with params: {'learning_rate': 0.002766481588357058, 'weight_decay': 0.006, 'warmup_steps': 52, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4962,0.504525,0.831347,0.786679,0.728348,0.743772
2,0.0867,0.514757,0.832264,0.809103,0.747302,0.764185
3,0.0726,0.500853,0.837764,0.804568,0.752507,0.767073
4,0.0668,0.506249,0.829514,0.805002,0.745658,0.761416
5,0.0653,0.503955,0.83593,0.815026,0.767385,0.781429
6,0.0645,0.516029,0.826764,0.810974,0.748543,0.765777
7,0.0633,0.510954,0.824015,0.805434,0.757647,0.770423
8,0.0622,0.500592,0.83593,0.824847,0.785529,0.796525
9,0.0612,0.498346,0.836847,0.84363,0.789339,0.805433
10,0.0604,0.498232,0.836847,0.831116,0.79042,0.800602


[I 2025-03-23 23:00:09,669] Trial 79 finished with value: 0.800835421084273 and parameters: {'learning_rate': 0.002766481588357058, 'weight_decay': 0.006, 'warmup_steps': 52, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 80 with params: {'learning_rate': 0.0014378085652697649, 'weight_decay': 0.005, 'warmup_steps': 47, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.61,0.527288,0.819432,0.712095,0.654823,0.67232
2,0.1038,0.497474,0.833181,0.818214,0.738618,0.765051
3,0.0765,0.488224,0.83593,0.816475,0.754304,0.772951
4,0.07,0.500341,0.833181,0.814823,0.756316,0.773161
5,0.067,0.499186,0.840513,0.830574,0.755693,0.780545
6,0.0647,0.498217,0.835014,0.823022,0.750574,0.772963
7,0.0632,0.49632,0.832264,0.794001,0.734571,0.753157
8,0.062,0.495097,0.836847,0.819608,0.753676,0.773437
9,0.0612,0.489377,0.842346,0.824742,0.760328,0.779959
10,0.0608,0.490423,0.839597,0.821498,0.754071,0.774183


[I 2025-03-23 23:04:05,535] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.00463808281527665, 'weight_decay': 0.006, 'warmup_steps': 49, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.439,0.51504,0.83593,0.835082,0.765938,0.784008
2,0.0857,0.524609,0.825848,0.843153,0.768886,0.791441
3,0.0727,0.514306,0.828598,0.828544,0.778854,0.792978
4,0.0689,0.548056,0.824015,0.807136,0.764855,0.774684
5,0.0677,0.543551,0.825848,0.828494,0.775561,0.7922
6,0.065,0.552412,0.819432,0.833305,0.772508,0.789218
7,0.0637,0.549637,0.824931,0.828248,0.771793,0.78655
8,0.063,0.552301,0.826764,0.825423,0.761667,0.778999
9,0.0618,0.537561,0.825848,0.832885,0.782639,0.796014
10,0.061,0.53585,0.827681,0.808765,0.761701,0.772761


[I 2025-03-23 23:07:48,637] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0002726307018738496, 'weight_decay': 0.005, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.151,0.834734,0.703025,0.321824,0.311837,0.300532
2,0.4438,0.653389,0.775435,0.519483,0.477606,0.482716
3,0.233,0.604725,0.780018,0.631379,0.54159,0.564414
4,0.1543,0.580852,0.799267,0.694091,0.602926,0.632594
5,0.1194,0.571301,0.800183,0.72662,0.652368,0.673884
6,0.1025,0.571861,0.812099,0.757462,0.686813,0.709479
7,0.0926,0.566139,0.812099,0.754786,0.685737,0.707249
8,0.086,0.55913,0.809349,0.78344,0.680056,0.71634
9,0.0812,0.564495,0.811182,0.759997,0.689139,0.712357
10,0.0778,0.560561,0.806599,0.771278,0.69136,0.717029


[I 2025-03-23 23:11:41,640] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.004901331812145459, 'weight_decay': 0.001, 'warmup_steps': 44, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4255,0.521142,0.818515,0.806181,0.732549,0.753072
2,0.0863,0.535465,0.819432,0.807434,0.74823,0.765161
3,0.0739,0.52053,0.823098,0.824553,0.762605,0.778043
4,0.0704,0.53677,0.821265,0.799863,0.766239,0.770023
5,0.0678,0.527854,0.829514,0.815605,0.771017,0.779665


[I 2025-03-23 23:13:35,166] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 5.286423289644344e-05, 'weight_decay': 0.008, 'warmup_steps': 31, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7532,1.447758,0.44088,0.075581,0.100963,0.073857
2,1.2455,1.233027,0.525206,0.132302,0.146179,0.124201
3,1.0532,1.107711,0.605866,0.180011,0.203717,0.181277
4,0.903,1.007232,0.629698,0.240363,0.224092,0.20814
5,0.7827,0.936494,0.655362,0.288518,0.253118,0.241535
6,0.687,0.886884,0.675527,0.292597,0.279722,0.272716
7,0.6131,0.85392,0.686526,0.305903,0.298696,0.290178
8,0.5583,0.828807,0.697525,0.34333,0.312646,0.307873
9,0.5145,0.813567,0.715857,0.368061,0.338649,0.336337
10,0.4786,0.79679,0.712191,0.372275,0.338203,0.339087


[I 2025-03-23 23:17:21,799] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.004848193454734499, 'weight_decay': 0.005, 'warmup_steps': 52, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4332,0.512481,0.833181,0.816789,0.764435,0.777078
2,0.0861,0.534921,0.830431,0.789715,0.739323,0.750543
3,0.0735,0.518582,0.831347,0.815762,0.760911,0.774306
4,0.0694,0.537396,0.829514,0.818429,0.782881,0.78703
5,0.0679,0.527851,0.830431,0.840332,0.784782,0.798351
6,0.0661,0.539095,0.824015,0.820893,0.751107,0.771034
7,0.0645,0.544655,0.821265,0.824985,0.776328,0.790923
8,0.0628,0.53489,0.824931,0.823118,0.776733,0.784865
9,0.0626,0.547349,0.824931,0.830917,0.785521,0.792432
10,0.0612,0.542226,0.826764,0.838355,0.778246,0.796026


[I 2025-03-23 23:23:09,332] Trial 85 finished with value: 0.798557522166817 and parameters: {'learning_rate': 0.004848193454734499, 'weight_decay': 0.005, 'warmup_steps': 52, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 86 with params: {'learning_rate': 0.004401091456869476, 'weight_decay': 0.004, 'warmup_steps': 51, 'lambda_param': 0.9, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4321,0.503515,0.827681,0.796171,0.735031,0.750126
2,0.0853,0.508912,0.830431,0.820823,0.748866,0.768043
3,0.0727,0.502441,0.831347,0.829094,0.769567,0.786423
4,0.0689,0.519287,0.829514,0.815967,0.747695,0.764609
5,0.068,0.511979,0.840513,0.826567,0.76689,0.783948
6,0.0651,0.505776,0.842346,0.827281,0.767075,0.784304
7,0.065,0.517499,0.829514,0.822448,0.766808,0.781821
8,0.0628,0.509866,0.837764,0.837482,0.770236,0.789046
9,0.0618,0.519482,0.831347,0.820739,0.76312,0.779023
10,0.0608,0.501286,0.84143,0.846738,0.7804,0.800709


[I 2025-03-23 23:28:57,939] Trial 86 finished with value: 0.8005441812508924 and parameters: {'learning_rate': 0.004401091456869476, 'weight_decay': 0.004, 'warmup_steps': 51, 'lambda_param': 0.9, 'temperature': 3.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 87 with params: {'learning_rate': 0.0047486802136833565, 'weight_decay': 0.003, 'warmup_steps': 52, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4336,0.512654,0.831347,0.82615,0.753952,0.774957
2,0.0851,0.517219,0.827681,0.811991,0.760079,0.775946
3,0.0734,0.501581,0.843263,0.813796,0.766916,0.781004
4,0.0689,0.523935,0.835014,0.818866,0.778908,0.788279
5,0.0673,0.501347,0.839597,0.832907,0.799464,0.807142
6,0.0654,0.504649,0.832264,0.822822,0.778954,0.790446
7,0.0642,0.505548,0.833181,0.825805,0.784495,0.795465
8,0.0631,0.508675,0.846013,0.84084,0.786877,0.804308
9,0.0624,0.531982,0.829514,0.834266,0.783883,0.798856
10,0.0612,0.520237,0.835014,0.841367,0.784466,0.802369


[I 2025-03-23 23:34:53,419] Trial 87 finished with value: 0.8053289900826553 and parameters: {'learning_rate': 0.0047486802136833565, 'weight_decay': 0.003, 'warmup_steps': 52, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 88 with params: {'learning_rate': 0.0019658555325810473, 'weight_decay': 0.003, 'warmup_steps': 45, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5486,0.51852,0.820348,0.742984,0.672527,0.694125
2,0.0932,0.502169,0.826764,0.775489,0.727107,0.738356
3,0.0741,0.490893,0.835014,0.776611,0.726687,0.741765
4,0.0687,0.490941,0.835014,0.790878,0.742408,0.754606
5,0.0664,0.493144,0.840513,0.819406,0.757564,0.777261


[I 2025-03-23 23:36:45,608] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0004908898893145527, 'weight_decay': 0.003, 'warmup_steps': 9, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9024,0.664386,0.767186,0.462913,0.454717,0.449498
2,0.2311,0.56293,0.797434,0.634129,0.577831,0.592468
3,0.1256,0.554612,0.799267,0.723865,0.636046,0.664685
4,0.0965,0.538596,0.815765,0.796116,0.68903,0.723218
5,0.0844,0.533142,0.818515,0.784475,0.706125,0.729515
6,0.0765,0.525579,0.819432,0.795097,0.710831,0.74014
7,0.0722,0.533475,0.816682,0.797209,0.712852,0.742202
8,0.0697,0.530253,0.814849,0.790323,0.698324,0.728015
9,0.0678,0.517292,0.815765,0.788214,0.706522,0.735051
10,0.0662,0.519682,0.815765,0.789354,0.704607,0.733372


[I 2025-03-23 23:40:30,664] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.002278065566595687, 'weight_decay': 0.004, 'warmup_steps': 52, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5274,0.512137,0.832264,0.78005,0.711996,0.731802
2,0.0904,0.509222,0.827681,0.820663,0.743133,0.766094
3,0.0728,0.50441,0.826764,0.78002,0.735573,0.745031
4,0.0686,0.496427,0.834097,0.786655,0.72777,0.743646
5,0.0662,0.493372,0.84143,0.810889,0.757679,0.772939
6,0.0637,0.500766,0.832264,0.798413,0.739216,0.756906
7,0.0632,0.50203,0.833181,0.813028,0.746169,0.766422
8,0.0618,0.495662,0.836847,0.842235,0.777001,0.797223
9,0.0616,0.509919,0.84143,0.843297,0.775636,0.795842
10,0.061,0.495263,0.836847,0.843467,0.779775,0.799421


[I 2025-03-23 23:46:27,347] Trial 90 finished with value: 0.7958489099851839 and parameters: {'learning_rate': 0.002278065566595687, 'weight_decay': 0.004, 'warmup_steps': 52, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 91 with params: {'learning_rate': 0.0032566782451976927, 'weight_decay': 0.005, 'warmup_steps': 50, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4674,0.508817,0.834097,0.787293,0.71991,0.737261
2,0.0841,0.515378,0.835014,0.81601,0.764565,0.77542
3,0.0714,0.543891,0.820348,0.805317,0.757261,0.77058
4,0.0685,0.517101,0.830431,0.815247,0.768959,0.779486
5,0.0659,0.521809,0.830431,0.824603,0.78355,0.792337
6,0.064,0.53016,0.823098,0.832059,0.762165,0.782543
7,0.0632,0.507198,0.83868,0.828297,0.773171,0.787391
8,0.062,0.516923,0.826764,0.825743,0.778156,0.79056
9,0.0614,0.51287,0.827681,0.811391,0.762262,0.776043
10,0.0604,0.508554,0.830431,0.826613,0.778332,0.790248


[I 2025-03-23 23:50:13,004] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.003989929587495216, 'weight_decay': 0.009000000000000001, 'warmup_steps': 39, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4426,0.506347,0.831347,0.794748,0.733078,0.750405
2,0.0851,0.503576,0.829514,0.812123,0.746813,0.764615
3,0.073,0.508317,0.836847,0.821605,0.765819,0.780638
4,0.0692,0.513364,0.824931,0.843114,0.75451,0.782891
5,0.0663,0.493353,0.843263,0.849562,0.785763,0.805217
6,0.0642,0.497623,0.835014,0.848151,0.771118,0.794322
7,0.0636,0.49978,0.840513,0.858465,0.786466,0.807569
8,0.0623,0.511526,0.84143,0.859109,0.784622,0.807158
9,0.0617,0.513117,0.830431,0.843406,0.783704,0.799792
10,0.0606,0.503344,0.837764,0.852799,0.779648,0.799994


[I 2025-03-23 23:55:52,540] Trial 92 finished with value: 0.8095706559193131 and parameters: {'learning_rate': 0.003989929587495216, 'weight_decay': 0.009000000000000001, 'warmup_steps': 39, 'lambda_param': 0.0, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 93 with params: {'learning_rate': 0.0038698177903088407, 'weight_decay': 0.01, 'warmup_steps': 35, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.443,0.521971,0.823098,0.830653,0.73312,0.759386
2,0.0865,0.507148,0.829514,0.821363,0.757493,0.775071
3,0.0718,0.485685,0.837764,0.848287,0.781226,0.800203
4,0.0686,0.500254,0.83868,0.843947,0.790453,0.803237
5,0.0668,0.510041,0.829514,0.836982,0.782044,0.795554
6,0.065,0.512861,0.825848,0.843331,0.759515,0.782248
7,0.0642,0.509472,0.833181,0.848723,0.770592,0.795414
8,0.0626,0.521755,0.826764,0.845518,0.768002,0.791135
9,0.0615,0.515424,0.827681,0.847863,0.762317,0.786193
10,0.0607,0.518506,0.822181,0.845797,0.760909,0.786035


[I 2025-03-23 23:59:39,933] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.003282051440173333, 'weight_decay': 0.007, 'warmup_steps': 33, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4598,0.522239,0.823098,0.806012,0.724826,0.751451
2,0.0871,0.496244,0.833181,0.809698,0.737854,0.758514
3,0.0728,0.484031,0.843263,0.809552,0.76349,0.774544
4,0.0692,0.50493,0.843263,0.847077,0.77239,0.791657
5,0.0666,0.503019,0.83868,0.84265,0.767837,0.787754
6,0.0644,0.49906,0.836847,0.81818,0.753725,0.770259
7,0.0635,0.506352,0.842346,0.847219,0.780498,0.795549
8,0.0627,0.50185,0.843263,0.834715,0.785753,0.795823
9,0.0621,0.513019,0.83868,0.846201,0.789468,0.802535
10,0.0614,0.497967,0.837764,0.832523,0.759038,0.776719


[I 2025-03-24 00:03:34,117] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0025595765928725024, 'weight_decay': 0.009000000000000001, 'warmup_steps': 45, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4967,0.510133,0.83593,0.793328,0.722414,0.741242
2,0.0873,0.521108,0.821265,0.812105,0.732409,0.754914
3,0.072,0.505084,0.832264,0.797453,0.735071,0.753017
4,0.0675,0.505492,0.833181,0.817773,0.745795,0.765892
5,0.0656,0.507771,0.832264,0.798358,0.751349,0.760969


[I 2025-03-24 00:05:32,073] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.00015972356535382792, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3636,1.046471,0.620532,0.198715,0.211822,0.186257
2,0.7244,0.811874,0.712191,0.325593,0.331768,0.318871
3,0.4631,0.720133,0.747938,0.449038,0.393684,0.403317
4,0.3215,0.672159,0.754354,0.502309,0.44508,0.460432
5,0.2395,0.646955,0.767186,0.560648,0.502714,0.516899
6,0.1903,0.616462,0.780018,0.59993,0.544998,0.559297
7,0.1582,0.607822,0.789184,0.630556,0.578302,0.591563
8,0.1379,0.597545,0.789184,0.63452,0.575855,0.592497
9,0.1232,0.598472,0.794684,0.671452,0.611774,0.629938
10,0.1138,0.594602,0.791017,0.68303,0.606379,0.62916


[I 2025-03-24 00:09:24,320] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0026997999471158555, 'weight_decay': 0.004, 'warmup_steps': 44, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4916,0.494523,0.829514,0.797521,0.715918,0.740709
2,0.0861,0.51062,0.834097,0.81437,0.742724,0.763333
3,0.0715,0.48672,0.833181,0.80766,0.741878,0.760961
4,0.0679,0.501688,0.83593,0.848142,0.747629,0.780766
5,0.0665,0.500398,0.840513,0.814302,0.752427,0.768898


[I 2025-03-24 00:11:25,358] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0001147772186457988, 'weight_decay': 0.008, 'warmup_steps': 48, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5133,1.171862,0.562786,0.163736,0.168142,0.145038
2,0.886,0.915424,0.664528,0.29331,0.262608,0.254038
3,0.622,0.809262,0.71769,0.339352,0.333483,0.324312
4,0.4632,0.748848,0.733272,0.405126,0.364729,0.371024
5,0.359,0.704293,0.750687,0.534493,0.440588,0.460287


[I 2025-03-24 00:13:23,557] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.004806748004880436, 'weight_decay': 0.003, 'warmup_steps': 46, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.433,0.509609,0.829514,0.805502,0.735179,0.754425
2,0.0852,0.492207,0.83593,0.82627,0.754861,0.772156
3,0.0739,0.491357,0.84143,0.836291,0.78123,0.794865
4,0.0704,0.503533,0.830431,0.846661,0.760703,0.783104
5,0.0679,0.511214,0.840513,0.851346,0.787972,0.806842
6,0.065,0.510843,0.836847,0.845061,0.768665,0.789445
7,0.064,0.505744,0.837764,0.848151,0.766423,0.790597
8,0.0631,0.505568,0.846929,0.865548,0.787176,0.808619
9,0.0626,0.526224,0.842346,0.852646,0.77865,0.800941
10,0.0614,0.504801,0.843263,0.854835,0.778534,0.801487


[I 2025-03-24 00:19:17,796] Trial 99 finished with value: 0.8063897466457337 and parameters: {'learning_rate': 0.004806748004880436, 'weight_decay': 0.003, 'warmup_steps': 46, 'lambda_param': 0.5, 'temperature': 4.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 100 with params: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 50, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1878,0.836378,0.690192,0.312031,0.30018,0.291701
2,0.441,0.64507,0.781852,0.510558,0.484679,0.482564
3,0.2334,0.591211,0.792851,0.63565,0.549583,0.573846
4,0.1551,0.568875,0.808433,0.691414,0.609529,0.634563
5,0.12,0.563834,0.814849,0.754602,0.677404,0.701593
6,0.1029,0.55592,0.819432,0.785548,0.701136,0.729416
7,0.0929,0.561841,0.811182,0.751133,0.6791,0.702781
8,0.0861,0.551644,0.813932,0.775821,0.694719,0.720522
9,0.0815,0.554309,0.814849,0.776796,0.71242,0.732258
10,0.0781,0.561481,0.817599,0.799029,0.720226,0.745488


[I 2025-03-24 00:23:06,873] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.00011850132046636438, 'weight_decay': 0.001, 'warmup_steps': 27, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4829,1.156116,0.572869,0.167591,0.176354,0.154566
2,0.8719,0.907656,0.669111,0.290913,0.269691,0.261256
3,0.6102,0.799311,0.72044,0.339879,0.338704,0.331123
4,0.4517,0.743141,0.733272,0.407148,0.36245,0.368435
5,0.3503,0.701574,0.747021,0.512871,0.429619,0.448351


[I 2025-03-24 00:25:03,053] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.004691069866402871, 'weight_decay': 0.001, 'warmup_steps': 35, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4184,0.503948,0.829514,0.82576,0.750346,0.772007
2,0.085,0.507652,0.834097,0.85431,0.779647,0.802423
3,0.0719,0.505868,0.83593,0.848946,0.784107,0.802081
4,0.069,0.514478,0.828598,0.859692,0.780141,0.802113
5,0.0669,0.509269,0.827681,0.843052,0.781731,0.79801
6,0.0646,0.506681,0.836847,0.84479,0.788618,0.806555
7,0.0638,0.521046,0.829514,0.843946,0.779772,0.796597
8,0.0629,0.523724,0.832264,0.851595,0.793113,0.811544
9,0.0614,0.532281,0.825848,0.846415,0.787408,0.804181
10,0.0607,0.535602,0.825848,0.847502,0.788781,0.806467


[I 2025-03-24 00:30:50,486] Trial 102 finished with value: 0.8025169878320161 and parameters: {'learning_rate': 0.004691069866402871, 'weight_decay': 0.001, 'warmup_steps': 35, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 103 with params: {'learning_rate': 0.004282308816611479, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4261,0.508823,0.824931,0.805804,0.738815,0.758466
2,0.0847,0.50577,0.832264,0.810875,0.754778,0.771506
3,0.0722,0.486513,0.83868,0.849772,0.79345,0.807766
4,0.0687,0.517706,0.823098,0.817066,0.753266,0.773417
5,0.0667,0.490348,0.837764,0.832898,0.76782,0.78833
6,0.0652,0.50243,0.835014,0.845497,0.775874,0.794806
7,0.0643,0.517708,0.828598,0.849926,0.780721,0.800325
8,0.0626,0.518679,0.831347,0.856958,0.782156,0.804127
9,0.0621,0.499846,0.836847,0.855762,0.783478,0.802859
10,0.0608,0.509978,0.831347,0.847128,0.774661,0.797508


[I 2025-03-24 00:36:50,274] Trial 103 finished with value: 0.8097390280426646 and parameters: {'learning_rate': 0.004282308816611479, 'weight_decay': 0.002, 'warmup_steps': 32, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 104 with params: {'learning_rate': 0.004714048137370582, 'weight_decay': 0.001, 'warmup_steps': 35, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4177,0.503972,0.831347,0.806542,0.746211,0.763435
2,0.0847,0.494203,0.83868,0.827281,0.769613,0.786355
3,0.0728,0.496802,0.836847,0.833338,0.776071,0.79385
4,0.0693,0.519081,0.830431,0.831806,0.755128,0.779255
5,0.0667,0.520888,0.832264,0.838446,0.776635,0.795185
6,0.0651,0.517413,0.83593,0.853034,0.781557,0.802789
7,0.0636,0.509851,0.835014,0.854253,0.772039,0.798354
8,0.0627,0.511808,0.835014,0.854059,0.773773,0.796651
9,0.0619,0.507158,0.84143,0.855666,0.787972,0.808597
10,0.0606,0.51965,0.832264,0.8551,0.781071,0.804182


[I 2025-03-24 00:42:29,613] Trial 104 finished with value: 0.8101494921609463 and parameters: {'learning_rate': 0.004714048137370582, 'weight_decay': 0.001, 'warmup_steps': 35, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 105 with params: {'learning_rate': 0.0021215851709965733, 'weight_decay': 0.001, 'warmup_steps': 37, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5219,0.505223,0.833181,0.756275,0.702076,0.715441
2,0.0907,0.511521,0.820348,0.779145,0.724651,0.739079
3,0.0737,0.491192,0.835014,0.825202,0.753563,0.774065
4,0.0684,0.498073,0.830431,0.85124,0.75181,0.780471
5,0.0653,0.488483,0.840513,0.853527,0.770483,0.793034
6,0.0638,0.495958,0.835014,0.823357,0.749162,0.768593
7,0.0626,0.497938,0.833181,0.853382,0.76187,0.788719
8,0.0622,0.483432,0.84418,0.854576,0.790904,0.806124
9,0.0618,0.511211,0.831347,0.843202,0.766473,0.789336
10,0.0607,0.493833,0.835014,0.849372,0.767827,0.79091


[I 2025-03-24 00:46:15,474] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0045915168483194775, 'weight_decay': 0.002, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.436,0.509079,0.831347,0.817502,0.746303,0.766254
2,0.0857,0.508599,0.83593,0.824941,0.7532,0.774059
3,0.0729,0.506283,0.842346,0.840909,0.768974,0.790378
4,0.0684,0.518321,0.834097,0.854139,0.782931,0.804341
5,0.067,0.504114,0.840513,0.837299,0.78052,0.794579
6,0.0646,0.51431,0.835014,0.84635,0.786585,0.804595
7,0.0636,0.516429,0.833181,0.831864,0.772039,0.789191
8,0.0625,0.523963,0.835014,0.83764,0.781514,0.799185
9,0.0616,0.515655,0.83593,0.833968,0.78703,0.799401
10,0.0607,0.508884,0.840513,0.836064,0.78876,0.80196


[I 2025-03-24 00:52:04,607] Trial 106 finished with value: 0.7999728831397702 and parameters: {'learning_rate': 0.0045915168483194775, 'weight_decay': 0.002, 'warmup_steps': 48, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 107 with params: {'learning_rate': 0.0022490554064737212, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5019,0.529748,0.813016,0.708765,0.653604,0.66787
2,0.0904,0.531998,0.822181,0.801779,0.72894,0.745631
3,0.0735,0.505745,0.828598,0.831829,0.753238,0.777042
4,0.0687,0.495753,0.836847,0.847862,0.755827,0.782111
5,0.0662,0.489993,0.842346,0.831172,0.774312,0.787847
6,0.0648,0.527807,0.820348,0.811691,0.751057,0.764989
7,0.0634,0.515394,0.825848,0.837588,0.758362,0.779815
8,0.0623,0.504991,0.837764,0.844667,0.773395,0.792948
9,0.0615,0.49878,0.827681,0.830863,0.77275,0.78746
10,0.0608,0.499905,0.832264,0.847747,0.768997,0.791525


[I 2025-03-24 00:55:47,928] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.004216760298372613, 'weight_decay': 0.001, 'warmup_steps': 53, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.439,0.513224,0.828598,0.772075,0.734487,0.741923
2,0.085,0.519424,0.830431,0.814267,0.753262,0.766855
3,0.0724,0.494538,0.839597,0.817255,0.764736,0.777378
4,0.0689,0.508931,0.830431,0.825892,0.783505,0.791062
5,0.067,0.495552,0.835014,0.808642,0.769055,0.77794


[I 2025-03-24 00:57:40,810] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0016417048475562202, 'weight_decay': 0.001, 'warmup_steps': 23, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5594,0.521501,0.821265,0.726501,0.649408,0.675464
2,0.0986,0.49951,0.839597,0.813691,0.750437,0.766346
3,0.0758,0.49495,0.840513,0.81298,0.758163,0.772537
4,0.0701,0.490681,0.83593,0.817066,0.743002,0.765395
5,0.0667,0.487204,0.845096,0.833394,0.773806,0.789854
6,0.0652,0.494917,0.832264,0.805568,0.743405,0.759511
7,0.0636,0.494241,0.833181,0.822027,0.753525,0.773534
8,0.0625,0.492818,0.839597,0.831308,0.782905,0.794401
9,0.0621,0.495306,0.837764,0.838185,0.778301,0.795357
10,0.0614,0.48771,0.836847,0.836864,0.777049,0.795323


[I 2025-03-24 01:03:32,014] Trial 109 finished with value: 0.8062092051404537 and parameters: {'learning_rate': 0.0016417048475562202, 'weight_decay': 0.001, 'warmup_steps': 23, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 110 with params: {'learning_rate': 0.0019330351855928218, 'weight_decay': 0.01, 'warmup_steps': 49, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5505,0.526739,0.822181,0.717693,0.667153,0.680919
2,0.0936,0.505746,0.829514,0.793078,0.719852,0.743032
3,0.0742,0.499208,0.833181,0.78525,0.726133,0.74398
4,0.0694,0.496268,0.828598,0.803391,0.737854,0.756431
5,0.0665,0.494353,0.836847,0.817349,0.738339,0.763229
6,0.0644,0.502162,0.837764,0.822397,0.756418,0.773294
7,0.0633,0.491389,0.83593,0.815693,0.74388,0.765526
8,0.0616,0.503155,0.83868,0.839944,0.7632,0.785787
9,0.0613,0.498217,0.83868,0.84217,0.755572,0.783723
10,0.0607,0.500117,0.833181,0.836466,0.758889,0.783878


[I 2025-03-24 01:07:33,291] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.0032529409451113093, 'weight_decay': 0.002, 'warmup_steps': 24, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4522,0.520294,0.822181,0.812965,0.727661,0.750443
2,0.0856,0.513608,0.822181,0.811026,0.729853,0.755394
3,0.0726,0.510308,0.828598,0.813292,0.764984,0.774472
4,0.068,0.511022,0.833181,0.826982,0.76886,0.78448
5,0.0662,0.494766,0.84418,0.859087,0.78395,0.806103
6,0.0643,0.488164,0.837764,0.813452,0.759099,0.772493
7,0.0632,0.498012,0.836847,0.838339,0.781705,0.797651
8,0.0621,0.500796,0.833181,0.837988,0.76958,0.786552
9,0.0613,0.487149,0.84418,0.862813,0.799305,0.817449
10,0.0612,0.50345,0.834097,0.824667,0.751129,0.772235


[I 2025-03-24 01:11:20,532] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0016347137689858614, 'weight_decay': 0.001, 'warmup_steps': 38, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5777,0.539279,0.815765,0.707641,0.65589,0.67179
2,0.0991,0.502996,0.828598,0.806852,0.733512,0.753813
3,0.0761,0.503651,0.835014,0.817517,0.754946,0.773099
4,0.0702,0.507883,0.824931,0.788945,0.722837,0.745405
5,0.0666,0.49329,0.835014,0.808329,0.748545,0.765861
6,0.0653,0.489817,0.833181,0.808166,0.741667,0.762984
7,0.063,0.496172,0.837764,0.834701,0.756578,0.779651
8,0.0619,0.495417,0.835014,0.837121,0.75555,0.782172
9,0.0615,0.492237,0.833181,0.814561,0.745755,0.767492
10,0.0609,0.496376,0.833181,0.810604,0.745957,0.765125


[I 2025-03-24 01:15:01,647] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0016090297554161844, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5732,0.529301,0.819432,0.726737,0.660491,0.682771
2,0.0995,0.496884,0.840513,0.820451,0.754077,0.775093
3,0.0762,0.501386,0.831347,0.793971,0.736256,0.753355
4,0.0697,0.485941,0.847846,0.82151,0.755042,0.774369
5,0.0664,0.485451,0.846013,0.84404,0.769026,0.792061
6,0.0647,0.494995,0.836847,0.817573,0.749867,0.769701
7,0.0634,0.490963,0.846013,0.820609,0.763301,0.779699
8,0.0624,0.507229,0.84418,0.819344,0.760424,0.777925
9,0.0617,0.503417,0.839597,0.812109,0.758114,0.775017
10,0.0608,0.497728,0.837764,0.815231,0.756652,0.774493


[I 2025-03-24 01:18:51,299] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0014921861051929602, 'weight_decay': 0.0, 'warmup_steps': 13, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.581,0.548817,0.813016,0.732041,0.647948,0.6756
2,0.1047,0.501475,0.836847,0.789478,0.729585,0.7483
3,0.0779,0.492363,0.837764,0.791341,0.746394,0.757018
4,0.0715,0.515644,0.829514,0.826555,0.739694,0.768081
5,0.0681,0.493223,0.83868,0.832301,0.755965,0.780061
6,0.0654,0.497275,0.833181,0.804496,0.737842,0.757194
7,0.0641,0.492415,0.840513,0.813322,0.75255,0.767592
8,0.0631,0.505184,0.842346,0.818761,0.749372,0.770308
9,0.0628,0.502201,0.83593,0.817552,0.757589,0.775691
10,0.0616,0.494398,0.836847,0.821554,0.759866,0.777401


[I 2025-03-24 01:22:51,007] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.00404547081000335, 'weight_decay': 0.002, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4253,0.518921,0.824015,0.801642,0.72478,0.747346
2,0.0847,0.514659,0.832264,0.818692,0.756405,0.768922
3,0.0725,0.498631,0.831347,0.848703,0.771103,0.793963
4,0.0696,0.523182,0.829514,0.820412,0.752563,0.770805
5,0.0665,0.5154,0.832264,0.836781,0.777191,0.793297
6,0.0646,0.510654,0.826764,0.805744,0.763859,0.771912
7,0.0641,0.526881,0.826764,0.834935,0.767571,0.78422
8,0.0624,0.529408,0.830431,0.826103,0.767456,0.783274
9,0.0613,0.530281,0.824931,0.833131,0.773366,0.791439
10,0.0609,0.525671,0.830431,0.847633,0.782001,0.801612


[I 2025-03-24 01:28:35,351] Trial 115 finished with value: 0.7729726591044757 and parameters: {'learning_rate': 0.00404547081000335, 'weight_decay': 0.002, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 116 with params: {'learning_rate': 0.00287320164867111, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.475,0.51325,0.821265,0.809214,0.71595,0.74412
2,0.0873,0.51155,0.83593,0.819988,0.751503,0.772328
3,0.0729,0.494756,0.837764,0.830602,0.766436,0.783268
4,0.0687,0.493245,0.84143,0.837289,0.7771,0.793491
5,0.0658,0.497192,0.837764,0.818556,0.78109,0.786148
6,0.0641,0.497896,0.836847,0.8247,0.785521,0.794695
7,0.0632,0.480157,0.842346,0.82993,0.79547,0.802252
8,0.0623,0.496368,0.843263,0.831096,0.79781,0.804075
9,0.0621,0.489516,0.846013,0.837478,0.791936,0.803207
10,0.0616,0.501635,0.836847,0.839436,0.785424,0.797908


[I 2025-03-24 01:34:32,724] Trial 116 finished with value: 0.8155179730795615 and parameters: {'learning_rate': 0.00287320164867111, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 117 with params: {'learning_rate': 0.003583567811734399, 'weight_decay': 0.0, 'warmup_steps': 38, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4515,0.522362,0.821265,0.802896,0.7352,0.757162
2,0.0862,0.504285,0.828598,0.802065,0.737616,0.756123
3,0.0725,0.501627,0.836847,0.814803,0.767876,0.778474
4,0.0688,0.505308,0.83868,0.836023,0.76784,0.787763
5,0.0667,0.507378,0.836847,0.812586,0.760784,0.774456


[I 2025-03-24 01:36:23,257] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0030773738256811346, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.467,0.511758,0.828598,0.802888,0.716675,0.741607
2,0.0872,0.506999,0.830431,0.809971,0.75353,0.767349
3,0.0724,0.502974,0.836847,0.835656,0.778091,0.792808
4,0.0681,0.501986,0.840513,0.853648,0.792302,0.811588
5,0.0657,0.500255,0.837764,0.838156,0.799059,0.809959
6,0.0641,0.509226,0.837764,0.835253,0.786731,0.801511
7,0.0632,0.512892,0.83593,0.843467,0.79091,0.806973
8,0.0621,0.518427,0.834097,0.848611,0.785219,0.802102
9,0.062,0.514674,0.836847,0.850476,0.785492,0.80521
10,0.0605,0.511487,0.830431,0.847626,0.779971,0.801368


[I 2025-03-24 01:42:01,984] Trial 118 finished with value: 0.8036873478942849 and parameters: {'learning_rate': 0.0030773738256811346, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 119 with params: {'learning_rate': 5.416926623119094e-05, 'weight_decay': 0.003, 'warmup_steps': 43, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7533,1.442614,0.44363,0.073652,0.101608,0.07426
2,1.2388,1.22556,0.532539,0.143364,0.153977,0.133928
3,1.0413,1.096789,0.605866,0.178354,0.204565,0.181814
4,0.8884,0.998386,0.630614,0.218594,0.223031,0.203785
5,0.7684,0.928204,0.659945,0.292137,0.256796,0.24712
6,0.6732,0.881507,0.675527,0.29102,0.278744,0.271581
7,0.6,0.848736,0.690192,0.301369,0.302965,0.294043
8,0.5455,0.824895,0.701192,0.350973,0.317358,0.315532
9,0.5021,0.810526,0.71769,0.370476,0.34252,0.338901
10,0.4667,0.792945,0.718607,0.376366,0.348129,0.347023


[I 2025-03-24 01:45:44,495] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.00016104904333464902, 'weight_decay': 0.009000000000000001, 'warmup_steps': 26, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3732,1.032864,0.626948,0.201816,0.213833,0.189541
2,0.7133,0.803122,0.708524,0.343348,0.32199,0.313615
3,0.4543,0.713249,0.765353,0.523273,0.431348,0.449412
4,0.3135,0.663544,0.759853,0.547379,0.447714,0.473014
5,0.2319,0.644991,0.768103,0.567662,0.513888,0.523669


[I 2025-03-24 01:47:39,331] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.002351888054867701, 'weight_decay': 0.003, 'warmup_steps': 42, 'lambda_param': 0.7000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5067,0.513251,0.824015,0.782617,0.712164,0.732337
2,0.0888,0.519701,0.823098,0.80609,0.747357,0.763687
3,0.0725,0.496377,0.836847,0.815244,0.752821,0.769606
4,0.0681,0.506718,0.832264,0.830966,0.75074,0.775709
5,0.0661,0.503723,0.83593,0.838792,0.770196,0.791535
6,0.0642,0.494123,0.837764,0.83411,0.746648,0.772773
7,0.0628,0.4958,0.837764,0.849458,0.763688,0.790213
8,0.0618,0.500098,0.83593,0.843494,0.759668,0.785265
9,0.0616,0.499998,0.828598,0.833888,0.757847,0.779817
10,0.0607,0.490953,0.833181,0.814629,0.764369,0.776336


[I 2025-03-24 01:51:31,262] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.00021703879021406816, 'weight_decay': 0.0, 'warmup_steps': 38, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2657,0.914615,0.655362,0.278852,0.247047,0.234063
2,0.5431,0.70614,0.754354,0.461489,0.404348,0.412704
3,0.3083,0.629059,0.781852,0.554683,0.503273,0.509622
4,0.2047,0.599996,0.792851,0.602992,0.550925,0.567672
5,0.152,0.59182,0.791934,0.663594,0.599594,0.618276


[I 2025-03-24 01:53:19,001] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0008636165666655091, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7124,0.565491,0.796517,0.620822,0.558238,0.573946
2,0.1373,0.526259,0.819432,0.790984,0.673868,0.713826
3,0.0896,0.527901,0.811182,0.800448,0.700637,0.73364
4,0.0767,0.511179,0.829514,0.820325,0.730023,0.759919
5,0.0708,0.50616,0.830431,0.809735,0.728161,0.756589
6,0.0676,0.497742,0.826764,0.814673,0.727845,0.75774
7,0.0654,0.510467,0.824015,0.821522,0.719751,0.753934
8,0.0642,0.511196,0.825848,0.805981,0.727958,0.754533
9,0.0633,0.515082,0.820348,0.81613,0.727357,0.755351
10,0.0628,0.51097,0.823098,0.809445,0.731079,0.754486


[I 2025-03-24 01:57:25,021] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0037850685498194885, 'weight_decay': 0.0, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4441,0.499852,0.825848,0.779758,0.722828,0.737174
2,0.0846,0.501893,0.834097,0.813957,0.748944,0.76285
3,0.0722,0.496704,0.834097,0.843389,0.764925,0.789063
4,0.0681,0.499293,0.839597,0.82455,0.762265,0.780954
5,0.067,0.500925,0.840513,0.837334,0.778665,0.795544
6,0.0646,0.520643,0.830431,0.828538,0.778411,0.79086
7,0.0632,0.522131,0.831347,0.814571,0.766798,0.776386
8,0.0625,0.506182,0.831347,0.835482,0.748811,0.775707
9,0.0613,0.503003,0.837764,0.835593,0.77956,0.793516
10,0.0604,0.498364,0.837764,0.854108,0.779387,0.802578


[I 2025-03-24 02:03:18,412] Trial 124 finished with value: 0.8020772845829689 and parameters: {'learning_rate': 0.0037850685498194885, 'weight_decay': 0.0, 'warmup_steps': 32, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 125 with params: {'learning_rate': 0.0037236877089750543, 'weight_decay': 0.006, 'warmup_steps': 37, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4462,0.51358,0.828598,0.799079,0.735486,0.753285
2,0.0861,0.525121,0.823098,0.801873,0.74104,0.754237
3,0.0724,0.519127,0.828598,0.781469,0.727652,0.74292
4,0.0696,0.513151,0.828598,0.816747,0.751993,0.771037
5,0.0672,0.511518,0.828598,0.827556,0.760673,0.780154
6,0.0652,0.515718,0.824931,0.83323,0.761629,0.779887
7,0.0632,0.497309,0.83868,0.846743,0.791585,0.805511
8,0.0623,0.509811,0.83593,0.851484,0.793213,0.806706
9,0.0621,0.503243,0.837764,0.836551,0.786581,0.799622
10,0.0609,0.496776,0.83593,0.848227,0.79017,0.808462


[I 2025-03-24 02:08:52,565] Trial 125 finished with value: 0.8076165616295597 and parameters: {'learning_rate': 0.0037236877089750543, 'weight_decay': 0.006, 'warmup_steps': 37, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 126 with params: {'learning_rate': 0.004543033021161845, 'weight_decay': 0.008, 'warmup_steps': 25, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4056,0.496854,0.833181,0.790965,0.743454,0.756114
2,0.0841,0.502002,0.835014,0.822338,0.754531,0.773024
3,0.0732,0.501514,0.833181,0.841918,0.77804,0.795263
4,0.0693,0.523095,0.828598,0.828863,0.764338,0.783651
5,0.0667,0.519252,0.83593,0.845298,0.789332,0.807001
6,0.0648,0.522626,0.827681,0.84039,0.775987,0.794614
7,0.0634,0.517272,0.824015,0.834805,0.772268,0.789142
8,0.0628,0.521717,0.830431,0.842556,0.78382,0.800545
9,0.0625,0.50751,0.833181,0.846225,0.765082,0.789095
10,0.0613,0.515454,0.829514,0.848653,0.773196,0.796385


[I 2025-03-24 02:14:47,381] Trial 126 finished with value: 0.7959063087684686 and parameters: {'learning_rate': 0.004543033021161845, 'weight_decay': 0.008, 'warmup_steps': 25, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 127 with params: {'learning_rate': 0.004039828931622969, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4291,0.514725,0.828598,0.816427,0.75953,0.775939
2,0.0846,0.496796,0.830431,0.813718,0.749399,0.769472
3,0.0722,0.485744,0.842346,0.842892,0.777603,0.79792
4,0.068,0.49065,0.837764,0.821231,0.763603,0.779908
5,0.0657,0.498174,0.836847,0.843354,0.771357,0.791336
6,0.0645,0.494559,0.83593,0.833452,0.768156,0.786589
7,0.0635,0.502554,0.835014,0.857545,0.790161,0.809818
8,0.0623,0.513991,0.830431,0.82796,0.762229,0.77987
9,0.0612,0.499019,0.831347,0.829048,0.761958,0.781354
10,0.0605,0.515556,0.827681,0.843446,0.767227,0.78937


[I 2025-03-24 02:20:40,583] Trial 127 finished with value: 0.7910124584210991 and parameters: {'learning_rate': 0.004039828931622969, 'weight_decay': 0.006, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 128 with params: {'learning_rate': 0.003587026982078836, 'weight_decay': 0.007, 'warmup_steps': 42, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4452,0.511664,0.821265,0.797767,0.714247,0.739028
2,0.0844,0.528404,0.824015,0.801055,0.745063,0.76025
3,0.0719,0.501467,0.830431,0.796597,0.742164,0.758469
4,0.0683,0.512083,0.832264,0.817887,0.741845,0.765133
5,0.0666,0.526868,0.830431,0.817973,0.763605,0.77814
6,0.0643,0.525645,0.827681,0.82416,0.770004,0.785536
7,0.0629,0.508303,0.835014,0.83138,0.769559,0.788576
8,0.0619,0.531413,0.823098,0.818846,0.764709,0.780001
9,0.0618,0.513436,0.830431,0.822291,0.767266,0.781637
10,0.0607,0.50908,0.839597,0.82147,0.777907,0.789288


[I 2025-03-24 02:26:44,906] Trial 128 finished with value: 0.8019134815642508 and parameters: {'learning_rate': 0.003587026982078836, 'weight_decay': 0.007, 'warmup_steps': 42, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 129 with params: {'learning_rate': 0.0006712937288776745, 'weight_decay': 0.005, 'warmup_steps': 43, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8208,0.600702,0.787351,0.529072,0.495578,0.500699
2,0.1711,0.53511,0.813932,0.73845,0.655173,0.681
3,0.1022,0.535528,0.815765,0.818895,0.72187,0.7518
4,0.0835,0.515492,0.821265,0.821163,0.721088,0.753432
5,0.0753,0.515051,0.816682,0.814702,0.713751,0.74699


[I 2025-03-24 02:28:44,810] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.004493818637467672, 'weight_decay': 0.009000000000000001, 'warmup_steps': 44, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4356,0.509969,0.830431,0.818272,0.734169,0.763676
2,0.0864,0.504491,0.821265,0.83108,0.748096,0.770987
3,0.0729,0.488979,0.843263,0.847735,0.789262,0.805762
4,0.0696,0.510746,0.826764,0.824708,0.763459,0.781832
5,0.0674,0.506674,0.829514,0.819909,0.768179,0.781344


[I 2025-03-24 02:30:47,872] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.00436889782563019, 'weight_decay': 0.004, 'warmup_steps': 36, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4296,0.514495,0.824931,0.800245,0.746647,0.759657
2,0.0856,0.530715,0.825848,0.804078,0.746541,0.764164
3,0.0722,0.524643,0.819432,0.837185,0.770113,0.787916
4,0.0694,0.5311,0.831347,0.855311,0.776062,0.799639
5,0.0676,0.543161,0.827681,0.849253,0.774822,0.79752
6,0.0649,0.532629,0.825848,0.803013,0.758296,0.77028
7,0.064,0.524565,0.829514,0.806133,0.757063,0.770969
8,0.0629,0.523022,0.829514,0.832039,0.766633,0.784907
9,0.062,0.526714,0.832264,0.829619,0.769065,0.787821
10,0.0611,0.539365,0.824015,0.83022,0.756706,0.781918


[I 2025-03-24 02:34:37,208] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0017417038107094816, 'weight_decay': 0.002, 'warmup_steps': 14, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5452,0.526488,0.813932,0.722875,0.6489,0.672157
2,0.0969,0.510121,0.826764,0.79928,0.739874,0.754551
3,0.0758,0.504395,0.836847,0.837329,0.775204,0.789388
4,0.0705,0.500958,0.833181,0.830876,0.756855,0.778301
5,0.0661,0.49971,0.83593,0.841185,0.775622,0.79365
6,0.0646,0.517493,0.824931,0.807615,0.750977,0.763466
7,0.0634,0.501202,0.835014,0.8211,0.761814,0.776067
8,0.0622,0.499302,0.836847,0.828613,0.765547,0.783002
9,0.0615,0.500418,0.836847,0.836352,0.768549,0.787921
10,0.0615,0.502024,0.835014,0.833669,0.764336,0.781261


[I 2025-03-24 02:38:28,073] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0016799222735319251, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5624,0.536734,0.811182,0.728551,0.654707,0.676742
2,0.0978,0.505461,0.832264,0.81209,0.742099,0.762452
3,0.0761,0.508588,0.832264,0.813882,0.742735,0.762362
4,0.0696,0.49625,0.843263,0.823729,0.752262,0.773294
5,0.0668,0.496461,0.839597,0.81024,0.751834,0.766976
6,0.0646,0.490387,0.833181,0.813919,0.746111,0.767817
7,0.0641,0.499986,0.836847,0.81632,0.751112,0.769522
8,0.0629,0.50792,0.835014,0.837809,0.753268,0.779225
9,0.062,0.495957,0.83868,0.839162,0.766684,0.789441
10,0.0609,0.503559,0.836847,0.843869,0.774614,0.796781


[I 2025-03-24 02:44:16,885] Trial 133 finished with value: 0.795951613515239 and parameters: {'learning_rate': 0.0016799222735319251, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 2.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 134 with params: {'learning_rate': 6.558978114640059e-05, 'weight_decay': 0.0, 'warmup_steps': 23, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6777,1.375856,0.47571,0.116954,0.11604,0.091291
2,1.1593,1.155246,0.578368,0.168869,0.184821,0.163987
3,0.9428,1.017073,0.626948,0.191257,0.220475,0.196228
4,0.7791,0.922462,0.659028,0.295956,0.255552,0.248653
5,0.658,0.860447,0.67736,0.290591,0.288081,0.279753


[I 2025-03-24 02:46:08,326] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.004277668534483279, 'weight_decay': 0.002, 'warmup_steps': 40, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4371,0.505659,0.832264,0.810559,0.734564,0.756732
2,0.086,0.516899,0.826764,0.800662,0.750172,0.764454
3,0.0723,0.494269,0.831347,0.860155,0.774141,0.801389
4,0.0688,0.509844,0.830431,0.811113,0.754712,0.768059
5,0.0665,0.501561,0.83868,0.834642,0.791505,0.80212
6,0.0652,0.523762,0.824015,0.793905,0.735096,0.749372
7,0.0652,0.526415,0.827681,0.827642,0.778533,0.791879
8,0.0627,0.513967,0.832264,0.838439,0.785782,0.800797
9,0.0621,0.521116,0.834097,0.846979,0.783377,0.80354
10,0.0608,0.512182,0.830431,0.838126,0.781934,0.798346


[I 2025-03-24 02:51:50,764] Trial 135 finished with value: 0.8074465443518513 and parameters: {'learning_rate': 0.004277668534483279, 'weight_decay': 0.002, 'warmup_steps': 40, 'lambda_param': 0.9, 'temperature': 5.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 136 with params: {'learning_rate': 0.002778241534096457, 'weight_decay': 0.002, 'warmup_steps': 41, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4778,0.508233,0.827681,0.796784,0.713307,0.73753
2,0.0866,0.485412,0.839597,0.836948,0.754696,0.778393
3,0.0717,0.475779,0.839597,0.823797,0.755999,0.775302
4,0.0672,0.492347,0.83868,0.846767,0.770719,0.793463
5,0.0656,0.489344,0.84418,0.847573,0.774596,0.796084
6,0.0648,0.505341,0.83593,0.839372,0.79049,0.802876
7,0.0631,0.495328,0.839597,0.843418,0.781372,0.800123
8,0.0618,0.50509,0.843263,0.847455,0.799782,0.812427
9,0.0616,0.506766,0.837764,0.845955,0.798461,0.811343
10,0.0609,0.495772,0.84143,0.847924,0.790184,0.808342


[I 2025-03-24 02:57:34,714] Trial 136 finished with value: 0.8059584259298288 and parameters: {'learning_rate': 0.002778241534096457, 'weight_decay': 0.002, 'warmup_steps': 41, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 137 with params: {'learning_rate': 0.004635675894565288, 'weight_decay': 0.002, 'warmup_steps': 33, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4178,0.498499,0.829514,0.812479,0.745602,0.764244
2,0.0843,0.493499,0.835014,0.834276,0.762902,0.782562
3,0.0721,0.486179,0.840513,0.849493,0.783373,0.805074
4,0.0692,0.496315,0.842346,0.853802,0.780467,0.804167
5,0.0666,0.491264,0.83868,0.839612,0.792458,0.802618
6,0.0648,0.493228,0.837764,0.848671,0.783811,0.802444
7,0.064,0.517186,0.833181,0.843716,0.781668,0.798983
8,0.0628,0.529854,0.831347,0.846579,0.775476,0.796527
9,0.0623,0.51395,0.839597,0.848275,0.78673,0.804379
10,0.0609,0.511964,0.837764,0.856399,0.78213,0.804167


[I 2025-03-24 03:03:23,256] Trial 137 finished with value: 0.8114897843829706 and parameters: {'learning_rate': 0.004635675894565288, 'weight_decay': 0.002, 'warmup_steps': 33, 'lambda_param': 0.9, 'temperature': 5.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 138 with params: {'learning_rate': 0.003886970268099752, 'weight_decay': 0.001, 'warmup_steps': 42, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4521,0.524136,0.822181,0.825754,0.743052,0.76931
2,0.0864,0.519172,0.831347,0.836697,0.762579,0.784948
3,0.0725,0.511759,0.837764,0.827403,0.764303,0.783454
4,0.0691,0.512036,0.831347,0.837438,0.772272,0.789854
5,0.0668,0.536037,0.828598,0.842057,0.775866,0.794822
6,0.065,0.518285,0.835014,0.832126,0.776327,0.793147
7,0.0633,0.512768,0.836847,0.81776,0.777904,0.787591
8,0.0625,0.518662,0.83593,0.832066,0.771321,0.787368
9,0.0618,0.545178,0.833181,0.842079,0.780881,0.800035
10,0.061,0.507979,0.836847,0.83964,0.778205,0.797944


[I 2025-03-24 03:09:14,165] Trial 138 finished with value: 0.8027976391669068 and parameters: {'learning_rate': 0.003886970268099752, 'weight_decay': 0.001, 'warmup_steps': 42, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 139 with params: {'learning_rate': 0.0046008524528405505, 'weight_decay': 0.001, 'warmup_steps': 49, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4399,0.505845,0.834097,0.816165,0.754256,0.771824
2,0.0863,0.505075,0.832264,0.82433,0.753418,0.773098
3,0.0741,0.493532,0.839597,0.841567,0.790382,0.805619
4,0.0694,0.498203,0.845096,0.847933,0.788027,0.805202
5,0.0673,0.510332,0.837764,0.85636,0.786426,0.808726
6,0.0656,0.508085,0.839597,0.85805,0.781072,0.804063
7,0.0645,0.526614,0.834097,0.83332,0.780544,0.795113
8,0.0635,0.510593,0.83868,0.855928,0.781813,0.803103
9,0.0622,0.500006,0.839597,0.85421,0.781785,0.803011
10,0.0611,0.502357,0.84143,0.848832,0.792579,0.807819


[I 2025-03-24 03:14:53,603] Trial 139 finished with value: 0.7981063854328009 and parameters: {'learning_rate': 0.0046008524528405505, 'weight_decay': 0.001, 'warmup_steps': 49, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 140 with params: {'learning_rate': 0.003924879085141806, 'weight_decay': 0.003, 'warmup_steps': 34, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4412,0.509937,0.822181,0.807628,0.718764,0.746258
2,0.0858,0.522401,0.827681,0.850606,0.759834,0.786674
3,0.0726,0.499574,0.83593,0.839035,0.778113,0.796682
4,0.0685,0.512575,0.840513,0.842112,0.758275,0.784218
5,0.0669,0.513614,0.835014,0.833386,0.77334,0.791473
6,0.0654,0.504469,0.83593,0.846933,0.77333,0.795857
7,0.0646,0.532795,0.827681,0.81207,0.770033,0.779718
8,0.0629,0.517517,0.835014,0.817835,0.766755,0.780884
9,0.0617,0.514913,0.837764,0.812968,0.773364,0.782672
10,0.0609,0.495163,0.843263,0.810542,0.771769,0.782074


[I 2025-03-24 03:18:37,452] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.004998790279779423, 'weight_decay': 0.002, 'warmup_steps': 34, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4057,0.499798,0.834097,0.832831,0.751499,0.779398
2,0.0845,0.529007,0.823098,0.823339,0.756885,0.775902
3,0.0724,0.497882,0.842346,0.867648,0.779398,0.808875
4,0.0686,0.499229,0.840513,0.862178,0.791585,0.81466
5,0.0668,0.51889,0.840513,0.858588,0.78315,0.806701
6,0.0651,0.509337,0.836847,0.840524,0.789339,0.804796
7,0.0639,0.513331,0.840513,0.862164,0.783487,0.808988
8,0.0635,0.514766,0.840513,0.847931,0.789567,0.807575
9,0.062,0.520255,0.836847,0.85324,0.780767,0.805776
10,0.0609,0.501315,0.840513,0.850615,0.787514,0.807363


[I 2025-03-24 03:24:17,191] Trial 141 finished with value: 0.8065845023680183 and parameters: {'learning_rate': 0.004998790279779423, 'weight_decay': 0.002, 'warmup_steps': 34, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 142 with params: {'learning_rate': 0.004304840964521802, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4251,0.512961,0.822181,0.784034,0.726075,0.740957
2,0.0851,0.505761,0.833181,0.821556,0.737725,0.763732
3,0.0729,0.507341,0.835014,0.822659,0.762074,0.779342
4,0.0689,0.505378,0.840513,0.862676,0.784212,0.806693
5,0.0664,0.497778,0.83593,0.846703,0.769478,0.790579
6,0.0647,0.503792,0.835014,0.846179,0.759602,0.783369
7,0.064,0.510403,0.832264,0.829728,0.745457,0.771072
8,0.0633,0.50868,0.84143,0.832188,0.761476,0.782399
9,0.062,0.495142,0.846929,0.829043,0.773811,0.790729
10,0.0609,0.497965,0.849679,0.843841,0.767281,0.792699


[I 2025-03-24 03:30:14,117] Trial 142 finished with value: 0.7862444043323364 and parameters: {'learning_rate': 0.004304840964521802, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 143 with params: {'learning_rate': 0.004954595124574099, 'weight_decay': 0.01, 'warmup_steps': 44, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4229,0.525332,0.817599,0.805881,0.734784,0.754419
2,0.0852,0.525328,0.831347,0.808816,0.747579,0.761237
3,0.073,0.538885,0.826764,0.84321,0.783402,0.798859
4,0.0695,0.547063,0.827681,0.828378,0.756902,0.77836
5,0.0673,0.532568,0.831347,0.842782,0.770952,0.788927
6,0.0654,0.531041,0.831347,0.828841,0.751286,0.772502
7,0.0638,0.53299,0.830431,0.856557,0.776032,0.800086
8,0.0638,0.563266,0.827681,0.840068,0.771807,0.790394
9,0.0623,0.552362,0.825848,0.833717,0.773487,0.791779
10,0.0613,0.556208,0.823098,0.833599,0.771627,0.790685


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-24 03:34:11,923] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.004062992839455107, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4253,0.50748,0.831347,0.822903,0.743695,0.76808
2,0.0852,0.540877,0.817599,0.816168,0.740269,0.762122
3,0.0723,0.515512,0.828598,0.83998,0.778408,0.796638
4,0.069,0.516015,0.830431,0.842202,0.78087,0.797089
5,0.066,0.506754,0.831347,0.836743,0.78832,0.798084
6,0.0643,0.51076,0.831347,0.838798,0.774104,0.792672
7,0.064,0.522301,0.826764,0.827979,0.783191,0.793886
8,0.0624,0.521615,0.830431,0.830712,0.777014,0.789526
9,0.0619,0.515238,0.829514,0.83007,0.779929,0.794199
10,0.0608,0.518055,0.832264,0.833891,0.779178,0.795743


[I 2025-03-24 03:40:05,457] Trial 144 finished with value: 0.7977382606380758 and parameters: {'learning_rate': 0.004062992839455107, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 6.5}. Best is trial 29 with value: 0.8166135562260471.


Trial 145 with params: {'learning_rate': 0.003758568415973629, 'weight_decay': 0.002, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4396,0.513056,0.821265,0.811545,0.728596,0.754459
2,0.0867,0.522424,0.821265,0.820314,0.744323,0.766545
3,0.0725,0.507286,0.831347,0.825947,0.773771,0.783331
4,0.0693,0.508417,0.840513,0.826765,0.759636,0.778517
5,0.0662,0.527043,0.835014,0.826967,0.778445,0.788161
6,0.0646,0.513505,0.834097,0.833402,0.768272,0.788043
7,0.0637,0.52662,0.835014,0.844883,0.782831,0.800651
8,0.0628,0.511567,0.834097,0.84007,0.789506,0.803315
9,0.0616,0.513179,0.836847,0.844204,0.788334,0.8032
10,0.0606,0.507977,0.839597,0.863566,0.779856,0.804367


[I 2025-03-24 03:45:52,541] Trial 145 finished with value: 0.811896478793711 and parameters: {'learning_rate': 0.003758568415973629, 'weight_decay': 0.002, 'warmup_steps': 28, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 146 with params: {'learning_rate': 0.003106665066028978, 'weight_decay': 0.002, 'warmup_steps': 39, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4634,0.524298,0.819432,0.808028,0.7188,0.744493
2,0.0868,0.511338,0.828598,0.82257,0.75546,0.772837
3,0.0721,0.492481,0.83593,0.82573,0.752117,0.775093
4,0.0681,0.506363,0.833181,0.828669,0.778049,0.789214
5,0.0669,0.494314,0.829514,0.781638,0.762908,0.760859
6,0.0645,0.502467,0.829514,0.798135,0.746747,0.759266
7,0.0636,0.486737,0.84418,0.844377,0.78782,0.801393
8,0.0623,0.504586,0.839597,0.827736,0.768936,0.781558
9,0.0614,0.496935,0.837764,0.839414,0.765513,0.785783
10,0.0604,0.498951,0.837764,0.834595,0.778055,0.792839


[I 2025-03-24 03:51:33,023] Trial 146 finished with value: 0.7860815022839646 and parameters: {'learning_rate': 0.003106665066028978, 'weight_decay': 0.002, 'warmup_steps': 39, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 29 with value: 0.8166135562260471.


Trial 147 with params: {'learning_rate': 0.0031491427651468345, 'weight_decay': 0.0, 'warmup_steps': 33, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.467,0.529361,0.823098,0.799378,0.718558,0.742527
2,0.0868,0.512177,0.830431,0.808481,0.738401,0.757844
3,0.0731,0.510373,0.834097,0.830867,0.774253,0.789162
4,0.0691,0.529957,0.828598,0.817398,0.738388,0.762603
5,0.0665,0.522505,0.826764,0.823135,0.764525,0.780937


[I 2025-03-24 03:53:16,522] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.003199645143713299, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4254,0.524632,0.814849,0.796281,0.726267,0.749085
2,0.0872,0.521625,0.830431,0.810055,0.762792,0.77439
3,0.0731,0.505664,0.833181,0.821729,0.764867,0.781121
4,0.0692,0.509963,0.832264,0.818436,0.774012,0.783384
5,0.0665,0.508577,0.834097,0.839329,0.779575,0.7964
6,0.0654,0.51782,0.829514,0.845525,0.773168,0.795912
7,0.0647,0.516677,0.834097,0.844438,0.782051,0.800258
8,0.0631,0.518138,0.823098,0.804469,0.753054,0.765774
9,0.0624,0.529241,0.823098,0.826259,0.759657,0.777712
10,0.0612,0.513264,0.827681,0.835369,0.76737,0.785918


[I 2025-03-24 03:56:21,053] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0041337567860579225, 'weight_decay': 0.001, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4177,0.499621,0.826764,0.79534,0.737995,0.751829
2,0.0854,0.506764,0.829514,0.838186,0.770784,0.786074
3,0.074,0.511343,0.823098,0.8263,0.752159,0.769307
4,0.0696,0.523154,0.828598,0.831806,0.759033,0.777205
5,0.0665,0.51045,0.83593,0.844635,0.783174,0.800067
6,0.0646,0.517992,0.829514,0.833363,0.779586,0.793468
7,0.0636,0.521977,0.833181,0.822346,0.780836,0.788704
8,0.0627,0.523429,0.830431,0.837805,0.784446,0.799817
9,0.0623,0.521101,0.828598,0.840038,0.775335,0.795016
10,0.0611,0.522667,0.828598,0.823946,0.763096,0.780421


[I 2025-03-24 03:59:24,354] Trial 149 pruned. 


In [47]:
print(best_trial_distill_aug)

BestRun(run_id='29', objective=0.8166135562260471, hyperparameters={'learning_rate': 0.004465858399905994, 'weight_decay': 0.002, 'warmup_steps': 49, 'lambda_param': 0.9, 'temperature': 2.0}, run_summary=None)
