# Prohledávání hyperparametrů pro model BiLSTM nad datasetem TREC (coarse) 

Tento notebook slouží k nalezení optimálních hyperparametrů nad datasetem TREC (coarse) pro model BiLSTM s odemčenou embedding vrstvou. Hyperparametry jsou hledány pro původní i augmentovaný dataset pro normální trénink i destilaci.

K prohledávání je využito knihovny Optuna s algoritmem Hyperband. Nejlepší konfigurace je volena na základě F1-skóre, zkoušeno je 150 kombinací hyperparametrů pro každou z variant.

## Import knihoven a základní nastavení

In [1]:
from transformers import BasicTokenizer, Trainer
from datasets import concatenate_datasets, load_from_disk
import kagglehub
import optuna
import torch
import math
import base

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


Resetování náhodného seedu pro replikovatelnost výsledků.

In [2]:
base.reset_seed()

Ověření dostupnosti GPU.

In [3]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Načtení embeddingů.

Načtení datasetu a jeho základní předzpracování (tokenizace, vytvoření slovníků všech tokenů, vytvoření indexu pro GloVe embeddingy).

In [4]:
my_glove = kagglehub.dataset_download("thanakomsn/glove6b300dtxt")
print(my_glove)

/home/jovyan/.cache/kagglehub/datasets/thanakomsn/glove6b300dtxt/versions/1


In [5]:
GLOVE_FILE = f"{my_glove}/glove.6B.300d.txt"
DATASET = "trec"

In [6]:
train_data = load_from_disk(f"~/data/{DATASET}/train-logits_coarse")
eval_data = load_from_disk(f"~/data/{DATASET}/eval-logits_coarse")
test_data = load_from_disk(f"~/data/{DATASET}/test-logits_coarse")

all_train_data = load_from_disk(f"~/data/{DATASET}/train-logits-augmented_coarse")

all_data = concatenate_datasets([load_from_disk(file) for file in [f"~/data/{DATASET}/eval-logits_coarse", f"~/data/{DATASET}/test-logits_coarse", f"~/data/{DATASET}/train-logits-augmented_coarse"]])
tokenizer = BasicTokenizer(do_lower_case=True)

Tokenizace.

In [7]:
train_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), train_data))
eval_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), eval_data))
test_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), test_data))

all_train_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), all_train_data))

all_data_tokens = list(map(lambda e: tokenizer.tokenize(e["sentence"]), all_data))

Získání všech unikátních tokenů v datasetu.

In [8]:
vocab = base.get_vocab(all_data_tokens)

Přiřazení indexu jednotlivým tokenům.

In [9]:
word_index = dict(zip(vocab, range(len(vocab))))

Získání indexů z GloVe embeddingů.

In [10]:
embeddings_index = base.get_embeddings_indeces(GLOVE_FILE)

Found 400000 word vectors.


Definice velikosti slovníku a velikosti embedding dimenze. 

In [11]:
print(len(vocab))
num_tokens = len(vocab) + 2
embedding_dim = 300

8766


Vytvoření vazby mezi tokeny (jejich indexy) a embeddingy. Část tokenů nebyla nalezena, což ovšem nepředstavuje problém.

In [12]:
embedding_matrix = base.get_embedding_matrix(num_tokens, embedding_dim, word_index, embeddings_index)

Converted 8551 words (215) misses


Přiřazení indexu tokenům v každé části datasetu.

In [13]:
train_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),train_data_tokens))
eval_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),eval_data_tokens))
test_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),test_data_tokens))

all_train_data_index = list(map(lambda x: list(map(lambda y: word_index[y], x)),all_train_data_tokens))

Zarovnání délky všech záznamů.

In [14]:
train_padded_data = list(map(lambda x: base.padd(x,60), train_data_index))
eval_padded_data = list(map(lambda x: base.padd(x,60), eval_data_index))
test_padded_data = list(map(lambda x: base.padd(x,60), test_data_index))

all_train_padded_data = list(map(lambda x: base.padd(x,60), all_train_data_index))

Přidání ID tokenů do každé části datasetu.

In [15]:
train_data = train_data.add_column("input_ids", train_padded_data)
eval_data = eval_data.add_column("input_ids", eval_padded_data)
test_data = test_data.add_column("input_ids", test_padded_data)

all_train_data = all_train_data.add_column("input_ids", all_train_padded_data)

Základní konfigurace tréninku během prohledávání. Optuna nepracuje s epochami, ale s kroky. Níže je prováděn přepočet. 

Minimální délka tréninku je pět epoch, maximální 15 epoch. Maximální počet kroků pro warm up je nastaven na 10 % první epochy.

In [16]:
num_epochs = 15
batch_size = 128

In [17]:
data_length = len(train_data)
min_r = math.ceil(data_length/batch_size)*5
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

## Prohledávání s normálním tréninkem nad původním datasetem
Definice hledaných hyperparametrů a jejich rozmezí.

In [18]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [19]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Získání modelu s definovanou odemčenou embedding vrstvou. 

In [20]:
def get_BiLSTM():
    return base.BiLSTMClassifier(embedding_matrix=embedding_matrix, embedding_dim=embedding_dim, fc_dim=400, hidden_dim=300, output_dim=6, freeze_embed=False)

In [21]:
base.reset_seed()

Konfigurace jednotlivých tréninků.

In [22]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-base-embedd_coarse_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-base-embedd_coarse_hp-search", epochs=num_epochs, batch_size=batch_size)

Konfigurace trenéra pro jednotlivé tréninky. 

In [23]:
trainer = Trainer(
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM()
)
  

Nastavení prohledávání.

In [24]:
best_trial_normal = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-embedd",
    n_trials=150
)

[I 2025-03-21 23:19:49,612] A new study created in memory with name: Base-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5878,1.34269,0.479377,0.383846,0.37055,0.333946
2,1.1348,0.940124,0.661778,0.59508,0.554501,0.565719
3,0.7734,0.735816,0.733272,0.620977,0.626803,0.618899
4,0.5798,0.658441,0.768103,0.667579,0.653776,0.655763
5,0.495,0.595582,0.794684,0.668816,0.679259,0.673455
6,0.4224,0.579615,0.79835,0.666671,0.683705,0.67423
7,0.357,0.572807,0.814849,0.688591,0.693982,0.69016
8,0.3182,0.566859,0.804766,0.676982,0.688753,0.681398
9,0.2759,0.573188,0.812099,0.684657,0.692726,0.686888
10,0.2416,0.622248,0.792851,0.776659,0.696067,0.701843


[I 2025-03-21 23:20:46,213] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3297,0.918127,0.659945,0.563858,0.564229,0.557041
2,0.694,0.618592,0.782768,0.66517,0.666245,0.664563
3,0.425,0.539052,0.816682,0.696689,0.697573,0.692205
4,0.3077,0.535889,0.826764,0.701832,0.704088,0.700685
5,0.2211,0.50164,0.846929,0.853538,0.774816,0.796563


[I 2025-03-21 23:21:30,143] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7203,1.655326,0.340972,0.362765,0.249777,0.17962
2,1.6099,1.561061,0.406966,0.245009,0.309278,0.20534
3,1.4954,1.405919,0.439963,0.391338,0.333151,0.264549
4,1.3239,1.258679,0.505958,0.538663,0.397588,0.37188
5,1.2004,1.15083,0.598533,0.547825,0.496304,0.500888
6,1.085,1.067062,0.632447,0.545349,0.533723,0.529509
7,1.0122,1.005855,0.650779,0.561525,0.546719,0.549242
8,0.9516,0.962923,0.656279,0.557335,0.557921,0.555301
9,0.9106,0.935768,0.662695,0.561598,0.562791,0.55912
10,0.8656,0.913392,0.667278,0.561897,0.570308,0.561815


[I 2025-03-21 23:22:17,281] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2239,0.771044,0.71494,0.633035,0.608659,0.610924
2,0.5524,0.557495,0.805683,0.679905,0.687805,0.683262
3,0.3171,0.552796,0.818515,0.871164,0.703504,0.713591
4,0.2019,0.506205,0.851512,0.864917,0.807664,0.826968
5,0.12,0.554299,0.846929,0.863021,0.802143,0.823622


[I 2025-03-21 23:22:39,637] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.051,0.653274,0.783685,0.673051,0.665745,0.666266
2,0.4337,0.506056,0.820348,0.688126,0.701508,0.692963
3,0.213,0.499795,0.851512,0.828201,0.797006,0.808271
4,0.1021,0.563385,0.850596,0.862434,0.796267,0.818201
5,0.0399,0.590622,0.863428,0.828853,0.835692,0.831703
6,0.0118,0.728003,0.861595,0.874093,0.815036,0.836486
7,0.0051,0.773751,0.857929,0.812816,0.838619,0.822087
8,0.0053,0.824343,0.859762,0.846939,0.821174,0.832164
9,0.0016,0.807195,0.868011,0.857802,0.828841,0.841275
10,0.0004,0.840121,0.866178,0.84833,0.827274,0.836608


[I 2025-03-21 23:23:51,949] Trial 4 finished with value: 0.8366135421499248 and parameters: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 0}. Best is trial 4 with value: 0.8366135421499248.


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6807,1.590803,0.342805,0.372201,0.251293,0.186013
2,1.4921,1.353353,0.442713,0.351813,0.342132,0.285472
3,1.2258,1.118858,0.6022,0.549687,0.498955,0.505882
4,1.0117,0.961686,0.660862,0.556657,0.561144,0.556366
5,0.8778,0.861609,0.709441,0.603395,0.60412,0.601987


[I 2025-03-21 23:24:14,862] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5351,1.256817,0.529789,0.556784,0.421831,0.418135
2,1.0105,0.840479,0.704858,0.619302,0.597586,0.60464
3,0.6513,0.679327,0.75527,0.638446,0.649441,0.636681
4,0.5017,0.612315,0.791934,0.683579,0.6746,0.675109
5,0.42,0.560739,0.802016,0.671633,0.686991,0.67855
6,0.3474,0.567049,0.79835,0.665354,0.685288,0.673111
7,0.2826,0.535882,0.830431,0.865974,0.715633,0.720376
8,0.2402,0.543051,0.821265,0.815787,0.728745,0.741181
9,0.1956,0.567383,0.819432,0.801592,0.716033,0.725545
10,0.1567,0.596941,0.819432,0.83424,0.760526,0.782479


[I 2025-03-21 23:26:17,330] Trial 6 finished with value: 0.7927515386785537 and parameters: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 3}. Best is trial 4 with value: 0.8366135421499248.


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6924,1.612135,0.336389,0.202616,0.246299,0.175816
2,1.5467,1.441526,0.418882,0.373917,0.317777,0.231663
3,1.3219,1.225729,0.512374,0.543368,0.403908,0.380597
4,1.1238,1.064648,0.627864,0.527604,0.534566,0.525925
5,0.9873,0.957166,0.658112,0.557748,0.56082,0.55579


[I 2025-03-21 23:26:42,960] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4898,1.215313,0.546288,0.568252,0.442749,0.449711
2,0.9596,0.798781,0.713107,0.627353,0.60861,0.613084
3,0.6137,0.652742,0.767186,0.648887,0.658355,0.6478
4,0.4686,0.596058,0.791934,0.681177,0.673777,0.673855
5,0.3919,0.547953,0.807516,0.675483,0.690648,0.682342


[I 2025-03-21 23:27:07,904] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4212,1.061715,0.609533,0.561692,0.508312,0.507757
2,0.8308,0.723814,0.737855,0.638586,0.628488,0.63011
3,0.5178,0.591983,0.791017,0.675612,0.67517,0.669682
4,0.39,0.541385,0.822181,0.698722,0.699358,0.697528
5,0.3163,0.522519,0.825848,0.690345,0.707186,0.697832
6,0.2471,0.548187,0.833181,0.827469,0.750142,0.760954
7,0.1817,0.513965,0.84143,0.843957,0.762302,0.781254
8,0.1342,0.563301,0.84418,0.845589,0.802234,0.816782
9,0.0977,0.622711,0.837764,0.850743,0.794027,0.813634
10,0.0684,0.621197,0.83593,0.833647,0.798912,0.813564


[I 2025-03-21 23:29:04,429] Trial 9 finished with value: 0.8046762508072982 and parameters: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 4 with value: 0.8366135421499248.


Trial 10 with params: {'learning_rate': 0.004518165681587256, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9801,0.625198,0.783685,0.67793,0.669669,0.664526
2,0.3644,0.443788,0.863428,0.835275,0.832123,0.832722
3,0.1084,0.571377,0.860678,0.855154,0.813381,0.828145
4,0.0409,0.630783,0.870761,0.881638,0.832596,0.849958
5,0.0107,0.650962,0.873511,0.862399,0.834762,0.845962
6,0.0059,0.706581,0.867094,0.862435,0.811122,0.829513
7,0.0008,0.724276,0.874427,0.869946,0.827073,0.842972
8,0.0002,0.7563,0.874427,0.883399,0.826935,0.847058
9,0.0001,0.776717,0.87626,0.885187,0.828268,0.848632
10,0.0001,0.791012,0.875344,0.884443,0.82727,0.84776


[I 2025-03-21 23:29:52,181] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0020056372842325635, 'weight_decay': 0.006, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0787,0.667226,0.771769,0.683049,0.654582,0.659884
2,0.457,0.551264,0.810266,0.687562,0.694763,0.685235
3,0.232,0.461999,0.854262,0.824184,0.798947,0.80936
4,0.1168,0.610886,0.843263,0.862598,0.788504,0.813158
5,0.0534,0.583403,0.871677,0.825046,0.831044,0.827486
6,0.0159,0.673272,0.873511,0.830448,0.830829,0.830245
7,0.0059,0.778952,0.868011,0.846285,0.837504,0.841383
8,0.0017,0.761372,0.870761,0.848899,0.822017,0.83356
9,0.001,0.776145,0.867094,0.849587,0.828141,0.837653
10,0.001,0.810396,0.866178,0.847308,0.828026,0.836372


[I 2025-03-21 23:31:17,008] Trial 11 finished with value: 0.8318923354071618 and parameters: {'learning_rate': 0.0020056372842325635, 'weight_decay': 0.006, 'warmup_steps': 0}. Best is trial 4 with value: 0.8366135421499248.


Trial 12 with params: {'learning_rate': 0.0033049565193748773, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9697,0.648098,0.771769,0.671028,0.6575,0.658299
2,0.3964,0.489776,0.84143,0.811535,0.790816,0.796955
3,0.1563,0.464671,0.864345,0.835006,0.807271,0.818462
4,0.0542,0.533143,0.869844,0.846329,0.827891,0.836115
5,0.0194,0.710688,0.860678,0.852358,0.823294,0.834251
6,0.0122,0.704817,0.871677,0.875244,0.815476,0.834208
7,0.0044,0.719346,0.871677,0.85295,0.830051,0.839787
8,0.001,0.796094,0.875344,0.872674,0.823782,0.842855
9,0.0003,0.81908,0.877177,0.873844,0.825142,0.844104
10,0.0001,0.826274,0.87626,0.872546,0.824378,0.843121


[I 2025-03-21 23:32:03,701] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0018997871267974278, 'weight_decay': 0.005, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1194,0.732695,0.743355,0.675418,0.623768,0.63672
2,0.5095,0.515839,0.817599,0.686715,0.698454,0.69194
3,0.2604,0.479781,0.846929,0.828055,0.737529,0.749504
4,0.1383,0.610536,0.831347,0.843533,0.780841,0.801102
5,0.0544,0.70017,0.84418,0.82668,0.809479,0.815168
6,0.0303,0.650373,0.859762,0.826805,0.830871,0.827594
7,0.0214,0.717581,0.866178,0.877609,0.817082,0.839028
8,0.0047,0.711347,0.858845,0.861135,0.810273,0.829846
9,0.0012,0.708396,0.869844,0.862761,0.838553,0.849371
10,0.0007,0.772974,0.867094,0.880212,0.817493,0.840461


[I 2025-03-21 23:33:33,928] Trial 13 finished with value: 0.8445909909234567 and parameters: {'learning_rate': 0.0018997871267974278, 'weight_decay': 0.005, 'warmup_steps': 2}. Best is trial 13 with value: 0.8445909909234567.


Trial 14 with params: {'learning_rate': 0.002120746655142563, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.093,0.68993,0.761687,0.676022,0.643526,0.651337
2,0.4703,0.500849,0.832264,0.704846,0.708502,0.705908
3,0.2349,0.474434,0.864345,0.851765,0.807423,0.824048
4,0.1174,0.60604,0.846929,0.851492,0.794363,0.813846
5,0.0473,0.574275,0.872594,0.860534,0.823156,0.838052
6,0.0213,0.668924,0.869844,0.84653,0.811409,0.825872
7,0.0058,0.718356,0.864345,0.83518,0.831989,0.832922
8,0.0011,0.751392,0.864345,0.838026,0.82583,0.831445
9,0.0005,0.781847,0.867094,0.841122,0.827795,0.83399
10,0.0003,0.804675,0.867094,0.849007,0.827759,0.837283


[I 2025-03-21 23:34:31,958] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.003827341260767903, 'weight_decay': 0.008, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0225,0.767774,0.754354,0.679734,0.636473,0.643023
2,0.401,0.458658,0.856095,0.87109,0.790823,0.81634
3,0.1452,0.533039,0.853346,0.840272,0.798261,0.814426
4,0.0509,0.663978,0.857929,0.85881,0.802835,0.822172
5,0.0196,0.689632,0.865261,0.846245,0.842874,0.843814
6,0.0133,0.730771,0.864345,0.849205,0.82016,0.83012
7,0.0057,0.72559,0.872594,0.852872,0.833443,0.84209
8,0.0009,0.787121,0.867094,0.851902,0.810663,0.826399
9,0.0003,0.81005,0.868011,0.863898,0.811629,0.830314
10,0.0002,0.824,0.868011,0.851496,0.811859,0.826824


[I 2025-03-21 23:35:51,678] Trial 15 finished with value: 0.8276608175167749 and parameters: {'learning_rate': 0.003827341260767903, 'weight_decay': 0.008, 'warmup_steps': 3}. Best is trial 13 with value: 0.8445909909234567.


Trial 16 with params: {'learning_rate': 0.0011533205291771927, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2348,0.790956,0.71494,0.631652,0.608795,0.611272
2,0.5744,0.560559,0.805683,0.68081,0.68736,0.683524
3,0.3425,0.552568,0.814849,0.70621,0.691148,0.693584
4,0.2276,0.521338,0.843263,0.79556,0.728004,0.731128
5,0.1331,0.578954,0.847846,0.862867,0.792731,0.816368


[I 2025-03-21 23:36:31,219] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0005057170281699356, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4544,1.143972,0.571952,0.577094,0.467412,0.477846
2,0.868,0.729007,0.739688,0.634933,0.632459,0.630832
3,0.5423,0.605326,0.787351,0.664102,0.673626,0.665059
4,0.4082,0.546636,0.814849,0.693581,0.693495,0.691734
5,0.3319,0.520166,0.823098,0.690095,0.702611,0.696016
6,0.2556,0.528284,0.827681,0.815802,0.734292,0.744574
7,0.1827,0.530161,0.833181,0.844778,0.773958,0.794397
8,0.1454,0.534714,0.84143,0.853975,0.799887,0.817643
9,0.1032,0.60968,0.83868,0.8546,0.804201,0.823027
10,0.0697,0.61897,0.830431,0.829527,0.795107,0.809902


[I 2025-03-21 23:37:19,816] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0008191030676050509, 'weight_decay': 0.007, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3335,0.906286,0.656279,0.582961,0.552744,0.560209
2,0.6914,0.615427,0.782768,0.66603,0.666869,0.664895
3,0.4143,0.535735,0.810266,0.694005,0.691893,0.689359
4,0.2957,0.531343,0.831347,0.712301,0.705022,0.705555
5,0.2109,0.50247,0.846929,0.857303,0.783188,0.805953
6,0.1328,0.56066,0.846013,0.828765,0.794897,0.806803
7,0.0846,0.583896,0.851512,0.862239,0.805667,0.824987
8,0.0478,0.599714,0.854262,0.866039,0.817955,0.835965
9,0.0245,0.726959,0.833181,0.84988,0.789954,0.811481
10,0.0159,0.759648,0.83868,0.817463,0.797419,0.804484


[I 2025-03-21 23:38:09,500] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00031715506418016835, 'weight_decay': 0.006, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5711,1.343031,0.468378,0.547856,0.365287,0.335621
2,1.1016,0.911594,0.670027,0.604075,0.564562,0.57465
3,0.7337,0.691057,0.757104,0.640346,0.647078,0.641499
4,0.5378,0.630179,0.780018,0.670237,0.665286,0.664618
5,0.4562,0.578603,0.797434,0.667918,0.682642,0.674306


[I 2025-03-21 23:38:33,407] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.004391486310509663, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0596,0.704239,0.757104,0.671976,0.639527,0.643977
2,0.408,0.478651,0.849679,0.880693,0.743134,0.752394
3,0.1392,0.521864,0.848763,0.843101,0.784343,0.805444
4,0.0495,0.58837,0.868011,0.882594,0.808803,0.833325
5,0.0141,0.725849,0.856095,0.861503,0.800145,0.821018
6,0.0039,0.765149,0.865261,0.874707,0.809855,0.831005
7,0.0009,0.751888,0.874427,0.864972,0.835295,0.847727
8,0.0002,0.777844,0.873511,0.864441,0.834636,0.847237
9,0.0001,0.792055,0.875344,0.865861,0.835969,0.848648
10,0.0001,0.803094,0.87626,0.876603,0.836967,0.853004


[I 2025-03-21 23:39:47,924] Trial 20 finished with value: 0.8536858457668641 and parameters: {'learning_rate': 0.004391486310509663, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 20 with value: 0.8536858457668641.


Trial 21 with params: {'learning_rate': 0.004279483560254982, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0153,0.648241,0.790101,0.710568,0.666821,0.676042
2,0.3776,0.452141,0.861595,0.862048,0.779721,0.79854
3,0.144,0.544706,0.846013,0.850592,0.79148,0.812911
4,0.0502,0.52706,0.87626,0.850775,0.816895,0.830648
5,0.0121,0.701163,0.87901,0.845255,0.838838,0.840939
6,0.0062,0.695923,0.869844,0.884214,0.81845,0.842616
7,0.0018,0.694176,0.880843,0.890465,0.840303,0.859444
8,0.0004,0.74027,0.88176,0.890741,0.840233,0.859691
9,0.0002,0.776699,0.87901,0.889454,0.838227,0.857834
10,0.0001,0.782251,0.880843,0.89053,0.839706,0.859226


[I 2025-03-21 23:41:03,429] Trial 21 finished with value: 0.8599168771006523 and parameters: {'learning_rate': 0.004279483560254982, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 21 with value: 0.8599168771006523.


Trial 22 with params: {'learning_rate': 0.004827531108315613, 'weight_decay': 0.004, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9948,0.725149,0.772686,0.669365,0.656983,0.654705
2,0.3707,0.427047,0.861595,0.813632,0.82372,0.817529
3,0.1099,0.508557,0.859762,0.842046,0.803975,0.819242
4,0.0433,0.538804,0.874427,0.884028,0.82819,0.847207
5,0.012,0.710267,0.868011,0.86573,0.819959,0.837522
6,0.0046,0.777959,0.867094,0.87476,0.811828,0.8324
7,0.001,0.791445,0.871677,0.869755,0.8238,0.841523
8,0.0019,0.835491,0.866178,0.877142,0.819849,0.840301
9,0.0003,0.843979,0.867094,0.845284,0.82075,0.831036
10,0.0001,0.858663,0.867094,0.879163,0.820383,0.841607


[I 2025-03-21 23:42:58,272] Trial 22 finished with value: 0.8424658856232389 and parameters: {'learning_rate': 0.004827531108315613, 'weight_decay': 0.004, 'warmup_steps': 4}. Best is trial 21 with value: 0.8599168771006523.


Trial 23 with params: {'learning_rate': 0.0034001365132481838, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0143,0.723515,0.756187,0.661829,0.641457,0.640755
2,0.4159,0.463739,0.852429,0.858907,0.781164,0.801473
3,0.1572,0.514035,0.853346,0.816022,0.797021,0.804362
4,0.0632,0.572463,0.868928,0.85596,0.821912,0.834746
5,0.0217,0.671778,0.862511,0.822817,0.832349,0.825942
6,0.0095,0.700459,0.861595,0.861782,0.814592,0.832597
7,0.0036,0.782004,0.868011,0.828681,0.821184,0.824252
8,0.0022,0.82867,0.865261,0.841577,0.817895,0.827777
9,0.0008,0.836851,0.866178,0.82139,0.818575,0.819711
10,0.0002,0.848407,0.865261,0.832677,0.817631,0.82415


[I 2025-03-21 23:43:56,703] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0048184234461823355, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0577,0.649699,0.778185,0.671694,0.661004,0.66133
2,0.38,0.459961,0.851512,0.848802,0.761435,0.778586
3,0.124,0.606258,0.849679,0.846902,0.784357,0.806435
4,0.0436,0.614911,0.874427,0.886846,0.824503,0.846989
5,0.0106,0.750995,0.862511,0.828724,0.814466,0.820034
6,0.0047,0.803279,0.855179,0.863691,0.822223,0.834397
7,0.0046,0.726104,0.871677,0.845201,0.832739,0.838234
8,0.001,0.834751,0.865261,0.856042,0.828292,0.838409
9,0.0002,0.837155,0.868011,0.868502,0.830297,0.844648
10,0.0001,0.845225,0.868011,0.868316,0.830248,0.844668


[I 2025-03-21 23:45:07,359] Trial 24 finished with value: 0.8447108472332268 and parameters: {'learning_rate': 0.0048184234461823355, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 25 with params: {'learning_rate': 0.0037555097582367652, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0638,0.751239,0.739688,0.663315,0.62681,0.630268
2,0.4298,0.468497,0.847846,0.875008,0.742183,0.749885
3,0.1503,0.518148,0.867094,0.851675,0.806866,0.824004
4,0.0523,0.589959,0.861595,0.846547,0.808207,0.821574
5,0.0214,0.617574,0.873511,0.855287,0.834259,0.843372
6,0.0081,0.815647,0.865261,0.874146,0.788921,0.813542
7,0.0044,0.760283,0.873511,0.879833,0.816616,0.837269
8,0.005,0.769545,0.870761,0.832178,0.832365,0.831615
9,0.0017,0.843588,0.867094,0.879711,0.809423,0.833428
10,0.0003,0.867432,0.869844,0.879248,0.811681,0.834594


[I 2025-03-21 23:46:01,946] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.004244290028895047, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0037,0.620583,0.773602,0.66872,0.658428,0.657356
2,0.3878,0.464503,0.857012,0.846731,0.806902,0.822415
3,0.1275,0.532025,0.855179,0.832978,0.814718,0.821298
4,0.0391,0.696489,0.850596,0.862034,0.80423,0.82404
5,0.0102,0.66905,0.867094,0.864938,0.828174,0.843557
6,0.0109,0.638019,0.868928,0.811838,0.831513,0.81953
7,0.0049,0.731963,0.870761,0.815447,0.83197,0.822564
8,0.0006,0.785967,0.877177,0.857936,0.836031,0.845888
9,0.0002,0.819907,0.874427,0.855832,0.833465,0.843477
10,0.0001,0.835998,0.873511,0.854937,0.832715,0.842677


[I 2025-03-21 23:47:17,176] Trial 26 finished with value: 0.842836443846063 and parameters: {'learning_rate': 0.004244290028895047, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 21 with value: 0.8599168771006523.


Trial 27 with params: {'learning_rate': 0.0049534770480705505, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0579,0.661661,0.769019,0.661654,0.65403,0.652441
2,0.3799,0.470192,0.854262,0.855046,0.774007,0.791628
3,0.1218,0.575942,0.853346,0.864967,0.796751,0.820867
4,0.0411,0.65227,0.861595,0.86622,0.80063,0.824793
5,0.0117,0.695287,0.863428,0.849239,0.837345,0.840062
6,0.0035,0.777866,0.865261,0.835104,0.809323,0.820108
7,0.0034,0.706696,0.877177,0.841815,0.846426,0.8439
8,0.0009,0.808452,0.866178,0.850363,0.828769,0.837591
9,0.0003,0.79476,0.874427,0.854259,0.835284,0.843372
10,0.0001,0.811943,0.871677,0.84913,0.82375,0.834364


[I 2025-03-21 23:48:07,319] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.004092452351192612, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0144,0.679007,0.773602,0.682242,0.656643,0.660845
2,0.3961,0.454048,0.861595,0.872687,0.79592,0.82
3,0.1371,0.520856,0.854262,0.877533,0.798564,0.824584
4,0.0518,0.624068,0.843263,0.850223,0.790713,0.811203
5,0.0255,0.668769,0.865261,0.854185,0.818459,0.833042
6,0.0083,0.757956,0.860678,0.860092,0.817016,0.832071
7,0.0046,0.771783,0.858845,0.839537,0.813041,0.823969
8,0.0006,0.863271,0.863428,0.864016,0.816735,0.834925
9,0.0002,0.877055,0.865261,0.864419,0.818417,0.836112
10,0.0002,0.894208,0.864345,0.863401,0.81775,0.835287


[I 2025-03-21 23:49:07,865] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.0039545935251655345, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0593,0.716644,0.753437,0.667345,0.640435,0.641854
2,0.4171,0.472894,0.853346,0.878117,0.791819,0.813907
3,0.1408,0.516055,0.865261,0.840365,0.807526,0.820645
4,0.0494,0.622088,0.865261,0.841887,0.821172,0.828218
5,0.0181,0.669212,0.875344,0.862544,0.825668,0.840485
6,0.0084,0.812743,0.865261,0.848526,0.816254,0.830038
7,0.0023,0.79422,0.870761,0.866082,0.834248,0.846234
8,0.0004,0.812341,0.868011,0.878417,0.820213,0.841263
9,0.0002,0.831579,0.868928,0.880364,0.83033,0.849525
10,0.0001,0.842382,0.869844,0.880224,0.831525,0.850047


[I 2025-03-21 23:51:16,125] Trial 29 finished with value: 0.85160594926881 and parameters: {'learning_rate': 0.0039545935251655345, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 30 with params: {'learning_rate': 0.0011993214992789504, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2455,0.777893,0.715857,0.632465,0.609518,0.611879
2,0.5642,0.561117,0.800183,0.672701,0.68367,0.677435
3,0.33,0.550609,0.813016,0.863792,0.699666,0.708302
4,0.2178,0.498605,0.847846,0.863913,0.803395,0.824517
5,0.126,0.573919,0.852429,0.847444,0.804335,0.82143
6,0.0751,0.586097,0.845096,0.799554,0.819711,0.807507
7,0.0333,0.611286,0.867094,0.851604,0.835989,0.843142
8,0.0163,0.701147,0.852429,0.866158,0.825155,0.841679
9,0.0135,0.718914,0.857929,0.873709,0.828988,0.847198
10,0.0043,0.744995,0.853346,0.866087,0.817006,0.835706


[I 2025-03-21 23:52:14,254] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.004023538706654465, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0622,0.701152,0.766269,0.672901,0.652229,0.653157
2,0.4142,0.480773,0.851512,0.875909,0.781957,0.801993
3,0.14,0.51321,0.866178,0.841307,0.806724,0.820618
4,0.0427,0.57964,0.862511,0.863411,0.816172,0.833491
5,0.0124,0.655616,0.872594,0.857651,0.842084,0.849231
6,0.0101,0.732021,0.869844,0.872303,0.795575,0.816092
7,0.0035,0.799281,0.868011,0.837806,0.820859,0.827813
8,0.0005,0.829141,0.868928,0.853691,0.821741,0.83392
9,0.0002,0.841223,0.868928,0.863495,0.822398,0.837541
10,0.0001,0.850264,0.870761,0.854368,0.824216,0.835943


[I 2025-03-21 23:53:52,282] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.004199033700803008, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0604,0.693912,0.752521,0.665594,0.634978,0.638716
2,0.4084,0.478128,0.851512,0.710812,0.727692,0.71797
3,0.1406,0.534752,0.858845,0.844122,0.811158,0.825025
4,0.0507,0.595177,0.868928,0.867801,0.808136,0.829949
5,0.0157,0.62759,0.878093,0.870248,0.819681,0.837501
6,0.0055,0.792218,0.868928,0.878945,0.831609,0.848575
7,0.004,0.758388,0.869844,0.854996,0.812138,0.828429
8,0.0012,0.773425,0.869844,0.855375,0.82249,0.835702
9,0.0003,0.779434,0.873511,0.854018,0.817244,0.830506
10,0.0002,0.798736,0.878093,0.85747,0.820198,0.834069


[I 2025-03-21 23:54:41,252] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.00412787973403558, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0134,0.675215,0.772686,0.689241,0.655847,0.66032
2,0.3843,0.448431,0.850596,0.866505,0.786576,0.812029
3,0.133,0.528074,0.857929,0.835712,0.801377,0.815067
4,0.0494,0.636366,0.857012,0.862344,0.798854,0.822138
5,0.0197,0.754992,0.865261,0.876931,0.807629,0.83083


[I 2025-03-21 23:55:05,274] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0022363183764139427, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0952,0.714378,0.750687,0.68528,0.631783,0.644761
2,0.4866,0.490463,0.831347,0.699215,0.709515,0.703831
3,0.2242,0.465809,0.858845,0.810152,0.775099,0.787758
4,0.11,0.615675,0.843263,0.8675,0.788691,0.814143
5,0.0409,0.591181,0.866178,0.852029,0.810042,0.825718
6,0.0231,0.688311,0.852429,0.826979,0.807738,0.81494
7,0.0157,0.640291,0.874427,0.839825,0.833754,0.836556
8,0.0031,0.729904,0.862511,0.852421,0.81438,0.829886
9,0.0007,0.74274,0.872594,0.861704,0.831905,0.844746
10,0.0004,0.76702,0.872594,0.861605,0.832003,0.844728


[I 2025-03-21 23:57:18,195] Trial 34 finished with value: 0.8432681408709374 and parameters: {'learning_rate': 0.0022363183764139427, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 35 with params: {'learning_rate': 0.004480812890680047, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.065,0.667222,0.776352,0.682525,0.656238,0.660833
2,0.3943,0.480748,0.855179,0.882212,0.783132,0.805697
3,0.1381,0.627955,0.829514,0.828729,0.778878,0.796867
4,0.0536,0.607617,0.869844,0.868274,0.811436,0.831526
5,0.0139,0.663138,0.878093,0.861679,0.83737,0.848141
6,0.0041,0.706738,0.871677,0.875838,0.83219,0.849611
7,0.0028,0.655399,0.88176,0.880132,0.841233,0.856602
8,0.0011,0.708161,0.88176,0.881238,0.840751,0.856856
9,0.0003,0.739152,0.879927,0.879812,0.838941,0.855538
10,0.0001,0.763412,0.875344,0.876577,0.835518,0.85225


[I 2025-03-21 23:58:54,436] Trial 35 finished with value: 0.8557982244729976 and parameters: {'learning_rate': 0.004480812890680047, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 36 with params: {'learning_rate': 5.370203809578854e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7274,1.673502,0.35472,0.351594,0.260506,0.199984
2,1.6305,1.589349,0.422548,0.211616,0.319738,0.238964
3,1.5507,1.492479,0.455545,0.229067,0.342135,0.273806
4,1.4218,1.348961,0.452796,0.387963,0.346077,0.296624
5,1.2945,1.246065,0.508708,0.542829,0.401508,0.381893
6,1.1898,1.165513,0.593951,0.559501,0.489534,0.493424
7,1.1167,1.102597,0.601283,0.544775,0.498115,0.502966
8,1.0528,1.051763,0.63978,0.547036,0.541377,0.538169
9,1.0101,1.020493,0.63703,0.542807,0.539283,0.535479
10,0.9695,0.997896,0.636114,0.538411,0.541216,0.533585


[I 2025-03-21 23:59:45,539] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 0.004811087422288426, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0007,0.626401,0.788268,0.682648,0.668841,0.668484
2,0.3554,0.461481,0.863428,0.846089,0.808734,0.821865
3,0.1102,0.574468,0.847846,0.852795,0.799572,0.820016
4,0.0396,0.658506,0.846929,0.848668,0.795221,0.812391
5,0.0116,0.762147,0.860678,0.806577,0.824749,0.813906


[I 2025-03-22 00:00:09,327] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.004842245634933987, 'weight_decay': 0.004, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0154,0.636182,0.794684,0.695611,0.67612,0.675701
2,0.3517,0.438935,0.857929,0.860397,0.81231,0.829542
3,0.1072,0.60742,0.839597,0.869192,0.785627,0.812998
4,0.0359,0.621488,0.87626,0.874919,0.825164,0.844334
5,0.0097,0.681618,0.87626,0.88872,0.82673,0.84894
6,0.0031,0.746273,0.877177,0.865213,0.827746,0.842644
7,0.002,0.729376,0.879927,0.862643,0.830606,0.843395
8,0.0002,0.767429,0.873511,0.881488,0.824842,0.845074
9,0.0001,0.778409,0.874427,0.869418,0.825471,0.842256
10,0.0001,0.788238,0.874427,0.858704,0.825206,0.838749


[I 2025-03-22 00:01:25,807] Trial 38 finished with value: 0.8394531610916324 and parameters: {'learning_rate': 0.004842245634933987, 'weight_decay': 0.004, 'warmup_steps': 1}. Best is trial 21 with value: 0.8599168771006523.


Trial 39 with params: {'learning_rate': 5.7801019639330395e-05, 'weight_decay': 0.002, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7282,1.67005,0.351971,0.355927,0.2585,0.19962
2,1.625,1.58122,0.412466,0.21757,0.31305,0.220911
3,1.5342,1.464029,0.445463,0.223409,0.335474,0.265126
4,1.3857,1.314756,0.48121,0.54256,0.373102,0.33822
5,1.2599,1.210845,0.541705,0.546347,0.434881,0.430625
6,1.1497,1.127232,0.615032,0.545379,0.513754,0.51302
7,1.0762,1.063747,0.628781,0.557234,0.52493,0.529917
8,1.0137,1.016929,0.647113,0.550934,0.548946,0.545261
9,0.973,0.987607,0.644363,0.547329,0.546432,0.542673
10,0.9301,0.96519,0.64528,0.54555,0.549524,0.54192


[I 2025-03-22 00:02:25,025] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.0013895077245751437, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1955,0.797641,0.724106,0.650252,0.614703,0.617379
2,0.5372,0.563429,0.802933,0.67598,0.68582,0.679467
3,0.3084,0.540442,0.819432,0.701629,0.694941,0.694801
4,0.1883,0.544398,0.84143,0.858038,0.791369,0.811532
5,0.1013,0.557323,0.861595,0.865021,0.821367,0.839064
6,0.075,0.621253,0.843263,0.860387,0.800789,0.820288
7,0.0207,0.706071,0.858845,0.873703,0.829114,0.847042
8,0.0068,0.717919,0.863428,0.866675,0.832061,0.846529
9,0.0042,0.768979,0.857012,0.858373,0.818723,0.834972
10,0.0038,0.790146,0.856095,0.85521,0.809133,0.826901


[I 2025-03-22 00:03:26,778] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 6.459897452290429e-05, 'weight_decay': 0.0, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7137,1.650393,0.342805,0.357006,0.251137,0.179517
2,1.6074,1.558575,0.417049,0.227722,0.31643,0.220973
3,1.4932,1.404898,0.439047,0.391167,0.332325,0.265115
4,1.3244,1.260494,0.504125,0.538075,0.395573,0.36857
5,1.2024,1.152542,0.597617,0.54793,0.496259,0.501156
6,1.0883,1.071859,0.633364,0.546405,0.534174,0.529668
7,1.0174,1.011178,0.64528,0.555793,0.542441,0.544531
8,0.9588,0.969859,0.648946,0.551349,0.551102,0.548365
9,0.9187,0.941944,0.656279,0.557124,0.556422,0.553343
10,0.874,0.91993,0.666361,0.561076,0.568733,0.560886


[I 2025-03-22 00:04:13,723] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.004569345608858716, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0704,0.657625,0.771769,0.673885,0.653958,0.656037
2,0.3849,0.474327,0.856095,0.882261,0.775912,0.796064
3,0.1273,0.654111,0.824931,0.844809,0.775146,0.798657
4,0.0527,0.579361,0.866178,0.861449,0.810285,0.827696
5,0.0118,0.7663,0.865261,0.86257,0.809429,0.827574


[I 2025-03-22 00:05:02,219] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0018152073227572087, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1206,0.735762,0.744271,0.670175,0.635478,0.636318
2,0.491,0.534895,0.808433,0.680008,0.690789,0.683922
3,0.2595,0.492452,0.847846,0.773902,0.728248,0.733888
4,0.1402,0.573685,0.853346,0.85732,0.809935,0.825971
5,0.0704,0.607755,0.860678,0.838875,0.821828,0.829076
6,0.0451,0.624368,0.859762,0.871932,0.80343,0.826587
7,0.0098,0.730082,0.866178,0.840297,0.834865,0.836926
8,0.0047,0.765745,0.859762,0.847392,0.813769,0.827023
9,0.0016,0.756077,0.861595,0.841103,0.832434,0.836542
10,0.0013,0.841284,0.854262,0.850071,0.789355,0.809922


[I 2025-03-22 00:05:59,657] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.004085348367302571, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0588,0.691079,0.76077,0.667651,0.646639,0.64668
2,0.4041,0.461972,0.851512,0.878532,0.735759,0.736928
3,0.1354,0.493662,0.865261,0.855338,0.809081,0.826425
4,0.0484,0.55844,0.863428,0.863819,0.80661,0.827359
5,0.0203,0.685067,0.868928,0.897301,0.792994,0.818958


[I 2025-03-22 00:06:38,098] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.004994119017945273, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0543,0.687187,0.770852,0.666166,0.655707,0.654362
2,0.3799,0.491222,0.849679,0.807563,0.762164,0.77095
3,0.1234,0.571561,0.858845,0.868063,0.801127,0.825133
4,0.0414,0.614333,0.868928,0.853435,0.811572,0.827117
5,0.0093,0.807722,0.872594,0.874453,0.824482,0.841862
6,0.0021,0.827635,0.873511,0.883632,0.825927,0.845631
7,0.0029,0.781988,0.871677,0.881322,0.814794,0.836991
8,0.0022,0.792806,0.879927,0.863513,0.849117,0.855057
9,0.0005,0.792332,0.88176,0.88039,0.841132,0.856765
10,0.0001,0.812215,0.878093,0.888351,0.82897,0.850255


[I 2025-03-22 00:07:51,597] Trial 45 finished with value: 0.8510026521663452 and parameters: {'learning_rate': 0.004994119017945273, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 46 with params: {'learning_rate': 0.004080387355373358, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9983,0.64694,0.761687,0.66028,0.649537,0.648062
2,0.3803,0.466031,0.854262,0.841453,0.805571,0.820349
3,0.1247,0.509786,0.855179,0.84834,0.816911,0.829849
4,0.045,0.61395,0.861595,0.865757,0.815588,0.833329
5,0.0125,0.679279,0.865261,0.844858,0.829059,0.835643
6,0.0068,0.794851,0.868928,0.879395,0.801691,0.826041
7,0.002,0.75495,0.873511,0.859668,0.825087,0.839061
8,0.0007,0.823219,0.877177,0.864054,0.826654,0.842055
9,0.0006,0.818034,0.878093,0.854581,0.828095,0.839466
10,0.0002,0.826462,0.878093,0.871595,0.818882,0.838054


[I 2025-03-22 00:09:08,040] Trial 46 finished with value: 0.836935635565866 and parameters: {'learning_rate': 0.004080387355373358, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 21 with value: 0.8599168771006523.


Trial 47 with params: {'learning_rate': 0.0007101947085849762, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.393,0.983076,0.637947,0.585428,0.531113,0.538248
2,0.7441,0.636891,0.778185,0.669821,0.662874,0.662311
3,0.4493,0.555041,0.812099,0.693035,0.691865,0.688918
4,0.3284,0.547158,0.822181,0.702428,0.698947,0.697547
5,0.2461,0.503206,0.843263,0.851455,0.771228,0.793266
6,0.1717,0.539734,0.849679,0.846574,0.797404,0.814453
7,0.1133,0.554631,0.843263,0.853339,0.790685,0.810491
8,0.0745,0.566819,0.846013,0.859003,0.812699,0.829697
9,0.0444,0.676046,0.836847,0.851923,0.793766,0.814727
10,0.024,0.720231,0.84143,0.855161,0.807334,0.824602


[I 2025-03-22 00:10:06,313] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0020003456430374125, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1096,0.742694,0.743355,0.693668,0.626321,0.640113
2,0.4807,0.517136,0.815765,0.687718,0.697119,0.691562
3,0.2467,0.475392,0.851512,0.82462,0.775876,0.79382
4,0.1276,0.613841,0.848763,0.867919,0.797166,0.818872
5,0.0593,0.602007,0.861595,0.850508,0.82546,0.835089
6,0.0293,0.684791,0.857929,0.873379,0.798596,0.824222
7,0.0069,0.708797,0.868928,0.866693,0.81981,0.838018
8,0.0036,0.721543,0.865261,0.876231,0.817575,0.83878
9,0.0009,0.764155,0.864345,0.854868,0.82519,0.837967
10,0.0006,0.81855,0.864345,0.867264,0.834111,0.848388


[I 2025-03-22 00:11:29,726] Trial 48 finished with value: 0.842518186084479 and parameters: {'learning_rate': 0.0020003456430374125, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 49 with params: {'learning_rate': 0.004894929388957926, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9981,0.64297,0.781852,0.682109,0.66378,0.665433
2,0.3623,0.469365,0.854262,0.840026,0.799639,0.814355
3,0.1106,0.660077,0.84418,0.842326,0.786083,0.806842
4,0.0412,0.62997,0.866178,0.877575,0.811426,0.832487
5,0.0182,0.746154,0.868011,0.810795,0.830248,0.818428


[I 2025-03-22 00:11:53,661] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.004755700589617742, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0592,0.67897,0.770852,0.677177,0.652091,0.656229
2,0.3802,0.452178,0.857929,0.837151,0.812601,0.822119
3,0.1254,0.556687,0.836847,0.847037,0.777469,0.800488
4,0.0492,0.556023,0.872594,0.882276,0.815644,0.83717
5,0.0121,0.7783,0.866178,0.852031,0.808262,0.82528


[I 2025-03-22 00:12:16,954] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.004869156800524472, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0538,0.646301,0.775435,0.665031,0.660218,0.657462
2,0.381,0.475933,0.850596,0.881139,0.75254,0.767427
3,0.1242,0.566456,0.857929,0.853622,0.801193,0.82091
4,0.0457,0.601923,0.873511,0.877558,0.830592,0.849331
5,0.0157,0.675961,0.866178,0.852271,0.820653,0.832697
6,0.0069,0.727133,0.863428,0.852923,0.816132,0.831083
7,0.0007,0.732265,0.866178,0.85351,0.819342,0.832835
8,0.0002,0.765409,0.872594,0.859328,0.824,0.838281
9,0.0001,0.786456,0.871677,0.858995,0.823272,0.837794
10,0.0001,0.804208,0.870761,0.8583,0.822605,0.837094


[I 2025-03-22 00:13:48,735] Trial 51 finished with value: 0.837736815791689 and parameters: {'learning_rate': 0.004869156800524472, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 52 with params: {'learning_rate': 0.004895548155384432, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0545,0.654109,0.771769,0.663283,0.657471,0.655084
2,0.3809,0.471501,0.857012,0.830545,0.785979,0.79891
3,0.1305,0.558908,0.857929,0.866733,0.800699,0.824266
4,0.0461,0.630516,0.868928,0.873388,0.818376,0.838165
5,0.0108,0.714583,0.859762,0.850927,0.833565,0.839416
6,0.0022,0.734141,0.869844,0.873333,0.839864,0.853569
7,0.0016,0.798495,0.864345,0.869162,0.83746,0.848269
8,0.0009,0.786414,0.871677,0.871313,0.832281,0.847857
9,0.0001,0.807117,0.871677,0.872033,0.832316,0.848229
10,0.0001,0.812526,0.870761,0.870972,0.831601,0.847375


[I 2025-03-22 00:15:03,505] Trial 52 finished with value: 0.845529857731262 and parameters: {'learning_rate': 0.004895548155384432, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 53 with params: {'learning_rate': 0.00021967416393079315, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6219,1.438879,0.492209,0.424851,0.380299,0.358599
2,1.2474,1.06785,0.615949,0.552502,0.514009,0.519476
3,0.9014,0.84978,0.68561,0.585043,0.588177,0.578996
4,0.6871,0.710782,0.757104,0.652346,0.643944,0.64439
5,0.5804,0.651216,0.773602,0.662371,0.6584,0.657344
6,0.5063,0.612296,0.792851,0.665354,0.677751,0.671245
7,0.4404,0.60708,0.7956,0.673212,0.678778,0.67493
8,0.3942,0.586938,0.793767,0.67082,0.679075,0.67344
9,0.3556,0.585346,0.791934,0.666151,0.678023,0.670455
10,0.3202,0.597128,0.802933,0.673151,0.685573,0.678196


[I 2025-03-22 00:16:19,456] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0014993382390299488, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1769,0.771945,0.72319,0.649835,0.616071,0.618301
2,0.5283,0.58646,0.784601,0.662775,0.673698,0.661813
3,0.2974,0.517216,0.827681,0.703773,0.702778,0.700475
4,0.1719,0.534844,0.847846,0.844163,0.804557,0.818555
5,0.0828,0.599007,0.867094,0.864874,0.807862,0.828764


[I 2025-03-22 00:16:42,343] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.002864469152099005, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0595,0.636975,0.771769,0.673525,0.657168,0.659929
2,0.4369,0.474571,0.843263,0.709064,0.719518,0.713974
3,0.1846,0.510416,0.854262,0.839957,0.805656,0.819012
4,0.0771,0.63116,0.851512,0.862731,0.806171,0.825822
5,0.0369,0.616709,0.872594,0.871769,0.822703,0.841387
6,0.0091,0.760843,0.861595,0.861784,0.802363,0.824157
7,0.0095,0.699971,0.861595,0.820577,0.834431,0.825582
8,0.0034,0.758732,0.857012,0.797241,0.829117,0.809187
9,0.0015,0.794232,0.868011,0.842802,0.838975,0.840501
10,0.0004,0.830126,0.868011,0.844127,0.838556,0.840843


[I 2025-03-22 00:17:51,260] Trial 55 finished with value: 0.8347241048928394 and parameters: {'learning_rate': 0.002864469152099005, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 21 with value: 0.8599168771006523.


Trial 56 with params: {'learning_rate': 0.004324182099109418, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0434,0.697908,0.76077,0.661955,0.649372,0.646201
2,0.4029,0.455724,0.857012,0.884383,0.740907,0.742503
3,0.1244,0.489002,0.870761,0.849934,0.81228,0.827024
4,0.0459,0.658451,0.863428,0.864836,0.806536,0.827313
5,0.0176,0.687293,0.868011,0.83054,0.829458,0.829869


[I 2025-03-22 00:18:18,298] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0047015763232524715, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0048,0.630674,0.794684,0.685744,0.672809,0.673537
2,0.3575,0.444575,0.863428,0.842146,0.799523,0.814307
3,0.1149,0.587647,0.847846,0.872307,0.792164,0.819583
4,0.046,0.643641,0.865261,0.877132,0.817485,0.839147
5,0.0114,0.686866,0.879927,0.877108,0.838986,0.854266
6,0.0023,0.749582,0.878093,0.875886,0.837592,0.8531
7,0.0004,0.783209,0.878093,0.876164,0.837349,0.853125
8,0.0002,0.79925,0.877177,0.875089,0.836682,0.852268
9,0.0001,0.819191,0.877177,0.875518,0.836633,0.852431
10,0.0001,0.831061,0.873511,0.871976,0.833869,0.84924


[I 2025-03-22 00:19:43,699] Trial 57 finished with value: 0.8495674795424012 and parameters: {'learning_rate': 0.0047015763232524715, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 21 with value: 0.8599168771006523.


Trial 58 with params: {'learning_rate': 0.004747158710398896, 'weight_decay': 0.002, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9924,0.667997,0.772686,0.681146,0.658782,0.657749
2,0.3606,0.443498,0.859762,0.815008,0.832236,0.821773
3,0.1072,0.538041,0.857012,0.854523,0.808598,0.826928
4,0.044,0.525537,0.885426,0.883308,0.843733,0.85988
5,0.0104,0.682163,0.877177,0.874109,0.843852,0.857115
6,0.0031,0.832448,0.868011,0.879997,0.829611,0.847732
7,0.0005,0.842878,0.868928,0.867898,0.819639,0.83833
8,0.0001,0.860511,0.866178,0.865431,0.817727,0.836169
9,0.0001,0.871485,0.867094,0.865928,0.818707,0.836934
10,0.0001,0.884196,0.868011,0.866648,0.819374,0.837624


[I 2025-03-22 00:21:38,498] Trial 58 finished with value: 0.8460610643777473 and parameters: {'learning_rate': 0.004747158710398896, 'weight_decay': 0.002, 'warmup_steps': 4}. Best is trial 21 with value: 0.8599168771006523.


Trial 59 with params: {'learning_rate': 0.0048602160405686, 'weight_decay': 0.01, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0164,0.656468,0.771769,0.663934,0.658849,0.654327
2,0.3786,0.468866,0.857929,0.828801,0.831164,0.8271
3,0.1155,0.546696,0.857012,0.860291,0.816667,0.834216
4,0.0433,0.646049,0.857012,0.87049,0.814596,0.832088
5,0.0212,0.63278,0.868011,0.878845,0.823289,0.842025
6,0.004,0.683212,0.864345,0.852044,0.81885,0.831981
7,0.0012,0.696516,0.866178,0.836296,0.838988,0.83725
8,0.0003,0.728073,0.868011,0.851156,0.840114,0.845116
9,0.0001,0.746665,0.866178,0.849858,0.838502,0.843681
10,0.0001,0.759813,0.865261,0.849142,0.837836,0.84298


[I 2025-03-22 00:23:14,988] Trial 59 finished with value: 0.8365192082737324 and parameters: {'learning_rate': 0.0048602160405686, 'weight_decay': 0.01, 'warmup_steps': 0}. Best is trial 21 with value: 0.8599168771006523.


Trial 60 with params: {'learning_rate': 0.004783042124121322, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0017,0.630045,0.789184,0.6847,0.669947,0.670382
2,0.3578,0.465971,0.857929,0.840523,0.804192,0.817181
3,0.1095,0.53546,0.855179,0.841049,0.80675,0.820149
4,0.0411,0.663793,0.851512,0.852537,0.799069,0.816598
5,0.012,0.683473,0.861595,0.82671,0.827383,0.825816
6,0.0129,0.824481,0.857012,0.846308,0.797517,0.816204
7,0.0054,0.749738,0.868011,0.84657,0.818922,0.83062
8,0.0007,0.762607,0.87626,0.858993,0.816817,0.833119
9,0.0003,0.814934,0.871677,0.85696,0.812896,0.830086
10,0.0001,0.828666,0.871677,0.856457,0.813146,0.830007


[I 2025-03-22 00:24:00,496] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.0021393164813841184, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1043,0.660449,0.769019,0.67137,0.653456,0.657018
2,0.4478,0.550027,0.807516,0.675643,0.694364,0.681438
3,0.2184,0.470619,0.852429,0.808221,0.7799,0.790574
4,0.1137,0.654373,0.83868,0.832195,0.795438,0.808532
5,0.0533,0.599348,0.868011,0.879959,0.80917,0.833284
6,0.0233,0.625107,0.856095,0.814138,0.810698,0.811936
7,0.0098,0.733158,0.864345,0.856791,0.807453,0.824702
8,0.0068,0.73912,0.860678,0.872673,0.812839,0.834572
9,0.0025,0.778965,0.866178,0.861546,0.808308,0.827611
10,0.0042,0.748791,0.863428,0.836676,0.825831,0.830105


[I 2025-03-22 00:24:57,770] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.004860453055013602, 'weight_decay': 0.002, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9994,0.73558,0.775435,0.673501,0.659552,0.657059
2,0.364,0.42082,0.861595,0.816798,0.824179,0.819621
3,0.11,0.532176,0.861595,0.865037,0.80427,0.825622
4,0.0425,0.54651,0.87901,0.886901,0.832346,0.85046
5,0.011,0.718311,0.87626,0.89007,0.835439,0.856465
6,0.0057,0.728901,0.873511,0.853136,0.834792,0.842684
7,0.0014,0.787136,0.879927,0.86527,0.830369,0.844586
8,0.0003,0.82571,0.880843,0.860772,0.839908,0.849215
9,0.0001,0.84446,0.880843,0.860239,0.840469,0.849257
10,0.0001,0.85811,0.88176,0.869383,0.841185,0.853217


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-22 00:26:30,656] Trial 62 finished with value: 0.8461412842571834 and parameters: {'learning_rate': 0.004860453055013602, 'weight_decay': 0.002, 'warmup_steps': 4}. Best is trial 21 with value: 0.8599168771006523.


Trial 63 with params: {'learning_rate': 0.0024579778748657277, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.103,0.770232,0.743355,0.681716,0.633238,0.636809
2,0.4457,0.524586,0.812099,0.683446,0.695486,0.688019
3,0.2166,0.508645,0.837764,0.83833,0.768,0.786813
4,0.0992,0.649037,0.840513,0.83803,0.768543,0.78966
5,0.0392,0.565376,0.874427,0.823873,0.834041,0.828137


[I 2025-03-22 00:27:06,977] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00011912397327149118, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6792,1.588017,0.345555,0.357086,0.253487,0.191026
2,1.4845,1.341117,0.452796,0.350717,0.351039,0.299068
3,1.2135,1.10572,0.613199,0.551529,0.510551,0.516152
4,0.9998,0.952542,0.663611,0.56062,0.56322,0.559151
5,0.8661,0.852706,0.713107,0.606916,0.607334,0.605156
6,0.7652,0.78912,0.731439,0.620572,0.622244,0.620791
7,0.6899,0.757585,0.737855,0.634757,0.625057,0.62716
8,0.6322,0.722799,0.758937,0.639272,0.649731,0.643791
9,0.5977,0.700534,0.76077,0.64732,0.646658,0.644591
10,0.55,0.690178,0.766269,0.640916,0.656665,0.648137


[I 2025-03-22 00:27:54,146] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.00010546468583372021, 'weight_decay': 0.008, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6927,1.606316,0.336389,0.205224,0.246286,0.175899
2,1.5285,1.401851,0.424381,0.371588,0.322931,0.242974
3,1.2796,1.178456,0.546288,0.550961,0.43693,0.427664
4,1.0699,1.01574,0.637947,0.534338,0.543691,0.533638
5,0.9334,0.909667,0.68561,0.582052,0.583771,0.581349


[I 2025-03-22 00:28:48,592] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.004814585044790494, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0139,0.614143,0.802016,0.703642,0.680657,0.684228
2,0.357,0.42915,0.860678,0.825083,0.832138,0.827944
3,0.1063,0.524698,0.857012,0.864435,0.800317,0.823651
4,0.0387,0.600177,0.868928,0.873354,0.826748,0.845795
5,0.0131,0.630693,0.88176,0.878051,0.841905,0.856399
6,0.0029,0.742234,0.874427,0.886275,0.833524,0.853847
7,0.0006,0.765229,0.87626,0.875892,0.836396,0.852416
8,0.0003,0.798657,0.878093,0.889639,0.837843,0.857904
9,0.0001,0.817076,0.880843,0.890627,0.840038,0.859584
10,0.0001,0.830853,0.87901,0.889335,0.838607,0.858215


[I 2025-03-22 00:30:53,212] Trial 66 finished with value: 0.861541151855867 and parameters: {'learning_rate': 0.004814585044790494, 'weight_decay': 0.001, 'warmup_steps': 4}. Best is trial 66 with value: 0.861541151855867.


Trial 67 with params: {'learning_rate': 0.004069054697997178, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.027,0.609996,0.786434,0.682634,0.665974,0.669694
2,0.394,0.438568,0.852429,0.834167,0.815732,0.824094
3,0.1198,0.475479,0.873511,0.87762,0.822745,0.843867
4,0.0397,0.587358,0.869844,0.868706,0.832585,0.846387
5,0.016,0.728504,0.868928,0.848873,0.831891,0.839216
6,0.0077,0.737546,0.855179,0.841676,0.813031,0.823314
7,0.0043,0.786852,0.862511,0.827524,0.826744,0.826655
8,0.0009,0.802541,0.869844,0.859057,0.831997,0.84341
9,0.0003,0.828146,0.872594,0.86139,0.833614,0.845484
10,0.0002,0.846615,0.871677,0.859973,0.833211,0.844565


[I 2025-03-22 00:32:24,809] Trial 67 finished with value: 0.8455566785532618 and parameters: {'learning_rate': 0.004069054697997178, 'weight_decay': 0.001, 'warmup_steps': 4}. Best is trial 66 with value: 0.861541151855867.


Trial 68 with params: {'learning_rate': 0.004438886731630525, 'weight_decay': 0.0, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0135,0.631599,0.781852,0.679671,0.666623,0.664589
2,0.3801,0.423137,0.854262,0.810095,0.80016,0.80469
3,0.117,0.515054,0.859762,0.86315,0.802018,0.824605
4,0.0409,0.638445,0.846013,0.857398,0.809774,0.828138
5,0.0222,0.733081,0.858845,0.860956,0.81088,0.829285


[I 2025-03-22 00:32:51,661] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.0018218011565438292, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1381,0.779772,0.727773,0.66503,0.619352,0.623826
2,0.4881,0.555963,0.802016,0.672178,0.68894,0.678016
3,0.2589,0.542192,0.825848,0.792294,0.728711,0.7444
4,0.1361,0.522186,0.860678,0.874533,0.815663,0.835267
5,0.059,0.548914,0.864345,0.870853,0.798778,0.820665
6,0.0269,0.607426,0.868011,0.861892,0.838099,0.848273
7,0.0109,0.731661,0.861595,0.837601,0.822988,0.828987
8,0.0038,0.71407,0.867094,0.866907,0.828181,0.843907
9,0.0011,0.735844,0.870761,0.862264,0.839756,0.849703
10,0.0006,0.762799,0.868011,0.860129,0.837786,0.847795


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-22 00:34:16,025] Trial 69 finished with value: 0.8475233545173868 and parameters: {'learning_rate': 0.0018218011565438292, 'weight_decay': 0.001, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 70 with params: {'learning_rate': 0.004895275999318223, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9992,0.646762,0.786434,0.685478,0.666619,0.66806
2,0.3608,0.459237,0.859762,0.828362,0.805097,0.814102
3,0.1122,0.578256,0.853346,0.855124,0.795357,0.817026
4,0.0403,0.61568,0.867094,0.831434,0.822271,0.825302
5,0.0124,0.696251,0.873511,0.823632,0.824212,0.823626
6,0.0037,0.858199,0.857929,0.875906,0.8018,0.826376
7,0.0032,0.794948,0.868928,0.847969,0.83231,0.837625
8,0.0025,0.803233,0.872594,0.836794,0.815636,0.824184
9,0.0002,0.799755,0.875344,0.841124,0.8279,0.833402
10,0.0002,0.814456,0.873511,0.847038,0.826187,0.834544


[I 2025-03-22 00:35:27,155] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0017699093202578372, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1476,0.756067,0.743355,0.661737,0.634563,0.634348
2,0.4928,0.565095,0.794684,0.666384,0.68269,0.670827
3,0.2646,0.500641,0.83868,0.810703,0.740675,0.755006
4,0.1524,0.540529,0.846013,0.837824,0.804134,0.816056
5,0.0723,0.566841,0.868928,0.829573,0.818364,0.823517


[I 2025-03-22 00:35:51,577] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00010295616529943657, 'weight_decay': 0.005, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6818,1.600025,0.338222,0.372012,0.247681,0.178047
2,1.5214,1.396883,0.428048,0.375727,0.326391,0.249147
3,1.279,1.181142,0.546288,0.554391,0.436393,0.428389
4,1.0749,1.022605,0.636114,0.533633,0.542044,0.532354
5,0.9432,0.921949,0.670027,0.568975,0.571563,0.567646
6,0.8429,0.856029,0.699358,0.59988,0.592553,0.594462
7,0.7708,0.81374,0.718607,0.615402,0.609706,0.610661
8,0.7081,0.774565,0.738772,0.622216,0.631523,0.626379
9,0.6684,0.750498,0.746104,0.636282,0.632879,0.631139
10,0.6197,0.736253,0.744271,0.622781,0.637436,0.629317


[I 2025-03-22 00:37:18,917] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.0004201995563692489, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5144,1.198807,0.563703,0.571444,0.456746,0.46563
2,0.9476,0.799079,0.72044,0.619788,0.612518,0.613439
3,0.6013,0.631522,0.772686,0.653741,0.661987,0.653121
4,0.4572,0.582992,0.808433,0.691866,0.687731,0.687202
5,0.3785,0.537615,0.809349,0.678153,0.691303,0.684355


[I 2025-03-22 00:37:48,423] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0015478504391193734, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1752,0.783244,0.716774,0.643068,0.612488,0.611777
2,0.5229,0.569702,0.7956,0.675643,0.682282,0.672216
3,0.2976,0.537189,0.824931,0.761636,0.707799,0.715749
4,0.1773,0.59656,0.831347,0.836841,0.775105,0.79277
5,0.0924,0.588346,0.863428,0.834431,0.797522,0.811167
6,0.039,0.6012,0.859762,0.810423,0.803897,0.806267
7,0.0147,0.676064,0.865261,0.866509,0.835243,0.848539
8,0.0048,0.709058,0.863428,0.840427,0.83479,0.837289
9,0.0024,0.773701,0.860678,0.860398,0.812558,0.831177
10,0.0013,0.821517,0.857929,0.857129,0.810875,0.828777


[I 2025-03-22 00:38:52,674] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.002058219135364151, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1074,0.686518,0.769019,0.674156,0.653634,0.656374
2,0.4551,0.512489,0.808433,0.676503,0.693417,0.683617
3,0.226,0.506709,0.839597,0.821388,0.757042,0.776843
4,0.1147,0.565088,0.854262,0.869118,0.801264,0.82271
5,0.0493,0.570641,0.864345,0.818345,0.835446,0.825616


[I 2025-03-22 00:39:15,170] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.002237075771185007, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0957,0.720881,0.748854,0.689962,0.629512,0.643374
2,0.4842,0.492318,0.830431,0.699644,0.70797,0.703271
3,0.2229,0.45774,0.859762,0.816636,0.793469,0.803417
4,0.1063,0.679631,0.83868,0.864113,0.787319,0.811422
5,0.0465,0.562465,0.861595,0.830145,0.806896,0.816521
6,0.0292,0.634857,0.875344,0.884647,0.81554,0.839019
7,0.0094,0.680518,0.864345,0.846523,0.825453,0.834801
8,0.0024,0.777829,0.860678,0.860017,0.802741,0.823689
9,0.0022,0.766844,0.865261,0.862779,0.818592,0.835432
10,0.0004,0.793153,0.862511,0.842108,0.815353,0.826848


[I 2025-03-22 00:40:07,783] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.004868373253771223, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0168,0.611112,0.8011,0.686036,0.682791,0.680241
2,0.3533,0.427986,0.860678,0.849272,0.814142,0.828355
3,0.1011,0.581667,0.855179,0.846665,0.816687,0.829448
4,0.0399,0.682341,0.857012,0.846869,0.812514,0.825223
5,0.0128,0.675176,0.859762,0.86206,0.822155,0.837986
6,0.0045,0.748773,0.877177,0.886682,0.817673,0.841073
7,0.0029,0.723413,0.878093,0.872925,0.82831,0.84535
8,0.0006,0.738074,0.878093,0.8746,0.828487,0.84629
9,0.0002,0.750921,0.87626,0.872918,0.827073,0.84478
10,0.0001,0.760962,0.87626,0.872646,0.827073,0.844647


[I 2025-03-22 00:41:25,609] Trial 77 finished with value: 0.8444864138222137 and parameters: {'learning_rate': 0.004868373253771223, 'weight_decay': 0.0, 'warmup_steps': 1}. Best is trial 66 with value: 0.861541151855867.


Trial 78 with params: {'learning_rate': 0.0032916411926229043, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0308,0.670548,0.758937,0.658819,0.647431,0.645884
2,0.4352,0.49151,0.84143,0.79449,0.789641,0.790707
3,0.1647,0.492513,0.854262,0.859533,0.798723,0.820225
4,0.0611,0.592537,0.863428,0.876026,0.805986,0.829545
5,0.0218,0.594176,0.875344,0.849552,0.835106,0.841603
6,0.0137,0.792951,0.856095,0.861392,0.783976,0.804549
7,0.0063,0.753459,0.869844,0.880366,0.821458,0.842785
8,0.001,0.792604,0.875344,0.887711,0.824934,0.84755
9,0.0004,0.801009,0.875344,0.884573,0.826574,0.84755
10,0.0002,0.834058,0.874427,0.884123,0.825562,0.846803


[I 2025-03-22 00:42:53,195] Trial 78 finished with value: 0.8459427673907798 and parameters: {'learning_rate': 0.0032916411926229043, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 79 with params: {'learning_rate': 0.00245906566009336, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0925,0.765742,0.747938,0.681684,0.631337,0.64051
2,0.4676,0.483031,0.836847,0.706977,0.71284,0.709237
3,0.2139,0.505026,0.849679,0.840658,0.785113,0.804515
4,0.0983,0.546602,0.857012,0.824087,0.812487,0.816691
5,0.0403,0.616027,0.872594,0.873127,0.83253,0.848899
6,0.0148,0.694685,0.853346,0.833696,0.797946,0.810549
7,0.0101,0.70507,0.866178,0.879311,0.835909,0.853407
8,0.0019,0.754823,0.869844,0.868023,0.822138,0.839513
9,0.001,0.774818,0.872594,0.836384,0.842551,0.838984
10,0.0003,0.793357,0.869844,0.860594,0.839424,0.848813


[I 2025-03-22 00:44:14,076] Trial 79 finished with value: 0.8489628085567024 and parameters: {'learning_rate': 0.00245906566009336, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 80 with params: {'learning_rate': 0.004967048350796694, 'weight_decay': 0.005, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0102,0.59336,0.796517,0.687068,0.676613,0.677337
2,0.358,0.453506,0.870761,0.876719,0.81369,0.834201
3,0.111,0.590863,0.858845,0.863456,0.801468,0.823581
4,0.0424,0.552702,0.871677,0.858591,0.822627,0.83735
5,0.0095,0.693276,0.868928,0.841537,0.831801,0.836148
6,0.0013,0.760606,0.867094,0.852642,0.821902,0.833697
7,0.0009,0.810249,0.860678,0.863596,0.822467,0.839162
8,0.0001,0.822931,0.868928,0.850323,0.829414,0.838814
9,0.0001,0.837426,0.868928,0.850253,0.829414,0.838781
10,0.0001,0.855731,0.868928,0.858368,0.829678,0.841987


[I 2025-03-22 00:45:49,529] Trial 80 finished with value: 0.8414565231582746 and parameters: {'learning_rate': 0.004967048350796694, 'weight_decay': 0.005, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 81 with params: {'learning_rate': 0.0008699781245994665, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3268,0.88617,0.666361,0.601659,0.561002,0.572837
2,0.6772,0.603593,0.780935,0.663997,0.665346,0.6634
3,0.4041,0.541848,0.816682,0.700696,0.696854,0.694649
4,0.2845,0.533116,0.834097,0.711922,0.709094,0.707312
5,0.1988,0.503839,0.846013,0.857152,0.783103,0.805807
6,0.1186,0.567557,0.846929,0.82025,0.79513,0.804232
7,0.073,0.592064,0.857929,0.868042,0.821266,0.838244
8,0.0391,0.645275,0.853346,0.868616,0.806889,0.829302
9,0.0169,0.742424,0.836847,0.849471,0.794373,0.813588
10,0.0146,0.734852,0.842346,0.839021,0.800661,0.814145


[I 2025-03-22 00:46:47,768] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0021514497084715273, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1014,0.664962,0.769019,0.671548,0.653276,0.656912
2,0.4479,0.547665,0.806599,0.674992,0.692467,0.680104
3,0.2185,0.47126,0.855179,0.810764,0.781671,0.792711
4,0.1128,0.647878,0.842346,0.854079,0.798506,0.817658
5,0.051,0.61114,0.866178,0.864243,0.808685,0.82891


[I 2025-03-22 00:47:12,292] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.004674786107780741, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0541,0.687854,0.76352,0.668693,0.646211,0.64785
2,0.3853,0.455538,0.855179,0.862934,0.79311,0.813478
3,0.1236,0.520492,0.859762,0.851451,0.803432,0.821382
4,0.0424,0.562358,0.863428,0.85939,0.796538,0.817848
5,0.0094,0.770145,0.864345,0.863838,0.808403,0.827878
6,0.0086,0.714465,0.871677,0.876341,0.817303,0.835359
7,0.0019,0.80873,0.864345,0.8643,0.807477,0.828107
8,0.0004,0.822757,0.867094,0.857243,0.818978,0.834552
9,0.0002,0.842503,0.866178,0.856374,0.817966,0.833643
10,0.0001,0.863205,0.864345,0.855426,0.816302,0.832233


[I 2025-03-22 00:48:29,310] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0047371631613001616, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0033,0.627998,0.794684,0.687562,0.674052,0.673854
2,0.3568,0.463956,0.853346,0.812056,0.8004,0.804952
3,0.1122,0.610559,0.84143,0.833884,0.785934,0.803152
4,0.0429,0.632009,0.858845,0.873387,0.803645,0.82739
5,0.0105,0.714481,0.870761,0.869846,0.833591,0.846967
6,0.0015,0.833221,0.874427,0.870181,0.827349,0.843136
7,0.0003,0.834688,0.877177,0.874988,0.837401,0.852466
8,0.0001,0.846829,0.878093,0.875825,0.838063,0.853191
9,0.0001,0.860531,0.877177,0.875176,0.837383,0.852551
10,0.0001,0.872813,0.877177,0.875176,0.837383,0.852551


[I 2025-03-22 00:49:55,718] Trial 84 finished with value: 0.8490774559962104 and parameters: {'learning_rate': 0.0047371631613001616, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 85 with params: {'learning_rate': 0.0035553692358613297, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0225,0.694651,0.772686,0.692963,0.655021,0.661233
2,0.4132,0.466081,0.849679,0.85803,0.788681,0.809098
3,0.1507,0.49432,0.861595,0.854315,0.805023,0.82318
4,0.0607,0.591298,0.860678,0.874042,0.806373,0.828282
5,0.0325,0.629599,0.863428,0.843071,0.83312,0.83722
6,0.0142,0.697887,0.872594,0.862837,0.823368,0.838665
7,0.0029,0.723848,0.878093,0.856197,0.838472,0.846147
8,0.0006,0.78412,0.874427,0.86388,0.835111,0.847438
9,0.0002,0.814186,0.872594,0.853495,0.833667,0.842547
10,0.0001,0.828899,0.872594,0.853495,0.833667,0.842547


[I 2025-03-22 00:51:21,229] Trial 85 finished with value: 0.8424380216299331 and parameters: {'learning_rate': 0.0035553692358613297, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 86 with params: {'learning_rate': 0.0002597113179487162, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5907,1.355349,0.48396,0.397152,0.373378,0.339065
2,1.1629,0.973103,0.640697,0.580713,0.534483,0.546177
3,0.7996,0.748026,0.731439,0.617923,0.624218,0.61753
4,0.6041,0.672784,0.767186,0.667245,0.652513,0.655034
5,0.5177,0.610443,0.793767,0.669785,0.678298,0.673287


[I 2025-03-22 00:51:55,481] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0017271306498404771, 'weight_decay': 0.005, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1522,0.766768,0.729606,0.652504,0.622347,0.621833
2,0.501,0.574186,0.792851,0.667075,0.681411,0.668775
3,0.2709,0.509079,0.837764,0.80156,0.757535,0.77167
4,0.1471,0.540344,0.848763,0.853746,0.805855,0.822491
5,0.0715,0.550659,0.868011,0.830808,0.81822,0.824056
6,0.0442,0.675999,0.853346,0.845897,0.805543,0.822033
7,0.0098,0.764602,0.858845,0.870859,0.812021,0.833045
8,0.0069,0.679097,0.871677,0.872809,0.840479,0.854357
9,0.0022,0.775188,0.861595,0.865026,0.832011,0.846238
10,0.0006,0.814955,0.856095,0.858937,0.818366,0.835041


[I 2025-03-22 00:53:13,814] Trial 87 finished with value: 0.8307799162765273 and parameters: {'learning_rate': 0.0017271306498404771, 'weight_decay': 0.005, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 88 with params: {'learning_rate': 0.003924304174202288, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0206,0.724671,0.762603,0.683626,0.643571,0.651949
2,0.4049,0.456364,0.854262,0.860478,0.772572,0.794713
3,0.1435,0.525371,0.866178,0.884421,0.808081,0.833696
4,0.0493,0.591266,0.869844,0.881755,0.811377,0.835399
5,0.0137,0.828278,0.861595,0.877864,0.803178,0.828993
6,0.0049,0.928823,0.850596,0.864515,0.797086,0.818821
7,0.0037,0.809086,0.860678,0.827063,0.83351,0.829942
8,0.0003,0.874455,0.857929,0.846365,0.812185,0.826003
9,0.0002,0.899059,0.857012,0.856683,0.811173,0.828666
10,0.0001,0.913386,0.857929,0.857438,0.81184,0.829364


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 00:54:57,448] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.004855384074094552, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0059,0.620504,0.790101,0.680368,0.670649,0.668083
2,0.3513,0.451672,0.858845,0.836283,0.796187,0.809422
3,0.1072,0.60625,0.847846,0.837716,0.792211,0.809632
4,0.0408,0.594078,0.864345,0.864959,0.816452,0.835307
5,0.0209,0.707317,0.866178,0.86103,0.820701,0.83472
6,0.0099,0.683757,0.872594,0.864595,0.816679,0.832756
7,0.0019,0.744912,0.87626,0.862065,0.826547,0.840832
8,0.0017,0.724964,0.870761,0.830853,0.8245,0.82665
9,0.0006,0.757264,0.873511,0.84843,0.825448,0.835069
10,0.0002,0.778759,0.873511,0.84843,0.825448,0.835069


[I 2025-03-22 00:55:46,792] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0049365596716173426, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0012,0.649265,0.779102,0.676258,0.66157,0.661292
2,0.3636,0.455343,0.860678,0.854337,0.795604,0.815256
3,0.1069,0.576788,0.855179,0.855854,0.797783,0.819107
4,0.0437,0.595182,0.865261,0.878058,0.819109,0.839552
5,0.0137,0.699147,0.861595,0.81932,0.831969,0.82338


[I 2025-03-22 00:56:08,976] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0038896361329533932, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0562,0.776067,0.734189,0.658805,0.623167,0.626125
2,0.4265,0.456371,0.852429,0.712744,0.727821,0.719986
3,0.1455,0.499018,0.863428,0.841776,0.80561,0.8194
4,0.0564,0.623297,0.861595,0.874294,0.807515,0.828706
5,0.0195,0.644935,0.872594,0.850256,0.825352,0.835268
6,0.0027,0.786856,0.88176,0.887273,0.823676,0.844643
7,0.001,0.76895,0.879927,0.874239,0.831984,0.847233
8,0.0018,0.75272,0.880843,0.877287,0.841294,0.855223
9,0.0007,0.791572,0.880843,0.869635,0.841066,0.852641
10,0.0003,0.809222,0.87626,0.855339,0.83739,0.84507


[I 2025-03-22 00:58:23,254] Trial 91 finished with value: 0.8533007638476637 and parameters: {'learning_rate': 0.0038896361329533932, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 92 with params: {'learning_rate': 0.002286346777310063, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0952,0.736738,0.739688,0.690087,0.623405,0.637414
2,0.4694,0.504331,0.830431,0.702971,0.707704,0.703414
3,0.222,0.484237,0.862511,0.837593,0.796777,0.812168
4,0.1067,0.636076,0.851512,0.870416,0.798205,0.821034
5,0.0453,0.609362,0.855179,0.830356,0.80915,0.818211


[I 2025-03-22 00:59:16,040] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.0013335798536248983, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1916,0.783551,0.712191,0.641247,0.606138,0.610506
2,0.5545,0.573879,0.799267,0.675465,0.683079,0.677036
3,0.3223,0.578586,0.813016,0.703713,0.68821,0.691278
4,0.2086,0.506142,0.851512,0.861856,0.800279,0.818869
5,0.1079,0.567064,0.857929,0.867284,0.791965,0.815431
6,0.0725,0.601156,0.836847,0.806107,0.804805,0.80175
7,0.0229,0.666613,0.860678,0.840493,0.830443,0.834426
8,0.0124,0.679006,0.863428,0.862976,0.8249,0.840334
9,0.0059,0.746531,0.862511,0.875118,0.823632,0.843562
10,0.0023,0.770554,0.863428,0.86342,0.823862,0.840062


[I 2025-03-22 01:00:36,063] Trial 93 finished with value: 0.8379019399995523 and parameters: {'learning_rate': 0.0013335798536248983, 'weight_decay': 0.009000000000000001, 'warmup_steps': 1}. Best is trial 66 with value: 0.861541151855867.


Trial 94 with params: {'learning_rate': 0.004620454531033379, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0673,0.683417,0.762603,0.667935,0.645972,0.648221
2,0.3903,0.467875,0.858845,0.866848,0.795724,0.815957
3,0.131,0.578295,0.840513,0.849065,0.778033,0.801726
4,0.0461,0.558869,0.867094,0.867994,0.8188,0.837053
5,0.0112,0.735488,0.863428,0.827449,0.828053,0.827318


[I 2025-03-22 01:01:14,111] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.004703115676231689, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0585,0.665726,0.777269,0.674739,0.660203,0.660517
2,0.3846,0.453034,0.861595,0.860409,0.780041,0.798009
3,0.1211,0.558089,0.852429,0.850737,0.796916,0.816575
4,0.0464,0.582172,0.863428,0.862508,0.806563,0.825993
5,0.0112,0.705659,0.865261,0.854565,0.819459,0.833459
6,0.0029,0.732481,0.867094,0.853754,0.821567,0.833631
7,0.0011,0.736279,0.868011,0.860679,0.830092,0.843088
8,0.0003,0.760272,0.871677,0.862762,0.833704,0.846068
9,0.0001,0.783157,0.870761,0.861761,0.833037,0.845214
10,0.0001,0.798727,0.870761,0.861818,0.83297,0.845234


[I 2025-03-22 01:02:33,422] Trial 95 finished with value: 0.8444843072682073 and parameters: {'learning_rate': 0.004703115676231689, 'weight_decay': 0.003, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 96 with params: {'learning_rate': 0.004135670313382532, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0514,0.679616,0.756187,0.665876,0.640178,0.641943
2,0.4055,0.482112,0.850596,0.711752,0.726735,0.717558
3,0.1392,0.52631,0.857012,0.816262,0.809391,0.811345
4,0.0559,0.584408,0.866178,0.864055,0.807606,0.828098
5,0.0162,0.661359,0.873511,0.845084,0.816832,0.827708


[I 2025-03-22 01:02:57,146] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0015687027204991748, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1643,0.765618,0.724106,0.650465,0.617733,0.618161
2,0.5218,0.575513,0.789184,0.667552,0.677937,0.666162
3,0.2953,0.528125,0.815765,0.751984,0.703233,0.706972
4,0.1698,0.54816,0.846929,0.847713,0.786226,0.805026
5,0.0795,0.602921,0.860678,0.85204,0.795295,0.813611
6,0.0485,0.576047,0.861595,0.816102,0.82352,0.819227
7,0.0199,0.690628,0.860678,0.839038,0.821897,0.82913
8,0.0099,0.673345,0.868011,0.862818,0.83647,0.848179
9,0.0031,0.7427,0.862511,0.865892,0.833768,0.84748
10,0.0011,0.770901,0.862511,0.86736,0.832766,0.847686


[I 2025-03-22 01:04:27,874] Trial 97 finished with value: 0.8443191718367592 and parameters: {'learning_rate': 0.0015687027204991748, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 98 with params: {'learning_rate': 0.004710514480809791, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0051,0.624136,0.794684,0.688396,0.673842,0.674871
2,0.3576,0.448855,0.863428,0.831709,0.80049,0.811713
3,0.1158,0.56497,0.857929,0.8769,0.799576,0.826203
4,0.0431,0.700003,0.866178,0.867495,0.818019,0.837008
5,0.0164,0.677502,0.867094,0.848924,0.830995,0.838463
6,0.0082,0.758005,0.864345,0.819488,0.818527,0.818849
7,0.0068,0.705902,0.871677,0.85816,0.824167,0.837976
8,0.0023,0.789358,0.863428,0.853319,0.818122,0.832438
9,0.0003,0.817946,0.864345,0.835114,0.82695,0.830782
10,0.0001,0.839292,0.863428,0.840307,0.826,0.832621


[I 2025-03-22 01:05:45,476] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0022463988696438566, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0957,0.743942,0.741522,0.69046,0.6248,0.637374
2,0.4786,0.509865,0.822181,0.700523,0.699797,0.698255
3,0.2264,0.476811,0.857012,0.842514,0.791532,0.81006
4,0.111,0.611427,0.847846,0.850289,0.795791,0.813947
5,0.0477,0.604638,0.858845,0.841003,0.801575,0.817411


[I 2025-03-22 01:06:09,266] Trial 99 pruned. 


Trial 100 with params: {'learning_rate': 0.003060948053221695, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0439,0.666351,0.779102,0.676254,0.66347,0.662228
2,0.4179,0.47956,0.843263,0.884547,0.724644,0.733784
3,0.17,0.555063,0.852429,0.834168,0.793559,0.809248
4,0.082,0.613907,0.848763,0.873753,0.789045,0.818005
5,0.0312,0.617779,0.869844,0.87496,0.841768,0.854725
6,0.01,0.718234,0.867094,0.84965,0.829099,0.837981
7,0.0021,0.73006,0.871677,0.833055,0.842555,0.837066
8,0.0009,0.806109,0.864345,0.865549,0.825547,0.84185
9,0.0004,0.821253,0.868011,0.847936,0.829282,0.837492
10,0.0002,0.835139,0.869844,0.84918,0.830642,0.838774


[I 2025-03-22 01:07:08,116] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0019954839033846108, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.131,0.741899,0.743355,0.654689,0.633736,0.632918
2,0.4679,0.549097,0.811182,0.689023,0.695282,0.685788
3,0.2477,0.518157,0.834097,0.800219,0.776335,0.782546
4,0.1277,0.555132,0.849679,0.838601,0.807653,0.81836
5,0.0616,0.55396,0.863428,0.816293,0.834405,0.824002
6,0.026,0.663394,0.863428,0.873882,0.79626,0.820547
7,0.0117,0.668875,0.862511,0.841179,0.814413,0.825766
8,0.0033,0.718193,0.868011,0.867064,0.818035,0.837114
9,0.0009,0.726528,0.871677,0.860709,0.831047,0.843733
10,0.0004,0.758483,0.873511,0.871898,0.832464,0.848575


[I 2025-03-22 01:08:22,791] Trial 101 finished with value: 0.8435153047622047 and parameters: {'learning_rate': 0.0019954839033846108, 'weight_decay': 0.001, 'warmup_steps': 4}. Best is trial 66 with value: 0.861541151855867.


Trial 102 with params: {'learning_rate': 0.0017054296866496192, 'weight_decay': 0.004, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1623,0.772037,0.72044,0.642853,0.615262,0.613902
2,0.5034,0.55531,0.802933,0.675022,0.687915,0.678396
3,0.272,0.551236,0.828598,0.805721,0.729745,0.745922
4,0.1529,0.587896,0.837764,0.847376,0.797433,0.813996
5,0.0701,0.54937,0.869844,0.854505,0.820879,0.834421
6,0.0426,0.633034,0.858845,0.848284,0.802084,0.819877
7,0.0098,0.768844,0.860678,0.858776,0.82484,0.836914
8,0.0051,0.745803,0.862511,0.855893,0.832975,0.842821
9,0.004,0.742085,0.864345,0.864912,0.825299,0.841382
10,0.0013,0.768175,0.864345,0.863845,0.825601,0.841123


[I 2025-03-22 01:09:49,743] Trial 102 finished with value: 0.8411834007361899 and parameters: {'learning_rate': 0.0017054296866496192, 'weight_decay': 0.004, 'warmup_steps': 4}. Best is trial 66 with value: 0.861541151855867.


Trial 103 with params: {'learning_rate': 0.002050832647565712, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1093,0.694244,0.765353,0.672602,0.651394,0.654075
2,0.4638,0.53801,0.809349,0.683058,0.692586,0.685872
3,0.2363,0.471662,0.846929,0.798789,0.765731,0.776572
4,0.1225,0.561581,0.860678,0.869301,0.796988,0.81851
5,0.0536,0.584079,0.858845,0.84535,0.802052,0.818658
6,0.0204,0.707871,0.858845,0.847654,0.812171,0.826175
7,0.0088,0.715052,0.859762,0.841039,0.822475,0.830718
8,0.005,0.714167,0.868011,0.868287,0.828241,0.844584
9,0.0026,0.826975,0.860678,0.876367,0.792458,0.819122
10,0.0017,0.7523,0.863428,0.835631,0.826356,0.830342


[I 2025-03-22 01:10:39,618] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.0029603730697884438, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0576,0.648725,0.772686,0.679863,0.656642,0.660762
2,0.4307,0.473415,0.848763,0.878951,0.732401,0.735567
3,0.1729,0.507297,0.864345,0.864813,0.805119,0.826791
4,0.0733,0.613341,0.857012,0.85972,0.811187,0.828882
5,0.0259,0.651859,0.868011,0.879372,0.819199,0.841025
6,0.0084,0.709083,0.859762,0.867014,0.815244,0.831553
7,0.0117,0.730194,0.849679,0.798397,0.816661,0.801956
8,0.0109,0.702484,0.869844,0.827849,0.830047,0.828364
9,0.0023,0.76876,0.872594,0.859834,0.832205,0.843784
10,0.0007,0.788959,0.870761,0.858909,0.830409,0.842236


[I 2025-03-22 01:11:54,292] Trial 104 finished with value: 0.8504518299133793 and parameters: {'learning_rate': 0.0029603730697884438, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 105 with params: {'learning_rate': 0.003359333912514363, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0506,0.712021,0.745188,0.669364,0.629572,0.637386
2,0.4265,0.462347,0.846013,0.70997,0.721922,0.715695
3,0.1549,0.518135,0.852429,0.849537,0.798384,0.816409
4,0.0662,0.60618,0.856095,0.833013,0.801616,0.812183
5,0.0228,0.667632,0.861595,0.849972,0.804405,0.821828
6,0.0071,0.745676,0.867094,0.842866,0.801016,0.815595
7,0.0017,0.76045,0.872594,0.842258,0.841368,0.841573
8,0.0008,0.855019,0.861595,0.831677,0.803374,0.815499
9,0.0005,0.831599,0.865261,0.838619,0.825561,0.831579
10,0.0002,0.851578,0.863428,0.837437,0.824201,0.830307


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-22 01:13:21,052] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0024358020003067884, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0908,0.721884,0.754354,0.684604,0.640679,0.647623
2,0.4472,0.501691,0.827681,0.698608,0.705407,0.701211
3,0.2062,0.526737,0.852429,0.846084,0.797526,0.81522
4,0.0959,0.703231,0.829514,0.839949,0.780402,0.798607
5,0.0347,0.660148,0.857012,0.814318,0.81718,0.814472


[I 2025-03-22 01:13:47,727] Trial 106 pruned. 


Trial 107 with params: {'learning_rate': 0.004515968649412248, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0582,0.67391,0.773602,0.679145,0.656303,0.659236
2,0.3998,0.486886,0.847846,0.876254,0.750871,0.764329
3,0.1338,0.590787,0.84418,0.856092,0.790041,0.813583
4,0.0511,0.60661,0.860678,0.873928,0.804881,0.827636
5,0.0153,0.718151,0.861595,0.842314,0.832942,0.83714
6,0.0039,0.733678,0.866178,0.876413,0.831133,0.846736
7,0.0032,0.74574,0.867094,0.847698,0.830838,0.837394
8,0.0003,0.786251,0.871677,0.857661,0.824875,0.837854
9,0.0002,0.811017,0.872594,0.859449,0.825344,0.83901
10,0.0001,0.824058,0.872594,0.860144,0.825344,0.839326


[I 2025-03-22 01:14:59,172] Trial 107 finished with value: 0.8401771539381042 and parameters: {'learning_rate': 0.004515968649412248, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 108 with params: {'learning_rate': 0.002279996971277142, 'weight_decay': 0.002, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0815,0.687796,0.766269,0.683602,0.652514,0.655555
2,0.4591,0.485847,0.840513,0.711134,0.714903,0.71189
3,0.2214,0.513034,0.856095,0.83706,0.781341,0.798794
4,0.0987,0.612548,0.846929,0.850486,0.794596,0.813315
5,0.0466,0.595398,0.868011,0.851913,0.827045,0.837922
6,0.0139,0.659194,0.873511,0.858152,0.824716,0.838225
7,0.0039,0.726197,0.870761,0.819115,0.839908,0.827897
8,0.0026,0.761441,0.867094,0.852378,0.837498,0.843419
9,0.0011,0.780257,0.868928,0.858302,0.829898,0.842034
10,0.0004,0.802387,0.867094,0.857518,0.827854,0.840572


[I 2025-03-22 01:16:14,511] Trial 108 finished with value: 0.8399664174580606 and parameters: {'learning_rate': 0.002279996971277142, 'weight_decay': 0.002, 'warmup_steps': 1}. Best is trial 66 with value: 0.861541151855867.


Trial 109 with params: {'learning_rate': 0.004954557837064162, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0584,0.679036,0.767186,0.661992,0.653162,0.650789
2,0.379,0.482629,0.849679,0.844836,0.762248,0.776043
3,0.1266,0.599358,0.846929,0.862262,0.781674,0.808133
4,0.0417,0.689385,0.864345,0.880813,0.823931,0.845637
5,0.0125,0.769014,0.857929,0.839487,0.829534,0.831925
6,0.0039,0.831947,0.860678,0.871777,0.806763,0.826865
7,0.0034,0.778506,0.871677,0.865578,0.841248,0.852045
8,0.0003,0.797104,0.872594,0.872923,0.833471,0.849446
9,0.0001,0.81661,0.870761,0.868802,0.822951,0.840511
10,0.0001,0.830867,0.869844,0.867751,0.822236,0.839659


[I 2025-03-22 01:17:03,672] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0007407215661032523, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3619,0.947761,0.651696,0.569854,0.553528,0.548181
2,0.7189,0.620171,0.777269,0.673091,0.661473,0.663014
3,0.4348,0.548128,0.815765,0.698325,0.694935,0.693414
4,0.3216,0.54841,0.818515,0.70219,0.696612,0.695934
5,0.2423,0.496237,0.840513,0.838614,0.751892,0.769137


[I 2025-03-22 01:17:27,883] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.004817592961405681, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0079,0.609924,0.791017,0.682225,0.671013,0.670436
2,0.3533,0.445735,0.861595,0.826998,0.798315,0.808377
3,0.1138,0.540902,0.847846,0.81873,0.792587,0.803144
4,0.0457,0.620214,0.862511,0.866509,0.814609,0.833762
5,0.0147,0.632668,0.868928,0.822077,0.821118,0.821534
6,0.0013,0.740904,0.868011,0.876963,0.820954,0.840521
7,0.0003,0.767636,0.870761,0.866204,0.822843,0.839154
8,0.0001,0.777349,0.874427,0.869212,0.826598,0.842569
9,0.0001,0.791764,0.873511,0.868495,0.8256,0.841728
10,0.0001,0.802372,0.873511,0.881872,0.825649,0.845614


[I 2025-03-22 01:18:56,960] Trial 111 finished with value: 0.8435097550817777 and parameters: {'learning_rate': 0.004817592961405681, 'weight_decay': 0.001, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 112 with params: {'learning_rate': 0.0028663928341780593, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0757,0.693527,0.762603,0.665649,0.647935,0.648685
2,0.4331,0.511509,0.829514,0.701797,0.708405,0.703334
3,0.1915,0.496641,0.856095,0.808711,0.800305,0.80336
4,0.077,0.62195,0.840513,0.846475,0.780525,0.800419
5,0.0276,0.711188,0.863428,0.879886,0.804478,0.830176
6,0.0154,0.641689,0.862511,0.857796,0.807216,0.82537
7,0.0051,0.756153,0.860678,0.849105,0.8131,0.827811
8,0.0008,0.828032,0.860678,0.849754,0.813983,0.827921
9,0.0003,0.837854,0.861595,0.848612,0.814768,0.828267
10,0.0002,0.85046,0.861595,0.848554,0.814719,0.828276


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-22 01:20:15,463] Trial 112 pruned. 


Trial 113 with params: {'learning_rate': 0.0024133507238817066, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0911,0.717326,0.753437,0.66678,0.634767,0.642584
2,0.4652,0.478498,0.837764,0.70791,0.713583,0.71004
3,0.2136,0.514643,0.846929,0.819454,0.776241,0.790331
4,0.1035,0.573072,0.854262,0.829413,0.799999,0.811069
5,0.0344,0.603867,0.873511,0.87397,0.833292,0.849301
6,0.0188,0.710276,0.861595,0.87048,0.808165,0.827398
7,0.0083,0.727489,0.868928,0.854592,0.837659,0.845303
8,0.0022,0.769847,0.868928,0.847796,0.837691,0.842232
9,0.0008,0.819214,0.867094,0.866991,0.827748,0.843799
10,0.0003,0.827866,0.868928,0.866821,0.829879,0.84481


[I 2025-03-22 01:22:06,753] Trial 113 finished with value: 0.8395967262661043 and parameters: {'learning_rate': 0.0024133507238817066, 'weight_decay': 0.003, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 114 with params: {'learning_rate': 0.0023450534311540857, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0923,0.72848,0.745188,0.692375,0.626818,0.641169
2,0.4747,0.495738,0.831347,0.702924,0.70802,0.704514
3,0.2166,0.478049,0.858845,0.818811,0.792277,0.803561
4,0.0994,0.61688,0.851512,0.87,0.79815,0.820906
5,0.0432,0.654044,0.849679,0.8713,0.793901,0.820356


[I 2025-03-22 01:22:31,162] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0023844132798106865, 'weight_decay': 0.001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.096,0.733159,0.758937,0.680579,0.639223,0.64803
2,0.4723,0.48083,0.839597,0.70958,0.71477,0.711556
3,0.2259,0.489238,0.854262,0.819785,0.770474,0.786143
4,0.1031,0.527717,0.862511,0.837859,0.833607,0.834676
5,0.0354,0.566063,0.868928,0.838262,0.821196,0.828706
6,0.0155,0.725581,0.862511,0.844561,0.815135,0.826114
7,0.0064,0.764833,0.858845,0.835506,0.821194,0.82776
8,0.0045,0.73516,0.869844,0.869669,0.831034,0.846713
9,0.001,0.783659,0.863428,0.855427,0.833663,0.843236
10,0.0004,0.809041,0.865261,0.85826,0.83548,0.845722


[I 2025-03-22 01:24:13,385] Trial 115 finished with value: 0.844711209671145 and parameters: {'learning_rate': 0.0023844132798106865, 'weight_decay': 0.001, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 116 with params: {'learning_rate': 0.00481158629724762, 'weight_decay': 0.001, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0076,0.605568,0.784601,0.661448,0.672654,0.663372
2,0.3526,0.454744,0.863428,0.823308,0.83382,0.827529
3,0.1032,0.604138,0.84418,0.860893,0.797557,0.820963
4,0.0399,0.648315,0.868928,0.88203,0.811009,0.835035
5,0.0116,0.651373,0.860678,0.80844,0.833364,0.818404


[I 2025-03-22 01:24:46,345] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.00489802302511677, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0565,0.637681,0.775435,0.663132,0.660601,0.657268
2,0.377,0.469142,0.858845,0.885739,0.777716,0.798602
3,0.119,0.598823,0.846929,0.850579,0.791092,0.812458
4,0.042,0.678131,0.860678,0.860057,0.792419,0.815274
5,0.0125,0.684126,0.865261,0.858601,0.829016,0.839922
6,0.0045,0.778791,0.873511,0.880771,0.817034,0.837769
7,0.0049,0.683497,0.873511,0.862605,0.834481,0.846127
8,0.0012,0.738578,0.873511,0.864246,0.834181,0.846817
9,0.0003,0.762351,0.873511,0.8724,0.824611,0.842967
10,0.0001,0.776991,0.87626,0.874314,0.827605,0.845426


[I 2025-03-22 01:26:34,691] Trial 117 finished with value: 0.8431139764977885 and parameters: {'learning_rate': 0.00489802302511677, 'weight_decay': 0.0, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 118 with params: {'learning_rate': 0.0005058685099924152, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4633,1.156272,0.56462,0.578143,0.458274,0.466873
2,0.8775,0.722471,0.744271,0.637656,0.636116,0.634736
3,0.5433,0.602601,0.786434,0.663218,0.672517,0.664698
4,0.4061,0.545079,0.813016,0.69189,0.692633,0.690558
5,0.3285,0.521036,0.825848,0.691969,0.705167,0.698252
6,0.2543,0.5299,0.829514,0.80384,0.726916,0.73245
7,0.1846,0.5276,0.83593,0.843324,0.767554,0.786967
8,0.1485,0.525859,0.845096,0.858065,0.80228,0.821464
9,0.1044,0.607388,0.826764,0.845636,0.785316,0.805948
10,0.0711,0.620471,0.832264,0.851874,0.797243,0.818611


[I 2025-03-22 01:27:47,146] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.002062049606106389, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1073,0.689222,0.766269,0.672209,0.650911,0.65379
2,0.4577,0.522864,0.811182,0.683281,0.69384,0.686824
3,0.2339,0.469348,0.850596,0.80257,0.768517,0.780101
4,0.12,0.539295,0.861595,0.854855,0.798003,0.816033
5,0.05,0.584705,0.860678,0.836051,0.803059,0.816179


[I 2025-03-22 01:28:22,891] Trial 119 pruned. 


Trial 120 with params: {'learning_rate': 0.0036855114399215855, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0332,0.639414,0.769019,0.665138,0.654148,0.654171
2,0.4142,0.475038,0.839597,0.776904,0.808198,0.788797
3,0.143,0.506294,0.857929,0.84087,0.800133,0.816436
4,0.0523,0.633238,0.852429,0.875293,0.79517,0.82187
5,0.0254,0.712123,0.854262,0.848539,0.805101,0.822313
6,0.0115,0.672972,0.864345,0.840222,0.827201,0.832594
7,0.002,0.736169,0.874427,0.884305,0.825963,0.847026
8,0.0005,0.787551,0.866178,0.866793,0.828967,0.844079
9,0.0002,0.804666,0.863428,0.864492,0.826667,0.841816
10,0.0002,0.81691,0.863428,0.864431,0.826619,0.841847


[I 2025-03-22 01:29:44,003] Trial 120 finished with value: 0.8433483620079953 and parameters: {'learning_rate': 0.0036855114399215855, 'weight_decay': 0.001, 'warmup_steps': 4}. Best is trial 66 with value: 0.861541151855867.


Trial 121 with params: {'learning_rate': 0.004812222946087088, 'weight_decay': 0.002, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.997,0.628042,0.791017,0.681557,0.672843,0.671108
2,0.3613,0.428771,0.863428,0.811885,0.835557,0.821741
3,0.1118,0.517225,0.863428,0.854576,0.804822,0.823843
4,0.041,0.522191,0.880843,0.886258,0.833396,0.851228
5,0.0118,0.666476,0.868011,0.841675,0.826004,0.832881
6,0.0036,0.815082,0.871677,0.871343,0.812981,0.834227
7,0.0005,0.825272,0.872594,0.846088,0.834025,0.83953
8,0.0005,0.85086,0.868011,0.836082,0.830263,0.832955
9,0.0011,0.814683,0.869844,0.819506,0.832547,0.824852
10,0.0003,0.827293,0.873511,0.868473,0.816501,0.835213


[I 2025-03-22 01:31:33,499] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.004782032426806539, 'weight_decay': 0.003, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0071,0.557694,0.811182,0.686812,0.692049,0.688404
2,0.3504,0.450696,0.858845,0.819076,0.812472,0.815542
3,0.1084,0.524009,0.857012,0.851685,0.810708,0.826822
4,0.0375,0.556463,0.871677,0.884055,0.811538,0.836604
5,0.0105,0.747047,0.868928,0.871562,0.818456,0.839093
6,0.013,0.802643,0.854262,0.85873,0.819633,0.833642
7,0.0044,0.829832,0.859762,0.848664,0.813667,0.827751
8,0.0007,0.873094,0.857012,0.845202,0.81101,0.824556
9,0.0002,0.883421,0.856095,0.845136,0.810321,0.824415
10,0.0001,0.898903,0.855179,0.844604,0.809323,0.82364


[I 2025-03-22 01:32:28,510] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.00472357798721968, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9966,0.646836,0.773602,0.68409,0.659658,0.659845
2,0.3606,0.432772,0.861595,0.827961,0.832497,0.829125
3,0.112,0.526554,0.856095,0.865141,0.798694,0.822788
4,0.0375,0.583037,0.868011,0.877845,0.821753,0.840859
5,0.0252,0.657235,0.872594,0.874803,0.832887,0.849625
6,0.0062,0.823179,0.865261,0.867853,0.808307,0.829543
7,0.004,0.809442,0.864345,0.863447,0.818645,0.835748
8,0.0012,0.830119,0.866178,0.866639,0.81845,0.837153
9,0.0002,0.844793,0.868928,0.867711,0.821372,0.839289
10,0.0001,0.858589,0.869844,0.868127,0.822066,0.839868


[I 2025-03-22 01:34:14,388] Trial 123 finished with value: 0.8386052437766613 and parameters: {'learning_rate': 0.00472357798721968, 'weight_decay': 0.001, 'warmup_steps': 4}. Best is trial 66 with value: 0.861541151855867.


Trial 124 with params: {'learning_rate': 0.0035044571017368705, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.018,0.701334,0.762603,0.664516,0.64997,0.647341
2,0.41,0.458414,0.853346,0.828295,0.809528,0.816881
3,0.1579,0.507172,0.851512,0.848493,0.79329,0.814146
4,0.0541,0.564228,0.861595,0.859353,0.806668,0.825542
5,0.0156,0.761171,0.871677,0.849331,0.82328,0.83431
6,0.0074,0.819328,0.860678,0.852254,0.808206,0.822018
7,0.0043,0.887388,0.858845,0.834582,0.808696,0.819778
8,0.003,0.822099,0.870761,0.855938,0.822813,0.836024
9,0.0004,0.86388,0.872594,0.857353,0.824146,0.837429
10,0.0002,0.885902,0.872594,0.853667,0.814713,0.829528


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 01:35:35,298] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.004632834951112755, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9991,0.666008,0.781852,0.68006,0.661364,0.664139
2,0.3664,0.471175,0.855179,0.803626,0.803654,0.801724
3,0.1302,0.637687,0.84143,0.861919,0.778111,0.804777
4,0.046,0.617715,0.871677,0.886677,0.83238,0.852827
5,0.0154,0.777554,0.856095,0.865046,0.818853,0.837309
6,0.0068,0.738438,0.87626,0.872619,0.837042,0.851257
7,0.0009,0.815176,0.87901,0.863777,0.839032,0.848904
8,0.0005,0.833313,0.878093,0.862665,0.838317,0.847968
9,0.0002,0.8535,0.87626,0.861292,0.836701,0.84644
10,0.0001,0.866628,0.878093,0.863058,0.838083,0.848119


[I 2025-03-22 01:36:48,151] Trial 125 finished with value: 0.8468090850234677 and parameters: {'learning_rate': 0.004632834951112755, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 126 with params: {'learning_rate': 0.004901441604756807, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0045,0.624012,0.791934,0.686739,0.672055,0.673852
2,0.361,0.467852,0.859762,0.867767,0.785093,0.808165
3,0.1081,0.528211,0.857012,0.819745,0.799924,0.808328
4,0.0406,0.653092,0.865261,0.84794,0.817168,0.829826
5,0.0176,0.712612,0.859762,0.810856,0.82138,0.814989
6,0.006,0.789283,0.868011,0.867528,0.821195,0.837785
7,0.0031,0.809215,0.865261,0.85806,0.828419,0.840615
8,0.0035,0.797443,0.87901,0.874197,0.829815,0.846796
9,0.0007,0.80543,0.880843,0.867874,0.84055,0.852133
10,0.0002,0.829404,0.87901,0.866503,0.839203,0.85079


[I 2025-03-22 01:38:06,894] Trial 126 finished with value: 0.8509646161138728 and parameters: {'learning_rate': 0.004901441604756807, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 127 with params: {'learning_rate': 0.0021295141332994877, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0939,0.691959,0.762603,0.674326,0.644212,0.651613
2,0.4726,0.494645,0.830431,0.702388,0.707547,0.70425
3,0.2379,0.500163,0.846929,0.837679,0.767024,0.785513
4,0.1211,0.647584,0.840513,0.845317,0.790192,0.808189
5,0.0539,0.599628,0.864345,0.841282,0.825071,0.831967
6,0.0313,0.694865,0.863428,0.869003,0.778826,0.801245
7,0.0106,0.731582,0.860678,0.821161,0.83197,0.825377
8,0.0041,0.699873,0.868928,0.860396,0.839387,0.8487
9,0.0015,0.744928,0.866178,0.857647,0.837057,0.845813
10,0.0006,0.781033,0.870761,0.862891,0.839238,0.849843


[I 2025-03-22 01:39:26,097] Trial 127 finished with value: 0.8488073782627499 and parameters: {'learning_rate': 0.0021295141332994877, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 128 with params: {'learning_rate': 0.0017414871103465861, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1342,0.717952,0.747021,0.648633,0.636581,0.635896
2,0.4916,0.567235,0.789184,0.660829,0.679123,0.665611
3,0.2632,0.490179,0.843263,0.810112,0.75243,0.768174
4,0.1447,0.508534,0.859762,0.846115,0.823273,0.831782
5,0.0612,0.587763,0.868928,0.861875,0.827809,0.842088
6,0.0306,0.62085,0.866178,0.882202,0.834151,0.853553
7,0.0119,0.697739,0.854262,0.868218,0.807622,0.829239
8,0.0084,0.652564,0.866178,0.865899,0.836968,0.849002
9,0.0024,0.712464,0.865261,0.859378,0.835053,0.845877
10,0.0008,0.743974,0.864345,0.857645,0.833547,0.844364


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-22 01:42:24,146] Trial 128 finished with value: 0.8451087307626914 and parameters: {'learning_rate': 0.0017414871103465861, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 129 with params: {'learning_rate': 0.004923602572126758, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0556,0.643786,0.778185,0.667521,0.662803,0.660957
2,0.3802,0.475752,0.856095,0.885301,0.765876,0.784787
3,0.1246,0.549541,0.857012,0.862666,0.791291,0.814891
4,0.0419,0.61332,0.868928,0.899992,0.808229,0.837328
5,0.0083,0.707054,0.857929,0.808818,0.850305,0.823092
6,0.0016,0.799282,0.864345,0.87744,0.810274,0.830863
7,0.0007,0.759317,0.868011,0.831726,0.821353,0.826032
8,0.0002,0.782207,0.870761,0.8406,0.823763,0.831115
9,0.0001,0.79609,0.870761,0.84833,0.82348,0.833938
10,0.0001,0.809121,0.870761,0.848343,0.82348,0.833958


[I 2025-03-22 01:43:19,248] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0031431061993438753, 'weight_decay': 0.004, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0505,0.669475,0.770852,0.682205,0.654086,0.660057
2,0.4255,0.465706,0.843263,0.706681,0.719653,0.713066
3,0.1661,0.55015,0.850596,0.858749,0.794251,0.816775
4,0.0682,0.579331,0.858845,0.8725,0.813239,0.833831
5,0.0211,0.631175,0.865261,0.829838,0.818746,0.823236
6,0.012,0.722517,0.868011,0.850942,0.822933,0.832951
7,0.0037,0.774031,0.865261,0.833574,0.827022,0.829946
8,0.0022,0.860968,0.868928,0.853207,0.811147,0.827232
9,0.0007,0.825444,0.871677,0.85733,0.822802,0.836803
10,0.0012,0.85844,0.869844,0.842626,0.812178,0.824556


[I 2025-03-22 01:44:07,141] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.0009814194172105438, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2787,0.834135,0.702108,0.633507,0.591982,0.603414
2,0.6272,0.57617,0.80385,0.679549,0.685173,0.681645
3,0.3798,0.568496,0.814849,0.70814,0.691719,0.694068
4,0.262,0.55,0.824015,0.788973,0.711106,0.717528
5,0.166,0.524367,0.849679,0.857516,0.786641,0.80794


[I 2025-03-22 01:44:55,515] Trial 131 pruned. 


Trial 132 with params: {'learning_rate': 0.0020548225217741365, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1017,0.699922,0.758937,0.676403,0.640217,0.649365
2,0.4799,0.496051,0.824015,0.69523,0.702473,0.698168
3,0.2468,0.482865,0.855179,0.8154,0.772236,0.785809
4,0.1198,0.654197,0.836847,0.835443,0.786833,0.802785
5,0.0539,0.563559,0.875344,0.864025,0.83476,0.847219
6,0.0251,0.620796,0.87626,0.8856,0.815615,0.839508
7,0.0076,0.764905,0.860678,0.87206,0.832236,0.847814
8,0.002,0.726947,0.864345,0.87592,0.834155,0.851112
9,0.0007,0.747019,0.868928,0.88013,0.83813,0.855224
10,0.0004,0.788631,0.867094,0.879609,0.836134,0.853943


[I 2025-03-22 01:46:21,777] Trial 132 finished with value: 0.8489456882401408 and parameters: {'learning_rate': 0.0020548225217741365, 'weight_decay': 0.001, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 133 with params: {'learning_rate': 0.0014915052280211243, 'weight_decay': 0.0, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1758,0.774728,0.722273,0.648634,0.616318,0.616306
2,0.5316,0.585831,0.787351,0.668214,0.676114,0.663795
3,0.3118,0.551199,0.813932,0.780362,0.710196,0.720036
4,0.1835,0.56074,0.846013,0.848672,0.786555,0.804399
5,0.0914,0.612407,0.858845,0.865919,0.793783,0.815102


[I 2025-03-22 01:46:47,635] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0026867318500565364, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0573,0.647468,0.773602,0.684346,0.65692,0.661443
2,0.4393,0.481809,0.846013,0.713616,0.720338,0.716562
3,0.1945,0.521112,0.856095,0.840752,0.797929,0.813784
4,0.0761,0.755216,0.832264,0.853025,0.78995,0.811784
5,0.0365,0.61825,0.866178,0.872876,0.80892,0.829699
6,0.0127,0.685805,0.870761,0.833273,0.83858,0.835102
7,0.0096,0.784927,0.865261,0.852067,0.816852,0.830762
8,0.003,0.7605,0.862511,0.848656,0.806247,0.822222
9,0.0006,0.799671,0.863428,0.848086,0.806914,0.822384
10,0.0003,0.815469,0.864345,0.848242,0.807912,0.823026


[I 2025-03-22 01:47:44,832] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.0019397477005464374, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1084,0.702174,0.758937,0.683437,0.644111,0.650841
2,0.4816,0.530181,0.824015,0.699523,0.702758,0.698837
3,0.2454,0.481483,0.856095,0.804852,0.755711,0.766805
4,0.1305,0.557692,0.851512,0.853846,0.798421,0.817129
5,0.0543,0.617112,0.867094,0.842447,0.826974,0.834123
6,0.0263,0.62276,0.864345,0.847472,0.835315,0.840733
7,0.0182,0.706481,0.862511,0.853052,0.823868,0.835295
8,0.0072,0.702694,0.872594,0.883408,0.832331,0.851998
9,0.0013,0.734173,0.865261,0.854882,0.827592,0.839151
10,0.0006,0.764527,0.867094,0.866036,0.828688,0.84374


[I 2025-03-22 01:50:18,771] Trial 135 finished with value: 0.8407972976829196 and parameters: {'learning_rate': 0.0019397477005464374, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 136 with params: {'learning_rate': 0.004955842153235152, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0045,0.626022,0.785518,0.677768,0.66695,0.666778
2,0.3544,0.468208,0.851512,0.819589,0.797869,0.806244
3,0.1025,0.60727,0.846013,0.850579,0.790521,0.812174
4,0.05,0.563841,0.864345,0.855261,0.801038,0.817739
5,0.0159,0.685935,0.870761,0.830189,0.821145,0.82531
6,0.003,0.764468,0.872594,0.839438,0.813219,0.82443
7,0.0007,0.787435,0.869844,0.839123,0.821578,0.829358
8,0.0002,0.818775,0.868011,0.830891,0.820228,0.82515
9,0.0003,0.852176,0.865261,0.84003,0.808199,0.821218
10,0.0001,0.861617,0.867094,0.841425,0.809532,0.822586


[I 2025-03-22 01:51:49,805] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.002329015392059114, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0934,0.706907,0.757104,0.688408,0.636321,0.648831
2,0.4831,0.492387,0.829514,0.702986,0.705853,0.703852
3,0.2172,0.478995,0.857012,0.820898,0.800679,0.809505
4,0.1016,0.663857,0.846013,0.868115,0.793053,0.817014
5,0.0411,0.584821,0.868011,0.852543,0.829587,0.839441
6,0.0188,0.655546,0.868011,0.875383,0.81217,0.832776
7,0.0053,0.743168,0.862511,0.843028,0.81434,0.826406
8,0.0034,0.749159,0.866178,0.868648,0.81785,0.837303
9,0.0031,0.801041,0.868011,0.882872,0.818467,0.84218
10,0.0008,0.78517,0.870761,0.858258,0.832072,0.84306


[I 2025-03-22 01:53:02,341] Trial 137 finished with value: 0.8339820527589916 and parameters: {'learning_rate': 0.002329015392059114, 'weight_decay': 0.002, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 138 with params: {'learning_rate': 0.004502342856417004, 'weight_decay': 0.001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0623,0.673269,0.773602,0.6829,0.653924,0.659768
2,0.398,0.48472,0.851512,0.878961,0.744551,0.752398
3,0.1358,0.576556,0.848763,0.841172,0.79489,0.811838
4,0.0514,0.5964,0.858845,0.832211,0.801834,0.814025
5,0.0149,0.614012,0.862511,0.835719,0.824753,0.829763


[I 2025-03-22 01:53:31,804] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0001772405333439467, 'weight_decay': 0.002, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6371,1.508029,0.394134,0.46507,0.293096,0.252128
2,1.3244,1.153273,0.578368,0.539508,0.473566,0.473408
3,1.0033,0.947529,0.651696,0.549452,0.562917,0.547369
4,0.7987,0.781497,0.732356,0.624281,0.622872,0.622577
5,0.6772,0.710945,0.757104,0.648794,0.644193,0.643536
6,0.5925,0.661824,0.774519,0.657261,0.65908,0.657065
7,0.5204,0.647078,0.779102,0.66182,0.664105,0.661917
8,0.4712,0.619438,0.784601,0.663023,0.671624,0.666339
9,0.437,0.613006,0.789184,0.665643,0.674396,0.668634
10,0.397,0.615559,0.791934,0.66302,0.677551,0.669445


[I 2025-03-22 01:54:27,686] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.003407387530064961, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0141,0.717698,0.75527,0.661118,0.641117,0.640322
2,0.419,0.468941,0.845096,0.815668,0.795066,0.80187
3,0.1576,0.520426,0.849679,0.845317,0.79376,0.813119
4,0.061,0.589122,0.856095,0.859577,0.809987,0.827454
5,0.0219,0.649835,0.860678,0.857601,0.8219,0.83698
6,0.0145,0.714161,0.857012,0.834794,0.819558,0.825954
7,0.0044,0.70651,0.867094,0.836572,0.830013,0.832243
8,0.0018,0.729646,0.868928,0.841784,0.830473,0.835263
9,0.0003,0.766161,0.867094,0.839564,0.82886,0.833592
10,0.0002,0.792965,0.866178,0.838236,0.828109,0.832572


[I 2025-03-22 01:55:17,065] Trial 140 pruned. 


Trial 141 with params: {'learning_rate': 0.0018365599054044415, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.134,0.766117,0.740605,0.663785,0.630511,0.632583
2,0.4852,0.551458,0.809349,0.68031,0.694036,0.685211
3,0.258,0.565073,0.820348,0.786589,0.734422,0.748923
4,0.1366,0.514205,0.857012,0.849044,0.82203,0.832231
5,0.0604,0.553708,0.858845,0.817487,0.821857,0.819482
6,0.0306,0.700154,0.861595,0.864911,0.82182,0.838832
7,0.0126,0.738689,0.868011,0.879994,0.819907,0.841469
8,0.0059,0.728484,0.864345,0.868564,0.833942,0.848557
9,0.0027,0.715129,0.865261,0.858589,0.835746,0.845955
10,0.0008,0.74883,0.863428,0.857777,0.833517,0.844348


[I 2025-03-22 01:56:36,727] Trial 141 finished with value: 0.8464504910012307 and parameters: {'learning_rate': 0.0018365599054044415, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 142 with params: {'learning_rate': 0.002851502219594544, 'weight_decay': 0.002, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.054,0.640211,0.780018,0.68479,0.663233,0.666425
2,0.4283,0.477487,0.850596,0.880522,0.734181,0.73687
3,0.1825,0.519127,0.856095,0.834585,0.808167,0.818295
4,0.0797,0.586906,0.849679,0.843524,0.794345,0.812394
5,0.0245,0.644361,0.862511,0.828754,0.813369,0.820069


[I 2025-03-22 01:57:13,203] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0007529221588486578, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3813,0.955954,0.64253,0.567107,0.542501,0.542348
2,0.7234,0.619792,0.780018,0.673195,0.665301,0.664759
3,0.4369,0.558941,0.808433,0.693421,0.689081,0.68742
4,0.3198,0.548508,0.813016,0.697796,0.691931,0.691267
5,0.2391,0.495575,0.849679,0.858075,0.786826,0.80814
6,0.1627,0.545875,0.846013,0.839694,0.785416,0.802697
7,0.1096,0.540269,0.852429,0.863579,0.796934,0.819296
8,0.0657,0.580872,0.848763,0.859694,0.813431,0.830016
9,0.0397,0.677098,0.847846,0.862127,0.801515,0.823513
10,0.0243,0.737401,0.84418,0.857681,0.810805,0.827287


[I 2025-03-22 01:58:16,404] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.0021310661838051382, 'weight_decay': 0.0, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0909,0.711535,0.75527,0.67344,0.644645,0.645469
2,0.4681,0.497609,0.831347,0.703961,0.708255,0.704496
3,0.2349,0.531558,0.845096,0.829179,0.773647,0.790336
4,0.1109,0.59999,0.851512,0.85097,0.807564,0.823208
5,0.049,0.589412,0.868928,0.857028,0.819675,0.834758
6,0.0227,0.777168,0.853346,0.864257,0.791932,0.812128
7,0.0141,0.744379,0.867094,0.855116,0.818306,0.832922
8,0.005,0.700958,0.872594,0.869992,0.82292,0.841093
9,0.0014,0.727367,0.867094,0.845859,0.819017,0.830485
10,0.0005,0.760114,0.864345,0.852826,0.81661,0.831463


[I 2025-03-22 01:59:06,093] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.004907074505611369, 'weight_decay': 0.0, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.005,0.633843,0.784601,0.680405,0.664998,0.667283
2,0.3642,0.482638,0.857929,0.84062,0.790054,0.808334
3,0.1097,0.523793,0.861595,0.827993,0.804095,0.814374
4,0.0451,0.584028,0.860678,0.858066,0.804928,0.823795
5,0.0099,0.663373,0.877177,0.847268,0.828333,0.836712
6,0.0013,0.812418,0.867094,0.833133,0.830399,0.831346
7,0.0003,0.834441,0.867094,0.827729,0.830057,0.828691
8,0.0001,0.848836,0.866178,0.827014,0.829342,0.828
9,0.0001,0.859471,0.867094,0.833346,0.829946,0.831478
10,0.0001,0.865374,0.866178,0.827088,0.82928,0.828094


[I 2025-03-22 02:00:00,903] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.004283355770338839, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0574,0.682323,0.76352,0.668355,0.648756,0.649446
2,0.4016,0.461503,0.851512,0.878006,0.735639,0.736698
3,0.1347,0.518084,0.854262,0.854767,0.793296,0.812941
4,0.0513,0.563614,0.871677,0.880805,0.813189,0.835797
5,0.0102,0.71848,0.868011,0.851257,0.828924,0.838782
6,0.002,0.858747,0.859762,0.870743,0.803424,0.826028
7,0.0019,0.790456,0.861595,0.860056,0.814186,0.831711
8,0.0023,0.808516,0.873511,0.839442,0.853431,0.845492
9,0.0013,0.870443,0.862511,0.876111,0.813324,0.836446
10,0.0005,0.844263,0.862511,0.876652,0.824109,0.844224


[I 2025-03-22 02:01:18,361] Trial 146 finished with value: 0.8402875714638186 and parameters: {'learning_rate': 0.004283355770338839, 'weight_decay': 0.01, 'warmup_steps': 2}. Best is trial 66 with value: 0.861541151855867.


Trial 147 with params: {'learning_rate': 0.004813025077145098, 'weight_decay': 0.002, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.009,0.615624,0.791934,0.683768,0.671833,0.671867
2,0.3526,0.450266,0.864345,0.831548,0.800169,0.811541
3,0.1069,0.561706,0.847846,0.844693,0.791533,0.811839
4,0.0416,0.729827,0.845096,0.867034,0.791194,0.815902
5,0.0128,0.69849,0.87626,0.885834,0.827543,0.848634
6,0.006,0.767062,0.867094,0.847458,0.813737,0.825177
7,0.002,0.753558,0.873511,0.876232,0.83368,0.851034
8,0.0019,0.75403,0.882676,0.869631,0.842307,0.85383
9,0.0012,0.841455,0.875344,0.864077,0.836341,0.847953
10,0.0008,0.876791,0.868928,0.841959,0.831784,0.836161


[I 2025-03-22 02:02:40,670] Trial 147 finished with value: 0.8433669109485122 and parameters: {'learning_rate': 0.004813025077145098, 'weight_decay': 0.002, 'warmup_steps': 3}. Best is trial 66 with value: 0.861541151855867.


Trial 148 with params: {'learning_rate': 0.00015256559094465783, 'weight_decay': 0.003, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6636,1.55505,0.372136,0.347339,0.274089,0.221895
2,1.401,1.240159,0.538955,0.533411,0.431206,0.417814
3,1.0949,1.010397,0.632447,0.538683,0.542177,0.527703
4,0.8792,0.847275,0.703941,0.595488,0.600024,0.59506
5,0.7427,0.753364,0.742438,0.635664,0.632024,0.630945
6,0.6488,0.697164,0.75527,0.643076,0.642462,0.641522
7,0.5735,0.676837,0.769019,0.65732,0.654,0.653377
8,0.5209,0.645463,0.779102,0.657941,0.666931,0.661627
9,0.4871,0.635496,0.784601,0.662483,0.670332,0.664511
10,0.4453,0.630282,0.776352,0.649701,0.664977,0.656785


[I 2025-03-22 02:03:31,488] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.0020702773186767874, 'weight_decay': 0.003, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0996,0.695343,0.765353,0.679235,0.646223,0.654243
2,0.4779,0.497923,0.828598,0.70132,0.705477,0.702598
3,0.2526,0.488373,0.850596,0.835506,0.787299,0.802977
4,0.1165,0.649239,0.835014,0.832742,0.785917,0.801405
5,0.0564,0.609378,0.861595,0.835926,0.823697,0.828508


[I 2025-03-22 02:03:54,486] Trial 149 pruned. 


In [25]:
print(best_trial_normal)

BestRun(run_id='66', objective=0.861541151855867, hyperparameters={'learning_rate': 0.004814585044790494, 'weight_decay': 0.001, 'warmup_steps': 4}, run_summary=None)


In [26]:
base.reset_seed()

## Prohledávání s destilací nad původním datasetem
Konfigurace jednotlivých tréninků.

In [27]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-distill-embedd_coarse_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-distill-embedd_coarse_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.


In [28]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [29]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [30]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [31]:
best_trial_distill = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill-embedd",
    n_trials=150
)

[I 2025-03-22 02:03:54,858] A new study created in memory with name: Distill-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6988,3.221653,0.460128,0.39125,0.350303,0.302444
2,2.806,2.332087,0.620532,0.571521,0.514177,0.524033
3,2.0124,1.802679,0.734189,0.622515,0.623736,0.619607
4,1.5106,1.587178,0.753437,0.651337,0.641145,0.640953
5,1.2918,1.41723,0.776352,0.655127,0.665547,0.65884


[I 2025-03-22 02:04:26,233] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 4, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8725,3.699157,0.338222,0.202809,0.24766,0.177139
2,3.611,3.370288,0.411549,0.211862,0.311712,0.219806
3,3.1569,2.920703,0.473877,0.559583,0.364131,0.317692
4,2.756,2.585321,0.612282,0.529693,0.513114,0.513335
5,2.4689,2.348526,0.63428,0.534469,0.537457,0.530051
6,2.2377,2.186656,0.657195,0.558837,0.554641,0.552109
7,2.0781,2.079774,0.670944,0.576941,0.565796,0.567819
8,1.9377,1.992825,0.690192,0.580232,0.591461,0.58518
9,1.8351,1.917396,0.702108,0.600448,0.594994,0.591399
10,1.726,1.873099,0.700275,0.581962,0.599864,0.588393


[I 2025-03-22 02:05:18,206] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9378,3.821514,0.338222,0.372012,0.247681,0.178047
2,3.7582,3.668214,0.425298,0.210663,0.321241,0.239854
3,3.6303,3.499478,0.455545,0.238471,0.340921,0.278666
4,3.3846,3.205147,0.429881,0.380577,0.324789,0.255162
5,3.1286,3.004058,0.452796,0.3904,0.345155,0.288602


[I 2025-03-22 02:05:42,898] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8493,3.678138,0.334555,0.20639,0.244925,0.174792
2,3.5387,3.246219,0.393217,0.341877,0.297974,0.202223
3,3.0291,2.781742,0.511457,0.549894,0.401144,0.373895
4,2.5954,2.427698,0.625115,0.525688,0.530027,0.52113
5,2.3048,2.235848,0.637947,0.540336,0.546398,0.535112
6,2.0788,2.046371,0.67736,0.574823,0.573938,0.572099
7,1.9035,1.93716,0.699358,0.599409,0.590373,0.592634
8,1.7578,1.848013,0.716774,0.604373,0.613054,0.60822
9,1.6545,1.775402,0.72594,0.616437,0.617356,0.613221
10,1.5449,1.744441,0.716774,0.596322,0.615539,0.603445


[I 2025-03-22 02:06:31,933] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1661,2.227231,0.628781,0.55778,0.532447,0.520692
2,1.6945,1.378628,0.783685,0.670804,0.668125,0.666576
3,1.0356,1.148007,0.813932,0.691789,0.693212,0.690635
4,0.7492,1.044798,0.826764,0.703216,0.704844,0.701634
5,0.5531,0.932074,0.846929,0.709955,0.720841,0.715033
6,0.3989,0.884499,0.851512,0.711756,0.726013,0.718061
7,0.3045,0.84173,0.858845,0.720977,0.731232,0.726012
8,0.2146,0.870417,0.859762,0.867033,0.795595,0.816788
9,0.1663,0.81464,0.858845,0.867699,0.805158,0.825296
10,0.1261,0.817198,0.859762,0.872098,0.822819,0.841586


[I 2025-03-22 02:07:24,476] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6987,1.673743,0.725023,0.621877,0.622872,0.614005
2,1.1787,1.251119,0.796517,0.66582,0.684938,0.671964
3,0.6401,0.923346,0.851512,0.720899,0.724862,0.721529
4,0.3864,0.834547,0.859762,0.726618,0.732199,0.727031
5,0.2279,0.751656,0.879927,0.878941,0.803176,0.823422
6,0.1387,0.687563,0.87901,0.88737,0.83755,0.856577
7,0.1011,0.688164,0.880843,0.888469,0.829376,0.850774
8,0.0789,0.698182,0.88176,0.881082,0.838381,0.855961
9,0.0687,0.68108,0.883593,0.890671,0.831242,0.852828
10,0.0622,0.68944,0.877177,0.886047,0.825653,0.8477


[I 2025-03-22 02:09:43,889] Trial 5 finished with value: 0.8608388369843657 and parameters: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 5 with value: 0.8608388369843657.


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1524,2.211947,0.64253,0.568258,0.545146,0.539876
2,1.6948,1.406393,0.780935,0.672808,0.664872,0.665036
3,1.033,1.13219,0.814849,0.691155,0.69443,0.691302
4,0.742,1.071543,0.828598,0.707499,0.705835,0.703586
5,0.5627,0.949209,0.84143,0.704149,0.71785,0.71053
6,0.4154,0.936065,0.849679,0.708785,0.726367,0.715395
7,0.3181,0.872467,0.853346,0.718296,0.72591,0.721808
8,0.2387,0.888932,0.863428,0.857154,0.772169,0.78797
9,0.1912,0.860512,0.855179,0.863386,0.792336,0.813201
10,0.1471,0.842615,0.858845,0.870508,0.812204,0.833039


[I 2025-03-22 02:11:17,523] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6223,1.650234,0.714024,0.61016,0.616562,0.59922
2,1.1642,1.247658,0.779102,0.655171,0.670639,0.658608
3,0.6027,0.891474,0.856095,0.721661,0.728499,0.724314
4,0.3438,0.79539,0.865261,0.892444,0.746538,0.74838
5,0.2077,0.748569,0.88176,0.88072,0.804847,0.82437
6,0.13,0.75067,0.87901,0.88702,0.838551,0.85672
7,0.0926,0.70089,0.88176,0.888438,0.84027,0.858486
8,0.0755,0.708049,0.880843,0.890799,0.828976,0.851687
9,0.0688,0.691972,0.88176,0.88949,0.840242,0.858845
10,0.0627,0.710513,0.877177,0.885247,0.826573,0.847795


[I 2025-03-22 02:12:35,965] Trial 7 finished with value: 0.8569084742993941 and parameters: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 5 with value: 0.8608388369843657.


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8712,3.71037,0.340972,0.191092,0.249728,0.177276
2,3.6486,3.473649,0.416132,0.212919,0.315143,0.223636
3,3.2789,3.060123,0.452796,0.387365,0.343457,0.288688
4,2.9229,2.75993,0.529789,0.531623,0.422088,0.409862
5,2.6574,2.5241,0.609533,0.537407,0.511004,0.505298
6,2.4195,2.356755,0.629698,0.542243,0.528323,0.524639
7,2.2705,2.23722,0.648946,0.557221,0.545438,0.546351
8,2.1451,2.152694,0.655362,0.552514,0.558639,0.554511
9,2.051,2.083812,0.665445,0.567305,0.56324,0.559225
10,1.9481,2.036369,0.670027,0.559333,0.57237,0.561795


[I 2025-03-22 02:13:26,173] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0277,1.972834,0.678277,0.587042,0.58017,0.579054
2,1.486,1.288295,0.797434,0.678171,0.679081,0.67711
3,0.8947,1.065546,0.831347,0.703631,0.708948,0.704224
4,0.6053,0.958291,0.846013,0.715216,0.719315,0.715759
5,0.4261,0.843714,0.857012,0.718097,0.729174,0.723516
6,0.2961,0.836195,0.858845,0.805929,0.740099,0.742473
7,0.209,0.796636,0.872594,0.879971,0.805601,0.827536
8,0.1539,0.815008,0.875344,0.873758,0.835863,0.850169
9,0.1242,0.76159,0.866178,0.877346,0.827403,0.846508
10,0.0922,0.769061,0.865261,0.876622,0.826386,0.845603


[I 2025-03-22 02:14:46,141] Trial 9 finished with value: 0.8485700985127792 and parameters: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 5 with value: 0.8608388369843657.


Trial 10 with params: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3734,1.464367,0.771769,0.673845,0.652413,0.657188
2,0.9061,1.093464,0.823098,0.692733,0.708274,0.692904
3,0.4125,0.789556,0.867094,0.733055,0.738474,0.734398
4,0.1902,0.768212,0.872594,0.884386,0.814946,0.838332
5,0.1171,0.705125,0.87626,0.885398,0.827687,0.848432
6,0.0852,0.725239,0.872594,0.884872,0.823546,0.845623
7,0.074,0.690918,0.88176,0.888612,0.832745,0.852643
8,0.0656,0.666121,0.875344,0.87088,0.826317,0.843341
9,0.0589,0.656366,0.890009,0.897177,0.837863,0.859296
10,0.0526,0.664004,0.888176,0.893786,0.837325,0.857496


[I 2025-03-22 02:16:06,986] Trial 10 finished with value: 0.8555004989289946 and parameters: {'learning_rate': 0.003553256925699131, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 5 with value: 0.8608388369843657.


Trial 11 with params: {'learning_rate': 0.0023774407201803105, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5692,1.547465,0.743355,0.638071,0.635175,0.632294
2,1.1238,1.096355,0.817599,0.685816,0.698844,0.690833
3,0.5513,0.820796,0.872594,0.732892,0.742419,0.737235
4,0.2938,0.838949,0.869844,0.894669,0.788216,0.808018
5,0.1839,0.707261,0.887259,0.871007,0.836447,0.849861
6,0.1214,0.66058,0.894592,0.900761,0.841405,0.862854
7,0.0849,0.673311,0.887259,0.870019,0.834924,0.849175
8,0.0773,0.656639,0.888176,0.882739,0.835316,0.853728
9,0.0678,0.660517,0.885426,0.894069,0.842822,0.862527
10,0.062,0.658525,0.890009,0.896615,0.846576,0.865721


[I 2025-03-22 02:17:42,283] Trial 11 finished with value: 0.8625752755372232 and parameters: {'learning_rate': 0.0023774407201803105, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 11 with value: 0.8625752755372232.


Trial 12 with params: {'learning_rate': 0.001636915501474202, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7397,1.717695,0.719523,0.61824,0.617892,0.609372
2,1.2379,1.279347,0.783685,0.658476,0.673243,0.662827
3,0.6836,0.950322,0.850596,0.716302,0.724134,0.71889
4,0.4255,0.837346,0.863428,0.727669,0.734355,0.729292
5,0.2612,0.767496,0.870761,0.871899,0.796242,0.816465
6,0.167,0.713982,0.880843,0.889174,0.829253,0.850999
7,0.1101,0.693964,0.880843,0.88958,0.839512,0.858662
8,0.0844,0.688821,0.88451,0.892833,0.841666,0.861399
9,0.0703,0.690548,0.879927,0.888406,0.838962,0.857641
10,0.0637,0.696922,0.882676,0.890226,0.84095,0.859579


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-22 02:19:23,129] Trial 12 finished with value: 0.860385749846457 and parameters: {'learning_rate': 0.001636915501474202, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 11 with value: 0.8625752755372232.


Trial 13 with params: {'learning_rate': 0.0032103437513091603, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4838,1.426345,0.764436,0.665755,0.652079,0.652989
2,0.9425,0.950001,0.83593,0.705515,0.712861,0.707369
3,0.4149,0.8452,0.863428,0.735075,0.732902,0.73181
4,0.2176,0.831899,0.868928,0.886094,0.811958,0.836676
5,0.1301,0.745155,0.872594,0.88468,0.825095,0.845984
6,0.0951,0.695014,0.88176,0.891798,0.830819,0.85301
7,0.0721,0.669688,0.88176,0.891864,0.830657,0.853018
8,0.0665,0.66394,0.887259,0.892683,0.827498,0.849256
9,0.0604,0.655198,0.886343,0.893937,0.835567,0.856629
10,0.056,0.66705,0.883593,0.890366,0.833415,0.853682


[I 2025-03-22 02:20:41,701] Trial 13 finished with value: 0.8553933612656327 and parameters: {'learning_rate': 0.0032103437513091603, 'weight_decay': 0.007, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}. Best is trial 11 with value: 0.8625752755372232.


Trial 14 with params: {'learning_rate': 0.0015135386774138247, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8277,1.718947,0.725023,0.628026,0.617499,0.617823
2,1.2803,1.224623,0.793767,0.667967,0.67955,0.671453
3,0.7433,0.975713,0.836847,0.704479,0.714574,0.707433
4,0.4602,0.900477,0.851512,0.71882,0.725953,0.720367
5,0.2873,0.804528,0.867094,0.870286,0.792288,0.81335
6,0.1835,0.763251,0.875344,0.859153,0.825778,0.839023
7,0.1187,0.735309,0.874427,0.873197,0.833281,0.849493
8,0.0881,0.749513,0.87901,0.890163,0.837339,0.857532
9,0.0784,0.709299,0.878093,0.874737,0.836989,0.852147
10,0.0697,0.721848,0.878093,0.875297,0.83715,0.852648


[I 2025-03-22 02:21:59,686] Trial 14 finished with value: 0.8488575060879406 and parameters: {'learning_rate': 0.0015135386774138247, 'weight_decay': 0.001, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 11 with value: 0.8625752755372232.


Trial 15 with params: {'learning_rate': 0.0031938729076120406, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.265,1.430944,0.768103,0.664828,0.648611,0.651112
2,0.8869,0.89624,0.856095,0.722438,0.728624,0.72356
3,0.3915,0.792033,0.867094,0.731285,0.737375,0.733532
4,0.2211,0.853793,0.868928,0.867263,0.811637,0.831123
5,0.137,0.672791,0.889093,0.895352,0.82719,0.850327
6,0.0981,0.692257,0.882676,0.890791,0.821365,0.844843
7,0.077,0.673083,0.893676,0.899826,0.831189,0.854551
8,0.0667,0.669465,0.885426,0.891117,0.825689,0.847554
9,0.0632,0.662958,0.88451,0.890665,0.824471,0.846759
10,0.0581,0.670263,0.887259,0.894871,0.835341,0.856801


[I 2025-03-22 02:23:22,595] Trial 15 finished with value: 0.8607446145643812 and parameters: {'learning_rate': 0.0031938729076120406, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 11 with value: 0.8625752755372232.


Trial 16 with params: {'learning_rate': 0.0006411740416939972, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2928,2.49483,0.580202,0.576737,0.473345,0.479248
2,1.9475,1.574939,0.739688,0.631346,0.630784,0.629094
3,1.1936,1.261827,0.797434,0.677271,0.679314,0.675678
4,0.8935,1.153613,0.818515,0.69867,0.697823,0.695687
5,0.719,1.052153,0.831347,0.694198,0.711423,0.701338
6,0.5505,0.971546,0.847846,0.706567,0.725016,0.714811
7,0.4316,0.911292,0.850596,0.713425,0.724406,0.718654
8,0.3392,0.909362,0.854262,0.714633,0.728561,0.720694
9,0.288,0.918346,0.845096,0.708825,0.721644,0.713081
10,0.2319,0.890717,0.835014,0.824584,0.740951,0.751913


[I 2025-03-22 02:24:13,581] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0008797446343538097, 'weight_decay': 0.005, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1586,2.216119,0.63703,0.570459,0.537369,0.53392
2,1.6629,1.387352,0.784601,0.676791,0.667871,0.667987
3,1.0039,1.131575,0.814849,0.695888,0.694136,0.69179
4,0.7061,1.043918,0.827681,0.707124,0.705674,0.703166
5,0.5341,0.923385,0.850596,0.712066,0.724571,0.718216


[I 2025-03-22 02:24:38,804] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.004952142492162866, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3412,1.36684,0.7956,0.691102,0.673359,0.677391
2,0.7975,0.874667,0.850596,0.711343,0.72624,0.717103
3,0.2992,0.843035,0.863428,0.855052,0.777637,0.799853
4,0.1553,0.738383,0.875344,0.870592,0.81622,0.83596
5,0.1036,0.691519,0.887259,0.881493,0.826966,0.846831
6,0.0825,0.726478,0.879927,0.876579,0.820864,0.841236
7,0.0703,0.668586,0.886343,0.8784,0.826375,0.845168
8,0.0612,0.666484,0.886343,0.881132,0.826067,0.846229
9,0.0565,0.651195,0.887259,0.877335,0.827718,0.845198
10,0.0532,0.648123,0.890009,0.882225,0.829484,0.848634


[I 2025-03-22 02:25:29,366] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.002167427805517391, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6034,1.639339,0.72044,0.620767,0.621817,0.60546
2,1.1573,1.191325,0.802933,0.671906,0.689208,0.677479
3,0.5923,0.881435,0.859762,0.724709,0.731197,0.727212
4,0.3305,0.806625,0.868928,0.89526,0.757805,0.76735
5,0.1928,0.726057,0.882676,0.889267,0.822703,0.844749
6,0.1275,0.736155,0.88451,0.892339,0.842347,0.861063
7,0.0914,0.708835,0.878093,0.881951,0.819997,0.840099
8,0.0778,0.705752,0.878093,0.888377,0.827027,0.849098
9,0.0671,0.700412,0.88176,0.887603,0.83162,0.851215
10,0.0613,0.69482,0.88176,0.890114,0.840104,0.859331


[I 2025-03-22 02:27:39,368] Trial 19 finished with value: 0.8580838354270967 and parameters: {'learning_rate': 0.002167427805517391, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 11 with value: 0.8625752755372232.


Trial 20 with params: {'learning_rate': 0.0005440056024291415, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3754,2.70499,0.535289,0.58043,0.424565,0.417409
2,2.174,1.708848,0.728689,0.632233,0.623779,0.624861
3,1.3407,1.329837,0.782768,0.657229,0.67046,0.662181
4,0.9962,1.225682,0.80385,0.690869,0.686005,0.685286
5,0.8145,1.126517,0.817599,0.680956,0.700495,0.688939


[I 2025-03-22 02:28:04,482] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.001005011562263371, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0518,2.030556,0.666361,0.566937,0.572044,0.563515
2,1.5273,1.354847,0.783685,0.661801,0.670747,0.663503
3,0.9223,1.081465,0.826764,0.701697,0.704966,0.700649
4,0.6305,0.958481,0.847846,0.715696,0.721871,0.717532
5,0.4522,0.876979,0.852429,0.715062,0.724707,0.719674
6,0.3183,0.830389,0.853346,0.715359,0.726387,0.719898
7,0.2324,0.797061,0.870761,0.872707,0.79582,0.816688
8,0.1677,0.835679,0.867094,0.876635,0.810329,0.832014
9,0.1317,0.79702,0.867094,0.877476,0.819247,0.84
10,0.1017,0.803964,0.868011,0.868117,0.827956,0.844314


[I 2025-03-22 02:28:57,324] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 0.0022733334044757305, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5838,1.579486,0.726856,0.619752,0.625895,0.614234
2,1.1394,1.103269,0.810266,0.680519,0.693584,0.685202
3,0.5718,0.850084,0.863428,0.726336,0.734191,0.729745
4,0.3157,0.798197,0.868928,0.89561,0.76721,0.782487
5,0.1874,0.711522,0.888176,0.880535,0.827753,0.84598
6,0.1153,0.665412,0.891842,0.88377,0.839224,0.856239
7,0.0832,0.69094,0.887259,0.869922,0.835859,0.849684
8,0.0732,0.691659,0.88451,0.893647,0.832175,0.854368
9,0.0658,0.660913,0.883593,0.891112,0.832765,0.853628
10,0.0612,0.668466,0.886343,0.895075,0.842904,0.863128


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 02:30:30,092] Trial 22 finished with value: 0.8630326516335852 and parameters: {'learning_rate': 0.0022733334044757305, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 22 with value: 0.8630326516335852.


Trial 23 with params: {'learning_rate': 0.0031508745063568025, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2804,1.468299,0.75802,0.661891,0.642148,0.643801
2,0.9132,0.894234,0.853346,0.71766,0.726297,0.720158
3,0.3996,0.849129,0.857929,0.72899,0.73179,0.728588
4,0.2252,0.903826,0.861595,0.878629,0.804777,0.828628
5,0.1414,0.682037,0.891842,0.899634,0.829909,0.85367
6,0.0957,0.695226,0.886343,0.891875,0.826034,0.848127
7,0.0764,0.67569,0.87901,0.888815,0.838388,0.85763
8,0.0704,0.674686,0.886343,0.894197,0.826025,0.849016
9,0.0645,0.674442,0.880843,0.889071,0.831218,0.85205
10,0.0592,0.669109,0.889093,0.895534,0.837405,0.858391


[I 2025-03-22 02:32:08,537] Trial 23 finished with value: 0.8612019906060957 and parameters: {'learning_rate': 0.0031508745063568025, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 22 with value: 0.8630326516335852.


Trial 24 with params: {'learning_rate': 0.0013510897940405065, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8119,1.76243,0.710357,0.605152,0.608941,0.604291
2,1.3198,1.230661,0.7956,0.67132,0.679943,0.674676
3,0.7629,1.032747,0.835014,0.709104,0.710423,0.707607
4,0.5032,0.870335,0.857012,0.72079,0.729337,0.723896
5,0.3236,0.810369,0.861595,0.805406,0.742027,0.744568
6,0.2104,0.796619,0.865261,0.857296,0.808708,0.825706
7,0.1421,0.743725,0.873511,0.882834,0.833528,0.85204
8,0.1071,0.752863,0.875344,0.885129,0.834584,0.85392
9,0.0863,0.73975,0.87626,0.884133,0.835297,0.853769
10,0.0754,0.738234,0.874427,0.882331,0.834293,0.852479


[I 2025-03-22 02:33:03,882] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.003171390986152328, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4658,1.50082,0.762603,0.679909,0.647566,0.649135
2,0.927,1.022721,0.825848,0.695767,0.707746,0.697799
3,0.4135,0.851353,0.856095,0.726394,0.727032,0.725109
4,0.2306,0.790713,0.872594,0.848172,0.796397,0.813473
5,0.1229,0.712875,0.87626,0.881723,0.809794,0.831783
6,0.0971,0.746879,0.871677,0.883838,0.813728,0.837272
7,0.0881,0.712253,0.87901,0.891079,0.817896,0.843166
8,0.0742,0.650153,0.88451,0.882544,0.833588,0.852418
9,0.0643,0.672525,0.88451,0.894053,0.833476,0.855458
10,0.0604,0.668087,0.879927,0.889742,0.829225,0.851333


[I 2025-03-22 02:34:54,169] Trial 25 finished with value: 0.8547830291422258 and parameters: {'learning_rate': 0.003171390986152328, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 2.0}. Best is trial 22 with value: 0.8630326516335852.


Trial 26 with params: {'learning_rate': 0.002815309914473619, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3782,1.483467,0.753437,0.6461,0.641725,0.638694
2,0.9389,0.948085,0.839597,0.708521,0.715453,0.709922
3,0.4514,0.832146,0.868928,0.736294,0.739067,0.736306
4,0.2472,0.852349,0.866178,0.868552,0.793714,0.812753
5,0.1446,0.731629,0.873511,0.86056,0.824499,0.838922
6,0.1089,0.710298,0.87901,0.875231,0.828998,0.846449
7,0.0807,0.705588,0.875344,0.86899,0.827628,0.842972
8,0.0697,0.674019,0.88176,0.868242,0.830386,0.845961
9,0.0625,0.673585,0.888176,0.894586,0.82714,0.849998
10,0.0591,0.67248,0.887259,0.895084,0.834845,0.85682


[I 2025-03-22 02:37:14,783] Trial 26 finished with value: 0.8574656682355983 and parameters: {'learning_rate': 0.002815309914473619, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 22 with value: 0.8630326516335852.


Trial 27 with params: {'learning_rate': 0.0024764213303794967, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5169,1.555685,0.743355,0.64363,0.635348,0.632882
2,1.0812,1.134208,0.816682,0.688935,0.696574,0.690863
3,0.5218,0.839018,0.862511,0.72822,0.733112,0.729816
4,0.2718,0.846344,0.859762,0.849164,0.787202,0.805197
5,0.1638,0.730455,0.885426,0.860465,0.835273,0.845391
6,0.106,0.699494,0.887259,0.871343,0.834768,0.849776
7,0.0795,0.692982,0.883593,0.869684,0.841326,0.853322
8,0.0718,0.684435,0.889093,0.882825,0.837072,0.854656
9,0.064,0.678728,0.885426,0.891872,0.834204,0.854831
10,0.0594,0.681719,0.885426,0.879199,0.834094,0.851397


[I 2025-03-22 02:38:54,879] Trial 27 finished with value: 0.851601707326064 and parameters: {'learning_rate': 0.0024764213303794967, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 2.0}. Best is trial 22 with value: 0.8630326516335852.


Trial 28 with params: {'learning_rate': 0.0013636035539882107, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8111,1.767078,0.712191,0.607957,0.609411,0.605808
2,1.3194,1.23067,0.794684,0.67005,0.679536,0.673532
3,0.7746,1.030781,0.835014,0.707263,0.712366,0.705773
4,0.5104,0.927425,0.854262,0.71948,0.728224,0.722057
5,0.3408,0.823669,0.857012,0.719702,0.729041,0.724198
6,0.22,0.769396,0.864345,0.853984,0.797886,0.81642
7,0.1484,0.715907,0.879927,0.887334,0.838678,0.857215
8,0.1104,0.717143,0.877177,0.885229,0.836643,0.855114
9,0.0884,0.727933,0.873511,0.88238,0.833291,0.852021
10,0.0764,0.718378,0.875344,0.883844,0.834937,0.853638


[I 2025-03-22 02:40:47,939] Trial 28 finished with value: 0.8519317127051503 and parameters: {'learning_rate': 0.0013636035539882107, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 22 with value: 0.8630326516335852.


Trial 29 with params: {'learning_rate': 0.0049353391972831355, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3957,1.455031,0.758937,0.648983,0.649785,0.640433
2,0.8245,0.900358,0.853346,0.719784,0.727467,0.72134
3,0.3393,0.783283,0.869844,0.869106,0.813939,0.833041
4,0.174,0.68462,0.88451,0.892556,0.825182,0.847978
5,0.1072,0.628007,0.893676,0.88738,0.842412,0.859646
6,0.084,0.664121,0.885426,0.895257,0.834695,0.85678
7,0.0696,0.625676,0.890926,0.898289,0.839698,0.86065
8,0.0607,0.628062,0.890926,0.900665,0.839196,0.861666
9,0.0564,0.637688,0.888176,0.896151,0.837582,0.858782
10,0.0533,0.634595,0.891842,0.89926,0.840659,0.861882


[I 2025-03-22 02:42:38,165] Trial 29 finished with value: 0.8619714912108737 and parameters: {'learning_rate': 0.0049353391972831355, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 2.5}. Best is trial 22 with value: 0.8630326516335852.


Trial 30 with params: {'learning_rate': 0.0030784887015454883, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4726,1.527781,0.758937,0.68068,0.644678,0.648117
2,0.9576,0.996485,0.83868,0.709463,0.71459,0.710583
3,0.4367,0.866826,0.858845,0.730321,0.728904,0.727918
4,0.2199,0.800227,0.871677,0.858539,0.797651,0.815557
5,0.1322,0.697008,0.882676,0.89105,0.822928,0.846094
6,0.0945,0.688083,0.880843,0.888996,0.821919,0.844616
7,0.0786,0.693187,0.877177,0.887419,0.827462,0.849097
8,0.0692,0.667043,0.885426,0.892741,0.834992,0.855766
9,0.0632,0.662943,0.883593,0.880741,0.832668,0.851324
10,0.0599,0.668053,0.88451,0.891578,0.834597,0.854998


[I 2025-03-22 02:44:00,740] Trial 30 finished with value: 0.8560396618563372 and parameters: {'learning_rate': 0.0030784887015454883, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 22 with value: 0.8630326516335852.


Trial 31 with params: {'learning_rate': 0.0043697609962162175, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2984,1.330435,0.791017,0.6686,0.675661,0.670598
2,0.8499,0.87943,0.852429,0.720929,0.7246,0.721547
3,0.3369,0.770562,0.869844,0.879907,0.811969,0.834716
4,0.1677,0.737083,0.880843,0.894795,0.839781,0.860929
5,0.1075,0.682406,0.890926,0.899407,0.838757,0.860734
6,0.088,0.684594,0.885426,0.897284,0.843645,0.864477
7,0.0691,0.67079,0.883593,0.893647,0.842515,0.862247
8,0.0625,0.657054,0.885426,0.89335,0.835393,0.856317
9,0.0571,0.663978,0.88451,0.892739,0.83423,0.855362
10,0.0529,0.661287,0.886343,0.894847,0.845111,0.864184


[I 2025-03-22 02:45:26,429] Trial 31 finished with value: 0.8642290841521022 and parameters: {'learning_rate': 0.0043697609962162175, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 31 with value: 0.8642290841521022.


Trial 32 with params: {'learning_rate': 0.0036368284624855854, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2174,1.371135,0.775435,0.658509,0.663863,0.656105
2,0.8448,0.857736,0.853346,0.715285,0.727565,0.720711
3,0.3644,0.845456,0.864345,0.877287,0.78848,0.812708
4,0.1868,0.82914,0.865261,0.876055,0.800503,0.823718
5,0.1151,0.688002,0.87901,0.888007,0.829865,0.850836
6,0.0819,0.653967,0.88451,0.890259,0.825383,0.847054
7,0.0705,0.651305,0.882676,0.892283,0.832374,0.85414
8,0.0618,0.651053,0.88451,0.892296,0.834452,0.855304
9,0.0569,0.658005,0.882676,0.890932,0.832076,0.853413
10,0.0531,0.654924,0.883593,0.891015,0.833305,0.85409


[I 2025-03-22 02:46:41,145] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.003973341576228142, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2439,1.389707,0.769936,0.656889,0.659984,0.650999
2,0.8648,0.941831,0.847846,0.716864,0.720465,0.717064
3,0.3693,0.787205,0.861595,0.825446,0.760266,0.773013
4,0.1782,0.742213,0.877177,0.889827,0.826878,0.849858
5,0.1057,0.684976,0.883593,0.893626,0.842258,0.862096
6,0.0912,0.652077,0.88451,0.880219,0.834793,0.852241
7,0.0742,0.667036,0.883593,0.880016,0.832762,0.850934
8,0.0642,0.640203,0.88451,0.892934,0.833982,0.855325
9,0.0573,0.651016,0.885426,0.894332,0.834951,0.856529
10,0.0539,0.654211,0.88451,0.894094,0.833661,0.855728


[I 2025-03-22 02:48:18,179] Trial 33 finished with value: 0.8542590453582405 and parameters: {'learning_rate': 0.003973341576228142, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 31 with value: 0.8642290841521022.


Trial 34 with params: {'learning_rate': 0.004850989927224065, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4124,1.408189,0.778185,0.669314,0.662691,0.660258
2,0.8428,0.906573,0.859762,0.724246,0.732306,0.726725
3,0.3545,0.737613,0.87901,0.887899,0.820594,0.843084
4,0.1721,0.724611,0.87901,0.889744,0.818989,0.843162
5,0.1058,0.67631,0.894592,0.899419,0.834333,0.855898
6,0.0824,0.677778,0.886343,0.897128,0.83632,0.858112
7,0.0691,0.67282,0.885426,0.894086,0.835734,0.85684
8,0.0645,0.650995,0.886343,0.895526,0.836911,0.858156
9,0.0571,0.643669,0.887259,0.897025,0.8373,0.859012
10,0.0527,0.659496,0.882676,0.893086,0.843193,0.862353


[I 2025-03-22 02:49:34,208] Trial 34 finished with value: 0.857751429442411 and parameters: {'learning_rate': 0.004850989927224065, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 31 with value: 0.8642290841521022.


Trial 35 with params: {'learning_rate': 5.817102176211476e-05, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9255,3.80195,0.342805,0.192424,0.251088,0.178766
2,3.7431,3.653712,0.413382,0.210538,0.313293,0.225674
3,3.6024,3.450076,0.456462,0.23377,0.342558,0.277265
4,3.3211,3.14754,0.441797,0.385289,0.333698,0.267494
5,3.0795,2.95502,0.463795,0.560858,0.355766,0.30626
6,2.8741,2.797954,0.530706,0.557158,0.423724,0.414359
7,2.7381,2.674913,0.555454,0.54482,0.450095,0.446882
8,2.6222,2.570318,0.614115,0.534596,0.514933,0.5107
9,2.5299,2.495337,0.608616,0.532434,0.508028,0.506438
10,2.4477,2.443588,0.617782,0.525836,0.521529,0.512383


[I 2025-03-22 02:50:28,495] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.004799657735232045, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3533,1.422039,0.769019,0.657838,0.659227,0.648224
2,0.8095,0.891188,0.868011,0.731903,0.739755,0.733293
3,0.3256,0.735061,0.880843,0.878383,0.820905,0.841959
4,0.1571,0.670486,0.885426,0.896468,0.835497,0.857559
5,0.1065,0.680127,0.890009,0.899926,0.829398,0.853305
6,0.0847,0.679839,0.885426,0.89442,0.824899,0.848606
7,0.0681,0.708034,0.88176,0.893674,0.820634,0.845794
8,0.0629,0.662783,0.890009,0.899488,0.838583,0.860719
9,0.0587,0.664788,0.887259,0.896634,0.836701,0.858458
10,0.0534,0.654988,0.88451,0.896301,0.833456,0.856579


[I 2025-03-22 02:51:45,950] Trial 36 finished with value: 0.8604163981584323 and parameters: {'learning_rate': 0.004799657735232045, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 31 with value: 0.8642290841521022.


Trial 37 with params: {'learning_rate': 0.001567425272089884, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.73,1.774483,0.711274,0.620947,0.60839,0.605714
2,1.3091,1.19212,0.8011,0.676884,0.684001,0.679515
3,0.7109,0.97484,0.84143,0.710703,0.717425,0.712928
4,0.4491,0.865257,0.857929,0.725052,0.730217,0.725658
5,0.2838,0.78767,0.868011,0.867918,0.783292,0.803939
6,0.1806,0.747224,0.873511,0.884385,0.83203,0.852366
7,0.1237,0.711772,0.880843,0.890944,0.838987,0.858563
8,0.0942,0.702552,0.872594,0.884714,0.831332,0.851964
9,0.0812,0.703798,0.875344,0.884937,0.833919,0.853526
10,0.0724,0.698547,0.880843,0.888125,0.839089,0.857821


[I 2025-03-22 02:53:13,351] Trial 37 finished with value: 0.8529197286707585 and parameters: {'learning_rate': 0.001567425272089884, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 31 with value: 0.8642290841521022.


Trial 38 with params: {'learning_rate': 0.00015181932061058664, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8011,3.607273,0.35747,0.363096,0.26273,0.206815
2,3.3384,2.980213,0.472044,0.551067,0.363214,0.31969
3,2.7439,2.483892,0.622365,0.546411,0.525269,0.518642
4,2.2885,2.140129,0.664528,0.558016,0.564403,0.556501
5,1.9818,1.922677,0.699358,0.588729,0.599782,0.591646
6,1.7308,1.744816,0.737855,0.621945,0.629117,0.625041
7,1.5459,1.667096,0.744271,0.638158,0.630772,0.631482
8,1.404,1.582859,0.75802,0.6418,0.650256,0.644716
9,1.3247,1.527933,0.758937,0.644098,0.649937,0.644847
10,1.2254,1.500571,0.771769,0.646001,0.661594,0.653293


[I 2025-03-22 02:54:25,392] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.001395039612162253, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8492,1.768173,0.716774,0.6257,0.610929,0.61272
2,1.3321,1.255963,0.793767,0.670514,0.678781,0.672823
3,0.7839,0.993746,0.836847,0.707405,0.714202,0.70876
4,0.4945,0.876665,0.859762,0.724131,0.732251,0.726621
5,0.3158,0.789668,0.868928,0.863863,0.775247,0.793678


[I 2025-03-22 02:54:56,500] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.00012124257132049206, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8311,3.664997,0.332722,0.206254,0.243565,0.173431
2,3.5024,3.19515,0.405133,0.356494,0.306851,0.220123
3,2.968,2.721064,0.532539,0.556878,0.423221,0.412321
4,2.5362,2.378286,0.626948,0.526489,0.531965,0.52197
5,2.2575,2.178259,0.653529,0.548873,0.560146,0.549875
6,2.0245,2.001854,0.683776,0.578998,0.580629,0.577286
7,1.848,1.894622,0.707608,0.606525,0.599551,0.601141
8,1.703,1.803766,0.724106,0.609244,0.619417,0.613951
9,1.6018,1.731654,0.731439,0.620876,0.623606,0.619103
10,1.4927,1.703075,0.727773,0.606555,0.625541,0.614164


[I 2025-03-22 02:56:26,345] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.00301755257183799, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5269,1.514581,0.761687,0.67041,0.646186,0.651192
2,1.0099,0.981158,0.839597,0.710396,0.71389,0.711036
3,0.4542,0.835517,0.868011,0.735674,0.737014,0.735245
4,0.2262,0.829012,0.872594,0.868962,0.80632,0.827133
5,0.1339,0.698656,0.883593,0.887716,0.814486,0.837135


[I 2025-03-22 02:56:52,608] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0025834342779926336, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4809,1.569546,0.75802,0.673937,0.639847,0.648457
2,1.0287,1.080354,0.814849,0.689828,0.696289,0.687433
3,0.4972,0.924702,0.847846,0.724858,0.721068,0.720388
4,0.276,0.824534,0.865261,0.896467,0.77205,0.793651
5,0.1636,0.706011,0.882676,0.875252,0.822905,0.841857
6,0.1053,0.7199,0.880843,0.890977,0.829727,0.852103
7,0.0796,0.70721,0.87901,0.890178,0.838289,0.858053
8,0.0717,0.68135,0.882676,0.894043,0.830682,0.853943
9,0.0648,0.674705,0.889093,0.89534,0.83801,0.858569
10,0.0606,0.679455,0.885426,0.893284,0.834302,0.855728


[I 2025-03-22 02:58:21,211] Trial 42 finished with value: 0.8564475489872047 and parameters: {'learning_rate': 0.0025834342779926336, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 31 with value: 0.8642290841521022.


Trial 43 with params: {'learning_rate': 0.0005267268280578099, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.4631,2.777256,0.52154,0.582669,0.413087,0.40461
2,2.2161,1.730382,0.736022,0.629445,0.63071,0.628475
3,1.3497,1.36137,0.780018,0.663372,0.664808,0.660656
4,1.0102,1.212948,0.802933,0.686924,0.685298,0.683638
5,0.8209,1.121256,0.819432,0.681974,0.701821,0.690364


[I 2025-03-22 02:58:49,864] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.003931986652895382, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2425,1.403219,0.764436,0.6531,0.655339,0.6459
2,0.8658,0.901509,0.848763,0.71644,0.722369,0.71845
3,0.3624,0.79254,0.866178,0.859934,0.799372,0.819792
4,0.1813,0.791554,0.875344,0.888486,0.816452,0.84081
5,0.1095,0.698849,0.882676,0.890289,0.8243,0.846408
6,0.0816,0.677713,0.890009,0.882993,0.82884,0.848597
7,0.0693,0.675867,0.883593,0.876183,0.82431,0.842997
8,0.0618,0.667503,0.878093,0.87002,0.820561,0.83814
9,0.0576,0.664331,0.88176,0.874919,0.823365,0.841906
10,0.0553,0.687861,0.879927,0.873838,0.821573,0.840357


[I 2025-03-22 02:59:46,950] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0042471477241832626, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2795,1.291391,0.7956,0.670681,0.680431,0.675082
2,0.8317,0.889232,0.856095,0.72054,0.729584,0.723888
3,0.3391,0.794417,0.872594,0.885295,0.815293,0.838637
4,0.1634,0.733813,0.880843,0.891335,0.821584,0.845276
5,0.1033,0.714397,0.879927,0.893638,0.840428,0.86073
6,0.0879,0.700369,0.879927,0.889561,0.840878,0.858918
7,0.0732,0.708288,0.882676,0.869183,0.843428,0.853137
8,0.0661,0.681859,0.879927,0.890445,0.830249,0.852069
9,0.0586,0.66437,0.882676,0.891439,0.832935,0.854143
10,0.0547,0.657974,0.88451,0.891012,0.835656,0.855137


[I 2025-03-22 03:01:09,427] Trial 45 finished with value: 0.8570838496749144 and parameters: {'learning_rate': 0.0042471477241832626, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.5}. Best is trial 31 with value: 0.8642290841521022.


Trial 46 with params: {'learning_rate': 0.00035209578167894637, 'weight_decay': 0.01, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6106,3.226864,0.430797,0.36407,0.33092,0.289802
2,2.677,2.142368,0.665445,0.580936,0.561993,0.567825
3,1.7856,1.5928,0.751604,0.634448,0.641807,0.635999
4,1.3001,1.420439,0.773602,0.661622,0.661294,0.658747
5,1.0948,1.291316,0.789184,0.658335,0.67677,0.666484
6,0.9272,1.232991,0.79835,0.665743,0.683559,0.67357
7,0.7816,1.144441,0.810266,0.683389,0.690349,0.686571
8,0.6855,1.143966,0.816682,0.686532,0.699083,0.690406
9,0.6208,1.121544,0.817599,0.689274,0.698536,0.690112
10,0.5492,1.112395,0.823098,0.688168,0.703451,0.694905


[I 2025-03-22 03:01:58,596] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 0.0011948185300705968, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8922,1.857296,0.706691,0.596686,0.606329,0.600721
2,1.3953,1.252229,0.797434,0.670492,0.681741,0.674969
3,0.8238,1.033455,0.83868,0.708294,0.714673,0.709552
4,0.5523,0.926991,0.854262,0.718338,0.726995,0.721154
5,0.3828,0.826988,0.858845,0.721596,0.729271,0.725236


[I 2025-03-22 03:02:25,485] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0027511979602444763, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3834,1.487798,0.756187,0.654365,0.641573,0.641979
2,0.955,1.012423,0.827681,0.701896,0.706647,0.70002
3,0.4864,0.861985,0.863428,0.730315,0.73386,0.729571
4,0.2475,0.829209,0.865261,0.857046,0.800756,0.818872
5,0.146,0.722193,0.879927,0.873014,0.821383,0.839786
6,0.1032,0.720306,0.879927,0.875234,0.830008,0.846953
7,0.0754,0.684495,0.879927,0.875696,0.839941,0.854074
8,0.0715,0.661497,0.883593,0.892512,0.83245,0.854341
9,0.0613,0.66396,0.880843,0.888665,0.839806,0.858309
10,0.0588,0.654122,0.883593,0.89131,0.831965,0.853563


[I 2025-03-22 03:03:17,176] Trial 48 pruned. 


Trial 49 with params: {'learning_rate': 0.004089133967924388, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2466,1.331386,0.782768,0.673881,0.663675,0.665011
2,0.8158,0.897797,0.851512,0.720258,0.724023,0.719808
3,0.3427,0.761777,0.87626,0.877581,0.779504,0.801545
4,0.1705,0.700338,0.886343,0.892569,0.826078,0.848349
5,0.0991,0.654001,0.885426,0.894065,0.824324,0.848248
6,0.0806,0.670707,0.883593,0.890164,0.82528,0.846601
7,0.0715,0.659529,0.885426,0.895216,0.842761,0.862914
8,0.0633,0.652583,0.890926,0.896829,0.830951,0.852592
9,0.0589,0.652455,0.890009,0.898667,0.829292,0.852503
10,0.056,0.641413,0.890009,0.895335,0.829659,0.851627


[I 2025-03-22 03:05:01,265] Trial 49 finished with value: 0.8527626654570858 and parameters: {'learning_rate': 0.004089133967924388, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}. Best is trial 31 with value: 0.8642290841521022.


Trial 50 with params: {'learning_rate': 0.0033057165835290716, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2165,1.390904,0.778185,0.668463,0.658776,0.660421
2,0.8604,0.926182,0.850596,0.717669,0.724652,0.718156
3,0.3905,0.906993,0.862511,0.732901,0.734449,0.731162
4,0.2266,0.770258,0.88176,0.863948,0.822951,0.838192
5,0.1283,0.712784,0.880843,0.863557,0.832416,0.844454
6,0.098,0.695532,0.88451,0.893513,0.824109,0.847581
7,0.0787,0.641274,0.897342,0.879861,0.844354,0.858762
8,0.068,0.666299,0.886343,0.892528,0.826935,0.848803
9,0.0611,0.651491,0.890926,0.88476,0.839109,0.856649
10,0.0572,0.658056,0.891842,0.89773,0.839992,0.860749


[I 2025-03-22 03:06:17,848] Trial 50 finished with value: 0.8608085664494093 and parameters: {'learning_rate': 0.0033057165835290716, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 31 with value: 0.8642290841521022.


Trial 51 with params: {'learning_rate': 0.002854757801339913, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5366,1.46117,0.767186,0.67238,0.654925,0.655737
2,0.9881,0.967131,0.836847,0.702755,0.714961,0.707524
3,0.4474,0.909431,0.852429,0.729878,0.724524,0.724302
4,0.2419,0.86718,0.851512,0.841924,0.770735,0.790461
5,0.1353,0.696239,0.88176,0.892151,0.831329,0.853395
6,0.0889,0.701571,0.880843,0.889916,0.830149,0.851756
7,0.0723,0.682778,0.88176,0.892548,0.830117,0.853064
8,0.0671,0.685426,0.877177,0.889559,0.826516,0.849641
9,0.06,0.649493,0.888176,0.895291,0.836086,0.857504
10,0.0562,0.662164,0.882676,0.890762,0.832211,0.853386


[I 2025-03-22 03:07:11,785] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0027158955385139997, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4855,1.572488,0.764436,0.681257,0.648494,0.652116
2,1.0017,0.977747,0.834097,0.702807,0.711037,0.705569
3,0.4696,0.9354,0.855179,0.730694,0.72646,0.72607
4,0.2557,0.821573,0.866178,0.875449,0.791957,0.81522
5,0.1478,0.723489,0.882676,0.893263,0.82222,0.846325
6,0.1145,0.717951,0.877177,0.881646,0.819306,0.839515
7,0.0814,0.714416,0.878093,0.888288,0.827097,0.84939
8,0.0691,0.691021,0.88451,0.891569,0.833841,0.854451
9,0.0607,0.704125,0.88176,0.889531,0.831348,0.852325
10,0.0567,0.690471,0.886343,0.894648,0.834867,0.856591


[I 2025-03-22 03:08:26,562] Trial 52 finished with value: 0.8575482497964942 and parameters: {'learning_rate': 0.0027158955385139997, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.0, 'temperature': 7.0}. Best is trial 31 with value: 0.8642290841521022.


Trial 53 with params: {'learning_rate': 0.00042607074451430765, 'weight_decay': 0.005, 'warmup_steps': 3, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.5315,2.894235,0.494959,0.56199,0.386066,0.361506
2,2.3857,1.936205,0.699358,0.614457,0.594105,0.600945
3,1.5415,1.504341,0.76077,0.641419,0.652386,0.642286
4,1.1671,1.358855,0.783685,0.679643,0.667901,0.669337
5,0.9767,1.232297,0.8011,0.667846,0.68686,0.676088


[I 2025-03-22 03:08:52,003] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0001324011031485879, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8438,3.663796,0.332722,0.206254,0.243565,0.173431
2,3.4744,3.130886,0.417965,0.36799,0.316645,0.236306
3,2.8999,2.630978,0.568286,0.552547,0.461438,0.463447
4,2.4492,2.289906,0.644363,0.545811,0.543684,0.539081
5,2.1582,2.08779,0.661778,0.554196,0.567944,0.558247
6,1.9164,1.915771,0.700275,0.592951,0.595,0.59199
7,1.741,1.818481,0.72044,0.618359,0.60972,0.611775
8,1.6002,1.733948,0.732356,0.617444,0.628538,0.622071
9,1.5037,1.663705,0.735105,0.624635,0.62788,0.622895
10,1.3973,1.635626,0.748854,0.624762,0.643032,0.632266


[I 2025-03-22 03:10:30,059] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0010991268702194139, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0011,1.946484,0.67736,0.595447,0.576782,0.579863
2,1.4692,1.295238,0.7956,0.675302,0.678387,0.674605
3,0.8663,1.029466,0.83593,0.705475,0.712723,0.708021
4,0.5759,0.937243,0.849679,0.717968,0.723825,0.719229
5,0.4047,0.842044,0.857929,0.717777,0.729938,0.723704


[I 2025-03-22 03:10:54,585] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0001413812546509425, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8363,3.651401,0.335472,0.3731,0.24564,0.176271
2,3.4345,3.081356,0.44088,0.376599,0.335853,0.271589
3,2.8465,2.56909,0.604033,0.559976,0.500492,0.505995
4,2.3816,2.227874,0.652612,0.55117,0.551852,0.546151
5,2.0836,2.024293,0.676444,0.568146,0.579663,0.570897
6,1.8374,1.842542,0.711274,0.601685,0.605202,0.601769
7,1.6541,1.751916,0.738772,0.634484,0.624668,0.627113
8,1.5133,1.669381,0.739688,0.624257,0.635611,0.628672
9,1.4235,1.602686,0.746104,0.633822,0.638478,0.633143
10,1.3163,1.575953,0.756187,0.631067,0.648929,0.638805


[I 2025-03-22 03:11:48,635] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00012862788348576466, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.847,3.668639,0.333639,0.206322,0.244245,0.174113
2,3.4931,3.162117,0.414299,0.369557,0.313692,0.230247
3,2.9312,2.672197,0.546288,0.552471,0.436969,0.431201
4,2.4888,2.32444,0.636114,0.538005,0.536917,0.531443
5,2.1955,2.117619,0.660862,0.554425,0.566068,0.557508


[I 2025-03-22 03:12:13,426] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0016152937714768707, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7476,1.722523,0.718607,0.617865,0.617158,0.60882
2,1.2534,1.252948,0.789184,0.666111,0.676075,0.668692
3,0.6958,0.939255,0.849679,0.714999,0.723071,0.717933
4,0.4274,0.845555,0.860678,0.725881,0.73209,0.727375
5,0.2641,0.775145,0.867094,0.868792,0.793117,0.813124


[I 2025-03-22 03:12:38,063] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0018513207473671099, 'weight_decay': 0.004, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6613,1.700155,0.711274,0.615513,0.61352,0.600423
2,1.2134,1.295055,0.782768,0.659624,0.67274,0.662008
3,0.6609,0.947046,0.845096,0.71503,0.719785,0.715588
4,0.3906,0.801316,0.868011,0.732591,0.737244,0.73363
5,0.238,0.752892,0.878093,0.87868,0.801875,0.821899
6,0.1483,0.737403,0.877177,0.883831,0.827398,0.847359
7,0.1071,0.700351,0.885426,0.89309,0.833007,0.854843
8,0.0816,0.698964,0.88176,0.889707,0.830719,0.852157
9,0.0698,0.712416,0.880843,0.889505,0.829302,0.851192
10,0.0646,0.70152,0.880843,0.887716,0.83042,0.851005


[I 2025-03-22 03:13:26,008] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.00017559280388301614, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7795,3.552938,0.375802,0.346423,0.276859,0.224792
2,3.2154,2.834262,0.505958,0.542476,0.398159,0.369316
3,2.5699,2.355961,0.624198,0.532894,0.53323,0.520238
4,2.1199,2.000286,0.690192,0.582526,0.587918,0.581177
5,1.8176,1.784149,0.725023,0.612467,0.619113,0.61383


[I 2025-03-22 03:13:51,894] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00027314446191377634, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.6881,3.219454,0.456462,0.392876,0.345661,0.293286
2,2.8147,2.329207,0.623281,0.5649,0.518553,0.528349
3,2.0124,1.823342,0.71494,0.609052,0.610128,0.603492
4,1.5369,1.603374,0.751604,0.650992,0.639703,0.64066
5,1.3255,1.449692,0.768103,0.647526,0.659115,0.651514
6,1.1448,1.365787,0.782768,0.654768,0.669442,0.661246
7,0.9915,1.284299,0.799267,0.674547,0.681671,0.677488
8,0.8863,1.268008,0.791017,0.662087,0.678709,0.668697
9,0.8172,1.219,0.804766,0.67391,0.687874,0.680055
10,0.7366,1.254056,0.794684,0.665214,0.680021,0.670757


[I 2025-03-22 03:14:52,852] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.002002782108203493, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6525,1.613879,0.730522,0.622296,0.627861,0.616746
2,1.1706,1.270227,0.791934,0.661506,0.681295,0.668012
3,0.6248,0.920418,0.853346,0.723256,0.725285,0.722581
4,0.3607,0.83179,0.861595,0.724358,0.734046,0.727655
5,0.2214,0.775466,0.874427,0.876189,0.797749,0.81896
6,0.1371,0.717351,0.877177,0.887207,0.8254,0.847794
7,0.0961,0.713449,0.87626,0.861625,0.825345,0.840061
8,0.0767,0.708614,0.878093,0.887891,0.8366,0.856467
9,0.0695,0.705695,0.875344,0.884732,0.824958,0.846628
10,0.0638,0.718225,0.872594,0.882569,0.832057,0.851354


[I 2025-03-22 03:15:41,697] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.004273112256652242, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2574,1.440598,0.780935,0.667296,0.662618,0.660826
2,0.8185,0.829216,0.863428,0.724827,0.733765,0.728833
3,0.3235,0.800008,0.872594,0.889054,0.814011,0.839044
4,0.1699,0.702635,0.883593,0.889732,0.825735,0.846609
5,0.1014,0.657516,0.887259,0.896456,0.835603,0.857778
6,0.0818,0.666943,0.885426,0.892592,0.834113,0.855065
7,0.0677,0.647591,0.891842,0.896148,0.840532,0.860232
8,0.0619,0.622732,0.891842,0.896889,0.840394,0.860551
9,0.0554,0.647135,0.885426,0.892648,0.834299,0.855309
10,0.0533,0.656244,0.887259,0.89383,0.836042,0.856713


[I 2025-03-22 03:17:24,058] Trial 63 finished with value: 0.8600685529617548 and parameters: {'learning_rate': 0.004273112256652242, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}. Best is trial 31 with value: 0.8642290841521022.


Trial 64 with params: {'learning_rate': 0.002455355254460489, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5259,1.569988,0.744271,0.642322,0.636877,0.631777
2,1.087,1.138104,0.812099,0.680244,0.695967,0.685412
3,0.5244,0.832882,0.868011,0.730948,0.738503,0.733819
4,0.2764,0.836539,0.867094,0.895491,0.77462,0.794146
5,0.1635,0.742181,0.880843,0.840382,0.812631,0.823175
6,0.1093,0.739163,0.882676,0.890279,0.820873,0.844509
7,0.088,0.685824,0.886343,0.880908,0.834222,0.852283
8,0.0736,0.693759,0.88176,0.867013,0.830095,0.845214
9,0.0633,0.677427,0.885426,0.891527,0.833999,0.854558
10,0.0598,0.678638,0.887259,0.881769,0.835365,0.853081


[I 2025-03-22 03:18:14,538] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.004451165715697028, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3086,1.320468,0.793767,0.669625,0.679393,0.67304
2,0.8429,0.864306,0.856095,0.7224,0.728412,0.724587
3,0.3282,0.738054,0.880843,0.875696,0.820501,0.840456
4,0.1627,0.732408,0.88451,0.895298,0.832866,0.855824
5,0.1053,0.662636,0.887259,0.896739,0.835698,0.857568
6,0.0814,0.681699,0.88176,0.893305,0.840867,0.861215
7,0.0721,0.665807,0.88451,0.881033,0.844233,0.858989
8,0.063,0.667383,0.88176,0.892659,0.841259,0.861127
9,0.057,0.651243,0.886343,0.894176,0.835832,0.85685
10,0.0531,0.652583,0.886343,0.895597,0.845384,0.864649


[I 2025-03-22 03:20:05,948] Trial 65 finished with value: 0.8665942877370917 and parameters: {'learning_rate': 0.004451165715697028, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 65 with value: 0.8665942877370917.


Trial 66 with params: {'learning_rate': 0.003197786449053236, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2691,1.447715,0.761687,0.661133,0.645344,0.646229
2,0.8951,0.886349,0.858845,0.724002,0.73013,0.725485
3,0.3923,0.856753,0.859762,0.733976,0.730429,0.72953
4,0.2368,0.808048,0.870761,0.866759,0.802837,0.824454
5,0.136,0.665313,0.889093,0.89551,0.837625,0.858439
6,0.0956,0.672788,0.886343,0.892066,0.825921,0.847951
7,0.0748,0.649351,0.892759,0.899333,0.839789,0.861485
8,0.0669,0.633777,0.893676,0.900581,0.84084,0.862566
9,0.0617,0.649407,0.889093,0.894339,0.837754,0.85796
10,0.057,0.63847,0.891842,0.897712,0.83892,0.86027


[I 2025-03-22 03:21:28,156] Trial 66 finished with value: 0.8606767578737161 and parameters: {'learning_rate': 0.003197786449053236, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 65 with value: 0.8665942877370917.


Trial 67 with params: {'learning_rate': 0.0034980710998842248, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2039,1.393038,0.779102,0.667308,0.662247,0.660626
2,0.846,0.909227,0.851512,0.721756,0.725846,0.719759
3,0.3602,0.828523,0.863428,0.863947,0.772859,0.791038
4,0.1879,0.813591,0.867094,0.875883,0.792624,0.815563
5,0.1342,0.694824,0.878093,0.883705,0.82019,0.84095
6,0.0977,0.658118,0.882676,0.890691,0.822557,0.845573
7,0.0783,0.665377,0.880843,0.88926,0.819921,0.843621
8,0.0663,0.640994,0.889093,0.894893,0.837259,0.85801
9,0.0617,0.649377,0.886343,0.892209,0.825547,0.848063
10,0.0558,0.649627,0.889093,0.892661,0.828365,0.84972


[I 2025-03-22 03:23:03,144] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0002492428287138547, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7335,3.337713,0.474794,0.419947,0.35806,0.310989
2,2.9228,2.45175,0.613199,0.549724,0.509158,0.515625
3,2.1499,1.945556,0.696609,0.59826,0.590945,0.587483
4,1.663,1.654824,0.751604,0.64605,0.638393,0.638724
5,1.4217,1.51077,0.76077,0.648941,0.650112,0.646568
6,1.2282,1.41346,0.771769,0.646062,0.661785,0.652585
7,1.0666,1.338001,0.788268,0.662591,0.673398,0.667053
8,0.9565,1.302937,0.781852,0.656566,0.670351,0.661677
9,0.8829,1.248623,0.800183,0.670849,0.683939,0.676622
10,0.7983,1.26755,0.791934,0.662595,0.677421,0.668453


[I 2025-03-22 03:24:21,107] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 7.808255793137976e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9013,3.744221,0.340055,0.18532,0.249102,0.176716
2,3.6879,3.558096,0.405133,0.22266,0.30761,0.205284
3,3.4012,3.166542,0.422548,0.372017,0.319335,0.246157
4,3.0347,2.875819,0.499542,0.557482,0.388736,0.357563
5,2.7875,2.646543,0.570119,0.540474,0.466967,0.469708


[I 2025-03-22 03:24:45,855] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.004228218944317702, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2652,1.3664,0.778185,0.65847,0.66648,0.658948
2,0.8437,0.90141,0.848763,0.717478,0.722337,0.718881
3,0.3496,0.760002,0.870761,0.845936,0.775788,0.793129
4,0.1675,0.721438,0.87626,0.889104,0.81719,0.841658
5,0.1047,0.650211,0.887259,0.894958,0.835343,0.857062
6,0.0847,0.669351,0.88176,0.890765,0.831711,0.852986
7,0.0733,0.667276,0.879927,0.887615,0.831961,0.851592
8,0.0616,0.671575,0.878093,0.874512,0.829432,0.846724
9,0.0584,0.645447,0.88451,0.880482,0.834429,0.852143
10,0.0532,0.649263,0.882676,0.877104,0.833516,0.849983


[I 2025-03-22 03:25:36,023] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0008351364102847169, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1397,2.19821,0.643446,0.570402,0.545894,0.542008
2,1.6762,1.396827,0.783685,0.67207,0.666854,0.666231
3,1.017,1.138885,0.813932,0.69152,0.693407,0.690359
4,0.7306,1.060237,0.827681,0.706887,0.70477,0.702967
5,0.5527,0.939706,0.84418,0.70724,0.719717,0.713176
6,0.4058,0.936383,0.845096,0.704894,0.722496,0.711571
7,0.3126,0.872336,0.856095,0.721129,0.727495,0.724018
8,0.232,0.877073,0.858845,0.85162,0.768912,0.783887
9,0.1838,0.858839,0.856095,0.862152,0.783354,0.804551
10,0.1394,0.833151,0.860678,0.872703,0.813114,0.834728


[I 2025-03-22 03:26:30,912] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.00283384406712107, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3665,1.45517,0.756187,0.647684,0.643901,0.641046
2,0.943,0.947187,0.839597,0.705825,0.715934,0.708866
3,0.4534,0.806071,0.874427,0.737978,0.743987,0.74003
4,0.2423,0.84006,0.867094,0.858264,0.804264,0.820887
5,0.1462,0.705846,0.878093,0.866362,0.811259,0.829011
6,0.1047,0.699611,0.885426,0.879939,0.834285,0.851804
7,0.0775,0.672533,0.883593,0.877767,0.842626,0.856574
8,0.0702,0.663584,0.886343,0.881199,0.835139,0.852892
9,0.0632,0.645527,0.883593,0.890689,0.832642,0.853565
10,0.0586,0.642261,0.886343,0.894082,0.834801,0.856271


[I 2025-03-22 03:27:50,085] Trial 72 finished with value: 0.8537094776258124 and parameters: {'learning_rate': 0.00283384406712107, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 65 with value: 0.8665942877370917.


Trial 73 with params: {'learning_rate': 0.0022389020380338006, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5848,1.589898,0.731439,0.626416,0.629512,0.618568
2,1.1431,1.146056,0.809349,0.679646,0.692896,0.684178
3,0.5782,0.856009,0.863428,0.725845,0.73442,0.729658
4,0.3196,0.809186,0.863428,0.891892,0.753897,0.763578
5,0.1924,0.701205,0.890009,0.871311,0.811236,0.828599
6,0.1205,0.679635,0.890926,0.882442,0.838601,0.855353
7,0.0856,0.692908,0.88451,0.890694,0.843244,0.861082
8,0.075,0.690284,0.880843,0.889833,0.829704,0.851558
9,0.0661,0.671467,0.88451,0.89123,0.843227,0.861202
10,0.0613,0.682463,0.882676,0.891058,0.840457,0.85972


[I 2025-03-22 03:29:22,984] Trial 73 finished with value: 0.8594004772314153 and parameters: {'learning_rate': 0.0022389020380338006, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 65 with value: 0.8665942877370917.


Trial 74 with params: {'learning_rate': 0.0032050907092305556, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4836,1.420374,0.767186,0.664547,0.654301,0.654535
2,0.9415,0.94877,0.837764,0.70765,0.713916,0.709131
3,0.4143,0.867055,0.858845,0.732154,0.729542,0.728548
4,0.2178,0.816775,0.871677,0.879148,0.797168,0.819696
5,0.1373,0.752875,0.869844,0.880227,0.814656,0.835595
6,0.0962,0.68231,0.87901,0.886731,0.82013,0.842649
7,0.0737,0.683457,0.87901,0.886494,0.820466,0.842678
8,0.064,0.679989,0.880843,0.886107,0.822877,0.843549
9,0.0578,0.666661,0.88451,0.890521,0.825098,0.847017
10,0.0562,0.658688,0.886343,0.891899,0.827742,0.848949


[I 2025-03-22 03:30:19,157] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0012369393753215658, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.872,1.844977,0.708524,0.595756,0.608842,0.601794
2,1.3792,1.247337,0.791934,0.665612,0.677275,0.670156
3,0.8031,1.033403,0.839597,0.709708,0.715135,0.710312
4,0.5398,0.926076,0.852429,0.717332,0.725957,0.720196
5,0.377,0.826832,0.857012,0.719281,0.728663,0.723715
6,0.2468,0.804941,0.862511,0.866171,0.787777,0.80938
7,0.1646,0.780767,0.874427,0.883909,0.832952,0.851757
8,0.119,0.744944,0.875344,0.871796,0.834555,0.849336
9,0.09,0.758114,0.872594,0.882595,0.832911,0.851712
10,0.08,0.77899,0.868011,0.879737,0.827566,0.847533


[I 2025-03-22 03:31:09,582] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.0033466866493613912, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2007,1.362083,0.775435,0.666914,0.658796,0.659773
2,0.8805,0.9419,0.840513,0.709673,0.718076,0.70945
3,0.3831,0.825021,0.864345,0.815307,0.745644,0.750089
4,0.2138,0.80768,0.870761,0.870136,0.821648,0.840251
5,0.1171,0.706623,0.882676,0.891698,0.832708,0.853974
6,0.0887,0.738385,0.88176,0.893607,0.822586,0.846629
7,0.079,0.698256,0.88176,0.879066,0.83085,0.849561
8,0.0673,0.689069,0.882676,0.893049,0.831917,0.854145
9,0.0621,0.670684,0.885426,0.893836,0.834565,0.856097
10,0.0572,0.70542,0.883593,0.890724,0.83344,0.853987


[I 2025-03-22 03:32:01,021] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0033738460860100747, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1948,1.344116,0.782768,0.667467,0.667542,0.663596
2,0.8703,0.909606,0.850596,0.712421,0.726439,0.717279
3,0.3836,0.845881,0.862511,0.732138,0.734739,0.731513
4,0.2079,0.819812,0.874427,0.86361,0.798159,0.817971
5,0.1259,0.711986,0.88451,0.893293,0.83501,0.856033
6,0.0922,0.706038,0.88176,0.888818,0.822445,0.844524
7,0.0816,0.681524,0.883593,0.893598,0.823,0.847201
8,0.0673,0.683251,0.886343,0.893324,0.836256,0.856504
9,0.0616,0.678888,0.885426,0.893639,0.824496,0.848147
10,0.0593,0.692525,0.882676,0.88943,0.833007,0.853132


[I 2025-03-22 03:33:34,075] Trial 77 finished with value: 0.8557612909325338 and parameters: {'learning_rate': 0.0033738460860100747, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}. Best is trial 65 with value: 0.8665942877370917.


Trial 78 with params: {'learning_rate': 0.0028282847560266284, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3671,1.452091,0.752521,0.645916,0.639964,0.637868
2,0.946,0.964744,0.837764,0.7066,0.714336,0.708267
3,0.4545,0.794034,0.871677,0.73542,0.742111,0.73775
4,0.2382,0.847907,0.867094,0.875825,0.812415,0.83233
5,0.1393,0.67856,0.88451,0.888749,0.814767,0.837569
6,0.0939,0.703484,0.882676,0.875047,0.823157,0.841658
7,0.0727,0.686159,0.879927,0.886899,0.829802,0.850303
8,0.0696,0.681136,0.88451,0.879525,0.833778,0.851416
9,0.0621,0.665327,0.888176,0.895056,0.836013,0.857456
10,0.0578,0.660557,0.883593,0.890339,0.832645,0.853448


[I 2025-03-22 03:35:08,665] Trial 78 finished with value: 0.8544108320905738 and parameters: {'learning_rate': 0.0028282847560266284, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.0}. Best is trial 65 with value: 0.8665942877370917.


Trial 79 with params: {'learning_rate': 5.902380787515226e-05, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9321,3.807801,0.337305,0.368035,0.247028,0.177448
2,3.7447,3.653099,0.409716,0.216843,0.311,0.212708
3,3.5994,3.442383,0.456462,0.233665,0.342607,0.277101
4,3.3098,3.135475,0.444546,0.38631,0.335914,0.270969
5,3.0671,2.940081,0.472044,0.562925,0.362622,0.314601


[I 2025-03-22 03:35:32,459] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0012132712917754616, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.887,1.849017,0.707608,0.597312,0.606956,0.601695
2,1.3863,1.260252,0.792851,0.66687,0.678519,0.671129
3,0.816,1.029869,0.835014,0.705576,0.711732,0.706428
4,0.5465,0.927552,0.852429,0.717461,0.725361,0.719932
5,0.3771,0.835222,0.857012,0.719461,0.728265,0.723624
6,0.2496,0.814079,0.858845,0.865646,0.793832,0.81564
7,0.1666,0.78392,0.875344,0.884651,0.816067,0.838353
8,0.1179,0.747418,0.878093,0.88684,0.836891,0.855785
9,0.0973,0.745325,0.869844,0.881449,0.82972,0.84966
10,0.0816,0.76274,0.872594,0.882724,0.831878,0.85144


[I 2025-03-22 03:37:24,794] Trial 80 finished with value: 0.8451914262316665 and parameters: {'learning_rate': 0.0012132712917754616, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 3.5}. Best is trial 65 with value: 0.8665942877370917.


Trial 81 with params: {'learning_rate': 0.004838713174448739, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2796,1.420681,0.774519,0.661287,0.662597,0.654709
2,0.8178,0.869523,0.852429,0.717192,0.727847,0.720134
3,0.3199,0.792418,0.869844,0.885195,0.803115,0.828126
4,0.1684,0.707668,0.886343,0.896986,0.825078,0.849559
5,0.1054,0.658176,0.891842,0.902757,0.848647,0.869689
6,0.0812,0.687943,0.880843,0.891331,0.841421,0.860393
7,0.0708,0.63898,0.891842,0.899408,0.84118,0.861984
8,0.064,0.656973,0.886343,0.896994,0.835512,0.857773
9,0.0579,0.647779,0.890009,0.897464,0.848247,0.86706
10,0.0552,0.649886,0.888176,0.895628,0.836968,0.858061


[I 2025-03-22 03:39:08,876] Trial 81 finished with value: 0.8602003650306572 and parameters: {'learning_rate': 0.004838713174448739, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 65 with value: 0.8665942877370917.


Trial 82 with params: {'learning_rate': 0.0017580436761981916, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6756,1.764753,0.693859,0.612817,0.597396,0.588987
2,1.2574,1.206054,0.79835,0.675691,0.683476,0.677411
3,0.6721,0.962185,0.852429,0.72275,0.725564,0.72257
4,0.4067,0.811055,0.860678,0.726262,0.731808,0.727429
5,0.2545,0.755853,0.870761,0.878749,0.803309,0.826557
6,0.1516,0.737007,0.879927,0.888586,0.828366,0.850049
7,0.1092,0.671345,0.891842,0.898633,0.838629,0.860417
8,0.0837,0.681204,0.883593,0.891099,0.841389,0.860496
9,0.0719,0.697773,0.87901,0.887447,0.827942,0.849588
10,0.0667,0.691075,0.880843,0.888442,0.830374,0.851234


[I 2025-03-22 03:40:31,683] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0018018937382130724, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6658,1.716107,0.703025,0.612449,0.607044,0.593458
2,1.215,1.289133,0.784601,0.661196,0.673935,0.663478
3,0.6565,0.958265,0.848763,0.720654,0.723126,0.719755
4,0.3978,0.807755,0.863428,0.728291,0.734997,0.730011
5,0.2511,0.764233,0.875344,0.878688,0.799158,0.820358


[I 2025-03-22 03:40:57,040] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.0010928985745709764, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9309,1.900989,0.692942,0.586887,0.596001,0.590497
2,1.4476,1.270811,0.794684,0.674887,0.677609,0.675091
3,0.8589,1.084127,0.830431,0.704501,0.707358,0.703873
4,0.5894,0.947935,0.848763,0.717673,0.721992,0.718153
5,0.4167,0.834059,0.855179,0.717627,0.727032,0.722236
6,0.2882,0.840473,0.849679,0.835527,0.751549,0.762973
7,0.2079,0.767807,0.874427,0.879191,0.816759,0.836671
8,0.1468,0.810033,0.867094,0.879401,0.819244,0.840313
9,0.1171,0.773379,0.864345,0.876228,0.82656,0.845254
10,0.0923,0.772467,0.870761,0.881533,0.831028,0.850304


[I 2025-03-22 03:41:47,967] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.003492789868559147, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2045,1.384734,0.779102,0.665606,0.663368,0.660835
2,0.8482,0.897253,0.857012,0.722996,0.729837,0.723843
3,0.3591,0.848197,0.864345,0.866709,0.77261,0.791679
4,0.1924,0.804878,0.872594,0.865351,0.795629,0.817343
5,0.1252,0.679448,0.885426,0.893317,0.824379,0.847854
6,0.101,0.68944,0.885426,0.891848,0.825586,0.847875
7,0.0735,0.67404,0.883593,0.888445,0.82417,0.845476
8,0.0634,0.654809,0.891842,0.8963,0.830465,0.8526
9,0.0595,0.675598,0.880843,0.88578,0.822584,0.843328
10,0.0552,0.660238,0.879927,0.886217,0.820925,0.842793


[I 2025-03-22 03:42:40,947] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0047995795305536375, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2864,1.423926,0.769019,0.658701,0.657534,0.649921
2,0.8257,0.856513,0.858845,0.722077,0.732549,0.725613
3,0.3162,0.773734,0.878093,0.887928,0.800093,0.82516
4,0.163,0.690832,0.879927,0.878954,0.830145,0.848542
5,0.1097,0.630164,0.898258,0.905657,0.854748,0.874362
6,0.0865,0.637177,0.893676,0.902778,0.850441,0.870704
7,0.0726,0.628361,0.889093,0.885661,0.848022,0.863086
8,0.0612,0.613869,0.894592,0.901611,0.843025,0.864192
9,0.0575,0.607625,0.891842,0.899301,0.839887,0.861397
10,0.0544,0.623347,0.887259,0.896189,0.845801,0.865121


[I 2025-03-22 03:44:02,312] Trial 86 finished with value: 0.8693899540096316 and parameters: {'learning_rate': 0.0047995795305536375, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 87 with params: {'learning_rate': 0.004355241471291193, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2567,1.428636,0.776352,0.667057,0.659041,0.658247
2,0.8034,0.807165,0.868011,0.728094,0.738207,0.732588
3,0.3298,0.721703,0.872594,0.883205,0.80555,0.82946
4,0.1634,0.684174,0.890926,0.899727,0.83963,0.861364
5,0.0972,0.633954,0.893676,0.900732,0.832335,0.855611
6,0.0753,0.650007,0.890926,0.897476,0.840211,0.860419
7,0.0671,0.641343,0.892759,0.9003,0.840541,0.86225
8,0.0626,0.627505,0.887259,0.894456,0.83614,0.857045
9,0.0564,0.629051,0.890926,0.898644,0.838706,0.860405
10,0.0526,0.638066,0.888176,0.895925,0.836644,0.858032


[I 2025-03-22 03:45:16,056] Trial 87 finished with value: 0.8603435046558596 and parameters: {'learning_rate': 0.004355241471291193, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 88 with params: {'learning_rate': 0.0016450339735195022, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7594,1.702705,0.72319,0.616315,0.61933,0.611596
2,1.2557,1.245517,0.7956,0.674247,0.680271,0.674977
3,0.6977,0.935318,0.84143,0.711036,0.716793,0.712619
4,0.4276,0.832128,0.864345,0.727893,0.735035,0.729803
5,0.2621,0.765182,0.870761,0.873927,0.794943,0.816562


[I 2025-03-22 03:46:17,522] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0018583984330277988, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6959,1.736062,0.697525,0.61257,0.599903,0.592011
2,1.2238,1.240013,0.791934,0.667375,0.678746,0.670471
3,0.6529,0.903053,0.855179,0.719481,0.728416,0.723164
4,0.3883,0.84794,0.855179,0.719939,0.729319,0.722791
5,0.239,0.739153,0.88176,0.882665,0.804356,0.825437
6,0.1438,0.706168,0.88176,0.890663,0.830005,0.852036
7,0.0978,0.669463,0.886343,0.893295,0.833698,0.855366
8,0.0771,0.688599,0.890926,0.896976,0.83841,0.859601
9,0.0649,0.680702,0.88451,0.890571,0.833629,0.853918
10,0.0621,0.673617,0.888176,0.892742,0.836752,0.856703


[I 2025-03-22 03:47:48,102] Trial 89 finished with value: 0.8625701725345238 and parameters: {'learning_rate': 0.0018583984330277988, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 90 with params: {'learning_rate': 0.0011417628975969986, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9661,1.916871,0.689276,0.595936,0.588674,0.58841
2,1.4455,1.291317,0.792851,0.669661,0.677286,0.671531
3,0.861,1.046585,0.837764,0.709868,0.714027,0.710074
4,0.5691,0.918535,0.851512,0.718662,0.725234,0.720606
5,0.3894,0.828623,0.857929,0.7174,0.730781,0.72375
6,0.2671,0.816274,0.859762,0.855856,0.767036,0.785266
7,0.1761,0.764631,0.877177,0.882632,0.809254,0.831228
8,0.1245,0.75087,0.872594,0.868975,0.822957,0.840598
9,0.0979,0.782649,0.865261,0.877327,0.82668,0.845775
10,0.0855,0.766296,0.869844,0.880839,0.829805,0.849372


[I 2025-03-22 03:48:59,667] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0015870895951564893, 'weight_decay': 0.003, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7796,1.711307,0.716774,0.611463,0.615619,0.606885
2,1.2686,1.26367,0.791934,0.674787,0.676211,0.673181
3,0.735,0.966653,0.845096,0.713695,0.719225,0.714785
4,0.4439,0.858695,0.856095,0.719775,0.729301,0.723017
5,0.27,0.799693,0.866178,0.866137,0.781809,0.80225


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--precision/155d3220d6cd4a6553f12da68eeb3d1f97cf431206304a4bc6e2d564c29502e9 (last modified on Fri Jan 10 23:13:59 2025) since it couldn't be found locally at evaluate-metric--precision, or remotely on the Hugging Face Hub.
[I 2025-03-22 03:49:38,113] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0026880121098462016, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4999,1.552074,0.757104,0.671823,0.644067,0.645735
2,1.0112,0.982597,0.831347,0.699201,0.708972,0.702515
3,0.4657,0.911205,0.852429,0.729026,0.723463,0.723811
4,0.2452,0.782175,0.869844,0.856507,0.796018,0.813774
5,0.1476,0.730914,0.877177,0.888503,0.817457,0.841732
6,0.1004,0.716152,0.888176,0.894873,0.835746,0.857176
7,0.077,0.691707,0.885426,0.89332,0.833886,0.85551
8,0.0666,0.701434,0.882676,0.890622,0.830764,0.852517
9,0.0586,0.694271,0.883593,0.892444,0.832107,0.854039
10,0.0544,0.703102,0.883593,0.891737,0.832068,0.85381


[I 2025-03-22 03:51:08,400] Trial 92 finished with value: 0.8531121231108217 and parameters: {'learning_rate': 0.0026880121098462016, 'weight_decay': 0.006, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 93 with params: {'learning_rate': 0.00460457625010268, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2747,1.388243,0.784601,0.668477,0.666814,0.664068
2,0.8113,0.838758,0.860678,0.72354,0.732701,0.727335
3,0.3119,0.797481,0.873511,0.885639,0.80483,0.83041
4,0.1657,0.714872,0.87901,0.887392,0.820508,0.842976
5,0.1071,0.671356,0.887259,0.900583,0.834539,0.858706
6,0.0916,0.672565,0.891842,0.902786,0.8481,0.869288
7,0.0705,0.635642,0.887259,0.880943,0.836608,0.853456
8,0.0613,0.624156,0.889093,0.884436,0.837811,0.855769
9,0.0566,0.63428,0.885426,0.881171,0.833834,0.85203
10,0.0535,0.62877,0.888176,0.89505,0.8367,0.857807


[I 2025-03-22 03:52:46,247] Trial 93 finished with value: 0.8572261474254376 and parameters: {'learning_rate': 0.00460457625010268, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 94 with params: {'learning_rate': 0.002131765032515868, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6125,1.638037,0.719523,0.618382,0.620789,0.605181
2,1.1598,1.194368,0.796517,0.669201,0.683602,0.673318
3,0.5954,0.881098,0.857012,0.722554,0.729044,0.725132
4,0.3342,0.813351,0.863428,0.891456,0.754514,0.763141
5,0.201,0.740696,0.879927,0.881074,0.802966,0.823848
6,0.1285,0.742156,0.879927,0.864075,0.829216,0.843245
7,0.0901,0.701816,0.882676,0.889111,0.831378,0.852115
8,0.0747,0.736579,0.873511,0.885119,0.813661,0.838029
9,0.0685,0.685249,0.88176,0.88796,0.831575,0.851533
10,0.0623,0.703522,0.87901,0.887698,0.828786,0.849906


[I 2025-03-22 03:53:40,968] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0007756146753848378, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2817,2.379801,0.611366,0.577458,0.507687,0.513671
2,1.8007,1.458015,0.76352,0.65054,0.650746,0.648803
3,1.0837,1.162023,0.809349,0.689337,0.689198,0.687061
4,0.7831,1.081935,0.825848,0.705018,0.703644,0.701709
5,0.5974,0.959828,0.843263,0.70648,0.719548,0.712062


[I 2025-03-22 03:54:18,163] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.00015972356535382792, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7935,3.590064,0.363886,0.355006,0.267624,0.213081
2,3.2975,2.94035,0.486709,0.552281,0.378438,0.343965
3,2.688,2.452827,0.613199,0.53432,0.519307,0.51014
4,2.2418,2.097991,0.667278,0.561746,0.567527,0.560063
5,1.9316,1.879554,0.707608,0.594945,0.606851,0.598745
6,1.682,1.70582,0.736939,0.621216,0.628193,0.624462
7,1.4959,1.626242,0.753437,0.641741,0.640463,0.639698
8,1.355,1.545717,0.765353,0.647513,0.656245,0.650676
9,1.2741,1.494789,0.767186,0.650559,0.656636,0.651257
10,1.1769,1.465584,0.774519,0.64838,0.663201,0.655369


[I 2025-03-22 03:55:12,706] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0038486371590285046, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3351,1.434807,0.780935,0.675163,0.66486,0.664982
2,0.8989,0.901642,0.857929,0.722956,0.729224,0.725062
3,0.3795,0.785793,0.869844,0.739033,0.739186,0.737294
4,0.1892,0.708195,0.888176,0.897277,0.828138,0.851738
5,0.1112,0.654189,0.890009,0.896946,0.828939,0.851997
6,0.0891,0.652768,0.889093,0.880429,0.829211,0.847661
7,0.0762,0.648691,0.892759,0.896936,0.83198,0.853655
8,0.0668,0.644317,0.890926,0.89565,0.830344,0.852104
9,0.0607,0.614301,0.892759,0.898062,0.831671,0.854068
10,0.0552,0.627844,0.895509,0.898167,0.833737,0.855183


[I 2025-03-22 03:56:29,199] Trial 97 finished with value: 0.8649826835568524 and parameters: {'learning_rate': 0.0038486371590285046, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 2.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 98 with params: {'learning_rate': 0.003385837175358577, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3904,1.67594,0.742438,0.650534,0.63079,0.628454
2,0.9273,1.051629,0.820348,0.691522,0.700759,0.693131
3,0.4126,0.844134,0.857012,0.728173,0.72789,0.726521
4,0.218,0.800855,0.874427,0.842746,0.770695,0.786256
5,0.1284,0.722321,0.880843,0.868893,0.831599,0.8462
6,0.1002,0.710275,0.880843,0.891049,0.830836,0.852661
7,0.0823,0.710201,0.882676,0.889464,0.833446,0.852776
8,0.0712,0.642087,0.890926,0.885593,0.839408,0.857123
9,0.0623,0.641659,0.890926,0.897926,0.838787,0.860204
10,0.057,0.642205,0.892759,0.898737,0.840925,0.861705


[I 2025-03-22 03:57:55,039] Trial 98 finished with value: 0.8573683172741645 and parameters: {'learning_rate': 0.003385837175358577, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 3.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 99 with params: {'learning_rate': 0.0035086507650057357, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.202,1.351046,0.786434,0.671903,0.669217,0.667325
2,0.8489,0.908252,0.848763,0.720202,0.722654,0.717738
3,0.3623,0.883434,0.860678,0.869881,0.77894,0.800755
4,0.1987,0.846285,0.874427,0.889318,0.816431,0.840579
5,0.1309,0.700476,0.887259,0.897646,0.845035,0.865309
6,0.0865,0.677323,0.88451,0.892873,0.824655,0.847584
7,0.0742,0.693685,0.88176,0.893149,0.830596,0.853476
8,0.0653,0.676869,0.883593,0.891629,0.83291,0.853987
9,0.0593,0.66955,0.88451,0.894355,0.833265,0.855619
10,0.0546,0.680357,0.880843,0.887819,0.831908,0.851786


[I 2025-03-22 03:59:36,103] Trial 99 finished with value: 0.8582025470672402 and parameters: {'learning_rate': 0.0035086507650057357, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 100 with params: {'learning_rate': 0.0030018513458944154, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4983,1.437691,0.772686,0.671955,0.658493,0.659567
2,0.9679,0.968073,0.833181,0.70649,0.711304,0.704995
3,0.4354,0.867558,0.856095,0.730222,0.727087,0.725991
4,0.2305,0.767781,0.871677,0.875744,0.796343,0.818085
5,0.1257,0.660862,0.887259,0.894463,0.826409,0.849501
6,0.0995,0.700454,0.880843,0.881264,0.814038,0.833274
7,0.0782,0.673229,0.87901,0.887346,0.829829,0.850513
8,0.0668,0.661683,0.88451,0.893045,0.8331,0.854912
9,0.0593,0.658444,0.885426,0.892288,0.835122,0.855414
10,0.0573,0.664697,0.88451,0.891523,0.83375,0.85453


[I 2025-03-22 04:00:51,568] Trial 100 finished with value: 0.8530893914562485 and parameters: {'learning_rate': 0.0030018513458944154, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 101 with params: {'learning_rate': 0.0027242047755666717, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5068,1.518945,0.761687,0.677293,0.644409,0.652126
2,1.0315,1.092483,0.820348,0.694847,0.702033,0.692499
3,0.4977,0.884862,0.850596,0.724344,0.721987,0.721369
4,0.2748,0.857623,0.857929,0.845796,0.776578,0.795154
5,0.1577,0.715915,0.880843,0.876759,0.831714,0.848735
6,0.1019,0.714924,0.883593,0.878876,0.833272,0.850666
7,0.0797,0.699159,0.879927,0.874781,0.830077,0.847183
8,0.0719,0.679728,0.88451,0.892753,0.832761,0.854597
9,0.0637,0.678504,0.882676,0.89025,0.833376,0.853518
10,0.0591,0.685418,0.88176,0.891,0.840086,0.859641


[I 2025-03-22 04:02:26,629] Trial 101 finished with value: 0.8578522608420703 and parameters: {'learning_rate': 0.0027242047755666717, 'weight_decay': 0.002, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 102 with params: {'learning_rate': 0.003780562116917206, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2386,1.427868,0.762603,0.647937,0.654634,0.643253
2,0.8441,0.879838,0.854262,0.72015,0.726257,0.721594
3,0.3468,0.782051,0.866178,0.872271,0.792147,0.81414
4,0.1782,0.801368,0.87901,0.892878,0.829425,0.851968
5,0.1145,0.686607,0.885426,0.894132,0.834473,0.856094
6,0.0858,0.685475,0.88176,0.890844,0.821971,0.84538
7,0.0703,0.685218,0.882676,0.892997,0.832027,0.854079
8,0.062,0.665684,0.883593,0.894475,0.84277,0.862752
9,0.0564,0.671145,0.883593,0.893677,0.842492,0.862167
10,0.052,0.654514,0.885426,0.893659,0.835029,0.856243


[I 2025-03-22 04:03:57,175] Trial 102 finished with value: 0.8547983568781189 and parameters: {'learning_rate': 0.003780562116917206, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 103 with params: {'learning_rate': 0.00027009583847554473, 'weight_decay': 0.005, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.7002,3.238661,0.471127,0.401712,0.358151,0.311981
2,2.8293,2.351894,0.615949,0.561381,0.510849,0.519887
3,2.0369,1.831224,0.72319,0.61496,0.615255,0.611
4,1.5462,1.616216,0.747021,0.647406,0.635204,0.635762
5,1.3304,1.452445,0.762603,0.645454,0.653655,0.647209


[I 2025-03-22 04:04:51,366] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.004675550136021236, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.9, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3022,1.472067,0.76352,0.660612,0.654497,0.644212
2,0.8397,0.853537,0.860678,0.721029,0.73482,0.726715
3,0.3279,0.783705,0.872594,0.864248,0.796041,0.816939
4,0.1676,0.707773,0.88176,0.890942,0.822439,0.844996
5,0.102,0.691688,0.890009,0.898645,0.829969,0.852922
6,0.087,0.674621,0.880843,0.889504,0.823245,0.845519
7,0.0705,0.668536,0.889093,0.895101,0.828824,0.850982
8,0.0656,0.663779,0.888176,0.892763,0.828577,0.849867
9,0.0582,0.673937,0.887259,0.892256,0.828202,0.849311
10,0.055,0.660651,0.886343,0.890969,0.827456,0.848398


[I 2025-03-22 04:05:45,041] Trial 104 pruned. 


Trial 105 with params: {'learning_rate': 0.001394113520827695, 'weight_decay': 0.002, 'warmup_steps': 4, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8955,1.753922,0.713107,0.618151,0.607676,0.608845
2,1.3221,1.293549,0.782768,0.660808,0.671041,0.661939
3,0.7749,0.987323,0.837764,0.707673,0.714721,0.708724
4,0.4958,0.913979,0.854262,0.720015,0.728145,0.722235
5,0.3293,0.809574,0.861595,0.835018,0.750868,0.760729


[I 2025-03-22 04:06:35,922] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0038055075251044504, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3423,1.44323,0.765353,0.665059,0.651997,0.652965
2,0.9049,0.88128,0.854262,0.716942,0.728339,0.721323
3,0.3603,0.81635,0.859762,0.860162,0.768293,0.787553
4,0.1825,0.767463,0.871677,0.872099,0.815742,0.835374
5,0.117,0.6858,0.888176,0.897045,0.836475,0.858651
6,0.0856,0.663093,0.892759,0.902669,0.850124,0.870212
7,0.0705,0.646713,0.888176,0.896105,0.836965,0.858408
8,0.0634,0.644886,0.886343,0.893976,0.836276,0.856927
9,0.0548,0.644076,0.891842,0.898872,0.840642,0.86138
10,0.0512,0.63638,0.886343,0.894927,0.835774,0.857294


[I 2025-03-22 04:07:54,062] Trial 106 finished with value: 0.8588025882906501 and parameters: {'learning_rate': 0.0038055075251044504, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 107 with params: {'learning_rate': 0.002227558841449379, 'weight_decay': 0.008, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5883,1.593433,0.729606,0.624328,0.62813,0.616506
2,1.147,1.146495,0.805683,0.676829,0.690352,0.681094
3,0.5823,0.851017,0.862511,0.725316,0.733767,0.729153
4,0.322,0.797459,0.867094,0.895137,0.756607,0.766635
5,0.1958,0.70078,0.888176,0.870093,0.809652,0.827294


[I 2025-03-22 04:08:17,846] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0006599575952222034, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3243,2.494644,0.578368,0.579097,0.469672,0.476535
2,1.9442,1.550274,0.747021,0.636066,0.639554,0.635562
3,1.1764,1.238864,0.802016,0.681779,0.683309,0.680421
4,0.8706,1.119614,0.824015,0.702749,0.702181,0.700028
5,0.6877,1.0247,0.83593,0.698027,0.714713,0.704923


[I 2025-03-22 04:08:42,939] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0014523379566980315, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8154,1.743729,0.721357,0.622921,0.614748,0.613728
2,1.2998,1.251127,0.796517,0.674469,0.680767,0.675984
3,0.7563,1.00089,0.840513,0.710027,0.716752,0.711955
4,0.4811,0.862356,0.857929,0.722886,0.73029,0.72516
5,0.3045,0.792466,0.867094,0.867784,0.782528,0.803415
6,0.1982,0.782186,0.868928,0.853365,0.810263,0.82634
7,0.131,0.753066,0.875344,0.883531,0.825924,0.846047
8,0.097,0.719226,0.87626,0.885353,0.835388,0.854601
9,0.0765,0.735151,0.875344,0.884156,0.835981,0.853953
10,0.0692,0.741039,0.874427,0.883976,0.834692,0.853379


[I 2025-03-22 04:10:03,717] Trial 109 finished with value: 0.8511008518148137 and parameters: {'learning_rate': 0.0014523379566980315, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 5.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 110 with params: {'learning_rate': 0.001860723580698996, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7151,1.629462,0.735105,0.630931,0.627725,0.626209
2,1.1867,1.220238,0.802016,0.673124,0.687903,0.678264
3,0.6383,0.931743,0.846929,0.715704,0.721919,0.71715
4,0.3808,0.841633,0.860678,0.724169,0.73315,0.727287
5,0.2215,0.773147,0.868011,0.8699,0.793429,0.813953
6,0.1378,0.716822,0.882676,0.891309,0.839693,0.859547
7,0.0925,0.709101,0.87901,0.88614,0.828199,0.848885
8,0.0774,0.702337,0.878093,0.864057,0.827215,0.842325
9,0.0675,0.706912,0.879927,0.887326,0.828451,0.849741
10,0.063,0.703938,0.879927,0.886288,0.829554,0.849873


[I 2025-03-22 04:10:50,127] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.004011945706138424, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2676,1.408897,0.758937,0.646906,0.651354,0.64183
2,0.8597,0.920009,0.845096,0.713063,0.718415,0.714615
3,0.3537,0.761611,0.867094,0.850957,0.782716,0.801431
4,0.1687,0.74355,0.887259,0.899692,0.843488,0.865415
5,0.1061,0.665755,0.883593,0.892508,0.832141,0.854196
6,0.0848,0.713761,0.878093,0.886666,0.829286,0.849772
7,0.0739,0.673303,0.885426,0.892925,0.834501,0.855568
8,0.0642,0.662485,0.88451,0.891521,0.824655,0.847284
9,0.0586,0.651249,0.886343,0.894327,0.835092,0.856472
10,0.0553,0.66057,0.887259,0.89394,0.836664,0.85719


[I 2025-03-22 04:13:16,680] Trial 111 finished with value: 0.8571777561267729 and parameters: {'learning_rate': 0.004011945706138424, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 5.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 112 with params: {'learning_rate': 0.004220953417064143, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2653,1.471991,0.776352,0.669996,0.657756,0.658008
2,0.832,0.838948,0.862511,0.723068,0.733632,0.727781
3,0.3341,0.751017,0.877177,0.868635,0.809525,0.829102
4,0.1716,0.74216,0.879927,0.886848,0.812575,0.835351
5,0.1043,0.665841,0.888176,0.891777,0.819306,0.841379
6,0.0814,0.646047,0.890009,0.88661,0.848169,0.863846
7,0.0671,0.665854,0.888176,0.896857,0.837339,0.858842
8,0.0597,0.657534,0.887259,0.894823,0.836306,0.857474
9,0.0546,0.658981,0.886343,0.894712,0.835819,0.857016
10,0.0527,0.657333,0.886343,0.89279,0.836256,0.856459


[I 2025-03-22 04:14:36,470] Trial 112 finished with value: 0.8572675533133699 and parameters: {'learning_rate': 0.004220953417064143, 'weight_decay': 0.01, 'warmup_steps': 1, 'lambda_param': 0.2, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 113 with params: {'learning_rate': 0.004606680527144465, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2724,1.384265,0.792851,0.677199,0.671556,0.670659
2,0.8016,0.839047,0.852429,0.718806,0.726717,0.720966
3,0.319,0.737161,0.878093,0.871972,0.809382,0.830672
4,0.1609,0.746187,0.87901,0.887809,0.82132,0.842833
5,0.1085,0.654615,0.892759,0.891174,0.849614,0.866293
6,0.0848,0.631016,0.894592,0.891703,0.840426,0.860455
7,0.0709,0.633752,0.889093,0.883853,0.837821,0.855456
8,0.0635,0.628614,0.890926,0.90111,0.838133,0.861179
9,0.0584,0.633037,0.890009,0.882954,0.839146,0.8557
10,0.0534,0.630502,0.889093,0.882667,0.837548,0.854853


[I 2025-03-22 04:15:23,741] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.004192786214067821, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.248,1.408548,0.780018,0.666347,0.664913,0.660323
2,0.8276,1.014633,0.832264,0.706193,0.71335,0.701618
3,0.3541,0.778782,0.870761,0.877165,0.785706,0.808442
4,0.1693,0.71327,0.880843,0.889186,0.832034,0.852115
5,0.1058,0.662365,0.889093,0.897684,0.828549,0.852124
6,0.0816,0.668457,0.886343,0.891399,0.827939,0.848889
7,0.0701,0.705059,0.878093,0.886135,0.819319,0.84176
8,0.0619,0.678736,0.877177,0.884934,0.819602,0.841481
9,0.0579,0.683118,0.883593,0.890873,0.824158,0.84672
10,0.055,0.674853,0.883593,0.890732,0.824845,0.8469


[I 2025-03-22 04:16:17,052] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0006286622144722097, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3383,2.55431,0.553621,0.577579,0.443554,0.443243
2,2.0088,1.571445,0.759853,0.6428,0.650876,0.646117
3,1.2228,1.24715,0.79835,0.674532,0.681207,0.676223
4,0.8997,1.147833,0.817599,0.698809,0.69639,0.694794
5,0.7191,1.042451,0.830431,0.693122,0.709692,0.699981


[I 2025-03-22 04:16:41,109] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.00011264504731179041, 'weight_decay': 0.007, 'warmup_steps': 3, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.8576,3.685572,0.336389,0.205224,0.246286,0.175899
2,3.5653,3.278272,0.39505,0.182939,0.299315,0.202294
3,3.0682,2.827106,0.498625,0.555181,0.388515,0.354719
4,2.6466,2.476153,0.628781,0.530194,0.531882,0.52464
5,2.3542,2.272037,0.633364,0.536325,0.541653,0.530122
6,2.1267,2.084495,0.676444,0.575455,0.572886,0.571443
7,1.9515,1.973286,0.694775,0.595336,0.58631,0.588387
8,1.8035,1.882043,0.707608,0.595203,0.605276,0.599933
9,1.6965,1.809155,0.71769,0.609834,0.611096,0.606108
10,1.5872,1.773156,0.716774,0.596212,0.615323,0.603376


[I 2025-03-22 04:17:29,068] Trial 116 pruned. 


Trial 117 with params: {'learning_rate': 0.002610642662769728, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4723,1.553053,0.759853,0.678096,0.641687,0.650736
2,1.0019,1.004454,0.83868,0.707024,0.713573,0.708923
3,0.483,0.914992,0.853346,0.726415,0.725905,0.723983
4,0.2693,0.81092,0.865261,0.867691,0.770832,0.791996
5,0.1531,0.701115,0.883593,0.874973,0.824839,0.842718
6,0.1084,0.727671,0.880843,0.876087,0.830164,0.847574
7,0.0807,0.69753,0.88451,0.89122,0.832906,0.853932
8,0.0705,0.665828,0.887259,0.893486,0.835685,0.856441
9,0.0668,0.662683,0.88451,0.893102,0.833274,0.855052
10,0.0596,0.660774,0.886343,0.892929,0.834465,0.855583


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-22 04:19:16,340] Trial 117 finished with value: 0.8588640235030812 and parameters: {'learning_rate': 0.002610642662769728, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 118 with params: {'learning_rate': 0.003892040313292496, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3321,1.48977,0.778185,0.675912,0.660228,0.661577
2,0.9019,0.883127,0.863428,0.725289,0.734606,0.728851
3,0.3728,0.81303,0.864345,0.736161,0.73263,0.732092
4,0.1899,0.7449,0.880843,0.888741,0.812462,0.836388
5,0.11,0.667836,0.885426,0.890231,0.816733,0.839496
6,0.0923,0.670741,0.885426,0.892035,0.825814,0.848041
7,0.0747,0.661793,0.88451,0.877251,0.823541,0.843037
8,0.0649,0.651493,0.889093,0.893444,0.828539,0.850216
9,0.0566,0.653313,0.883593,0.889905,0.825224,0.846714
10,0.0535,0.649407,0.891842,0.896093,0.831597,0.852931


[I 2025-03-22 04:20:07,159] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.004612961144966952, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3234,1.319388,0.783685,0.673569,0.669917,0.667159
2,0.815,0.880964,0.854262,0.71572,0.730263,0.721952
3,0.3233,0.741222,0.878093,0.888018,0.819802,0.842623
4,0.1554,0.697026,0.882676,0.893423,0.841068,0.861384
5,0.1061,0.715529,0.887259,0.898259,0.846943,0.866148
6,0.0962,0.667635,0.887259,0.899046,0.843664,0.865332
7,0.0717,0.655481,0.887259,0.897485,0.845168,0.865393
8,0.064,0.670527,0.88451,0.892887,0.833778,0.855261
9,0.0584,0.648433,0.887259,0.894509,0.836511,0.857303
10,0.0547,0.639932,0.88451,0.894054,0.843335,0.862908


[I 2025-03-22 04:21:42,930] Trial 119 finished with value: 0.8545623035357588 and parameters: {'learning_rate': 0.004612961144966952, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 2.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 120 with params: {'learning_rate': 0.001755475103802743, 'weight_decay': 0.006, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7045,1.688887,0.71769,0.617425,0.617273,0.607744
2,1.2087,1.239907,0.791934,0.665889,0.679273,0.669945
3,0.6609,0.949811,0.846929,0.715909,0.721976,0.717395
4,0.4066,0.811064,0.865261,0.728316,0.736088,0.731088
5,0.2376,0.745703,0.870761,0.87236,0.795619,0.816133


[I 2025-03-22 04:22:12,015] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.004508805212328449, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2685,1.391654,0.796517,0.683778,0.673934,0.674491
2,0.8091,0.83158,0.859762,0.722095,0.732533,0.726102
3,0.3174,0.751699,0.877177,0.884239,0.799397,0.823668
4,0.1617,0.692782,0.890926,0.897493,0.83981,0.860369
5,0.104,0.670534,0.885426,0.897187,0.843714,0.863962
6,0.0867,0.645657,0.890926,0.88842,0.847547,0.864158
7,0.0706,0.661761,0.882676,0.889185,0.833731,0.853188
8,0.0642,0.648249,0.882676,0.892661,0.83312,0.854762
9,0.0566,0.66152,0.88451,0.893145,0.833935,0.855329
10,0.0537,0.663374,0.887259,0.897087,0.845055,0.865174


[I 2025-03-22 04:23:56,682] Trial 121 finished with value: 0.8590004164001864 and parameters: {'learning_rate': 0.004508805212328449, 'weight_decay': 0.008, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 122 with params: {'learning_rate': 0.002673465187980586, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4388,1.505644,0.75802,0.664607,0.64715,0.643845
2,0.9619,1.027483,0.826764,0.692275,0.707594,0.696464
3,0.4607,0.880188,0.857012,0.728448,0.72855,0.726486
4,0.2454,0.908302,0.857929,0.861796,0.787663,0.805683
5,0.1477,0.681901,0.889093,0.896875,0.836432,0.858436
6,0.1132,0.714372,0.875344,0.865695,0.834761,0.84781
7,0.0832,0.682683,0.886343,0.879914,0.834651,0.851987
8,0.0701,0.692419,0.882676,0.892748,0.830459,0.853334
9,0.0614,0.679899,0.88451,0.892192,0.8326,0.854274
10,0.0596,0.673033,0.886343,0.893363,0.834791,0.856028


[I 2025-03-22 04:25:42,925] Trial 122 finished with value: 0.8538645455106383 and parameters: {'learning_rate': 0.002673465187980586, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 4.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 123 with params: {'learning_rate': 0.0006953193202165888, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2596,2.408554,0.612282,0.581281,0.507631,0.51304
2,1.8687,1.518361,0.758937,0.650486,0.645475,0.646162
3,1.1448,1.205531,0.8011,0.679471,0.683662,0.67944
4,0.8436,1.115441,0.824931,0.703415,0.702646,0.700408
5,0.6632,1.002129,0.83868,0.702486,0.715105,0.707965
6,0.5011,0.932834,0.847846,0.707901,0.723694,0.715134
7,0.3864,0.875773,0.849679,0.712572,0.723441,0.717877
8,0.2996,0.874266,0.856095,0.714704,0.730676,0.721947
9,0.2529,0.881281,0.855179,0.883842,0.737988,0.739319
10,0.2016,0.862368,0.846929,0.853292,0.776912,0.796537


[I 2025-03-22 04:26:40,845] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.004745855300756521, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.398,1.43585,0.776352,0.671437,0.66127,0.659399
2,0.8412,0.934316,0.851512,0.715224,0.726292,0.718695
3,0.3401,0.787551,0.868011,0.866918,0.812878,0.83212
4,0.1798,0.713006,0.888176,0.901181,0.84514,0.867024
5,0.1039,0.679908,0.894592,0.905983,0.851882,0.872779
6,0.0895,0.687007,0.894592,0.903568,0.852375,0.871389
7,0.0738,0.725837,0.883593,0.896655,0.842212,0.863026
8,0.0617,0.67565,0.88451,0.894547,0.843608,0.863275
9,0.0588,0.67817,0.88451,0.894475,0.834208,0.85609
10,0.0547,0.664825,0.886343,0.895694,0.836106,0.857826


[I 2025-03-22 04:28:22,065] Trial 124 finished with value: 0.8569617083562283 and parameters: {'learning_rate': 0.004745855300756521, 'weight_decay': 0.006, 'warmup_steps': 0, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 125 with params: {'learning_rate': 0.002681208099834632, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4358,1.494781,0.76077,0.664386,0.64923,0.646214
2,0.958,1.014912,0.829514,0.693366,0.71029,0.69868
3,0.4595,0.863279,0.855179,0.726334,0.727085,0.72497
4,0.2432,0.879558,0.859762,0.847006,0.788168,0.804465
5,0.1468,0.689811,0.88451,0.879057,0.832768,0.850689
6,0.1065,0.692226,0.87626,0.862261,0.826278,0.841003
7,0.0818,0.67618,0.885426,0.878851,0.83458,0.85138
8,0.0715,0.672104,0.886343,0.894279,0.824616,0.848442
9,0.0625,0.650691,0.889093,0.895775,0.836323,0.857928
10,0.0589,0.660361,0.890009,0.896831,0.83704,0.858856


[I 2025-03-22 04:29:42,266] Trial 125 finished with value: 0.8548484887423747 and parameters: {'learning_rate': 0.002681208099834632, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 126 with params: {'learning_rate': 0.0009049791490282845, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.1442,2.18505,0.647113,0.57465,0.546095,0.544562
2,1.6413,1.368021,0.793767,0.680998,0.675518,0.674911
3,0.9822,1.151108,0.818515,0.699315,0.696204,0.693913
4,0.6995,1.029079,0.828598,0.704336,0.706883,0.703069
5,0.5274,0.911525,0.852429,0.712872,0.727374,0.719929


[I 2025-03-22 04:30:08,758] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.004684696183889671, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3538,1.440556,0.766269,0.657065,0.656084,0.646032
2,0.8342,0.970398,0.846013,0.720905,0.720499,0.715699
3,0.3536,0.821003,0.866178,0.874951,0.791714,0.81523
4,0.1753,0.70466,0.880843,0.879365,0.842046,0.856972
5,0.1105,0.676166,0.889093,0.899842,0.838261,0.860366
6,0.0871,0.677661,0.88451,0.895872,0.833734,0.856426
7,0.0694,0.683438,0.879927,0.89056,0.831129,0.85276
8,0.0617,0.664758,0.889093,0.898368,0.838782,0.860371
9,0.0578,0.667907,0.883593,0.894501,0.833087,0.855447
10,0.0545,0.662839,0.883593,0.893602,0.83418,0.855776


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-22 04:33:21,926] Trial 127 finished with value: 0.8566519262399518 and parameters: {'learning_rate': 0.004684696183889671, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 128 with params: {'learning_rate': 0.001324131699361648, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8284,1.773404,0.708524,0.599181,0.608734,0.602567
2,1.3276,1.23527,0.797434,0.672894,0.681702,0.675228
3,0.7818,1.033204,0.835014,0.706738,0.710846,0.706061
4,0.5173,0.912888,0.856095,0.719938,0.729291,0.723104
5,0.3451,0.811237,0.860678,0.721306,0.731483,0.726284
6,0.2253,0.776869,0.868011,0.873217,0.800879,0.823065
7,0.1478,0.727026,0.883593,0.889681,0.840779,0.859059
8,0.1065,0.72174,0.875344,0.884426,0.833495,0.85317
9,0.087,0.752091,0.869844,0.879638,0.830474,0.849185
10,0.0757,0.739905,0.873511,0.882221,0.83371,0.85208


[I 2025-03-22 04:35:07,540] Trial 128 finished with value: 0.8515350203976594 and parameters: {'learning_rate': 0.001324131699361648, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 129 with params: {'learning_rate': 0.0018805138689438538, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6972,1.731399,0.696609,0.615542,0.600959,0.591195
2,1.2146,1.236261,0.790101,0.661933,0.678768,0.667816
3,0.6445,0.904893,0.853346,0.719693,0.727114,0.722457
4,0.3818,0.841263,0.864345,0.726328,0.736332,0.729849
5,0.2314,0.73686,0.87626,0.877351,0.800136,0.820972


[I 2025-03-22 04:35:32,832] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0019050693692637798, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6514,1.69376,0.710357,0.613061,0.612602,0.599078
2,1.2055,1.290366,0.783685,0.659911,0.673513,0.662483
3,0.6442,0.929108,0.84418,0.713637,0.719373,0.714882
4,0.3803,0.81785,0.863428,0.729537,0.733441,0.72983
5,0.2327,0.745763,0.880843,0.881563,0.803968,0.824679


[I 2025-03-22 04:36:01,520] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.004380196135441463, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3043,1.303885,0.799267,0.678742,0.681505,0.679036
2,0.8437,0.861271,0.852429,0.718807,0.724907,0.720962
3,0.3347,0.767186,0.872594,0.881411,0.815047,0.836877
4,0.1687,0.757711,0.874427,0.890669,0.825033,0.848467
5,0.1146,0.675138,0.891842,0.901639,0.839881,0.862038
6,0.0846,0.656092,0.886343,0.897159,0.834888,0.857744
7,0.0694,0.645953,0.887259,0.896257,0.836595,0.858317
8,0.0656,0.645269,0.887259,0.896573,0.84592,0.865366
9,0.0594,0.666779,0.882676,0.892362,0.841724,0.86117
10,0.0543,0.654023,0.882676,0.892419,0.832587,0.854221


[I 2025-03-22 04:37:18,570] Trial 131 finished with value: 0.8578331036590466 and parameters: {'learning_rate': 0.004380196135441463, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 132 with params: {'learning_rate': 0.00447715918634477, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3154,1.278355,0.807516,0.689156,0.687173,0.686887
2,0.8251,0.826338,0.870761,0.731464,0.740301,0.735522
3,0.3149,0.761882,0.88451,0.877689,0.824319,0.843344
4,0.1655,0.725731,0.885426,0.884281,0.844347,0.860493
5,0.1078,0.680978,0.887259,0.89745,0.836967,0.858826
6,0.0838,0.689606,0.887259,0.886652,0.845867,0.861806
7,0.0716,0.690504,0.885426,0.89374,0.845578,0.863735
8,0.063,0.663491,0.886343,0.895538,0.83572,0.857427
9,0.059,0.673462,0.883593,0.891205,0.834003,0.854507
10,0.0528,0.641186,0.887259,0.893481,0.837115,0.857179


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 04:40:20,893] Trial 132 finished with value: 0.8594157079698249 and parameters: {'learning_rate': 0.00447715918634477, 'weight_decay': 0.0, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 133 with params: {'learning_rate': 0.0033288708658587326, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2059,1.406266,0.770852,0.666396,0.655411,0.654644
2,0.8929,0.942665,0.84143,0.708576,0.718604,0.710461
3,0.3926,0.849018,0.854262,0.725034,0.728947,0.72485
4,0.2233,0.787682,0.875344,0.861594,0.816219,0.833641
5,0.1219,0.710557,0.88176,0.891769,0.830991,0.853116
6,0.0956,0.720819,0.874427,0.883651,0.82503,0.845907
7,0.0784,0.679681,0.88451,0.867963,0.833624,0.847264
8,0.0666,0.738296,0.880843,0.893669,0.830653,0.853146
9,0.0697,0.677194,0.889093,0.898602,0.836814,0.859434
10,0.0582,0.670203,0.889093,0.898654,0.836372,0.859119


[I 2025-03-22 04:41:51,582] Trial 133 finished with value: 0.8580511749156953 and parameters: {'learning_rate': 0.0033288708658587326, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 134 with params: {'learning_rate': 6.558978114640059e-05, 'weight_decay': 0.0, 'warmup_steps': 2, 'lambda_param': 0.1, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9168,3.779043,0.340972,0.195143,0.249741,0.178544
2,3.7215,3.622391,0.406966,0.230453,0.309132,0.208102
3,3.5332,3.329093,0.434464,0.219597,0.327733,0.257876
4,3.1934,3.033056,0.453712,0.391297,0.345359,0.292646
5,2.9599,2.828735,0.503208,0.551651,0.395001,0.368783


[I 2025-03-22 04:42:16,698] Trial 134 pruned. 


Trial 135 with params: {'learning_rate': 0.003691578768913055, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4418,1.517217,0.756187,0.652164,0.642329,0.641763
2,0.9258,1.009117,0.829514,0.704222,0.710009,0.701542
3,0.3927,0.816229,0.861595,0.89944,0.740551,0.748852
4,0.1968,0.726835,0.886343,0.892208,0.825532,0.848001
5,0.1099,0.717983,0.878093,0.889168,0.827895,0.850175
6,0.0877,0.702543,0.879927,0.87579,0.831125,0.848043
7,0.0704,0.698111,0.87901,0.889565,0.830032,0.851432
8,0.0623,0.687037,0.883593,0.894264,0.833206,0.855381
9,0.0584,0.682226,0.88451,0.891625,0.834799,0.85512
10,0.0534,0.662909,0.886343,0.893297,0.836001,0.85661


[I 2025-03-22 04:43:46,410] Trial 135 finished with value: 0.8589076476524141 and parameters: {'learning_rate': 0.003691578768913055, 'weight_decay': 0.003, 'warmup_steps': 4, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 136 with params: {'learning_rate': 0.0038760707740597504, 'weight_decay': 0.005, 'warmup_steps': 0, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2214,1.380726,0.790101,0.671489,0.67112,0.667851
2,0.842,0.884273,0.855179,0.72112,0.726484,0.723064
3,0.3574,0.776157,0.878093,0.8825,0.800517,0.823186
4,0.1872,0.795052,0.877177,0.889232,0.818774,0.842362
5,0.1154,0.688178,0.882676,0.890579,0.822496,0.845551
6,0.0812,0.708865,0.885426,0.882544,0.844419,0.859055
7,0.0816,0.66475,0.896425,0.900866,0.834308,0.856656
8,0.063,0.634996,0.892759,0.896728,0.83176,0.853421
9,0.0586,0.652381,0.882676,0.889036,0.823472,0.845472
10,0.054,0.646806,0.890009,0.894096,0.83007,0.851214


[I 2025-03-22 04:44:35,084] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.002472023290700323, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.595,1.588808,0.729606,0.635015,0.627075,0.62168
2,1.0845,1.179985,0.809349,0.674983,0.696294,0.681617
3,0.5504,0.93853,0.84418,0.718545,0.71887,0.716233
4,0.3201,0.811431,0.865261,0.727635,0.737255,0.731853
5,0.1672,0.711174,0.883593,0.876465,0.823517,0.84238
6,0.1096,0.698043,0.885426,0.893264,0.834045,0.855324
7,0.0784,0.682495,0.887259,0.882433,0.845969,0.860618
8,0.0676,0.670518,0.887259,0.89525,0.835678,0.857385
9,0.0621,0.670686,0.888176,0.896204,0.826043,0.850084
10,0.0581,0.67344,0.889093,0.89652,0.83761,0.85899


[I 2025-03-22 04:45:56,853] Trial 137 finished with value: 0.8568687545384169 and parameters: {'learning_rate': 0.002472023290700323, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 138 with params: {'learning_rate': 0.0014724441286017684, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8202,1.739846,0.713107,0.613571,0.610637,0.606208
2,1.2974,1.266054,0.789184,0.670938,0.674142,0.670573
3,0.7622,1.018927,0.83593,0.710593,0.712535,0.708586
4,0.487,0.86931,0.857929,0.723524,0.729816,0.725309
5,0.3148,0.774128,0.870761,0.864693,0.776489,0.794807
6,0.203,0.782079,0.875344,0.884304,0.825625,0.846265
7,0.129,0.733933,0.879927,0.88798,0.838631,0.857282
8,0.0967,0.715651,0.875344,0.885465,0.834784,0.854335
9,0.0791,0.732347,0.87626,0.885278,0.826041,0.847482
10,0.0698,0.731814,0.878093,0.887139,0.837598,0.856359


[I 2025-03-22 04:47:14,689] Trial 138 finished with value: 0.85563996175176 and parameters: {'learning_rate': 0.0014724441286017684, 'weight_decay': 0.002, 'warmup_steps': 2, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 139 with params: {'learning_rate': 0.004698861822706531, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3685,1.493273,0.753437,0.64321,0.647992,0.633848
2,0.8322,0.917689,0.855179,0.723447,0.727084,0.723324
3,0.332,0.71506,0.878093,0.888441,0.819553,0.842919
4,0.1604,0.708178,0.880843,0.882479,0.840584,0.857543
5,0.1098,0.650774,0.885426,0.896158,0.825503,0.849324
6,0.0871,0.689181,0.883593,0.893444,0.824176,0.847607
7,0.0739,0.631128,0.888176,0.898271,0.836785,0.858853
8,0.0667,0.631404,0.893676,0.903134,0.851004,0.871011
9,0.0577,0.643086,0.880843,0.891438,0.821452,0.845379
10,0.0555,0.639077,0.878093,0.888957,0.83887,0.857904


[I 2025-03-22 04:48:59,655] Trial 139 finished with value: 0.8561950864049656 and parameters: {'learning_rate': 0.004698861822706531, 'weight_decay': 0.007, 'warmup_steps': 0, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}. Best is trial 86 with value: 0.8693899540096316.


Trial 140 with params: {'learning_rate': 0.0037310740151815474, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3298,1.471301,0.766269,0.664903,0.648242,0.650614
2,0.8776,0.928734,0.84418,0.709043,0.721007,0.712784
3,0.3694,0.831815,0.863428,0.735808,0.733109,0.732401
4,0.1959,0.777816,0.871677,0.867338,0.803958,0.82539
5,0.1103,0.710307,0.880843,0.876827,0.821616,0.841589
6,0.0896,0.687639,0.87901,0.88858,0.819411,0.843042
7,0.0755,0.671842,0.885426,0.894149,0.833778,0.85562
8,0.0655,0.703381,0.879927,0.888863,0.822661,0.844287
9,0.0604,0.640964,0.889093,0.884632,0.837451,0.855626
10,0.0561,0.655952,0.888176,0.881477,0.837588,0.854338


[I 2025-03-22 04:50:20,624] Trial 140 finished with value: 0.8552312107953898 and parameters: {'learning_rate': 0.0037310740151815474, 'weight_decay': 0.003, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 2.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 141 with params: {'learning_rate': 0.0013914522480691573, 'weight_decay': 0.001, 'warmup_steps': 1, 'lambda_param': 0.5, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8408,1.781652,0.702108,0.595152,0.603915,0.593216
2,1.3301,1.247255,0.790101,0.669027,0.674953,0.670464
3,0.7693,1.000467,0.840513,0.709953,0.715977,0.711201
4,0.49,0.861008,0.862511,0.727817,0.733535,0.7291
5,0.3172,0.800954,0.866178,0.859705,0.772803,0.790499


[I 2025-03-22 04:50:46,945] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.0007759266749790148, 'weight_decay': 0.004, 'warmup_steps': 2, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.2584,2.334389,0.635197,0.584665,0.532341,0.534774
2,1.7843,1.452141,0.769019,0.657582,0.654876,0.654105
3,1.0845,1.180298,0.810266,0.690287,0.689865,0.687735
4,0.7928,1.061057,0.827681,0.703787,0.705322,0.702199
5,0.604,0.959079,0.843263,0.705595,0.719958,0.711774
6,0.443,0.899199,0.846929,0.707443,0.72252,0.714376
7,0.3427,0.851542,0.856095,0.718273,0.728741,0.723017
8,0.2516,0.869084,0.861595,0.85464,0.770403,0.785859
9,0.1995,0.824185,0.852429,0.857267,0.780861,0.801295
10,0.1534,0.840109,0.861595,0.871032,0.816344,0.835017


[I 2025-03-22 04:51:37,478] Trial 142 pruned. 


Trial 143 with params: {'learning_rate': 0.0005792165868510817, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.3716,2.651195,0.529789,0.578621,0.417905,0.407938
2,2.1062,1.64701,0.741522,0.634438,0.636381,0.633668
3,1.2941,1.318506,0.780935,0.660284,0.668975,0.660425
4,0.9617,1.180864,0.812099,0.69534,0.692705,0.691327
5,0.7753,1.095403,0.824931,0.687136,0.706589,0.694956


[I 2025-03-22 04:52:05,547] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 6.847251037088202e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.9123,3.769336,0.340972,0.192072,0.249755,0.178209
2,3.7133,3.608927,0.411549,0.234926,0.312492,0.212387
3,3.5038,3.284846,0.428964,0.216846,0.323899,0.25138
4,3.1553,2.997459,0.455545,0.39194,0.347235,0.295796
5,2.9237,2.787759,0.51879,0.545267,0.411018,0.395029
6,2.6949,2.622894,0.601283,0.541632,0.500395,0.496933
7,2.5539,2.496564,0.601283,0.535795,0.497787,0.498172
8,2.4413,2.402965,0.631531,0.534787,0.53408,0.529099
9,2.3525,2.334709,0.635197,0.539937,0.536251,0.531514
10,2.2629,2.288223,0.632447,0.533301,0.536281,0.526195


[I 2025-03-22 04:53:05,181] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0037111908622285863, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3361,1.444196,0.768103,0.661556,0.650516,0.651287
2,0.8702,0.900725,0.856095,0.719925,0.728544,0.722987
3,0.3685,0.817383,0.868928,0.738112,0.737916,0.736179
4,0.2025,0.782426,0.87626,0.886197,0.807521,0.832363
5,0.1168,0.673641,0.888176,0.882495,0.837089,0.854465
6,0.0955,0.702511,0.885426,0.897297,0.825716,0.849943
7,0.0849,0.717664,0.879927,0.860771,0.839853,0.848857
8,0.0678,0.666005,0.890926,0.885302,0.839697,0.857281
9,0.0607,0.672288,0.891842,0.897889,0.831322,0.853771
10,0.0568,0.680252,0.886343,0.893843,0.836198,0.856956


[I 2025-03-22 04:55:23,112] Trial 145 finished with value: 0.8600073927009732 and parameters: {'learning_rate': 0.0037111908622285863, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.8, 'temperature': 2.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 146 with params: {'learning_rate': 0.0009573762623257292, 'weight_decay': 0.0, 'warmup_steps': 1, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0871,2.079238,0.662695,0.565354,0.568348,0.555694
2,1.5614,1.370954,0.778185,0.657721,0.666884,0.659603
3,0.9504,1.08891,0.821265,0.696549,0.698905,0.695746
4,0.6623,1.001513,0.834097,0.711292,0.710525,0.708522
5,0.4859,0.879149,0.851512,0.712713,0.724979,0.718741
6,0.3359,0.85469,0.849679,0.711156,0.724701,0.716549
7,0.2533,0.826112,0.862511,0.847547,0.76201,0.775086
8,0.1811,0.861952,0.864345,0.876864,0.807155,0.83027
9,0.1417,0.82665,0.860678,0.869328,0.80523,0.826016
10,0.1102,0.812932,0.862511,0.863703,0.824262,0.839866


[I 2025-03-22 04:56:15,118] Trial 146 pruned. 


Trial 147 with params: {'learning_rate': 0.0008992976510587238, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,3.0782,2.127085,0.658112,0.566079,0.561455,0.554693
2,1.608,1.385975,0.777269,0.655609,0.665313,0.658312
3,0.9809,1.123462,0.820348,0.694765,0.698279,0.694192
4,0.6936,1.030175,0.833181,0.710772,0.708708,0.707245
5,0.5194,0.915836,0.846013,0.710388,0.720666,0.715246


[I 2025-03-22 04:56:39,479] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.004727837910315894, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3945,1.438869,0.772686,0.669119,0.659525,0.656114
2,0.8418,0.924666,0.856095,0.719158,0.729966,0.722761
3,0.3377,0.800978,0.866178,0.880695,0.811218,0.834303
4,0.1837,0.719594,0.879927,0.893377,0.83726,0.859177
5,0.1031,0.678871,0.888176,0.898002,0.837383,0.859218
6,0.0815,0.700013,0.885426,0.895936,0.844959,0.86418
7,0.0687,0.670557,0.887259,0.898245,0.845592,0.866094
8,0.0626,0.659686,0.890009,0.900127,0.848823,0.868686
9,0.0571,0.661288,0.891842,0.900959,0.849592,0.869424
10,0.0546,0.66507,0.886343,0.895829,0.845337,0.864736


[I 2025-03-22 04:57:53,938] Trial 148 finished with value: 0.8647806323314217 and parameters: {'learning_rate': 0.004727837910315894, 'weight_decay': 0.002, 'warmup_steps': 0, 'lambda_param': 0.8, 'temperature': 3.5}. Best is trial 86 with value: 0.8693899540096316.


Trial 149 with params: {'learning_rate': 0.0039626784431113825, 'weight_decay': 0.001, 'warmup_steps': 0, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2459,1.391083,0.766269,0.653136,0.657317,0.647293
2,0.8641,0.928021,0.849679,0.718921,0.721688,0.718588
3,0.3657,0.758522,0.870761,0.85191,0.785367,0.803589
4,0.1881,0.79602,0.877177,0.89031,0.826615,0.850078
5,0.1132,0.690984,0.87901,0.886758,0.820769,0.842896
6,0.0872,0.677488,0.88176,0.888292,0.823197,0.844945
7,0.0729,0.690107,0.869844,0.880963,0.813163,0.836099
8,0.0628,0.675188,0.879927,0.887606,0.821116,0.84348
9,0.0575,0.663805,0.882676,0.888987,0.823824,0.845478
10,0.0532,0.672285,0.880843,0.885765,0.823124,0.843545


[I 2025-03-22 04:58:47,951] Trial 149 pruned. 


In [32]:
print(best_trial_distill)

BestRun(run_id='86', objective=0.8693899540096316, hyperparameters={'learning_rate': 0.0047995795305536375, 'weight_decay': 0.004, 'warmup_steps': 1, 'lambda_param': 0.6000000000000001, 'temperature': 4.0}, run_summary=None)


Přepočet kroků s ohledem na změnu velikosti datasetu.

In [33]:
data_length = len(all_train_data)
min_r = math.ceil(data_length/batch_size)*5
max_r = math.ceil(data_length/batch_size)*num_epochs
warm_up = math.ceil(data_length/batch_size/10)

In [34]:
base.reset_seed()

## Prohledávání s normálním tréninkem nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [35]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-base-embedd-aug_coarse_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-base-embedd-aug_coarse_hp-search", epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí.

In [36]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [37]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace trenéra pro jednotlivé tréninky. 

In [38]:
trainer = Trainer(
    args=training_args,
    train_dataset=all_train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [39]:
best_trial_normal_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Base-aug-embedd",
    n_trials=150
)

[I 2025-03-22 04:58:48,246] A new study created in memory with name: Base-aug-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7589,0.558416,0.813932,0.68408,0.694827,0.688067
2,0.2328,0.500471,0.849679,0.860809,0.797333,0.818076
3,0.1155,0.568789,0.856095,0.844311,0.813137,0.823016
4,0.0665,0.671564,0.849679,0.851058,0.815779,0.82972
5,0.043,0.674286,0.857012,0.854608,0.812949,0.828012


[I 2025-03-22 04:59:40,025] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.0007875660249889869, 'weight_decay': 0.001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4892,0.548112,0.849679,0.840118,0.78026,0.79583
2,0.0899,0.569692,0.868928,0.836454,0.830679,0.832544
3,0.0413,0.773998,0.852429,0.860232,0.818974,0.83163
4,0.0248,0.75125,0.857929,0.849093,0.822483,0.831504
5,0.0154,0.835349,0.855179,0.844508,0.821481,0.829535
6,0.0103,0.813776,0.858845,0.830859,0.823674,0.825869
7,0.0068,0.958609,0.856095,0.853907,0.813154,0.825818
8,0.0046,0.934039,0.871677,0.868389,0.835088,0.847289
9,0.0028,1.017574,0.864345,0.843514,0.828843,0.83382
10,0.0031,1.062351,0.857929,0.84039,0.822615,0.829002


[I 2025-03-22 05:01:34,079] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 6.533369619026643e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2547,0.863567,0.700275,0.603724,0.590887,0.594294
2,0.6512,0.668661,0.768103,0.641219,0.658464,0.648967
3,0.4702,0.625735,0.788268,0.660871,0.675405,0.666202
4,0.3718,0.588151,0.802933,0.678142,0.686361,0.680673
5,0.3053,0.577604,0.813932,0.681735,0.696282,0.688636


[I 2025-03-22 05:02:30,133] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.0013035123791853842, 'weight_decay': 0.0, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4468,0.5319,0.857012,0.836923,0.806412,0.815521
2,0.0589,0.618242,0.856095,0.836704,0.833174,0.832528
3,0.0255,0.74307,0.852429,0.853615,0.818978,0.830583
4,0.0143,0.890399,0.861595,0.874614,0.82543,0.842469
5,0.0109,0.845443,0.862511,0.861931,0.827923,0.840343
6,0.0061,0.874016,0.870761,0.872094,0.831874,0.848164
7,0.0049,1.056937,0.866178,0.875828,0.820663,0.839146
8,0.0034,1.073726,0.869844,0.859447,0.832001,0.842505
9,0.0015,1.136776,0.866178,0.855562,0.82874,0.83906
10,0.0018,1.042077,0.868011,0.846717,0.830724,0.837116


[I 2025-03-22 05:04:16,428] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.002311294500510415, 'weight_decay': 0.002, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3388,0.555494,0.860678,0.823112,0.809042,0.812667
2,0.0411,0.604129,0.865261,0.836695,0.828449,0.831752
3,0.0183,0.76403,0.866178,0.852206,0.832471,0.837968
4,0.0124,0.82024,0.869844,0.841847,0.833623,0.836408
5,0.0076,0.883938,0.877177,0.86071,0.828794,0.841124
6,0.006,0.943343,0.862511,0.842591,0.826671,0.833125
7,0.0028,1.056703,0.864345,0.863248,0.828046,0.840468
8,0.0017,1.186272,0.858845,0.8593,0.824164,0.836602
9,0.0017,1.250092,0.852429,0.843992,0.819957,0.827598
10,0.0009,1.206367,0.860678,0.848702,0.826097,0.83419


[I 2025-03-22 05:06:19,356] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0386,0.690145,0.762603,0.646917,0.648067,0.645052
2,0.4612,0.573269,0.806599,0.674665,0.689269,0.681207
3,0.3105,0.572474,0.799267,0.667526,0.68679,0.673787
4,0.222,0.574755,0.829514,0.838405,0.752964,0.771993
5,0.1557,0.566037,0.843263,0.854796,0.790491,0.811717
6,0.116,0.59381,0.84143,0.856486,0.799132,0.819425
7,0.0901,0.615576,0.831347,0.832505,0.7912,0.805886
8,0.0714,0.652444,0.829514,0.847136,0.788559,0.808881
9,0.0572,0.677359,0.839597,0.857436,0.806025,0.824914
10,0.047,0.674529,0.84143,0.855363,0.808178,0.82553


[I 2025-03-22 05:08:18,547] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0003654769917956456, 'weight_decay': 0.003, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6874,0.559946,0.820348,0.689189,0.700452,0.692441
2,0.178,0.52515,0.855179,0.829005,0.801738,0.81241
3,0.0845,0.672426,0.851512,0.812947,0.819658,0.811283
4,0.0477,0.735241,0.854262,0.845471,0.820822,0.828179
5,0.0323,0.715347,0.859762,0.869704,0.825195,0.840271
6,0.0219,0.755779,0.860678,0.848202,0.825304,0.833827
7,0.0145,0.822658,0.857012,0.842125,0.812461,0.823488
8,0.0121,0.856579,0.856095,0.831873,0.812118,0.819568
9,0.0097,0.93188,0.851512,0.832284,0.806985,0.81634
10,0.0056,0.939663,0.859762,0.824481,0.826315,0.8239


[I 2025-03-22 05:10:40,385] Trial 6 pruned. 


Trial 7 with params: {'learning_rate': 9.505122659935192e-05, 'weight_decay': 0.003, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1064,0.744411,0.748854,0.637218,0.634542,0.632984
2,0.5256,0.60556,0.792851,0.662818,0.678101,0.669419
3,0.3641,0.575221,0.810266,0.676598,0.693722,0.683868
4,0.2726,0.574523,0.818515,0.856928,0.708788,0.710564
5,0.2038,0.563223,0.832264,0.83724,0.755157,0.774452
6,0.1581,0.573202,0.832264,0.845203,0.772652,0.794829
7,0.1258,0.592222,0.832264,0.844507,0.783195,0.802873
8,0.1011,0.626823,0.817599,0.834057,0.761207,0.782589
9,0.0847,0.635186,0.834097,0.839545,0.802524,0.816129
10,0.0705,0.634486,0.843263,0.85841,0.809528,0.828136


[I 2025-03-22 05:13:26,960] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.637,0.554892,0.824931,0.863349,0.722192,0.731936
2,0.1649,0.529279,0.854262,0.827832,0.801711,0.811679
3,0.0773,0.681162,0.850596,0.847777,0.819286,0.826802
4,0.0419,0.72108,0.853346,0.867439,0.819272,0.836561
5,0.0291,0.82508,0.852429,0.866055,0.819142,0.833969
6,0.02,0.759666,0.864345,0.852903,0.839058,0.843985
7,0.0149,0.900136,0.851512,0.862639,0.809816,0.826242
8,0.0105,0.905383,0.858845,0.84597,0.825429,0.832354
9,0.0082,0.912484,0.856095,0.870495,0.829609,0.844992
10,0.005,0.937565,0.857012,0.841316,0.831312,0.834776


[I 2025-03-22 05:16:33,715] Trial 8 finished with value: 0.8343148722145988 and parameters: {'learning_rate': 0.00040842279473800845, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 8 with value: 0.8343148722145988.


Trial 9 with params: {'learning_rate': 0.0005338741354740678, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5691,0.585096,0.827681,0.842808,0.762595,0.78084
2,0.1314,0.550189,0.862511,0.835721,0.825656,0.829827
3,0.0589,0.72084,0.847846,0.835118,0.814957,0.820714
4,0.0333,0.748289,0.853346,0.852814,0.820112,0.831679
5,0.0237,0.846381,0.858845,0.859063,0.823833,0.835646


[I 2025-03-22 05:17:57,640] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 6.888788881730778e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2201,0.83577,0.710357,0.608227,0.601514,0.602925
2,0.6283,0.653859,0.771769,0.645462,0.661543,0.652678
3,0.4545,0.616104,0.791017,0.66356,0.678085,0.668853
4,0.3583,0.587599,0.809349,0.682335,0.691963,0.685375
5,0.2928,0.577626,0.811182,0.679263,0.694277,0.68615


[I 2025-03-22 05:18:59,598] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 0.0025419498380802787, 'weight_decay': 0.002, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3602,0.594379,0.856095,0.864937,0.805149,0.822217
2,0.0403,0.695778,0.846013,0.817378,0.816569,0.814262
3,0.0176,0.775218,0.860678,0.837191,0.817244,0.824166
4,0.0107,0.804076,0.858845,0.838989,0.814683,0.823844
5,0.0081,0.875343,0.877177,0.842304,0.830773,0.834741
6,0.0049,1.026851,0.863428,0.843428,0.828483,0.833815
7,0.0031,1.147906,0.857929,0.813353,0.815662,0.812856
8,0.0019,1.16568,0.863428,0.823834,0.822109,0.81971
9,0.0013,1.293275,0.865261,0.828576,0.82181,0.822038
10,0.0012,1.325782,0.862511,0.84366,0.81804,0.826101


[I 2025-03-22 05:21:13,705] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 0.00013274498948873754, 'weight_decay': 0.0, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0085,0.663629,0.771769,0.653249,0.657564,0.652644
2,0.4192,0.551003,0.819432,0.68778,0.700287,0.693574
3,0.2758,0.566783,0.814849,0.764479,0.707315,0.703931
4,0.1827,0.595308,0.829514,0.84948,0.780173,0.800712
5,0.126,0.569825,0.845096,0.85802,0.801846,0.821576


[I 2025-03-22 05:22:18,274] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.002704032693225816, 'weight_decay': 0.008, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3331,0.576082,0.861595,0.855166,0.808078,0.823554
2,0.037,0.627986,0.871677,0.829401,0.835871,0.831719
3,0.0187,0.875503,0.852429,0.821214,0.812995,0.812966
4,0.0108,0.956963,0.864345,0.834078,0.822376,0.825332
5,0.0068,0.991476,0.880843,0.885953,0.835061,0.851598
6,0.0049,1.15686,0.864345,0.859379,0.820909,0.833108
7,0.004,1.179159,0.865261,0.864749,0.820372,0.835492
8,0.0017,1.209828,0.867094,0.864656,0.82188,0.837116
9,0.001,1.19329,0.873511,0.881894,0.827234,0.846118
10,0.0005,1.267372,0.875344,0.883476,0.828684,0.847064


[I 2025-03-22 05:26:03,552] Trial 13 finished with value: 0.8400191931315731 and parameters: {'learning_rate': 0.002704032693225816, 'weight_decay': 0.008, 'warmup_steps': 12}. Best is trial 13 with value: 0.8400191931315731.


Trial 14 with params: {'learning_rate': 0.0015427366723152545, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3955,0.613043,0.853346,0.809772,0.81108,0.808284
2,0.0546,0.631341,0.855179,0.8255,0.824198,0.822407
3,0.0224,0.892346,0.842346,0.833664,0.812937,0.816963
4,0.0129,0.900355,0.855179,0.836747,0.820641,0.825428
5,0.0107,0.807378,0.871677,0.850671,0.843851,0.845692
6,0.0065,0.94909,0.862511,0.855561,0.825621,0.837249
7,0.0038,1.10307,0.858845,0.829753,0.825181,0.825082
8,0.0026,1.162845,0.861595,0.847071,0.818257,0.827613
9,0.0021,1.102125,0.862511,0.843411,0.827395,0.832696
10,0.0011,1.177894,0.870761,0.849723,0.833443,0.840047


[I 2025-03-22 05:30:48,026] Trial 14 finished with value: 0.8339714303388773 and parameters: {'learning_rate': 0.0015427366723152545, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12}. Best is trial 13 with value: 0.8400191931315731.


Trial 15 with params: {'learning_rate': 0.004854318076150846, 'weight_decay': 0.007, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2689,0.589001,0.864345,0.849739,0.819418,0.830187
2,0.0295,0.683963,0.866178,0.854988,0.830179,0.839818
3,0.0151,0.72738,0.867094,0.844696,0.840071,0.840951
4,0.0096,0.925184,0.868928,0.847528,0.833656,0.837746
5,0.0072,1.07367,0.863428,0.838817,0.825834,0.830856
6,0.0061,1.10784,0.866178,0.829613,0.822407,0.824365
7,0.0027,1.200504,0.874427,0.857306,0.828785,0.839417
8,0.0015,1.361136,0.866178,0.835262,0.821582,0.826681
9,0.0011,1.393721,0.861595,0.825113,0.81721,0.819987
10,0.0004,1.489426,0.864345,0.828091,0.819197,0.822678


[I 2025-03-22 05:32:48,503] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0004024698371246658, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6533,0.566039,0.820348,0.693761,0.699751,0.693371
2,0.167,0.524562,0.858845,0.822191,0.813901,0.817557
3,0.0777,0.679393,0.850596,0.825824,0.818773,0.81767
4,0.0434,0.77916,0.848763,0.826926,0.816059,0.81735
5,0.0286,0.753667,0.857929,0.85534,0.814808,0.828227


[I 2025-03-22 05:34:15,386] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 25}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3797,0.593292,0.863428,0.85348,0.811402,0.823564
2,0.045,0.680256,0.858845,0.817069,0.826626,0.820236
3,0.0202,0.736216,0.859762,0.834036,0.825841,0.827744
4,0.0118,0.815871,0.864345,0.843003,0.819818,0.828348
5,0.0082,0.835718,0.874427,0.859223,0.837318,0.845383
6,0.0049,0.926893,0.878093,0.861431,0.830174,0.842032
7,0.0031,1.114699,0.865261,0.86132,0.819939,0.834039
8,0.0018,1.18429,0.871677,0.86669,0.82646,0.839548
9,0.0011,1.282667,0.861595,0.860276,0.817028,0.831319
10,0.0011,1.27497,0.862511,0.860079,0.818473,0.833081


[I 2025-03-22 05:37:14,492] Trial 17 finished with value: 0.8382604817417166 and parameters: {'learning_rate': 0.0020085822314002493, 'weight_decay': 0.008, 'warmup_steps': 25}. Best is trial 13 with value: 0.8400191931315731.


Trial 18 with params: {'learning_rate': 0.004773885469842765, 'weight_decay': 0.008, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3102,0.690326,0.855179,0.843458,0.81328,0.822617
2,0.0331,0.703534,0.874427,0.839957,0.828131,0.831411
3,0.0165,0.826647,0.861595,0.844273,0.819072,0.826899
4,0.0089,0.991602,0.861595,0.822671,0.81873,0.818419
5,0.0072,1.167264,0.862511,0.826102,0.819357,0.82002
6,0.0045,1.267823,0.864345,0.838481,0.820905,0.826538
7,0.0039,1.475891,0.859762,0.854276,0.815916,0.828936
8,0.0019,1.46507,0.867094,0.860782,0.822517,0.83508
9,0.001,1.578082,0.862511,0.846552,0.818717,0.82796
10,0.0004,1.644605,0.860678,0.845525,0.817397,0.826467


[I 2025-03-22 05:39:20,276] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0018110025225277987, 'weight_decay': 0.005, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3814,0.630527,0.849679,0.804549,0.809708,0.804475
2,0.049,0.674557,0.851512,0.821552,0.8207,0.818139
3,0.0204,0.775788,0.857929,0.832594,0.824095,0.825092
4,0.014,0.833416,0.855179,0.828125,0.821684,0.822802
5,0.009,0.967876,0.851512,0.807325,0.819656,0.811677


[I 2025-03-22 05:40:27,054] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.001257512735966554, 'weight_decay': 0.008, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4563,0.53396,0.859762,0.854004,0.818105,0.828912
2,0.0626,0.58247,0.862511,0.833024,0.828958,0.827654
3,0.0268,0.715232,0.852429,0.822604,0.818215,0.817975
4,0.0151,0.817717,0.864345,0.864558,0.826768,0.841483
5,0.0105,0.883382,0.857929,0.828762,0.824141,0.824636
6,0.0071,0.867269,0.865261,0.835946,0.827633,0.830446
7,0.0042,1.10654,0.852429,0.819384,0.819509,0.817599
8,0.0031,1.097357,0.864345,0.845899,0.827797,0.835153
9,0.0028,1.113687,0.866178,0.856616,0.829325,0.840029
10,0.0022,1.05058,0.868011,0.848347,0.830262,0.837508


[I 2025-03-22 05:43:52,023] Trial 20 finished with value: 0.8413750447982159 and parameters: {'learning_rate': 0.001257512735966554, 'weight_decay': 0.008, 'warmup_steps': 31}. Best is trial 20 with value: 0.8413750447982159.


Trial 21 with params: {'learning_rate': 0.0012187596661872556, 'weight_decay': 0.008, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4576,0.538246,0.859762,0.853627,0.817342,0.828422
2,0.0625,0.528791,0.874427,0.826315,0.837377,0.830613
3,0.0279,0.680523,0.856095,0.837401,0.82332,0.826725
4,0.0176,0.784318,0.864345,0.876482,0.827212,0.844686
5,0.0101,0.858861,0.867094,0.863281,0.830159,0.842253
6,0.0063,0.880194,0.859762,0.850906,0.824084,0.834175
7,0.0046,1.106874,0.852429,0.867296,0.818501,0.834346
8,0.0045,1.037842,0.865261,0.86345,0.829564,0.841598
9,0.0017,1.11801,0.866178,0.864895,0.829737,0.842881
10,0.0014,1.094496,0.859762,0.861914,0.823546,0.838323


[I 2025-03-22 05:47:35,800] Trial 21 finished with value: 0.8383371501740736 and parameters: {'learning_rate': 0.0012187596661872556, 'weight_decay': 0.008, 'warmup_steps': 28}. Best is trial 20 with value: 0.8413750447982159.


Trial 22 with params: {'learning_rate': 0.0004170772817705616, 'weight_decay': 0.004, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6649,0.569613,0.822181,0.692741,0.701617,0.694128
2,0.1596,0.531806,0.858845,0.830196,0.814214,0.821144
3,0.0748,0.680246,0.849679,0.810243,0.817692,0.809348
4,0.0419,0.738863,0.853346,0.865111,0.82015,0.834959
5,0.0265,0.78307,0.854262,0.857362,0.819055,0.832555
6,0.019,0.772462,0.865261,0.863158,0.838877,0.848167
7,0.0122,0.891365,0.854262,0.863693,0.812111,0.82807
8,0.0106,0.861461,0.857012,0.855201,0.822591,0.83483
9,0.0071,0.949272,0.851512,0.839848,0.808777,0.820022
10,0.0047,0.950101,0.854262,0.843908,0.820173,0.829336


[I 2025-03-22 05:50:54,900] Trial 22 finished with value: 0.816100448786214 and parameters: {'learning_rate': 0.0004170772817705616, 'weight_decay': 0.004, 'warmup_steps': 31}. Best is trial 20 with value: 0.8413750447982159.


Trial 23 with params: {'learning_rate': 0.0008328182357819603, 'weight_decay': 0.008, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5193,0.532878,0.848763,0.845413,0.788959,0.806008
2,0.0859,0.546636,0.870761,0.838892,0.843192,0.840055
3,0.0394,0.736999,0.851512,0.851531,0.810371,0.822455
4,0.0228,0.757576,0.852429,0.843047,0.80945,0.819637
5,0.0137,0.776468,0.865261,0.848227,0.820782,0.830014
6,0.0095,0.838485,0.858845,0.841474,0.823772,0.830362
7,0.007,0.929778,0.864345,0.855226,0.828587,0.837528
8,0.0045,0.916458,0.866178,0.845875,0.828559,0.835847
9,0.0035,1.019342,0.856095,0.855433,0.811256,0.8272
10,0.0017,1.10509,0.852429,0.830713,0.818043,0.822193


[I 2025-03-22 05:52:50,610] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0010333636594161816, 'weight_decay': 0.01, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4689,0.537536,0.854262,0.837243,0.803764,0.814462
2,0.0728,0.528398,0.875344,0.871332,0.835475,0.849628
3,0.0311,0.67042,0.861595,0.849275,0.816648,0.828188
4,0.0199,0.800452,0.856095,0.833806,0.813505,0.819716
5,0.0125,0.82282,0.872594,0.862247,0.843577,0.851252
6,0.0087,0.873711,0.860678,0.862496,0.825327,0.838304
7,0.0059,0.904468,0.869844,0.866101,0.833449,0.844603
8,0.004,0.949665,0.857012,0.855068,0.812482,0.827526
9,0.0027,1.051067,0.857012,0.857612,0.822651,0.834481
10,0.0014,1.118254,0.859762,0.850531,0.82453,0.833745


[I 2025-03-22 05:56:16,868] Trial 24 finished with value: 0.8329249123230694 and parameters: {'learning_rate': 0.0010333636594161816, 'weight_decay': 0.01, 'warmup_steps': 22}. Best is trial 20 with value: 0.8413750447982159.


Trial 25 with params: {'learning_rate': 0.0017464548708330647, 'weight_decay': 0.005, 'warmup_steps': 28}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4106,0.548244,0.871677,0.876886,0.82641,0.842779
2,0.0498,0.633267,0.849679,0.820618,0.81943,0.817132
3,0.022,0.71904,0.856095,0.827593,0.823951,0.82203
4,0.0129,0.78125,0.867094,0.839751,0.831224,0.833102
5,0.0087,0.790344,0.872594,0.850087,0.835796,0.840734
6,0.0046,0.919007,0.871677,0.869759,0.833822,0.847244
7,0.0033,1.02925,0.868011,0.87709,0.831251,0.847122
8,0.0032,1.133296,0.861595,0.832193,0.827429,0.827156
9,0.0014,1.063618,0.872594,0.842922,0.834758,0.837839
10,0.0006,1.135537,0.873511,0.843981,0.836333,0.838799


[I 2025-03-22 05:59:39,869] Trial 25 finished with value: 0.8376358357204606 and parameters: {'learning_rate': 0.0017464548708330647, 'weight_decay': 0.005, 'warmup_steps': 28}. Best is trial 20 with value: 0.8413750447982159.


Trial 26 with params: {'learning_rate': 0.000733540863652704, 'weight_decay': 0.006, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5315,0.556803,0.84143,0.831901,0.762912,0.780405
2,0.0958,0.528732,0.875344,0.872667,0.836411,0.850221
3,0.043,0.757111,0.852429,0.864236,0.811436,0.826262
4,0.0239,0.783965,0.854262,0.842471,0.81069,0.821037
5,0.0171,0.835112,0.855179,0.853735,0.811852,0.82596
6,0.0103,0.858712,0.861595,0.867693,0.818442,0.83409
7,0.0079,0.909026,0.864345,0.850919,0.819377,0.829957
8,0.0055,0.963618,0.863428,0.840638,0.81912,0.826653
9,0.003,0.962562,0.866178,0.854781,0.829246,0.839343
10,0.0027,1.095198,0.855179,0.842994,0.828195,0.833788


[I 2025-03-22 06:03:12,024] Trial 26 finished with value: 0.8268495119511226 and parameters: {'learning_rate': 0.000733540863652704, 'weight_decay': 0.006, 'warmup_steps': 22}. Best is trial 20 with value: 0.8413750447982159.


Trial 27 with params: {'learning_rate': 0.00033550878568657655, 'weight_decay': 0.009000000000000001, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7241,0.566469,0.812099,0.685682,0.692861,0.685947
2,0.196,0.513254,0.853346,0.828742,0.799882,0.811341
3,0.0918,0.635877,0.849679,0.818389,0.80994,0.808804
4,0.0512,0.722442,0.852429,0.852264,0.819459,0.830854
5,0.0327,0.725752,0.857929,0.855848,0.82492,0.835557
6,0.0231,0.763305,0.860678,0.857687,0.825637,0.837414
7,0.0188,0.827569,0.857012,0.865764,0.813736,0.830021
8,0.0124,0.87914,0.855179,0.852457,0.811909,0.825029
9,0.0099,0.874096,0.853346,0.85275,0.809973,0.824917
10,0.0066,0.870229,0.860678,0.849351,0.825338,0.833959


[I 2025-03-22 06:05:25,096] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.002277250336871309, 'weight_decay': 0.007, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3775,0.613335,0.857012,0.865719,0.805712,0.821618
2,0.042,0.626586,0.851512,0.806661,0.821889,0.812759
3,0.0173,0.76669,0.853346,0.837543,0.814242,0.819846
4,0.011,0.74605,0.87626,0.86117,0.829233,0.841586
5,0.0067,0.895844,0.875344,0.836664,0.828652,0.831961


[I 2025-03-22 06:06:28,965] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 0.004551034354335968, 'weight_decay': 0.01, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3103,0.665074,0.862511,0.846998,0.80954,0.821767
2,0.0329,0.665536,0.872594,0.849266,0.826576,0.835467
3,0.0157,0.714626,0.871677,0.856079,0.82703,0.83774
4,0.0073,1.019249,0.868011,0.82935,0.824017,0.824529
5,0.0079,1.232083,0.854262,0.832339,0.812483,0.818547
6,0.0069,1.147059,0.853346,0.832094,0.811529,0.818546
7,0.0037,1.432812,0.854262,0.85694,0.810658,0.826784
8,0.003,1.263127,0.861595,0.839855,0.817925,0.825899
9,0.0006,1.401584,0.860678,0.840051,0.816658,0.82521
10,0.0003,1.482676,0.862511,0.841325,0.818005,0.826587


[I 2025-03-22 06:08:32,034] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0017963593191533934, 'weight_decay': 0.008, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.382,0.5947,0.856095,0.840338,0.813776,0.82233
2,0.048,0.666746,0.853346,0.818217,0.821639,0.817932
3,0.021,0.800003,0.846013,0.826067,0.81708,0.816483
4,0.0137,0.870109,0.856095,0.855199,0.814236,0.826605
5,0.0095,0.846949,0.865261,0.846271,0.82919,0.835436
6,0.0061,0.922634,0.870761,0.864501,0.824937,0.83931
7,0.0036,1.019072,0.866178,0.861022,0.820809,0.834622
8,0.002,1.036857,0.872594,0.859581,0.835223,0.844761
9,0.0013,1.128655,0.863428,0.851372,0.827112,0.836232
10,0.0009,1.150189,0.868011,0.865287,0.831283,0.844305


[I 2025-03-22 06:12:17,599] Trial 30 finished with value: 0.84376963775936 and parameters: {'learning_rate': 0.0017963593191533934, 'weight_decay': 0.008, 'warmup_steps': 15}. Best is trial 30 with value: 0.84376963775936.


Trial 31 with params: {'learning_rate': 0.004109169197885689, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3009,0.649542,0.867094,0.861298,0.821539,0.835772
2,0.0328,0.714097,0.861595,0.828943,0.81919,0.822046
3,0.016,0.905839,0.851512,0.830406,0.810323,0.815407
4,0.0088,1.011463,0.850596,0.817765,0.809314,0.810782
5,0.007,1.062699,0.868011,0.83769,0.832454,0.833803
6,0.004,1.189144,0.854262,0.836842,0.819814,0.82628
7,0.0028,1.176442,0.857929,0.838673,0.823319,0.828742
8,0.0025,1.263,0.861595,0.834292,0.826922,0.829405
9,0.0011,1.313403,0.864345,0.851289,0.819107,0.830947
10,0.0005,1.374768,0.860678,0.838817,0.816559,0.824736


[I 2025-03-22 06:14:26,345] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.0009165914899667564, 'weight_decay': 0.007, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4817,0.585055,0.845096,0.825074,0.787276,0.796969
2,0.0812,0.540461,0.874427,0.836277,0.845294,0.84019
3,0.0344,0.735836,0.856095,0.844323,0.812572,0.822969
4,0.0193,0.7713,0.860678,0.861799,0.825491,0.838499
5,0.0132,0.859839,0.859762,0.846308,0.814591,0.82676
6,0.0097,0.798672,0.862511,0.847515,0.833949,0.838986
7,0.0053,1.018044,0.857929,0.84401,0.814378,0.82364
8,0.0036,1.017077,0.861595,0.849662,0.825079,0.834794
9,0.0024,1.114207,0.858845,0.846377,0.814354,0.825584
10,0.0018,1.135197,0.860678,0.841667,0.823915,0.831214


[I 2025-03-22 06:17:09,005] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0016731337356021957, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4072,0.582311,0.859762,0.842702,0.817636,0.826054
2,0.0505,0.588419,0.864345,0.858888,0.830422,0.839804
3,0.0232,0.797057,0.857929,0.856012,0.825244,0.83447
4,0.0124,0.817106,0.868011,0.857075,0.831713,0.840331
5,0.0089,0.871338,0.864345,0.850396,0.829366,0.83708
6,0.0053,1.002935,0.862511,0.861631,0.827129,0.83867
7,0.0037,1.132556,0.862511,0.85993,0.817604,0.831458
8,0.0022,1.160284,0.868928,0.867076,0.831079,0.84468
9,0.0012,1.151927,0.866178,0.853647,0.828897,0.838441
10,0.0009,1.200855,0.875344,0.861658,0.836065,0.84655


[I 2025-03-22 06:20:24,138] Trial 33 finished with value: 0.8436376575123733 and parameters: {'learning_rate': 0.0016731337356021957, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30}. Best is trial 30 with value: 0.84376963775936.


Trial 34 with params: {'learning_rate': 0.0019304032879985068, 'weight_decay': 0.008, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3665,0.589813,0.865261,0.859011,0.821652,0.834182
2,0.0446,0.653232,0.865261,0.834535,0.831572,0.83106
3,0.0211,0.743391,0.860678,0.84032,0.828273,0.830373
4,0.0128,0.784682,0.865261,0.846589,0.828765,0.835433
5,0.0084,0.911528,0.868928,0.855723,0.832721,0.84107
6,0.006,0.992034,0.868928,0.85687,0.822672,0.835569
7,0.003,1.132074,0.865261,0.859334,0.820816,0.833228
8,0.0023,1.123779,0.869844,0.854913,0.82323,0.835354
9,0.0014,1.205717,0.871677,0.845599,0.826218,0.833007
10,0.001,1.225755,0.872594,0.850881,0.83447,0.84114


[I 2025-03-22 06:23:28,198] Trial 34 finished with value: 0.8438629349264759 and parameters: {'learning_rate': 0.0019304032879985068, 'weight_decay': 0.008, 'warmup_steps': 12}. Best is trial 34 with value: 0.8438629349264759.


Trial 35 with params: {'learning_rate': 0.0008936016543853431, 'weight_decay': 0.01, 'warmup_steps': 31}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5093,0.534543,0.849679,0.861883,0.798548,0.817461
2,0.0816,0.578402,0.869844,0.858368,0.842981,0.847921
3,0.0366,0.705636,0.861595,0.848789,0.817856,0.82728
4,0.0214,0.74359,0.867094,0.868012,0.839746,0.850349
5,0.0146,0.797711,0.864345,0.861266,0.818527,0.833897
6,0.0073,0.846426,0.873511,0.865758,0.842735,0.852515
7,0.0054,0.966255,0.860678,0.838359,0.817668,0.823452
8,0.0042,0.974956,0.864345,0.846723,0.836339,0.840881
9,0.0028,1.038052,0.860678,0.837518,0.832464,0.834161
10,0.0024,1.038036,0.861595,0.853678,0.824518,0.835733


[I 2025-03-22 06:26:43,029] Trial 35 finished with value: 0.8357522586351589 and parameters: {'learning_rate': 0.0008936016543853431, 'weight_decay': 0.01, 'warmup_steps': 31}. Best is trial 34 with value: 0.8438629349264759.


Trial 36 with params: {'learning_rate': 0.0024140167915813623, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3649,0.588375,0.862511,0.871214,0.809328,0.827717
2,0.041,0.640043,0.859762,0.823944,0.826632,0.824378
3,0.0186,0.767322,0.858845,0.834465,0.817017,0.822314
4,0.0106,0.787013,0.866178,0.842723,0.820861,0.829558
5,0.0086,0.872039,0.865261,0.851054,0.81872,0.831287
6,0.0054,0.940631,0.867094,0.852216,0.821879,0.832943
7,0.0029,1.093122,0.873511,0.85922,0.826553,0.838866
8,0.0021,1.146018,0.871677,0.857435,0.825112,0.837297
9,0.0011,1.245692,0.870761,0.857299,0.823983,0.836743
10,0.0006,1.267843,0.868011,0.855386,0.821767,0.834533


[I 2025-03-22 06:30:18,385] Trial 36 finished with value: 0.8356398765499202 and parameters: {'learning_rate': 0.0024140167915813623, 'weight_decay': 0.009000000000000001, 'warmup_steps': 30}. Best is trial 34 with value: 0.8438629349264759.


Trial 37 with params: {'learning_rate': 0.001233207839763869, 'weight_decay': 0.008, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4297,0.565042,0.854262,0.839563,0.812389,0.821564
2,0.0631,0.608301,0.855179,0.83341,0.823755,0.825168
3,0.0274,0.788288,0.847846,0.859339,0.817482,0.829092
4,0.0152,0.901453,0.857929,0.85806,0.824284,0.83556
5,0.0116,0.806804,0.868011,0.852002,0.822622,0.833496


[I 2025-03-22 06:31:14,947] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.0010674764812278787, 'weight_decay': 0.005, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4435,0.59951,0.845096,0.829228,0.796804,0.80481
2,0.0737,0.576233,0.857929,0.812754,0.834227,0.820925
3,0.0303,0.722451,0.846929,0.852678,0.815033,0.826175
4,0.0174,0.773187,0.856095,0.8469,0.821834,0.831251
5,0.0115,0.859155,0.859762,0.838379,0.825108,0.829297


[I 2025-03-22 06:32:22,513] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.00010957402645904347, 'weight_decay': 0.003, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0419,0.704796,0.762603,0.646475,0.647339,0.644413
2,0.4824,0.586459,0.7956,0.666084,0.680589,0.672294
3,0.331,0.574424,0.80385,0.671388,0.690158,0.678026
4,0.2382,0.584566,0.815765,0.80042,0.715909,0.722863
5,0.1716,0.566013,0.840513,0.853405,0.788538,0.810075
6,0.1293,0.584161,0.836847,0.851533,0.786297,0.807831
7,0.101,0.607412,0.83593,0.848678,0.795769,0.813776
8,0.0796,0.650286,0.824015,0.843758,0.783914,0.804658
9,0.0664,0.663791,0.835014,0.853944,0.80335,0.820774
10,0.0545,0.653323,0.842346,0.834678,0.808808,0.819607


[I 2025-03-22 06:34:20,483] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.002306843993572417, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3373,0.545056,0.87626,0.881948,0.829727,0.847243
2,0.04,0.561221,0.871677,0.840877,0.83543,0.83717
3,0.0182,0.7975,0.870761,0.863358,0.826103,0.837803
4,0.0126,0.833089,0.862511,0.847979,0.817569,0.829114
5,0.0086,0.836416,0.879927,0.855971,0.831548,0.841259
6,0.0051,0.923547,0.872594,0.850257,0.825148,0.835491
7,0.0029,1.104183,0.870761,0.857804,0.824092,0.835395
8,0.002,1.332348,0.862511,0.852856,0.828691,0.83434
9,0.0014,1.093272,0.873511,0.858997,0.827403,0.838406
10,0.0007,1.098958,0.878093,0.855922,0.839913,0.845464


[I 2025-03-22 06:37:29,458] Trial 40 finished with value: 0.8501281296030873 and parameters: {'learning_rate': 0.002306843993572417, 'weight_decay': 0.01, 'warmup_steps': 3}. Best is trial 40 with value: 0.8501281296030873.


Trial 41 with params: {'learning_rate': 0.004004583195326367, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2857,0.630678,0.867094,0.862439,0.822453,0.835912
2,0.0339,0.71951,0.856095,0.813756,0.814706,0.812696
3,0.0157,0.795201,0.861595,0.851221,0.828293,0.835507
4,0.009,0.884189,0.862511,0.832547,0.838765,0.833697
5,0.0089,0.899863,0.87626,0.835669,0.839741,0.83732
6,0.0055,1.085226,0.856095,0.81598,0.823291,0.818008
7,0.0029,1.118099,0.875344,0.839063,0.840162,0.837579
8,0.0021,1.132613,0.871677,0.840291,0.82634,0.831476
9,0.001,1.237188,0.873511,0.839139,0.836894,0.837047
10,0.0006,1.342393,0.869844,0.832752,0.824672,0.827186


[I 2025-03-22 06:39:24,916] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 5.3550149515819593e-05, 'weight_decay': 0.005, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3321,0.946102,0.662695,0.574839,0.556733,0.560847
2,0.7423,0.722708,0.751604,0.627364,0.642707,0.633846
3,0.5368,0.654512,0.774519,0.649757,0.663428,0.654674
4,0.4333,0.610199,0.79835,0.675253,0.681597,0.676877
5,0.3644,0.595474,0.799267,0.66983,0.684944,0.676738


[I 2025-03-22 06:40:26,863] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0011331973981284014, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4239,0.583284,0.836847,0.808392,0.798099,0.799878
2,0.0689,0.582263,0.865261,0.82962,0.830798,0.828407
3,0.0282,0.715291,0.846013,0.850777,0.812834,0.825597
4,0.0157,0.781007,0.857929,0.856836,0.822932,0.835419
5,0.012,0.780075,0.864345,0.851621,0.82936,0.838063
6,0.0073,0.795781,0.868011,0.867918,0.830737,0.845693
7,0.0048,1.073476,0.857929,0.858661,0.815334,0.82821
8,0.0044,0.988159,0.863428,0.875845,0.827059,0.844527
9,0.0029,1.030529,0.860678,0.860748,0.825079,0.837903
10,0.0019,1.06347,0.862511,0.851033,0.826836,0.83613


[I 2025-03-22 06:43:21,480] Trial 43 finished with value: 0.8347253038899356 and parameters: {'learning_rate': 0.0011331973981284014, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2}. Best is trial 40 with value: 0.8501281296030873.


Trial 44 with params: {'learning_rate': 0.002582719692696467, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3272,0.607846,0.863428,0.849141,0.829523,0.835572
2,0.0384,0.644429,0.855179,0.836275,0.832463,0.832315
3,0.0178,0.787785,0.872594,0.863619,0.829435,0.839154
4,0.0103,0.789747,0.867094,0.863198,0.82221,0.836287
5,0.0072,0.98019,0.864345,0.863103,0.818709,0.834476
6,0.0043,1.0013,0.870761,0.866588,0.824275,0.839838
7,0.0031,1.197073,0.866178,0.862161,0.821191,0.834847
8,0.0022,1.20446,0.864345,0.863356,0.83094,0.841409
9,0.0015,1.258711,0.873511,0.860907,0.836367,0.845866
10,0.0007,1.307265,0.870761,0.85801,0.833876,0.843371


[I 2025-03-22 06:46:47,154] Trial 44 finished with value: 0.8446711030205689 and parameters: {'learning_rate': 0.002582719692696467, 'weight_decay': 0.01, 'warmup_steps': 7}. Best is trial 40 with value: 0.8501281296030873.


Trial 45 with params: {'learning_rate': 0.0013905143710220842, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3982,0.594797,0.852429,0.859854,0.812295,0.825788
2,0.0575,0.595398,0.864345,0.830207,0.828706,0.827926
3,0.0249,0.846679,0.837764,0.835305,0.809644,0.814637
4,0.0152,0.728632,0.868928,0.856377,0.833441,0.842048
5,0.01,0.856499,0.869844,0.866961,0.831547,0.8455
6,0.0054,0.89244,0.865261,0.853362,0.827168,0.838069
7,0.0038,1.087327,0.857929,0.845046,0.824424,0.831119
8,0.0047,0.989166,0.864345,0.864717,0.82737,0.84151
9,0.0021,1.109954,0.866178,0.863594,0.830425,0.84117
10,0.0012,1.090608,0.873511,0.861203,0.844507,0.850802


[I 2025-03-22 06:50:32,997] Trial 45 finished with value: 0.836081933950136 and parameters: {'learning_rate': 0.0013905143710220842, 'weight_decay': 0.01, 'warmup_steps': 6}. Best is trial 40 with value: 0.8501281296030873.


Trial 46 with params: {'learning_rate': 0.0024335873509441965, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3361,0.60197,0.864345,0.860006,0.830069,0.839963
2,0.0398,0.653844,0.860678,0.818917,0.828777,0.821738
3,0.0184,0.781347,0.859762,0.844182,0.835829,0.836793
4,0.0114,0.739836,0.871677,0.858485,0.834819,0.843927
5,0.008,0.857544,0.875344,0.85513,0.836692,0.844094
6,0.0052,1.02623,0.868011,0.858449,0.840838,0.847565
7,0.0025,1.207192,0.858845,0.839742,0.824267,0.830146
8,0.0019,1.277693,0.858845,0.850657,0.833732,0.839801
9,0.0011,1.212965,0.870761,0.851491,0.843206,0.846169
10,0.0006,1.294896,0.871677,0.852206,0.844376,0.847113


[I 2025-03-22 06:53:49,923] Trial 46 finished with value: 0.836906040019109 and parameters: {'learning_rate': 0.0024335873509441965, 'weight_decay': 0.009000000000000001, 'warmup_steps': 9}. Best is trial 40 with value: 0.8501281296030873.


Trial 47 with params: {'learning_rate': 0.0019937793883598805, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3542,0.599266,0.863428,0.872548,0.818608,0.836229
2,0.0443,0.616572,0.857929,0.828662,0.825169,0.825065
3,0.0194,0.745034,0.862511,0.826525,0.829444,0.826086
4,0.0111,0.787563,0.861595,0.842759,0.827928,0.832273
5,0.0071,0.861046,0.868011,0.843077,0.823842,0.830578
6,0.0066,0.886191,0.864345,0.8647,0.818707,0.835015
7,0.0032,0.982322,0.875344,0.871766,0.827728,0.843081
8,0.0023,0.983833,0.87901,0.8635,0.84181,0.849303
9,0.0013,1.078387,0.869844,0.854625,0.824778,0.835139
10,0.001,1.025228,0.87626,0.847043,0.837672,0.841559


[I 2025-03-22 06:57:02,365] Trial 47 finished with value: 0.8359753550565627 and parameters: {'learning_rate': 0.0019937793883598805, 'weight_decay': 0.01, 'warmup_steps': 2}. Best is trial 40 with value: 0.8501281296030873.


Trial 48 with params: {'learning_rate': 0.0027076943187309722, 'weight_decay': 0.01, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3355,0.570095,0.864345,0.847847,0.811392,0.822887
2,0.0379,0.645634,0.871677,0.848732,0.836012,0.840033
3,0.0181,0.803176,0.862511,0.848945,0.830601,0.834301
4,0.0105,0.816997,0.867094,0.847348,0.830474,0.836636
5,0.0068,0.976544,0.857929,0.836326,0.831446,0.832897
6,0.0046,1.007085,0.860678,0.849791,0.824413,0.834739
7,0.0021,1.122609,0.863428,0.86246,0.82629,0.8405
8,0.0017,1.196476,0.864345,0.857125,0.809138,0.825198
9,0.0011,1.187949,0.867094,0.854599,0.830062,0.839635
10,0.0005,1.268175,0.867094,0.845938,0.830089,0.836041


[I 2025-03-22 07:00:20,875] Trial 48 finished with value: 0.8410017928977788 and parameters: {'learning_rate': 0.0027076943187309722, 'weight_decay': 0.01, 'warmup_steps': 13}. Best is trial 40 with value: 0.8501281296030873.


Trial 49 with params: {'learning_rate': 0.003599946958634411, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2979,0.569063,0.867094,0.854108,0.832761,0.839077
2,0.0348,0.674154,0.857929,0.846526,0.824012,0.831785
3,0.0155,0.727963,0.867094,0.836432,0.833102,0.832994
4,0.0083,0.963308,0.857012,0.829242,0.824122,0.825151
5,0.0078,1.000338,0.858845,0.851641,0.833704,0.840528
6,0.0048,1.043571,0.865261,0.85351,0.840459,0.844542
7,0.003,1.167921,0.861595,0.845896,0.835845,0.839028
8,0.0019,1.247748,0.867094,0.848681,0.841488,0.843359
9,0.0006,1.292448,0.869844,0.83714,0.84412,0.839435
10,0.0004,1.37158,0.869844,0.832144,0.844085,0.836747


[I 2025-03-22 07:02:30,371] Trial 49 pruned. 


Trial 50 with params: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3311,0.579214,0.863428,0.846184,0.812087,0.82149
2,0.0371,0.651499,0.864345,0.802956,0.821564,0.810371
3,0.0176,0.83339,0.854262,0.821882,0.814497,0.814361
4,0.0111,0.779564,0.868011,0.835062,0.824333,0.827725
5,0.008,0.96975,0.866178,0.862161,0.822807,0.836595
6,0.0055,0.935415,0.863428,0.835145,0.828919,0.830142
7,0.0031,1.157614,0.865261,0.850229,0.822755,0.830537
8,0.002,1.186338,0.869844,0.846896,0.824142,0.832297
9,0.0012,1.235983,0.867094,0.84276,0.823832,0.8297
10,0.0004,1.336995,0.869844,0.84896,0.834679,0.838748


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-22 07:06:23,449] Trial 50 finished with value: 0.8359920574068056 and parameters: {'learning_rate': 0.0027800474932883233, 'weight_decay': 0.0, 'warmup_steps': 12}. Best is trial 40 with value: 0.8501281296030873.


Trial 51 with params: {'learning_rate': 0.002229638502490497, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3401,0.564625,0.863428,0.851599,0.829123,0.835591
2,0.0412,0.628778,0.862511,0.842027,0.827271,0.832705
3,0.0184,0.801083,0.847846,0.814059,0.818072,0.81268
4,0.0099,0.90428,0.862511,0.85273,0.828897,0.836564
5,0.0087,0.901868,0.869844,0.855233,0.82347,0.835441
6,0.0048,1.025788,0.863428,0.861018,0.818013,0.833751
7,0.0023,1.201137,0.873511,0.868019,0.825935,0.841259
8,0.0019,1.128005,0.864345,0.860152,0.819985,0.833654
9,0.0017,1.198609,0.868928,0.854099,0.822802,0.834338
10,0.0009,1.247238,0.868011,0.843601,0.822726,0.830091


[I 2025-03-22 07:08:33,340] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.004803130612126116, 'weight_decay': 0.0, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.312,0.670235,0.855179,0.840218,0.805783,0.814968
2,0.0339,0.747057,0.854262,0.803587,0.813027,0.807113
3,0.0161,0.906428,0.852429,0.822765,0.812176,0.814711
4,0.0077,0.997805,0.856095,0.818516,0.814651,0.815083
5,0.0087,1.226245,0.857012,0.838321,0.824785,0.829359


[I 2025-03-22 07:09:31,342] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.00021967416393079315, 'weight_decay': 0.0, 'warmup_steps': 15}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.821,0.569546,0.808433,0.67977,0.690199,0.684646
2,0.2883,0.497521,0.846013,0.842544,0.757251,0.774145
3,0.1555,0.587307,0.837764,0.850047,0.790579,0.805793
4,0.0899,0.637178,0.847846,0.84375,0.812604,0.824371
5,0.0589,0.633506,0.851512,0.82618,0.825655,0.825457
6,0.0438,0.679549,0.854262,0.869704,0.809239,0.830565
7,0.0332,0.714369,0.855179,0.867564,0.810852,0.829595
8,0.0224,0.853996,0.840513,0.855661,0.800722,0.816673
9,0.0179,0.832265,0.843263,0.86072,0.800284,0.820105
10,0.0146,0.808105,0.852429,0.863241,0.809403,0.827353


[I 2025-03-22 07:11:32,889] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.002151034320172729, 'weight_decay': 0.007, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3552,0.570982,0.868011,0.873241,0.813963,0.832037
2,0.042,0.650207,0.858845,0.82892,0.825986,0.825847
3,0.0188,0.762896,0.854262,0.834996,0.811054,0.819124
4,0.0127,0.807227,0.865261,0.849806,0.82002,0.831126
5,0.0088,0.890979,0.861595,0.845499,0.819184,0.827851


[I 2025-03-22 07:12:47,014] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.001651625505402199, 'weight_decay': 0.008, 'warmup_steps': 21}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4001,0.640263,0.856095,0.864365,0.814849,0.828578
2,0.0524,0.606142,0.859762,0.817829,0.826887,0.821233
3,0.0215,0.863686,0.84143,0.837171,0.8121,0.817695
4,0.0125,0.836159,0.864345,0.850548,0.820537,0.829993
5,0.0072,0.930911,0.87901,0.866557,0.850174,0.856287
6,0.007,0.851191,0.874427,0.87226,0.826163,0.84314
7,0.0042,1.085957,0.867094,0.866513,0.821596,0.836747
8,0.0026,1.194548,0.861595,0.857941,0.818462,0.83094
9,0.0013,1.119939,0.871677,0.866721,0.8253,0.840278
10,0.001,1.109962,0.873511,0.863031,0.845006,0.852252


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 07:16:15,284] Trial 55 finished with value: 0.8424597738031032 and parameters: {'learning_rate': 0.001651625505402199, 'weight_decay': 0.008, 'warmup_steps': 21}. Best is trial 40 with value: 0.8501281296030873.


Trial 56 with params: {'learning_rate': 0.0027068059719358964, 'weight_decay': 0.008, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.337,0.60735,0.862511,0.869229,0.810431,0.826438
2,0.0364,0.659011,0.859762,0.829286,0.826575,0.826246
3,0.0161,0.959615,0.83868,0.829475,0.811499,0.814025
4,0.0097,0.834147,0.860678,0.847929,0.816854,0.827972
5,0.0082,0.879149,0.866178,0.862348,0.820793,0.835825
6,0.0042,0.966324,0.866178,0.862103,0.820972,0.835971
7,0.0022,1.12307,0.864345,0.861668,0.818862,0.834309
8,0.0013,1.19656,0.868011,0.861874,0.822886,0.836368
9,0.002,1.057301,0.87626,0.859792,0.829307,0.840834
10,0.0007,1.159636,0.868011,0.843505,0.822384,0.830465


[I 2025-03-22 07:18:33,909] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.0014836950525544204, 'weight_decay': 0.009000000000000001, 'warmup_steps': 18}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4074,0.607785,0.848763,0.854765,0.800288,0.814184
2,0.0568,0.589884,0.859762,0.825408,0.825356,0.824542
3,0.0244,0.740758,0.848763,0.819813,0.817216,0.814739
4,0.0129,0.791379,0.868011,0.854675,0.832496,0.839955
5,0.0095,0.840773,0.862511,0.836575,0.825955,0.830117
6,0.0063,0.894006,0.859762,0.84806,0.825069,0.833556
7,0.0036,0.996564,0.861595,0.841641,0.825493,0.831281
8,0.0028,1.103825,0.857012,0.856559,0.822517,0.834612
9,0.0027,1.064607,0.859762,0.846408,0.814796,0.826194
10,0.0009,1.184253,0.859762,0.834008,0.824513,0.827347


[I 2025-03-22 07:20:35,558] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0008494419495937195, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4829,0.557263,0.851512,0.834982,0.801285,0.810648
2,0.0864,0.540971,0.873511,0.852201,0.835821,0.842347
3,0.0397,0.721429,0.851512,0.819114,0.809032,0.810113
4,0.0219,0.807007,0.853346,0.843448,0.820321,0.826907
5,0.0147,0.900466,0.850596,0.851831,0.817859,0.828829
6,0.0099,0.862935,0.858845,0.856825,0.813955,0.82921
7,0.0057,0.965397,0.857929,0.844202,0.814797,0.823395
8,0.0048,1.005037,0.859762,0.836871,0.825551,0.828798
9,0.0026,1.066749,0.856095,0.837753,0.821382,0.826839
10,0.0024,1.077777,0.860678,0.841991,0.823959,0.831076


[I 2025-03-22 07:22:44,506] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.002168556477654359, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.353,0.615558,0.857012,0.865491,0.814682,0.829261
2,0.0428,0.578127,0.875344,0.843072,0.839883,0.839991
3,0.0176,0.781087,0.857012,0.818301,0.814754,0.814411
4,0.012,0.909565,0.858845,0.870849,0.815183,0.833278
5,0.0087,0.871326,0.868928,0.85865,0.841929,0.848548
6,0.0048,0.937538,0.87626,0.874748,0.837934,0.852715
7,0.0033,1.082725,0.860678,0.846398,0.817622,0.826732
8,0.0019,1.070584,0.872594,0.860896,0.835056,0.845667
9,0.0013,1.1198,0.875344,0.863715,0.83736,0.84842
10,0.0007,1.225664,0.870761,0.86009,0.832856,0.843971


[I 2025-03-22 07:25:56,929] Trial 59 finished with value: 0.8456978426052394 and parameters: {'learning_rate': 0.002168556477654359, 'weight_decay': 0.01, 'warmup_steps': 9}. Best is trial 40 with value: 0.8501281296030873.


Trial 60 with params: {'learning_rate': 0.00438809022727635, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2936,0.666389,0.856095,0.838053,0.806537,0.814922
2,0.034,0.748066,0.852429,0.839954,0.819928,0.825653
3,0.0167,0.777662,0.862511,0.856131,0.821172,0.830846
4,0.01,0.932745,0.855179,0.821343,0.823201,0.819385
5,0.0057,1.188244,0.860678,0.848351,0.827968,0.834409
6,0.003,1.190199,0.874427,0.861857,0.837385,0.847059
7,0.0024,1.315002,0.853346,0.836467,0.820788,0.826659
8,0.0015,1.349765,0.856095,0.835123,0.813001,0.822031
9,0.0009,1.54838,0.853346,0.829826,0.81384,0.81718
10,0.0004,1.601763,0.861595,0.83742,0.819633,0.824508


[I 2025-03-22 07:28:03,951] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.001760103184927571, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.372,0.595624,0.865261,0.831981,0.820151,0.824542
2,0.0481,0.58274,0.872594,0.824413,0.837644,0.829729
3,0.0215,0.889769,0.846929,0.83855,0.816507,0.822062
4,0.0129,0.815794,0.858845,0.845869,0.825936,0.831994
5,0.0104,0.837512,0.870761,0.857963,0.844127,0.848972
6,0.0058,0.878772,0.873511,0.861757,0.834956,0.845837
7,0.0034,1.045913,0.867094,0.857355,0.829614,0.840076
8,0.0023,1.086965,0.872594,0.859831,0.8349,0.844864
9,0.0017,1.117087,0.869844,0.857498,0.832736,0.842478
10,0.0009,1.154003,0.873511,0.862741,0.844355,0.851983


[I 2025-03-22 07:31:45,267] Trial 61 finished with value: 0.8400784262799097 and parameters: {'learning_rate': 0.001760103184927571, 'weight_decay': 0.01, 'warmup_steps': 10}. Best is trial 40 with value: 0.8501281296030873.


Trial 62 with params: {'learning_rate': 0.0012824673020133126, 'weight_decay': 0.007, 'warmup_steps': 19}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4335,0.547448,0.864345,0.858275,0.820662,0.833271
2,0.0604,0.621511,0.859762,0.825004,0.828076,0.822902
3,0.0257,0.7451,0.854262,0.83882,0.831055,0.832101
4,0.0158,0.873989,0.861595,0.858365,0.828047,0.837238
5,0.0114,0.876166,0.868928,0.858323,0.840991,0.847638
6,0.0066,0.945007,0.867094,0.879181,0.829468,0.847894
7,0.0047,1.082155,0.853346,0.845333,0.83033,0.834626
8,0.0035,1.075961,0.857929,0.856986,0.824074,0.835689
9,0.0017,1.174842,0.861595,0.85082,0.825636,0.835524
10,0.0013,1.200175,0.866178,0.854045,0.829757,0.838867


[I 2025-03-22 07:35:05,389] Trial 62 finished with value: 0.837666349528359 and parameters: {'learning_rate': 0.0012824673020133126, 'weight_decay': 0.007, 'warmup_steps': 19}. Best is trial 40 with value: 0.8501281296030873.


Trial 63 with params: {'learning_rate': 0.001448044690622023, 'weight_decay': 0.008, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3934,0.59069,0.855179,0.838573,0.813418,0.821268
2,0.0587,0.556845,0.873511,0.851353,0.836731,0.842246
3,0.0238,0.741489,0.855179,0.859383,0.820428,0.834031
4,0.0129,0.820396,0.863428,0.849146,0.829759,0.836141
5,0.0122,0.796951,0.867094,0.847921,0.840687,0.842973
6,0.0063,0.900582,0.867094,0.857493,0.829983,0.840528
7,0.0037,1.01179,0.855179,0.844683,0.821705,0.829432
8,0.0025,1.073326,0.857929,0.844366,0.815087,0.825339
9,0.0016,1.140201,0.856095,0.83269,0.813431,0.820142
10,0.0016,1.090063,0.863428,0.832058,0.826023,0.828347


[I 2025-03-22 07:37:11,139] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.00011912397327149118, 'weight_decay': 0.006, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0271,0.68664,0.759853,0.644714,0.645883,0.642307
2,0.4536,0.568659,0.809349,0.677109,0.692251,0.684157
3,0.3072,0.568893,0.807516,0.673561,0.692679,0.680986
4,0.2156,0.587531,0.819432,0.81636,0.727328,0.739555
5,0.1522,0.576083,0.836847,0.848635,0.776261,0.798336
6,0.1131,0.59805,0.83593,0.849359,0.784718,0.806084
7,0.0886,0.629571,0.837764,0.852207,0.806661,0.822676
8,0.0688,0.661757,0.832264,0.848571,0.791106,0.811028
9,0.056,0.67952,0.83868,0.855401,0.805129,0.823586
10,0.0456,0.677912,0.843263,0.845159,0.809586,0.823673


[I 2025-03-22 07:39:10,087] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.003851284481476066, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2749,0.627451,0.866178,0.860455,0.812968,0.826548
2,0.0339,0.722296,0.851512,0.821865,0.808782,0.812949
3,0.0152,0.848949,0.856095,0.840733,0.815125,0.823142
4,0.0094,0.971563,0.860678,0.84687,0.817144,0.828009
5,0.0072,1.207033,0.859762,0.848345,0.827652,0.832597


[I 2025-03-22 07:40:13,901] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0027320043379929757, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3243,0.572385,0.866178,0.859442,0.821611,0.834559
2,0.0372,0.598279,0.872594,0.867938,0.837061,0.848243
3,0.0191,0.806507,0.858845,0.85413,0.81756,0.828465
4,0.0098,0.864749,0.864345,0.858869,0.822091,0.833289
5,0.0059,0.895846,0.873511,0.88313,0.825909,0.846223
6,0.004,1.010357,0.860678,0.856475,0.81903,0.831959
7,0.0032,0.994371,0.872594,0.869062,0.836311,0.848405
8,0.0017,1.165792,0.870761,0.869566,0.844395,0.853038
9,0.0016,1.195121,0.862511,0.850072,0.82848,0.835949
10,0.001,1.128617,0.868928,0.865293,0.83449,0.845337


[I 2025-03-22 07:43:40,182] Trial 66 finished with value: 0.8451991812802983 and parameters: {'learning_rate': 0.0027320043379929757, 'weight_decay': 0.01, 'warmup_steps': 5}. Best is trial 40 with value: 0.8501281296030873.


Trial 67 with params: {'learning_rate': 0.002064289086569495, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3453,0.563446,0.862511,0.85724,0.818647,0.830699
2,0.0418,0.611804,0.861595,0.831213,0.829013,0.828297
3,0.0205,0.740623,0.853346,0.851741,0.823507,0.830269
4,0.0113,0.84688,0.864345,0.85341,0.821511,0.830772
5,0.0073,0.884473,0.874427,0.849603,0.83873,0.841765
6,0.0043,1.007826,0.869844,0.870855,0.832424,0.847088
7,0.0033,1.024652,0.875344,0.868657,0.829502,0.842675
8,0.0025,1.072242,0.868011,0.860202,0.821256,0.83556
9,0.0012,1.10672,0.872594,0.867089,0.827192,0.84096
10,0.0008,1.130397,0.872594,0.857203,0.82661,0.837778


[I 2025-03-22 07:46:06,605] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 0.0018950244195786625, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3576,0.571613,0.862511,0.856242,0.818863,0.830189
2,0.0461,0.616675,0.846929,0.82708,0.817452,0.819491
3,0.0191,0.838566,0.845096,0.835592,0.81606,0.819853
4,0.0133,0.751556,0.867094,0.840273,0.830027,0.833725
5,0.0082,0.837149,0.875344,0.853788,0.836759,0.843841
6,0.0058,0.954574,0.870761,0.84612,0.823945,0.832931
7,0.0038,1.148364,0.862511,0.849266,0.818325,0.828413
8,0.0026,1.174231,0.865261,0.841713,0.820912,0.828022
9,0.0013,1.230204,0.867094,0.842479,0.822648,0.828552
10,0.0011,1.242472,0.866178,0.833863,0.821163,0.825082


[I 2025-03-22 07:48:22,029] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.004205416081449773, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2848,0.663881,0.853346,0.843988,0.820722,0.828605
2,0.0309,0.694491,0.860678,0.845821,0.817682,0.827572
3,0.016,0.946776,0.863428,0.830521,0.83,0.827329
4,0.0086,1.091759,0.857929,0.827525,0.823032,0.824053
5,0.0073,1.040631,0.865261,0.839867,0.821971,0.827943
6,0.0036,1.101748,0.866178,0.837269,0.831822,0.832997
7,0.0023,1.166673,0.866178,0.853114,0.822217,0.831905
8,0.0019,1.20306,0.868928,0.824286,0.825782,0.822517
9,0.0027,1.241126,0.862511,0.836097,0.821508,0.824053
10,0.0016,1.269603,0.868928,0.838561,0.824936,0.828985


[I 2025-03-22 07:50:22,016] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0016097461614093362, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3806,0.597617,0.861595,0.823566,0.817083,0.819025
2,0.0512,0.60775,0.866178,0.836191,0.831826,0.832888
3,0.0241,0.722424,0.854262,0.834022,0.821488,0.823976
4,0.0129,0.804465,0.866178,0.844311,0.830866,0.835022
5,0.01,0.811262,0.870761,0.858913,0.843942,0.848888
6,0.0059,0.978796,0.863428,0.863505,0.827386,0.841221
7,0.005,1.055021,0.855179,0.855476,0.821583,0.832554
8,0.0036,0.977131,0.869844,0.848579,0.833009,0.839103
9,0.0014,1.163177,0.859762,0.844639,0.83413,0.837646
10,0.001,1.132152,0.870761,0.858835,0.833371,0.843365


[I 2025-03-22 07:53:40,289] Trial 70 finished with value: 0.8418000066019796 and parameters: {'learning_rate': 0.0016097461614093362, 'weight_decay': 0.01, 'warmup_steps': 7}. Best is trial 40 with value: 0.8501281296030873.


Trial 71 with params: {'learning_rate': 0.001152808017690577, 'weight_decay': 0.01, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.433,0.557223,0.850596,0.837386,0.809071,0.818432
2,0.0667,0.570337,0.864345,0.830522,0.829695,0.828247
3,0.0289,0.73101,0.855179,0.834132,0.823205,0.825012
4,0.0166,0.789612,0.855179,0.843657,0.811012,0.821812
5,0.0113,0.786407,0.868928,0.855068,0.822033,0.834931


[I 2025-03-22 07:54:48,100] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.002165564001945086, 'weight_decay': 0.007, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3734,0.612473,0.84418,0.854113,0.796504,0.811441
2,0.0424,0.631667,0.865261,0.835827,0.83089,0.832011
3,0.0208,0.811496,0.856095,0.839778,0.815624,0.821294
4,0.012,0.938481,0.856095,0.845815,0.822697,0.829511
5,0.0094,1.04933,0.859762,0.847129,0.827751,0.832264
6,0.0057,1.075989,0.857012,0.842697,0.814742,0.823917
7,0.0029,1.162938,0.857929,0.869211,0.813852,0.831829
8,0.0028,1.115941,0.861595,0.859133,0.817286,0.83166
9,0.0014,1.190463,0.866178,0.850442,0.821249,0.83152
10,0.0008,1.199317,0.864345,0.839561,0.819064,0.826826


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 07:56:58,532] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 5.953168512495511e-05, 'weight_decay': 0.01, 'warmup_steps': 27}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2942,0.903597,0.67736,0.58787,0.569769,0.57407
2,0.6954,0.691131,0.758937,0.634354,0.649465,0.640941
3,0.5013,0.636589,0.782768,0.656314,0.670809,0.661239
4,0.4003,0.594652,0.800183,0.676374,0.683724,0.678707
5,0.3321,0.58364,0.808433,0.677388,0.691624,0.684147


[I 2025-03-22 07:58:04,282] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.002895909710931081, 'weight_decay': 0.007, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3258,0.577083,0.865261,0.870802,0.812266,0.829733
2,0.036,0.593521,0.861595,0.820494,0.829088,0.823401
3,0.0175,0.84711,0.853346,0.83257,0.822916,0.824215
4,0.0101,0.821694,0.867094,0.822485,0.823244,0.821585
5,0.0068,1.020174,0.867094,0.842948,0.823977,0.829668
6,0.0058,0.971653,0.864345,0.849135,0.8207,0.831065
7,0.0046,1.001388,0.865261,0.826772,0.821458,0.822558
8,0.0026,1.057011,0.861595,0.839177,0.819017,0.826285
9,0.0012,1.170172,0.866178,0.842418,0.821267,0.829364
10,0.0005,1.218944,0.866178,0.851395,0.821364,0.832458


[I 2025-03-22 08:00:49,626] Trial 74 finished with value: 0.8339676343120375 and parameters: {'learning_rate': 0.002895909710931081, 'weight_decay': 0.007, 'warmup_steps': 12}. Best is trial 40 with value: 0.8501281296030873.


Trial 75 with params: {'learning_rate': 0.00037030632959673737, 'weight_decay': 0.008, 'warmup_steps': 16}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6801,0.569728,0.815765,0.689291,0.695584,0.689337
2,0.1782,0.521254,0.855179,0.838677,0.802199,0.815671
3,0.0837,0.67235,0.848763,0.811092,0.818154,0.809624
4,0.0478,0.710788,0.856095,0.866558,0.822366,0.836785
5,0.0303,0.696704,0.852429,0.82722,0.820155,0.822538


[I 2025-03-22 08:01:45,336] Trial 75 pruned. 


Trial 76 with params: {'learning_rate': 0.002334767879974332, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3677,0.595082,0.857012,0.863693,0.806161,0.821477
2,0.0404,0.647784,0.861595,0.831474,0.829585,0.828488
3,0.0183,0.722654,0.858845,0.843005,0.817662,0.825506
4,0.0123,0.834842,0.862511,0.858372,0.818006,0.832165
5,0.0075,0.906805,0.862511,0.84673,0.820123,0.828591
6,0.0055,0.987179,0.865261,0.848644,0.821331,0.830334
7,0.0025,1.043225,0.866178,0.860813,0.821454,0.833946
8,0.0016,1.171337,0.862511,0.85763,0.818599,0.831304
9,0.0016,1.175368,0.863428,0.859587,0.819328,0.831927
10,0.001,1.185036,0.866178,0.864199,0.820785,0.835897


[I 2025-03-22 08:04:57,096] Trial 76 finished with value: 0.8364604536329785 and parameters: {'learning_rate': 0.002334767879974332, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23}. Best is trial 40 with value: 0.8501281296030873.


Trial 77 with params: {'learning_rate': 0.0019398761451169626, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3564,0.632604,0.859762,0.853468,0.817383,0.828541
2,0.0456,0.60254,0.871677,0.82857,0.835971,0.831216
3,0.0205,0.814206,0.862511,0.832569,0.829604,0.828067
4,0.0139,0.804732,0.858845,0.838964,0.823489,0.829171
5,0.0096,0.850726,0.871677,0.858431,0.83484,0.843993
6,0.005,0.940309,0.864345,0.863174,0.828866,0.841857
7,0.0035,1.155645,0.853346,0.862828,0.813747,0.825388
8,0.0028,1.089047,0.864345,0.873094,0.820513,0.837373
9,0.0012,1.132006,0.870761,0.880272,0.834985,0.850125
10,0.0006,1.165802,0.867094,0.855448,0.830715,0.839732


[I 2025-03-22 08:08:22,883] Trial 77 finished with value: 0.8481143418637026 and parameters: {'learning_rate': 0.0019398761451169626, 'weight_decay': 0.01, 'warmup_steps': 7}. Best is trial 40 with value: 0.8501281296030873.


Trial 78 with params: {'learning_rate': 0.001977672110331516, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3613,0.625651,0.861595,0.869505,0.817935,0.833989
2,0.045,0.64218,0.861595,0.846916,0.829023,0.834483
3,0.0206,0.783543,0.868011,0.840801,0.832888,0.834326
4,0.0119,0.739499,0.877177,0.85305,0.840865,0.84462
5,0.0083,0.859249,0.875344,0.860887,0.837983,0.846775
6,0.0061,0.908665,0.863428,0.861844,0.828814,0.840534
7,0.0031,1.040106,0.866178,0.853115,0.831535,0.838685
8,0.0018,1.082822,0.867094,0.852415,0.821853,0.833426
9,0.0012,1.15906,0.863428,0.849419,0.819197,0.830203
10,0.0009,1.143029,0.871677,0.858799,0.834383,0.844046


[I 2025-03-22 08:12:04,111] Trial 78 finished with value: 0.8392240535754979 and parameters: {'learning_rate': 0.001977672110331516, 'weight_decay': 0.009000000000000001, 'warmup_steps': 10}. Best is trial 40 with value: 0.8501281296030873.


Trial 79 with params: {'learning_rate': 0.004113013113461955, 'weight_decay': 0.01, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2866,0.596143,0.864345,0.848271,0.820857,0.830958
2,0.0328,0.75879,0.857012,0.854113,0.814105,0.827169
3,0.0173,0.887195,0.855179,0.851012,0.814045,0.824657
4,0.0082,1.052438,0.850596,0.843988,0.817068,0.827103
5,0.0068,1.039296,0.872594,0.854619,0.8343,0.843125
6,0.0029,1.08855,0.874427,0.859402,0.838136,0.845898
7,0.0036,1.260618,0.861595,0.84726,0.81771,0.827598
8,0.0018,1.438299,0.864345,0.835262,0.819888,0.824053
9,0.0007,1.420533,0.869844,0.836643,0.825191,0.828384
10,0.0003,1.455051,0.867094,0.834958,0.822516,0.82666


[I 2025-03-22 08:14:10,616] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.001454873595932167, 'weight_decay': 0.007, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3911,0.605697,0.848763,0.812132,0.808464,0.807178
2,0.0576,0.574011,0.866178,0.824484,0.831087,0.827026
3,0.0234,0.814938,0.845096,0.84891,0.813552,0.823993
4,0.0148,0.816058,0.860678,0.859156,0.826219,0.837257
5,0.0106,0.826109,0.870761,0.851455,0.842932,0.846254
6,0.0046,0.954381,0.864345,0.863355,0.828423,0.841539
7,0.0032,1.136357,0.857012,0.844559,0.815059,0.823017
8,0.0032,1.115166,0.856095,0.844242,0.823658,0.83059
9,0.0019,1.242351,0.849679,0.831394,0.817515,0.82185
10,0.0011,1.232534,0.858845,0.842565,0.833565,0.836465


[I 2025-03-22 08:16:17,802] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 5.507514353796265e-05, 'weight_decay': 0.0, 'warmup_steps': 24}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3166,0.932122,0.661778,0.575245,0.555527,0.560386
2,0.727,0.7128,0.75527,0.630526,0.645687,0.637027
3,0.5272,0.647184,0.778185,0.652864,0.666289,0.657592
4,0.4246,0.606005,0.796517,0.673242,0.680485,0.675463
5,0.3562,0.594571,0.8011,0.671367,0.686242,0.678223
6,0.3045,0.580626,0.812099,0.681795,0.693891,0.687633
7,0.2651,0.574059,0.816682,0.685675,0.697484,0.69149
8,0.2318,0.585134,0.811182,0.848939,0.711146,0.721318
9,0.2063,0.58496,0.809349,0.845761,0.702669,0.703199
10,0.1852,0.587143,0.819432,0.821624,0.735715,0.752743


[I 2025-03-22 08:18:32,629] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.001902793830575616, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3547,0.585171,0.867094,0.861195,0.822952,0.83521
2,0.0469,0.617942,0.859762,0.836061,0.827546,0.829027
3,0.0219,0.753136,0.851512,0.830874,0.820248,0.822188
4,0.0134,0.762947,0.865261,0.855256,0.828513,0.839038
5,0.0085,0.787415,0.875344,0.857391,0.845935,0.850948
6,0.0055,0.893739,0.863428,0.844806,0.826208,0.833766
7,0.0031,0.987077,0.865261,0.843006,0.830446,0.834108
8,0.0024,0.978951,0.874427,0.872565,0.837233,0.850585
9,0.0013,1.097872,0.863428,0.862206,0.828081,0.840916
10,0.0007,1.128563,0.873511,0.860431,0.835444,0.84563


[I 2025-03-22 08:21:34,201] Trial 82 finished with value: 0.8428793727487038 and parameters: {'learning_rate': 0.001902793830575616, 'weight_decay': 0.01, 'warmup_steps': 8}. Best is trial 40 with value: 0.8501281296030873.


Trial 83 with params: {'learning_rate': 0.0014273398759477199, 'weight_decay': 0.01, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3983,0.57175,0.857929,0.851331,0.805595,0.820606
2,0.0566,0.616155,0.860678,0.826032,0.826216,0.825403
3,0.0246,0.717284,0.853346,0.827702,0.819733,0.821559
4,0.0145,0.754481,0.870761,0.850475,0.84372,0.845231
5,0.0096,0.810357,0.870761,0.851057,0.84347,0.845903
6,0.0057,0.951407,0.865261,0.857191,0.837697,0.845231
7,0.0032,1.057817,0.864345,0.848235,0.820839,0.829428
8,0.003,1.07856,0.858845,0.848301,0.814359,0.827565
9,0.0021,1.084054,0.865261,0.856149,0.826869,0.83894
10,0.0011,1.092505,0.868928,0.867102,0.831877,0.845828


[I 2025-03-22 08:25:15,718] Trial 83 finished with value: 0.8393062297285739 and parameters: {'learning_rate': 0.0014273398759477199, 'weight_decay': 0.01, 'warmup_steps': 7}. Best is trial 40 with value: 0.8501281296030873.


Trial 84 with params: {'learning_rate': 0.0026623355006501043, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.329,0.621436,0.860678,0.842232,0.808435,0.819362
2,0.038,0.618629,0.872594,0.829928,0.828787,0.827643
3,0.0177,0.890901,0.857929,0.826239,0.816392,0.81821
4,0.0103,0.824144,0.879927,0.861852,0.833941,0.843969
5,0.0075,1.026079,0.868928,0.842304,0.825272,0.830095


[I 2025-03-22 08:26:12,220] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0017564211264440269, 'weight_decay': 0.008, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3743,0.604343,0.866178,0.860176,0.821966,0.834605
2,0.0474,0.609972,0.866178,0.830728,0.831013,0.830151
3,0.0221,0.786206,0.849679,0.82507,0.816583,0.818303
4,0.0126,0.830033,0.855179,0.838512,0.819851,0.826908
5,0.0094,0.868735,0.870761,0.859173,0.833371,0.843675
6,0.0047,0.964347,0.865261,0.857348,0.838132,0.845776
7,0.0042,1.070605,0.863428,0.873064,0.827065,0.842604
8,0.0027,1.148473,0.860678,0.838321,0.827185,0.829645
9,0.002,1.104699,0.870761,0.839177,0.841881,0.839829
10,0.0015,1.1203,0.861595,0.832808,0.825872,0.82831


[I 2025-03-22 08:28:46,959] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.002354369079617836, 'weight_decay': 0.01, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3351,0.575958,0.860678,0.842204,0.808514,0.81861
2,0.0412,0.651191,0.866178,0.843397,0.831768,0.835542
3,0.0193,0.832452,0.854262,0.834406,0.82316,0.824568
4,0.0102,0.755149,0.866178,0.843929,0.831427,0.836011
5,0.0067,0.897479,0.866178,0.860663,0.821448,0.835094


[I 2025-03-22 08:29:52,348] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0016742632592731134, 'weight_decay': 0.01, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3771,0.579669,0.859762,0.833918,0.817742,0.822532
2,0.0511,0.593491,0.859762,0.83719,0.82658,0.829649
3,0.0219,0.842564,0.839597,0.810978,0.810687,0.805784
4,0.0116,0.84914,0.867094,0.867253,0.828483,0.843442
5,0.0097,0.888538,0.865261,0.865021,0.828234,0.841738
6,0.0063,0.925075,0.867094,0.852025,0.822003,0.832389
7,0.0042,1.157137,0.859762,0.870487,0.816975,0.832292
8,0.003,1.174551,0.852429,0.836607,0.819419,0.824198
9,0.0017,1.175652,0.854262,0.835239,0.821502,0.825049
10,0.001,1.244924,0.857929,0.847264,0.824118,0.832041


[I 2025-03-22 08:33:24,657] Trial 87 finished with value: 0.8290164705497712 and parameters: {'learning_rate': 0.0016742632592731134, 'weight_decay': 0.01, 'warmup_steps': 11}. Best is trial 40 with value: 0.8501281296030873.


Trial 88 with params: {'learning_rate': 0.000989963314574002, 'weight_decay': 0.01, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4499,0.562203,0.849679,0.817375,0.799707,0.804686
2,0.077,0.560138,0.869844,0.843447,0.84304,0.841356
3,0.0333,0.742284,0.851512,0.852269,0.809378,0.823723
4,0.0171,0.812091,0.859762,0.869821,0.815374,0.832941
5,0.0121,0.84954,0.859762,0.857429,0.814396,0.82948
6,0.0078,0.896443,0.861595,0.860491,0.825945,0.838197
7,0.0051,1.074242,0.849679,0.846578,0.808227,0.819809
8,0.0045,1.029817,0.857012,0.854307,0.81321,0.827137
9,0.0024,1.114736,0.854262,0.853172,0.810135,0.825249
10,0.0018,1.040691,0.857929,0.840282,0.821596,0.829516


[I 2025-03-22 08:35:39,287] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0026472720322389194, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3229,0.593471,0.864345,0.843828,0.828516,0.834149
2,0.0379,0.655986,0.850596,0.805569,0.819582,0.811049
3,0.0179,0.82262,0.857929,0.831626,0.825017,0.826311
4,0.0106,0.845823,0.873511,0.850854,0.838072,0.842039
5,0.0083,0.945587,0.868928,0.838308,0.840863,0.839017
6,0.0054,0.971759,0.871677,0.860623,0.832034,0.844188
7,0.0027,1.077082,0.868928,0.848262,0.832786,0.838119
8,0.0023,1.103217,0.872594,0.858244,0.835323,0.844009
9,0.001,1.24407,0.867094,0.853956,0.831129,0.839649
10,0.0005,1.282502,0.868011,0.855802,0.831361,0.840826


[I 2025-03-22 08:39:03,043] Trial 89 finished with value: 0.8399596942705562 and parameters: {'learning_rate': 0.0026472720322389194, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 40 with value: 0.8501281296030873.


Trial 90 with params: {'learning_rate': 0.0032738594616780903, 'weight_decay': 0.01, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2986,0.590153,0.867094,0.864701,0.820921,0.836883
2,0.0339,0.695406,0.853346,0.809523,0.81285,0.809635
3,0.0176,0.831599,0.855179,0.852526,0.812718,0.825012
4,0.0106,0.916464,0.858845,0.830268,0.81424,0.820237
5,0.0066,1.113237,0.868011,0.830537,0.822342,0.82462


[I 2025-03-22 08:39:57,308] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0007321203163635797, 'weight_decay': 0.009000000000000001, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5319,0.546905,0.847846,0.827844,0.777354,0.792757
2,0.0955,0.554164,0.87626,0.873309,0.83731,0.850897
3,0.044,0.745068,0.851512,0.867522,0.819277,0.833796
4,0.0253,0.765086,0.855179,0.846263,0.822366,0.829261
5,0.0172,0.833284,0.850596,0.813654,0.826591,0.818036
6,0.0111,0.86454,0.858845,0.85538,0.825631,0.835669
7,0.0095,0.940587,0.862511,0.871492,0.817017,0.835527
8,0.0061,0.93584,0.860678,0.843003,0.834402,0.837585
9,0.0037,1.026897,0.859762,0.859439,0.823515,0.836788
10,0.003,1.09288,0.856095,0.842336,0.830118,0.833417


[I 2025-03-22 08:42:57,028] Trial 91 finished with value: 0.8412874137225418 and parameters: {'learning_rate': 0.0007321203163635797, 'weight_decay': 0.009000000000000001, 'warmup_steps': 22}. Best is trial 40 with value: 0.8501281296030873.


Trial 92 with params: {'learning_rate': 0.0013553024971708217, 'weight_decay': 0.009000000000000001, 'warmup_steps': 29}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4404,0.528957,0.868928,0.861941,0.825413,0.837169
2,0.0584,0.570886,0.873511,0.849238,0.837233,0.841016
3,0.0264,0.776553,0.849679,0.840354,0.819896,0.824438
4,0.0144,0.77296,0.871677,0.870121,0.823094,0.840668
5,0.0096,0.866677,0.863428,0.82673,0.835645,0.830519


[I 2025-03-22 08:43:59,822] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 0.001102597971175093, 'weight_decay': 0.008, 'warmup_steps': 13}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.445,0.556784,0.854262,0.83672,0.802925,0.813828
2,0.0696,0.55311,0.866178,0.824082,0.830745,0.826235
3,0.0283,0.821191,0.846013,0.822935,0.814097,0.815237
4,0.0163,0.844172,0.860678,0.859391,0.814758,0.830056
5,0.0119,0.80089,0.869844,0.848926,0.83204,0.838513
6,0.0079,0.841446,0.863428,0.844901,0.827138,0.834383
7,0.0043,1.121002,0.853346,0.832374,0.811307,0.816854
8,0.0033,1.052046,0.858845,0.820335,0.823633,0.820869
9,0.0023,1.063531,0.860678,0.857567,0.815597,0.830969
10,0.0024,1.047228,0.863428,0.844993,0.825537,0.834015


[I 2025-03-22 08:47:15,026] Trial 93 finished with value: 0.8293312006067 and parameters: {'learning_rate': 0.001102597971175093, 'weight_decay': 0.008, 'warmup_steps': 13}. Best is trial 40 with value: 0.8501281296030873.


Trial 94 with params: {'learning_rate': 0.0033819498289936888, 'weight_decay': 0.01, 'warmup_steps': 17}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3213,0.589082,0.869844,0.880211,0.825391,0.842835
2,0.0355,0.720549,0.848763,0.820305,0.81887,0.817003
3,0.0169,0.944329,0.851512,0.840405,0.822263,0.824878
4,0.0106,0.85179,0.868928,0.857792,0.832824,0.842792
5,0.0073,1.187468,0.852429,0.834488,0.820112,0.823626


[I 2025-03-22 08:48:41,662] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 6.674239709387763e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 14}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2427,0.851379,0.703025,0.605096,0.59411,0.596919
2,0.6404,0.662401,0.770852,0.644074,0.660871,0.651626
3,0.4624,0.622699,0.788268,0.661034,0.675723,0.66643
4,0.3656,0.584639,0.80385,0.678743,0.686987,0.681656
5,0.2992,0.575355,0.813016,0.680522,0.695629,0.687549
6,0.2494,0.572732,0.824015,0.85818,0.712327,0.715445
7,0.2123,0.570309,0.822181,0.800788,0.720228,0.7284
8,0.1796,0.580207,0.823098,0.831595,0.746261,0.766978
9,0.1562,0.590279,0.823098,0.830451,0.748479,0.766298
10,0.1362,0.591225,0.832264,0.842986,0.772643,0.793855


[I 2025-03-22 08:50:45,448] Trial 95 pruned. 


Trial 96 with params: {'learning_rate': 0.0020530970584109317, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3529,0.596192,0.856095,0.863863,0.814768,0.828689
2,0.0427,0.695857,0.852429,0.838901,0.82183,0.825735
3,0.02,0.776214,0.856095,0.845522,0.823599,0.829574
4,0.0121,0.87577,0.861595,0.865223,0.826106,0.8401
5,0.0071,0.925222,0.867094,0.854817,0.832445,0.839928
6,0.0054,0.961144,0.87626,0.858083,0.836471,0.845803
7,0.0029,1.08824,0.869844,0.86083,0.833721,0.842909
8,0.0023,1.124693,0.871677,0.871531,0.834162,0.848242
9,0.0018,1.06995,0.871677,0.851766,0.83517,0.840907
10,0.0025,1.02898,0.87901,0.858828,0.84121,0.847689


[I 2025-03-22 08:53:56,204] Trial 96 finished with value: 0.844195828444018 and parameters: {'learning_rate': 0.0020530970584109317, 'weight_decay': 0.01, 'warmup_steps': 10}. Best is trial 40 with value: 0.8501281296030873.


Trial 97 with params: {'learning_rate': 0.0015110049449961044, 'weight_decay': 0.009000000000000001, 'warmup_steps': 12}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3995,0.60498,0.853346,0.848866,0.81272,0.822822
2,0.054,0.605823,0.858845,0.822052,0.827239,0.822724
3,0.0232,0.729957,0.851512,0.826498,0.819357,0.820103
4,0.0137,0.820906,0.855179,0.847008,0.823506,0.829567
5,0.0104,0.797928,0.870761,0.834744,0.833984,0.833791
6,0.0062,0.918653,0.859762,0.83645,0.816064,0.823486
7,0.0033,1.059456,0.851512,0.828243,0.808945,0.815521
8,0.0022,1.137532,0.860678,0.837946,0.815869,0.823579
9,0.0013,1.18556,0.863428,0.839523,0.817497,0.825933
10,0.0008,1.225955,0.860678,0.836991,0.815712,0.823599


[I 2025-03-22 08:56:02,505] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.002890512843431425, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3156,0.638077,0.865261,0.84395,0.81282,0.822212
2,0.0359,0.680695,0.868011,0.831498,0.833045,0.831299
3,0.0176,0.84839,0.862511,0.82865,0.821783,0.821996
4,0.0104,0.978598,0.858845,0.845425,0.816195,0.826756
5,0.0074,1.114984,0.865261,0.842546,0.821262,0.827641
6,0.0066,1.009694,0.861595,0.851,0.817932,0.828979
7,0.0039,1.162485,0.868928,0.845556,0.824274,0.83138
8,0.0019,1.164146,0.868928,0.863791,0.824463,0.837755
9,0.001,1.208482,0.867094,0.862624,0.821859,0.836466
10,0.0004,1.325371,0.868011,0.844127,0.82295,0.830801


[I 2025-03-22 08:58:10,429] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0016210031832433046, 'weight_decay': 0.01, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3836,0.589694,0.861595,0.830004,0.825762,0.826277
2,0.0499,0.644659,0.861595,0.831297,0.828704,0.82813
3,0.0225,0.839012,0.851512,0.828418,0.819136,0.820907
4,0.0145,0.756864,0.870761,0.856941,0.834768,0.842709
5,0.0104,0.818464,0.870761,0.859151,0.844372,0.849673
6,0.0049,0.865562,0.873511,0.862025,0.836753,0.846723
7,0.003,1.06955,0.869844,0.857795,0.833586,0.842173
8,0.0032,1.080722,0.864345,0.846928,0.828925,0.834637
9,0.0027,1.04194,0.871677,0.861536,0.833927,0.845202
10,0.0018,1.042113,0.869844,0.858864,0.833876,0.843114


[I 2025-03-22 09:01:19,673] Trial 99 finished with value: 0.8427717315610344 and parameters: {'learning_rate': 0.0016210031832433046, 'weight_decay': 0.01, 'warmup_steps': 8}. Best is trial 40 with value: 0.8501281296030873.


Trial 100 with params: {'learning_rate': 0.0022383952267447343, 'weight_decay': 0.01, 'warmup_steps': 10}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3445,0.602919,0.862511,0.871142,0.81884,0.834743
2,0.0408,0.598198,0.866178,0.831031,0.831266,0.830275
3,0.0184,0.880023,0.848763,0.838777,0.818396,0.823918
4,0.0114,0.833546,0.868011,0.845912,0.821998,0.830761
5,0.0085,0.930404,0.873511,0.861392,0.826008,0.840038
6,0.006,0.982595,0.864345,0.864796,0.818263,0.835466
7,0.0031,1.097079,0.866178,0.854658,0.83032,0.839083
8,0.0021,1.117033,0.870761,0.866622,0.825238,0.839829
9,0.0017,1.109054,0.870761,0.867723,0.83547,0.8469
10,0.0006,1.191565,0.870761,0.868345,0.834766,0.846752


[I 2025-03-22 09:04:33,605] Trial 100 finished with value: 0.8394751891066007 and parameters: {'learning_rate': 0.0022383952267447343, 'weight_decay': 0.01, 'warmup_steps': 10}. Best is trial 40 with value: 0.8501281296030873.


Trial 101 with params: {'learning_rate': 0.000832356614740368, 'weight_decay': 0.009000000000000001, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4865,0.540619,0.856095,0.862357,0.79534,0.812756
2,0.0867,0.557316,0.864345,0.82893,0.837258,0.832225
3,0.0387,0.822787,0.837764,0.810961,0.807929,0.802762
4,0.0216,0.845414,0.847846,0.845874,0.816473,0.82309
5,0.0154,0.829392,0.863428,0.859609,0.818031,0.832294


[I 2025-03-22 09:05:41,188] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.003329308543356044, 'weight_decay': 0.004, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2868,0.642707,0.856095,0.849861,0.8039,0.818528
2,0.0362,0.673266,0.858845,0.810088,0.815802,0.812281
3,0.0161,0.893574,0.857929,0.842012,0.818305,0.823751
4,0.0086,0.94935,0.857929,0.835707,0.815108,0.822549
5,0.0062,1.142785,0.870761,0.839601,0.82535,0.831024
6,0.006,1.123366,0.868011,0.834236,0.824631,0.826237
7,0.0034,1.288576,0.856095,0.834141,0.815408,0.818354
8,0.0018,1.215016,0.868011,0.830465,0.823307,0.824984
9,0.0008,1.276707,0.868928,0.831003,0.823987,0.825488
10,0.0006,1.296879,0.868011,0.824554,0.823307,0.822172


[I 2025-03-22 09:07:50,498] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0017856004545142741, 'weight_decay': 0.01, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3681,0.605939,0.867094,0.859534,0.822912,0.834163
2,0.0466,0.724757,0.842346,0.819445,0.822211,0.817386
3,0.0222,0.802378,0.843263,0.834009,0.813789,0.818188
4,0.0125,0.745393,0.865261,0.854344,0.830081,0.838807
5,0.009,0.787425,0.871677,0.860244,0.834427,0.844556
6,0.0044,0.929748,0.861595,0.852449,0.825852,0.836543
7,0.0037,1.03991,0.860678,0.862237,0.826574,0.838097
8,0.0027,1.042684,0.860678,0.850189,0.8255,0.834686
9,0.0027,1.137439,0.858845,0.845834,0.825499,0.831821
10,0.0012,1.097226,0.865261,0.849796,0.847504,0.847879


[I 2025-03-22 09:11:40,460] Trial 103 finished with value: 0.8370995643916723 and parameters: {'learning_rate': 0.0017856004545142741, 'weight_decay': 0.01, 'warmup_steps': 9}. Best is trial 40 with value: 0.8501281296030873.


Trial 104 with params: {'learning_rate': 0.0019857624255920285, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3496,0.625897,0.855179,0.848855,0.814416,0.825139
2,0.0445,0.685765,0.856095,0.820499,0.825035,0.820606
3,0.0203,0.722012,0.864345,0.836432,0.829929,0.830919
4,0.0116,0.81173,0.866178,0.854895,0.830482,0.83994
5,0.0082,0.901034,0.865261,0.874389,0.830727,0.845258
6,0.0046,0.9596,0.869844,0.86796,0.833325,0.846239
7,0.0032,0.994054,0.870761,0.868222,0.833283,0.84636
8,0.0024,1.108136,0.864345,0.860926,0.830746,0.840405
9,0.0012,1.168237,0.864345,0.861419,0.82967,0.840924
10,0.0008,1.215694,0.866178,0.85275,0.831524,0.838386


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-22 09:14:51,805] Trial 104 finished with value: 0.8453235650456677 and parameters: {'learning_rate': 0.0019857624255920285, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}. Best is trial 40 with value: 0.8501281296030873.


Trial 105 with params: {'learning_rate': 0.001354338449921022, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.389,0.559824,0.860678,0.84935,0.816994,0.82715
2,0.0587,0.619091,0.864345,0.840614,0.830717,0.832462
3,0.0273,0.783048,0.846929,0.832876,0.815942,0.818566
4,0.0135,0.795747,0.865261,0.852007,0.820118,0.832057
5,0.0097,0.813794,0.867094,0.863129,0.822596,0.836481
6,0.0065,0.926629,0.868928,0.857375,0.832462,0.841235
7,0.0046,1.07113,0.848763,0.830056,0.817401,0.819818
8,0.0033,1.018983,0.865261,0.862043,0.820023,0.835051
9,0.0022,1.080729,0.854262,0.827489,0.821288,0.821949
10,0.0017,1.099499,0.863428,0.853436,0.826759,0.837552


[I 2025-03-22 09:17:24,463] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.0015233176246774777, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3838,0.587972,0.862511,0.869458,0.819647,0.835009
2,0.0552,0.618457,0.857929,0.820883,0.833189,0.825976
3,0.0237,0.79979,0.845096,0.818663,0.823289,0.81784
4,0.0148,0.840604,0.858845,0.870816,0.823788,0.840249
5,0.0097,0.817836,0.872594,0.86846,0.836121,0.848021
6,0.006,0.983954,0.861595,0.859058,0.827677,0.8377
7,0.0036,1.188242,0.859762,0.870928,0.81777,0.832014
8,0.0033,1.092789,0.856095,0.853396,0.823595,0.833499
9,0.0025,1.198405,0.854262,0.826288,0.820929,0.822118
10,0.0013,1.09012,0.863428,0.836151,0.828644,0.831261


[I 2025-03-22 09:20:19,038] Trial 106 finished with value: 0.8453877758077807 and parameters: {'learning_rate': 0.0015233176246774777, 'weight_decay': 0.008, 'warmup_steps': 5}. Best is trial 40 with value: 0.8501281296030873.


Trial 107 with params: {'learning_rate': 0.0011069490856093639, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4306,0.550365,0.856095,0.819138,0.8119,0.812424
2,0.0688,0.579876,0.864345,0.859925,0.828223,0.839777
3,0.0305,0.7436,0.846013,0.861351,0.813249,0.828438
4,0.0169,0.781966,0.858845,0.871251,0.833148,0.847365
5,0.0122,0.907929,0.851512,0.851399,0.818139,0.829358


[I 2025-03-22 09:21:23,013] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.001811860418717086, 'weight_decay': 0.007, 'warmup_steps': 9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3685,0.578588,0.866178,0.859312,0.822354,0.83425
2,0.0466,0.610026,0.868928,0.83782,0.834301,0.834676
3,0.0197,0.843977,0.847846,0.822213,0.817692,0.815862
4,0.013,0.841177,0.860678,0.840086,0.825017,0.830213
5,0.011,0.834827,0.868011,0.853871,0.822438,0.83415


[I 2025-03-22 09:22:29,345] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.003984648457618905, 'weight_decay': 0.008, 'warmup_steps': 11}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2979,0.633947,0.873511,0.856532,0.827983,0.837365
2,0.0321,0.685603,0.867094,0.844885,0.831606,0.835803
3,0.0162,0.868482,0.859762,0.844849,0.818722,0.825974
4,0.0089,0.969397,0.863428,0.843011,0.829731,0.833699
5,0.0069,1.073862,0.871677,0.869328,0.83519,0.847234
6,0.0055,1.239348,0.867094,0.848976,0.831267,0.835997
7,0.0035,1.274948,0.853346,0.852153,0.821372,0.830804
8,0.0017,1.320699,0.857929,0.855937,0.814401,0.827892
9,0.0008,1.370112,0.864345,0.861761,0.829901,0.840834
10,0.0003,1.447053,0.864345,0.852404,0.829432,0.837299


[I 2025-03-22 09:25:34,133] Trial 109 finished with value: 0.8365378974445224 and parameters: {'learning_rate': 0.003984648457618905, 'weight_decay': 0.008, 'warmup_steps': 11}. Best is trial 40 with value: 0.8501281296030873.


Trial 110 with params: {'learning_rate': 0.00037970811347696283, 'weight_decay': 0.003, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6633,0.562241,0.818515,0.691142,0.698566,0.692039
2,0.1747,0.517279,0.856095,0.83972,0.802561,0.816431
3,0.0823,0.674658,0.84418,0.808235,0.814543,0.806396
4,0.0462,0.718771,0.854262,0.846388,0.819953,0.829354
5,0.03,0.728059,0.854262,0.830033,0.820365,0.822908
6,0.0221,0.751447,0.855179,0.854796,0.820965,0.833999
7,0.0148,0.809329,0.856095,0.866504,0.812129,0.830475
8,0.0114,0.868822,0.853346,0.831823,0.828605,0.829028
9,0.0083,0.935416,0.855179,0.843609,0.811307,0.822952
10,0.0056,0.95582,0.857012,0.849763,0.831528,0.838211


[I 2025-03-22 09:28:53,149] Trial 110 finished with value: 0.8298240293640116 and parameters: {'learning_rate': 0.00037970811347696283, 'weight_decay': 0.003, 'warmup_steps': 8}. Best is trial 40 with value: 0.8501281296030873.


Trial 111 with params: {'learning_rate': 0.0026846688168655516, 'weight_decay': 0.008, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3231,0.60366,0.867094,0.877845,0.831249,0.847233
2,0.0377,0.620133,0.869844,0.855233,0.834088,0.841944
3,0.016,0.846529,0.863428,0.858233,0.820177,0.832469
4,0.0103,0.845847,0.863428,0.837284,0.827403,0.8315
5,0.008,0.937359,0.865261,0.841099,0.821922,0.82833


[I 2025-03-22 09:29:48,642] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.0019499036368634648, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3502,0.564559,0.865261,0.836647,0.829537,0.831105
2,0.0463,0.607117,0.866178,0.850809,0.831589,0.838228
3,0.0204,0.735143,0.866178,0.837676,0.832342,0.831961
4,0.0133,0.700267,0.859762,0.848449,0.815584,0.828118
5,0.008,0.749921,0.877177,0.856406,0.838538,0.84595
6,0.0043,0.834214,0.875344,0.87547,0.83594,0.851532
7,0.0035,0.948653,0.877177,0.861415,0.829484,0.841466
8,0.0024,0.944313,0.873511,0.838835,0.836789,0.836467
9,0.001,1.032342,0.872594,0.867432,0.827605,0.841515
10,0.0005,1.145752,0.872594,0.860746,0.836121,0.845071


[I 2025-03-22 09:33:11,035] Trial 112 finished with value: 0.849569071402409 and parameters: {'learning_rate': 0.0019499036368634648, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}. Best is trial 40 with value: 0.8501281296030873.


Trial 113 with params: {'learning_rate': 0.0026003756990329147, 'weight_decay': 0.01, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3224,0.579446,0.864345,0.858086,0.821229,0.832742
2,0.0382,0.619168,0.862511,0.825191,0.837232,0.830467
3,0.0172,0.799397,0.851512,0.839756,0.82049,0.826347
4,0.0091,0.838632,0.863428,0.84883,0.820355,0.829309
5,0.0075,1.007604,0.861595,0.837569,0.819673,0.824103


[I 2025-03-22 09:34:16,211] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.000403070066570238, 'weight_decay': 0.009000000000000001, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.631,0.553436,0.824015,0.694135,0.702321,0.695829
2,0.1647,0.54163,0.851512,0.826335,0.799765,0.810126
3,0.0766,0.674272,0.854262,0.83372,0.821701,0.823347
4,0.0445,0.749052,0.848763,0.851901,0.816596,0.827779
5,0.0285,0.721337,0.859762,0.85234,0.833692,0.841011
6,0.0185,0.740882,0.857012,0.857111,0.822854,0.836214
7,0.0126,0.813065,0.857012,0.854856,0.823868,0.834378
8,0.01,0.851432,0.853346,0.827768,0.819351,0.822929
9,0.0081,0.893144,0.857012,0.848455,0.821634,0.832415
10,0.005,0.948421,0.857012,0.832607,0.822413,0.826187


[I 2025-03-22 09:36:28,734] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0021788274151528764, 'weight_decay': 0.007, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.339,0.587163,0.868928,0.85493,0.833656,0.840346
2,0.0421,0.64097,0.862511,0.839384,0.830183,0.831991
3,0.0189,0.909465,0.84143,0.830326,0.815076,0.815069
4,0.0107,0.856363,0.861595,0.84167,0.82729,0.83179
5,0.0087,0.910267,0.868928,0.865517,0.834404,0.844763
6,0.0056,1.013242,0.867094,0.865522,0.821081,0.837141
7,0.0032,1.120105,0.867094,0.876597,0.821675,0.839646
8,0.0021,1.168228,0.871677,0.865037,0.825844,0.839421
9,0.0012,1.181731,0.870761,0.867131,0.833751,0.846106
10,0.0005,1.267247,0.874427,0.871645,0.836612,0.8498


[I 2025-03-22 09:39:34,942] Trial 115 finished with value: 0.8510243730102385 and parameters: {'learning_rate': 0.0021788274151528764, 'weight_decay': 0.007, 'warmup_steps': 5}. Best is trial 115 with value: 0.8510243730102385.


Trial 116 with params: {'learning_rate': 0.001009569037991261, 'weight_decay': 0.006, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4429,0.604243,0.834097,0.792147,0.78696,0.785063
2,0.0773,0.554151,0.868011,0.863774,0.832467,0.843198
3,0.0319,0.750113,0.848763,0.851143,0.805114,0.820139
4,0.019,0.741985,0.859762,0.873164,0.814894,0.834452
5,0.0117,0.881842,0.863428,0.87274,0.818556,0.836037
6,0.0083,0.937386,0.851512,0.832407,0.819429,0.822403
7,0.005,0.941849,0.862511,0.874168,0.816685,0.836086
8,0.0039,0.905104,0.868928,0.866185,0.832044,0.84541
9,0.0027,0.990528,0.859762,0.858995,0.815064,0.83061
10,0.002,1.061586,0.865261,0.838845,0.827631,0.83262


[I 2025-03-22 09:42:55,335] Trial 116 finished with value: 0.8344461484263831 and parameters: {'learning_rate': 0.001009569037991261, 'weight_decay': 0.006, 'warmup_steps': 2}. Best is trial 115 with value: 0.8510243730102385.


Trial 117 with params: {'learning_rate': 0.0016353561862385662, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3754,0.589193,0.865261,0.859558,0.821882,0.834434
2,0.0524,0.652695,0.852429,0.822616,0.82164,0.819676
3,0.0217,0.777754,0.852429,0.820663,0.820954,0.817975
4,0.012,0.789045,0.865261,0.854765,0.829654,0.839041
5,0.0092,0.851993,0.88176,0.869358,0.852513,0.859016
6,0.0056,0.823974,0.877177,0.863122,0.839079,0.848442
7,0.0053,0.944079,0.87626,0.859591,0.83071,0.840162
8,0.0028,1.082284,0.862511,0.849956,0.828751,0.835543
9,0.0016,1.086827,0.874427,0.859031,0.838356,0.845481
10,0.0012,1.052867,0.875344,0.860902,0.838077,0.846623


[I 2025-03-22 09:46:19,267] Trial 117 finished with value: 0.8449204468929551 and parameters: {'learning_rate': 0.0016353561862385662, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 115 with value: 0.8510243730102385.


Trial 118 with params: {'learning_rate': 0.0018204132396431225, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.361,0.601021,0.861595,0.856807,0.818449,0.830736
2,0.0485,0.612069,0.861595,0.839278,0.828151,0.831955
3,0.0216,0.729754,0.854262,0.822829,0.820633,0.819898
4,0.0124,0.751846,0.866178,0.862761,0.822067,0.83621
5,0.0091,0.894379,0.867094,0.856926,0.830932,0.840932
6,0.0057,0.905429,0.867094,0.866964,0.830895,0.844673
7,0.0029,1.201618,0.860678,0.861201,0.825884,0.837972
8,0.0018,1.241001,0.864345,0.862006,0.820195,0.834538
9,0.0016,1.22656,0.861595,0.85154,0.827354,0.835956
10,0.0009,1.268948,0.869844,0.857436,0.833402,0.842525


[I 2025-03-22 09:49:09,325] Trial 118 finished with value: 0.8471066463665929 and parameters: {'learning_rate': 0.0018204132396431225, 'weight_decay': 0.008, 'warmup_steps': 1}. Best is trial 115 with value: 0.8510243730102385.


Trial 119 with params: {'learning_rate': 0.0018158721391141983, 'weight_decay': 0.007, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3528,0.597534,0.862511,0.871864,0.827855,0.842002
2,0.0488,0.681849,0.857012,0.828405,0.8253,0.824645
3,0.0211,0.781974,0.863428,0.851177,0.829977,0.835975
4,0.0136,0.795775,0.852429,0.857053,0.819757,0.831838
5,0.0071,0.880973,0.862511,0.852601,0.82839,0.837275
6,0.0051,0.909723,0.860678,0.859832,0.82603,0.838482
7,0.0032,1.074378,0.863428,0.862078,0.82921,0.840527
8,0.0023,1.174969,0.861595,0.861358,0.82732,0.839297
9,0.0012,1.20585,0.864345,0.864296,0.828695,0.842088
10,0.0008,1.224417,0.869844,0.869292,0.832212,0.846555


[I 2025-03-22 09:52:24,997] Trial 119 finished with value: 0.8450759304025635 and parameters: {'learning_rate': 0.0018158721391141983, 'weight_decay': 0.007, 'warmup_steps': 0}. Best is trial 115 with value: 0.8510243730102385.


Trial 120 with params: {'learning_rate': 0.0035447744175234083, 'weight_decay': 0.007, 'warmup_steps': 2}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2997,0.590562,0.871677,0.878885,0.827591,0.842802
2,0.0336,0.658854,0.868928,0.852152,0.82567,0.833713
3,0.0159,0.830529,0.863428,0.838306,0.821624,0.825748
4,0.0089,0.959227,0.857929,0.828922,0.822855,0.823745
5,0.0081,1.101505,0.861595,0.829795,0.835706,0.831263


[I 2025-03-22 09:53:27,856] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 0.0018052991775962368, 'weight_decay': 0.007, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3617,0.593914,0.865261,0.859413,0.821517,0.83412
2,0.0483,0.571526,0.862511,0.826199,0.829235,0.826623
3,0.0213,0.763614,0.862511,0.843695,0.828751,0.832521
4,0.0145,0.772145,0.864345,0.83381,0.820596,0.825003
5,0.0097,0.846831,0.864345,0.846091,0.827429,0.834925
6,0.0059,0.887762,0.868011,0.859191,0.831207,0.841671
7,0.0035,1.08772,0.862511,0.851266,0.828252,0.835131
8,0.0031,1.131912,0.864345,0.861937,0.830378,0.84113
9,0.0018,1.186979,0.860678,0.846878,0.81697,0.826353
10,0.0012,1.165141,0.865261,0.841264,0.820241,0.827513


[I 2025-03-22 09:55:32,031] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.0015896514760774907, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3766,0.60233,0.858845,0.867722,0.815819,0.832531
2,0.0538,0.656947,0.854262,0.808576,0.822688,0.813844
3,0.0219,0.814476,0.84418,0.828522,0.813974,0.815882
4,0.0155,0.778704,0.858845,0.848422,0.825381,0.833393
5,0.0094,0.810071,0.869844,0.849505,0.833226,0.83961
6,0.0071,0.854492,0.866178,0.855696,0.830186,0.840378
7,0.0035,1.03673,0.858845,0.837453,0.815523,0.823125
8,0.0028,1.051332,0.866178,0.85149,0.821773,0.832501
9,0.0018,1.176647,0.852429,0.83452,0.811389,0.818117
10,0.0014,1.114014,0.865261,0.853347,0.829206,0.838719


[I 2025-03-22 09:57:48,591] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.0018583749644439543, 'weight_decay': 0.007, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3624,0.585011,0.859762,0.844766,0.788912,0.803558
2,0.0501,0.655415,0.862511,0.840311,0.828922,0.832086
3,0.0219,0.735871,0.862511,0.835125,0.827622,0.829565
4,0.0115,0.848178,0.863428,0.835048,0.828203,0.829826
5,0.009,0.968133,0.862511,0.848433,0.818432,0.829326
6,0.0051,0.979978,0.866178,0.865801,0.830245,0.843893
7,0.0035,1.157342,0.860678,0.857582,0.816569,0.830812
8,0.002,1.146953,0.869844,0.854111,0.825152,0.835236
9,0.0008,1.246239,0.867094,0.855677,0.831972,0.840261
10,0.0013,1.126476,0.868011,0.846468,0.833184,0.838037


[I 2025-03-22 10:01:33,035] Trial 123 finished with value: 0.8421531401030865 and parameters: {'learning_rate': 0.0018583749644439543, 'weight_decay': 0.007, 'warmup_steps': 4}. Best is trial 115 with value: 0.8510243730102385.


Trial 124 with params: {'learning_rate': 0.0010558249711992795, 'weight_decay': 0.0, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4664,0.565568,0.846013,0.839626,0.788376,0.802333
2,0.0716,0.556968,0.865261,0.843588,0.829651,0.833861
3,0.0311,0.748977,0.854262,0.856979,0.819755,0.832257
4,0.0181,0.749574,0.869844,0.871611,0.830658,0.846939
5,0.0113,0.825195,0.868928,0.849765,0.840382,0.84385
6,0.0078,0.885804,0.863428,0.851188,0.836938,0.841153
7,0.0053,1.013163,0.858845,0.853056,0.832828,0.838989
8,0.0046,1.007008,0.869844,0.868246,0.831995,0.846472
9,0.0028,1.074411,0.857929,0.855243,0.814403,0.82861
10,0.0017,1.132591,0.862511,0.836409,0.826022,0.829886


[I 2025-03-22 10:03:50,067] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0019045657590369487, 'weight_decay': 0.006, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.357,0.57114,0.866178,0.873323,0.821384,0.837628
2,0.0458,0.564946,0.862511,0.832663,0.829103,0.82954
3,0.0201,0.792317,0.851512,0.853163,0.820451,0.828642
4,0.0116,0.763743,0.868928,0.856809,0.832713,0.841923
5,0.0085,0.881364,0.872594,0.847587,0.82635,0.834339
6,0.0055,0.920479,0.872594,0.85878,0.824396,0.837933
7,0.0037,1.122403,0.858845,0.831554,0.813249,0.819089
8,0.0024,1.092652,0.867094,0.840992,0.821798,0.828522
9,0.0015,1.108721,0.868928,0.845048,0.822276,0.830384
10,0.0008,1.074047,0.87626,0.841459,0.828905,0.833894


[I 2025-03-22 10:06:55,448] Trial 125 finished with value: 0.8289453412018827 and parameters: {'learning_rate': 0.0019045657590369487, 'weight_decay': 0.006, 'warmup_steps': 1}. Best is trial 115 with value: 0.8510243730102385.


Trial 126 with params: {'learning_rate': 0.00120436400976896, 'weight_decay': 0.008, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4164,0.576416,0.850596,0.827377,0.810283,0.81444
2,0.0664,0.543384,0.866178,0.845514,0.830422,0.835651
3,0.0282,0.770373,0.846013,0.86479,0.813673,0.829419
4,0.0156,0.766282,0.862511,0.841989,0.828683,0.832571
5,0.0114,0.749018,0.872594,0.868374,0.835927,0.84804
6,0.0072,0.848709,0.862511,0.860775,0.827536,0.839488
7,0.0046,1.055819,0.860678,0.869706,0.81629,0.833323
8,0.0039,1.047397,0.857929,0.860008,0.821801,0.836899
9,0.0021,1.118806,0.864345,0.860115,0.819098,0.833064
10,0.0016,1.145304,0.868928,0.857513,0.831047,0.841511


[I 2025-03-22 10:10:06,031] Trial 126 finished with value: 0.8294902656273555 and parameters: {'learning_rate': 0.00120436400976896, 'weight_decay': 0.008, 'warmup_steps': 4}. Best is trial 115 with value: 0.8510243730102385.


Trial 127 with params: {'learning_rate': 0.0009928118155017986, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4421,0.548427,0.853346,0.834103,0.809339,0.818183
2,0.0755,0.587909,0.868928,0.877527,0.842566,0.854253
3,0.0322,0.734994,0.854262,0.84154,0.829607,0.832095
4,0.0199,0.704697,0.866178,0.864931,0.830474,0.842883
5,0.0115,0.789298,0.864345,0.851514,0.829188,0.837584
6,0.0079,0.89715,0.868928,0.868,0.830857,0.845372
7,0.0077,1.048607,0.860678,0.862998,0.824747,0.837488
8,0.005,0.99402,0.863428,0.86217,0.826725,0.84023
9,0.0028,0.971661,0.865261,0.864816,0.828168,0.841877
10,0.0021,1.037241,0.858845,0.859963,0.822515,0.836731


[I 2025-03-22 10:13:20,781] Trial 127 finished with value: 0.8336247132850554 and parameters: {'learning_rate': 0.0009928118155017986, 'weight_decay': 0.008, 'warmup_steps': 0}. Best is trial 115 with value: 0.8510243730102385.


Trial 128 with params: {'learning_rate': 0.002421167929408165, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3302,0.592042,0.857012,0.869398,0.813681,0.830253
2,0.0396,0.613079,0.863428,0.858922,0.830016,0.839669
3,0.0175,0.8006,0.864345,0.858695,0.821765,0.833442
4,0.0107,0.89257,0.851512,0.814215,0.808439,0.809207
5,0.0084,1.036826,0.855179,0.835221,0.821281,0.825607
6,0.0055,0.935801,0.871677,0.854225,0.827106,0.836316
7,0.0034,1.055708,0.858845,0.838347,0.815183,0.822853
8,0.0018,1.15825,0.868011,0.862305,0.824104,0.837014
9,0.0015,1.083327,0.869844,0.852838,0.825256,0.835043
10,0.0015,1.136862,0.863428,0.839125,0.820261,0.826249


[I 2025-03-22 10:15:11,446] Trial 128 pruned. 


Trial 129 with params: {'learning_rate': 0.0031803756685725936, 'weight_decay': 0.006, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3128,0.626014,0.872594,0.859531,0.836595,0.84316
2,0.0349,0.610229,0.865261,0.852589,0.829358,0.83845
3,0.016,0.80161,0.861595,0.835997,0.819259,0.824454
4,0.0088,0.927573,0.856095,0.817649,0.823924,0.818844
5,0.0083,1.017789,0.879927,0.869069,0.849855,0.858013
6,0.0047,1.18094,0.868928,0.847139,0.833701,0.838379
7,0.0038,1.439568,0.846013,0.84775,0.80548,0.819336
8,0.0023,1.223627,0.864345,0.84985,0.821939,0.831743
9,0.0014,1.283453,0.859762,0.856356,0.818233,0.830511
10,0.0006,1.408644,0.860678,0.856839,0.819208,0.831002


[I 2025-03-22 10:17:05,555] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.002467219732883298, 'weight_decay': 0.008, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3273,0.615176,0.855179,0.864246,0.805062,0.821124
2,0.0396,0.625682,0.868011,0.849875,0.825662,0.833582
3,0.0184,0.888273,0.850596,0.845667,0.809958,0.820595
4,0.0114,0.896684,0.854262,0.852137,0.811372,0.824747
5,0.0088,0.992209,0.868011,0.864863,0.822332,0.837324
6,0.0053,1.043162,0.859762,0.842848,0.823271,0.831486
7,0.0019,1.299444,0.860678,0.858998,0.816315,0.830878
8,0.002,1.280192,0.857012,0.846965,0.821944,0.83163
9,0.0014,1.307826,0.860678,0.858121,0.815586,0.831017
10,0.0004,1.376408,0.857929,0.857642,0.822608,0.835877


[I 2025-03-22 10:19:40,138] Trial 130 pruned. 


Trial 131 with params: {'learning_rate': 0.001582078869536683, 'weight_decay': 0.007, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3808,0.605082,0.857929,0.825747,0.815103,0.818355
2,0.0529,0.605006,0.871677,0.84809,0.835664,0.840193
3,0.0239,0.696088,0.857012,0.846869,0.822652,0.831544
4,0.0128,0.801616,0.863428,0.862643,0.827898,0.840689
5,0.0107,0.792126,0.868928,0.845931,0.833316,0.837756
6,0.0057,0.954329,0.858845,0.860337,0.822942,0.836812
7,0.0034,1.025084,0.867094,0.843198,0.838997,0.840081
8,0.0027,1.048753,0.87901,0.863804,0.841515,0.84898
9,0.0018,1.159413,0.866178,0.847865,0.839554,0.841598
10,0.001,1.182459,0.869844,0.847895,0.840931,0.843418


[I 2025-03-22 10:23:20,797] Trial 131 finished with value: 0.8412305512244287 and parameters: {'learning_rate': 0.001582078869536683, 'weight_decay': 0.007, 'warmup_steps': 7}. Best is trial 115 with value: 0.8510243730102385.


Trial 132 with params: {'learning_rate': 0.0018465861586044931, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3622,0.630675,0.855179,0.849565,0.813667,0.82438
2,0.0477,0.613981,0.864345,0.817765,0.831521,0.822794
3,0.0206,0.773956,0.858845,0.830063,0.825644,0.824776
4,0.0109,0.884109,0.860678,0.841765,0.827313,0.829859
5,0.0104,0.836414,0.866178,0.855605,0.828739,0.839502
6,0.0052,0.939967,0.866178,0.866261,0.82864,0.842889
7,0.0027,1.19008,0.857012,0.84739,0.823494,0.830823
8,0.0019,1.15681,0.867094,0.863518,0.831832,0.842639
9,0.0013,1.140385,0.860678,0.849815,0.825398,0.834921
10,0.001,1.18903,0.865261,0.84435,0.829175,0.834525


[I 2025-03-22 10:25:46,785] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.002304824937137377, 'weight_decay': 0.009000000000000001, 'warmup_steps': 4}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3332,0.585453,0.868011,0.847284,0.832575,0.836564
2,0.0406,0.622153,0.862511,0.83255,0.828071,0.828984
3,0.0191,0.82081,0.856095,0.844337,0.825384,0.828467
4,0.0104,0.812626,0.868011,0.841,0.831292,0.835089
5,0.0073,0.85836,0.875344,0.857568,0.829367,0.839593
6,0.0056,1.015008,0.857012,0.827185,0.813207,0.818804
7,0.0038,1.100968,0.861595,0.857194,0.817468,0.831115
8,0.0018,1.197266,0.866178,0.861829,0.820598,0.835764
9,0.0009,1.224277,0.860678,0.846192,0.817062,0.827791
10,0.0008,1.216398,0.865261,0.849616,0.820678,0.831554


[I 2025-03-22 10:27:54,549] Trial 133 pruned. 


Trial 134 with params: {'learning_rate': 0.0016478851620443501, 'weight_decay': 0.01, 'warmup_steps': 1}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3706,0.584612,0.861595,0.869578,0.818105,0.834761
2,0.0517,0.658694,0.855179,0.808749,0.822497,0.814093
3,0.0225,0.742746,0.855179,0.823495,0.823351,0.820568
4,0.0142,0.771702,0.867094,0.855115,0.831083,0.840357
5,0.01,0.864281,0.868011,0.847118,0.832095,0.837893
6,0.0055,0.952926,0.865261,0.848396,0.827956,0.835745
7,0.0033,1.086611,0.865261,0.853661,0.829085,0.838077
8,0.0032,1.096586,0.866178,0.855096,0.829754,0.839478
9,0.0015,1.162238,0.861595,0.839792,0.816669,0.825475
10,0.0015,1.144817,0.866178,0.856462,0.829055,0.840447


[I 2025-03-22 10:30:52,051] Trial 134 finished with value: 0.8429700491775459 and parameters: {'learning_rate': 0.0016478851620443501, 'weight_decay': 0.01, 'warmup_steps': 1}. Best is trial 115 with value: 0.8510243730102385.


Trial 135 with params: {'learning_rate': 0.0025295211083243255, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3261,0.5791,0.865261,0.859487,0.821578,0.833761
2,0.0407,0.635907,0.861595,0.821027,0.827506,0.823141
3,0.0179,0.86205,0.852429,0.830738,0.821453,0.820698
4,0.011,0.804225,0.865261,0.84205,0.819942,0.828424
5,0.0069,0.921518,0.872594,0.851447,0.837185,0.84221
6,0.005,1.005541,0.871677,0.880306,0.825783,0.843924
7,0.0023,1.126953,0.869844,0.846253,0.82498,0.832282
8,0.0012,1.203615,0.872594,0.86734,0.828487,0.841048
9,0.0012,1.108548,0.871677,0.850327,0.835998,0.840547
10,0.0006,1.219449,0.868011,0.847368,0.833332,0.837502


[I 2025-03-22 10:34:22,305] Trial 135 finished with value: 0.8450270374619931 and parameters: {'learning_rate': 0.0025295211083243255, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 115 with value: 0.8510243730102385.


Trial 136 with params: {'learning_rate': 0.0022578782027112172, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3454,0.601478,0.861595,0.866303,0.810421,0.82542
2,0.0422,0.608055,0.857929,0.821533,0.826584,0.822267
3,0.02,0.755578,0.862511,0.823862,0.820062,0.819398
4,0.0115,0.796366,0.864345,0.843482,0.828648,0.834116
5,0.0095,0.873901,0.870761,0.865986,0.825344,0.840252
6,0.005,0.904558,0.869844,0.866065,0.824929,0.839605
7,0.0028,1.113566,0.870761,0.865344,0.825123,0.839079
8,0.0018,1.173967,0.877177,0.871313,0.831597,0.845353
9,0.0008,1.228565,0.880843,0.874333,0.83456,0.848563
10,0.0005,1.297388,0.874427,0.869004,0.829243,0.843341


[I 2025-03-22 10:37:27,830] Trial 136 finished with value: 0.8460881373273282 and parameters: {'learning_rate': 0.0022578782027112172, 'weight_decay': 0.008, 'warmup_steps': 7}. Best is trial 115 with value: 0.8510243730102385.


Trial 137 with params: {'learning_rate': 0.00465200612797881, 'weight_decay': 0.007, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2797,0.643966,0.858845,0.848117,0.826846,0.832879
2,0.0305,0.763604,0.860678,0.85346,0.835886,0.841716
3,0.0159,0.792319,0.860678,0.843767,0.835996,0.837609
4,0.0088,0.894987,0.859762,0.84489,0.817196,0.82644
5,0.0065,0.911063,0.868011,0.839326,0.832663,0.834379
6,0.0048,1.099034,0.865261,0.846758,0.829051,0.836176
7,0.0028,1.207383,0.862511,0.843491,0.811346,0.82169
8,0.0014,1.261001,0.867094,0.854879,0.833044,0.84061
9,0.0007,1.318091,0.870761,0.857768,0.835476,0.843388
10,0.0003,1.388284,0.869844,0.857092,0.835088,0.842629


[I 2025-03-22 10:40:33,056] Trial 137 finished with value: 0.8433573874706243 and parameters: {'learning_rate': 0.00465200612797881, 'weight_decay': 0.007, 'warmup_steps': 7}. Best is trial 115 with value: 0.8510243730102385.


Trial 138 with params: {'learning_rate': 0.0016678525969961084, 'weight_decay': 0.008, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3728,0.581849,0.860678,0.836224,0.817599,0.824121
2,0.0512,0.608001,0.865261,0.834673,0.830718,0.831441
3,0.0222,0.77003,0.851512,0.837011,0.828771,0.829381
4,0.0134,0.817176,0.861595,0.862012,0.836872,0.845233
5,0.0085,0.825805,0.871677,0.85353,0.84404,0.847279
6,0.0055,0.949696,0.865261,0.861677,0.819938,0.83472
7,0.0038,1.11568,0.863428,0.861934,0.828477,0.838344
8,0.0035,1.101686,0.870761,0.865244,0.824801,0.838492
9,0.0017,1.14647,0.858845,0.845427,0.826047,0.831318
10,0.0017,1.142639,0.865261,0.836363,0.830492,0.831949


[I 2025-03-22 10:42:26,440] Trial 138 pruned. 


Trial 139 with params: {'learning_rate': 0.0031464704542505917, 'weight_decay': 0.008, 'warmup_steps': 3}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2971,0.557198,0.868011,0.875669,0.823069,0.840385
2,0.0352,0.650103,0.862511,0.825929,0.829759,0.826362
3,0.0164,0.783552,0.866178,0.844142,0.82089,0.829664
4,0.0096,0.867002,0.863428,0.839956,0.819419,0.827537
5,0.0084,1.135369,0.853346,0.831973,0.811382,0.817762


[I 2025-03-22 10:43:25,943] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0007926669366535624, 'weight_decay': 0.0, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.493,0.549651,0.847846,0.832613,0.798434,0.807864
2,0.0896,0.533665,0.868928,0.824689,0.831723,0.827497
3,0.0408,0.823716,0.84418,0.832677,0.813622,0.816605
4,0.0235,0.76705,0.864345,0.848102,0.838448,0.840826
5,0.0161,0.812552,0.864345,0.863824,0.82852,0.841228
6,0.01,0.828753,0.860678,0.842386,0.834222,0.836721
7,0.0067,0.903726,0.867094,0.831827,0.829908,0.828536
8,0.005,0.920499,0.866178,0.845917,0.829336,0.835831
9,0.0032,1.079567,0.857929,0.8411,0.823684,0.828894
10,0.0018,1.040841,0.868928,0.842104,0.830738,0.834991


[I 2025-03-22 10:46:36,496] Trial 140 finished with value: 0.8397686535818228 and parameters: {'learning_rate': 0.0007926669366535624, 'weight_decay': 0.0, 'warmup_steps': 7}. Best is trial 115 with value: 0.8510243730102385.


Trial 141 with params: {'learning_rate': 0.00017226344414688613, 'weight_decay': 0.003, 'warmup_steps': 22}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9096,0.608719,0.790101,0.665717,0.674887,0.669477
2,0.35,0.522995,0.830431,0.696818,0.709435,0.702467
3,0.21,0.587393,0.824931,0.833414,0.762547,0.775923
4,0.1265,0.603801,0.851512,0.866772,0.8066,0.827515
5,0.0832,0.615448,0.854262,0.838305,0.818688,0.82699


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 10:47:46,257] Trial 141 pruned. 


Trial 142 with params: {'learning_rate': 0.001737231345136153, 'weight_decay': 0.008, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3673,0.594711,0.857012,0.841127,0.814927,0.823507
2,0.0501,0.626188,0.858845,0.835787,0.82726,0.828718
3,0.0222,0.730412,0.856095,0.822617,0.824481,0.820529
4,0.0117,0.821437,0.863428,0.861163,0.819116,0.832112
5,0.0091,0.794026,0.880843,0.845265,0.833721,0.837942
6,0.0064,0.921178,0.867094,0.861905,0.822414,0.836264
7,0.0033,1.117153,0.860678,0.858728,0.816973,0.830576
8,0.0028,1.040343,0.872594,0.86623,0.825947,0.840386
9,0.002,1.096519,0.871677,0.879043,0.826404,0.843628
10,0.001,1.150377,0.868011,0.852535,0.821983,0.833332


[I 2025-03-22 10:51:23,656] Trial 142 finished with value: 0.8444952938891773 and parameters: {'learning_rate': 0.001737231345136153, 'weight_decay': 0.008, 'warmup_steps': 6}. Best is trial 115 with value: 0.8510243730102385.


Trial 143 with params: {'learning_rate': 0.0014469525056601748, 'weight_decay': 0.007, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3992,0.567483,0.855179,0.844449,0.812855,0.822668
2,0.0546,0.629044,0.857012,0.837973,0.834799,0.832336
3,0.0255,0.781046,0.846929,0.830104,0.816463,0.818687
4,0.0128,0.831321,0.857929,0.846899,0.813762,0.825725
5,0.0103,0.816852,0.870761,0.849837,0.843664,0.845455
6,0.006,0.885812,0.868928,0.857267,0.832368,0.841709
7,0.0035,1.043097,0.860678,0.844602,0.817823,0.825964
8,0.003,1.042061,0.864345,0.860344,0.830095,0.840158
9,0.0012,1.179595,0.864345,0.848109,0.820371,0.82943
10,0.0011,1.122861,0.868928,0.848048,0.832355,0.838691


[I 2025-03-22 10:53:32,812] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.002030116684071346, 'weight_decay': 0.008, 'warmup_steps': 7}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3541,0.61545,0.863428,0.857278,0.820463,0.83238
2,0.0443,0.668684,0.857929,0.8215,0.827197,0.821714
3,0.02,0.714513,0.864345,0.834031,0.830699,0.830358
4,0.0118,0.815341,0.866178,0.832611,0.830538,0.830266
5,0.0088,0.848579,0.862511,0.834019,0.830329,0.828296
6,0.0054,0.906895,0.87626,0.873064,0.839302,0.852209
7,0.0056,0.98584,0.87901,0.866899,0.850596,0.856367
8,0.0034,1.039554,0.867094,0.864423,0.832156,0.843253
9,0.0014,1.102139,0.871677,0.868621,0.834674,0.847043
10,0.0009,1.101656,0.872594,0.85975,0.835397,0.844825


[I 2025-03-22 10:57:22,070] Trial 144 finished with value: 0.8490601324577125 and parameters: {'learning_rate': 0.002030116684071346, 'weight_decay': 0.008, 'warmup_steps': 7}. Best is trial 115 with value: 0.8510243730102385.


Trial 145 with params: {'learning_rate': 0.0027437474538416183, 'weight_decay': 0.008, 'warmup_steps': 8}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3245,0.584027,0.867094,0.872653,0.81344,0.831617
2,0.0372,0.697884,0.858845,0.822917,0.825823,0.822905
3,0.0186,0.789658,0.866178,0.843148,0.833862,0.834922
4,0.0109,0.824444,0.861595,0.830639,0.816741,0.822211
5,0.0076,0.962692,0.872594,0.850199,0.835942,0.84136
6,0.004,1.035056,0.871677,0.845898,0.833494,0.838524
7,0.0038,1.022954,0.871677,0.857981,0.824306,0.836412
8,0.0028,1.074287,0.870761,0.846044,0.825146,0.832876
9,0.0009,1.137212,0.871677,0.850635,0.834448,0.840424
10,0.0005,1.187857,0.871677,0.84698,0.843572,0.844086


[I 2025-03-22 11:00:38,420] Trial 145 finished with value: 0.8332651617907921 and parameters: {'learning_rate': 0.0027437474538416183, 'weight_decay': 0.008, 'warmup_steps': 8}. Best is trial 115 with value: 0.8510243730102385.


Trial 146 with params: {'learning_rate': 0.0016355194588595287, 'weight_decay': 0.005, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3762,0.591222,0.861595,0.846008,0.818938,0.828212
2,0.0534,0.672129,0.852429,0.811173,0.821681,0.814171
3,0.0216,0.78175,0.853346,0.842043,0.822811,0.827172
4,0.0106,0.767541,0.873511,0.855176,0.845216,0.849054
5,0.0094,0.810533,0.874427,0.864734,0.847075,0.853467
6,0.0061,1.039705,0.858845,0.857251,0.825974,0.836551
7,0.0053,1.048215,0.860678,0.852651,0.827585,0.834707
8,0.003,1.154255,0.866178,0.855679,0.83154,0.83965
9,0.0022,1.230531,0.859762,0.839518,0.834791,0.833784
10,0.0012,1.227187,0.868928,0.859511,0.842852,0.84898


[I 2025-03-22 11:03:38,642] Trial 146 finished with value: 0.8347451136694756 and parameters: {'learning_rate': 0.0016355194588595287, 'weight_decay': 0.005, 'warmup_steps': 6}. Best is trial 115 with value: 0.8510243730102385.


Trial 147 with params: {'learning_rate': 0.004827098251177437, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2743,0.652146,0.865261,0.85001,0.819787,0.830847
2,0.0306,0.763532,0.865261,0.820681,0.820257,0.8196
3,0.0152,0.890005,0.861595,0.837233,0.819953,0.824574
4,0.0097,0.924986,0.866178,0.837022,0.832464,0.832841
5,0.0063,1.34432,0.848763,0.825076,0.810039,0.811191


[I 2025-03-22 11:05:17,819] Trial 147 pruned. 


Trial 148 with params: {'learning_rate': 0.0011848616107773089, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4232,0.552985,0.850596,0.828859,0.810382,0.814391
2,0.0656,0.56444,0.860678,0.835434,0.836028,0.833924
3,0.0287,0.745886,0.84418,0.822149,0.813028,0.813689
4,0.0167,0.799053,0.857012,0.840254,0.820763,0.828346
5,0.0122,0.791594,0.861595,0.852755,0.825124,0.836063


[I 2025-03-22 11:06:19,213] Trial 148 pruned. 


Trial 149 with params: {'learning_rate': 0.002264170022337403, 'weight_decay': 0.008, 'warmup_steps': 0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3309,0.599462,0.860678,0.841352,0.807662,0.819356
2,0.0421,0.655613,0.858845,0.824775,0.826364,0.82339
3,0.0174,0.780731,0.861595,0.836541,0.826604,0.82976
4,0.0097,0.862734,0.866178,0.831878,0.831101,0.83055
5,0.0103,0.956444,0.861595,0.85861,0.818912,0.832072
6,0.0052,0.96952,0.863428,0.853245,0.816688,0.831213
7,0.0026,1.16674,0.856095,0.814313,0.812725,0.812054
8,0.0013,1.215224,0.862511,0.841682,0.818234,0.826662
9,0.0008,1.296668,0.861595,0.840899,0.817912,0.825787
10,0.0006,1.261931,0.866178,0.819576,0.82132,0.8196


[I 2025-03-22 11:08:47,076] Trial 149 pruned. 


In [40]:
print(best_trial_normal_aug)

BestRun(run_id='115', objective=0.8510243730102385, hyperparameters={'learning_rate': 0.0021788274151528764, 'weight_decay': 0.007, 'warmup_steps': 5}, run_summary=None)


In [41]:
base.reset_seed()

## Prohledávání s destilací nad augmentovaným datasetem
Konfigurace jednotlivých tréninků.

In [42]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/bilstm-distill-embedd-aug_coarse_hp-search", logging_dir=f"~/logs/{DATASET}/bilstm-distill-embedd-aug_coarse_hp-search", remove_unused_columns=False, epochs=num_epochs, batch_size=batch_size)

Definice hledaných hyperparametrů a jejich rozmezí, rozšířeno o hyperparametry destilace.

In [43]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 5e-5, 5e-3, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "warmup_steps" : trial.suggest_int("warmup_steps", 0, warm_up),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params

Konfigurace Optuny.

In [44]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



Konfigurace destilačního trenéra pro jednotlivé tréninky. 

In [45]:
trainer = base.DistilTrainer(
    args=training_args,
    train_dataset=all_train_data,
    eval_dataset=eval_data,
    compute_metrics=base.compute_metrics,
    model_init = lambda: get_BiLSTM(),
)
  

Nastavení prohledávání.

In [46]:
best_trial_distill_aug = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Distill-aug-embedd",
    n_trials=150
)

[I 2025-03-22 11:08:47,381] A new study created in memory with name: Distill-aug-embedd


Trial 0 with params: {'learning_rate': 0.0002805758207667253, 'weight_decay': 0.01, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6974,1.142992,0.809349,0.676968,0.693155,0.683362
2,0.5344,0.857628,0.852429,0.715264,0.726921,0.719829
3,0.3124,0.759493,0.874427,0.882113,0.828034,0.845399
4,0.2071,0.705778,0.882676,0.890862,0.842184,0.860378
5,0.1584,0.688659,0.880843,0.891687,0.84008,0.859889
6,0.1329,0.673177,0.886343,0.895722,0.844874,0.86442
7,0.1158,0.666344,0.88451,0.891531,0.844876,0.861877
8,0.1031,0.661213,0.888176,0.893987,0.847952,0.865027
9,0.0957,0.646031,0.892759,0.89874,0.85089,0.8688
10,0.0884,0.63767,0.889093,0.89658,0.847936,0.866425


[I 2025-03-22 11:11:02,251] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 0.00010255552094216992, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5208,1.767084,0.722273,0.614635,0.611127,0.610945
2,1.1685,1.298851,0.796517,0.667832,0.680825,0.673383
3,0.7914,1.128467,0.822181,0.690018,0.701016,0.695294
4,0.5944,1.057351,0.823098,0.690936,0.704668,0.695957
5,0.4728,0.962527,0.84418,0.706086,0.721745,0.713546


[I 2025-03-22 11:12:04,553] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 5.497167787383099e-05, 'weight_decay': 0.01, 'warmup_steps': 26, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9849,2.352055,0.612282,0.554768,0.506367,0.514437
2,1.7649,1.718317,0.735105,0.616989,0.628568,0.621928
3,1.2462,1.474024,0.769936,0.645962,0.658222,0.651502
4,0.9909,1.325576,0.79835,0.673258,0.681991,0.677094
5,0.8319,1.243336,0.812099,0.680587,0.694941,0.687161
6,0.7218,1.16896,0.820348,0.688864,0.700329,0.694449
7,0.6412,1.131119,0.819432,0.687426,0.700056,0.693571
8,0.5772,1.106739,0.825848,0.694035,0.705468,0.699356
9,0.5304,1.073822,0.825848,0.693215,0.70612,0.699221
10,0.4948,1.04837,0.828598,0.694759,0.707898,0.701161


[I 2025-03-22 11:13:58,609] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 0.00011635338541918901, 'weight_decay': 0.003, 'warmup_steps': 16, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3928,1.650117,0.749771,0.633904,0.637222,0.634307
2,1.0701,1.233893,0.810266,0.680269,0.691573,0.685194
3,0.7201,1.07308,0.831347,0.698976,0.709075,0.70364
4,0.5334,0.987338,0.835014,0.700294,0.71313,0.705782
5,0.419,0.916132,0.849679,0.711097,0.725347,0.718013


[I 2025-03-22 11:15:03,539] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.065,0.809382,0.860678,0.888255,0.744942,0.745176
2,0.2201,0.651418,0.887259,0.894808,0.838319,0.857627
3,0.1332,0.661795,0.880843,0.887143,0.843015,0.858087
4,0.1058,0.613105,0.889093,0.895641,0.848572,0.866054
5,0.0893,0.623715,0.891842,0.898606,0.850348,0.868529
6,0.082,0.57795,0.890009,0.896759,0.849234,0.866965
7,0.0742,0.583455,0.892759,0.899503,0.851905,0.869839
8,0.0689,0.586768,0.886343,0.881573,0.847098,0.860257
9,0.0649,0.587295,0.889093,0.895088,0.848883,0.86577
10,0.0614,0.571972,0.897342,0.902303,0.854987,0.872849


[I 2025-03-22 11:17:54,775] Trial 4 finished with value: 0.8693088405027042 and parameters: {'learning_rate': 0.0008369042894376068, 'weight_decay': 0.001, 'warmup_steps': 9, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 4 with value: 0.8693088405027042.


Trial 5 with params: {'learning_rate': 0.0018591820902866042, 'weight_decay': 0.002, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8198,0.675008,0.879927,0.869442,0.823128,0.838905
2,0.1514,0.608727,0.893676,0.899199,0.853515,0.86957
3,0.1036,0.624597,0.890009,0.894819,0.850718,0.865987
4,0.0855,0.597458,0.895509,0.889535,0.854626,0.867606
5,0.0753,0.561435,0.897342,0.890302,0.85577,0.869124
6,0.068,0.56585,0.895509,0.90081,0.853763,0.871332
7,0.0622,0.559529,0.901008,0.893497,0.857828,0.871453
8,0.0579,0.566164,0.897342,0.890118,0.855905,0.86913
9,0.0542,0.574464,0.894592,0.8881,0.853015,0.86671
10,0.0509,0.560316,0.893676,0.885078,0.843594,0.858948


[I 2025-03-22 11:20:01,560] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0533,0.83024,0.853346,0.798667,0.739115,0.738054
2,0.2221,0.676236,0.887259,0.897044,0.847249,0.864864
3,0.1328,0.670196,0.879927,0.876083,0.842263,0.854731
4,0.1062,0.634292,0.88451,0.891704,0.845309,0.862372
5,0.0908,0.633341,0.888176,0.872489,0.848809,0.858226
6,0.0818,0.576204,0.892759,0.898768,0.851234,0.869064
7,0.0738,0.585358,0.896425,0.902056,0.854254,0.872334
8,0.0691,0.589713,0.898258,0.902529,0.855999,0.87335
9,0.0651,0.589903,0.894592,0.900602,0.853301,0.871111
10,0.0615,0.576042,0.892759,0.900547,0.860521,0.876539


[I 2025-03-22 11:23:53,000] Trial 6 finished with value: 0.8722091103951631 and parameters: {'learning_rate': 0.0008204643365323959, 'weight_decay': 0.001, 'warmup_steps': 2, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 6 with value: 0.8722091103951631.


Trial 7 with params: {'learning_rate': 0.0020690200562805084, 'weight_decay': 0.003, 'warmup_steps': 3, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7346,0.711097,0.871677,0.853764,0.799305,0.813096
2,0.1465,0.608887,0.889093,0.882861,0.848957,0.861503
3,0.1032,0.596008,0.890926,0.883195,0.850652,0.862659
4,0.0845,0.559763,0.897342,0.893522,0.864836,0.876098
5,0.0756,0.553393,0.903758,0.895695,0.859603,0.873791
6,0.0686,0.535851,0.904675,0.891571,0.842315,0.859732
7,0.0623,0.53806,0.909258,0.898762,0.854641,0.871278
8,0.0577,0.543955,0.905591,0.891843,0.843252,0.860015
9,0.0543,0.545305,0.909258,0.897769,0.855118,0.870908
10,0.0515,0.536282,0.909258,0.898169,0.855007,0.871246


[I 2025-03-22 11:25:58,065] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 8.770946743725407e-05, 'weight_decay': 0.005, 'warmup_steps': 1, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.5875,1.9089,0.700275,0.604828,0.588891,0.59281
2,1.3125,1.395849,0.775435,0.65082,0.662867,0.655505
3,0.9011,1.215116,0.808433,0.681366,0.690425,0.685524
4,0.6953,1.137381,0.811182,0.68123,0.694452,0.685703
5,0.5617,1.032366,0.826764,0.692631,0.706994,0.69958


[I 2025-03-22 11:27:03,892] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9968,0.782696,0.866178,0.85015,0.7682,0.776903
2,0.1908,0.646196,0.890009,0.898513,0.849549,0.866874
3,0.1213,0.66495,0.879927,0.887271,0.842229,0.857832
4,0.0976,0.63108,0.887259,0.882344,0.85684,0.866755
5,0.0842,0.622184,0.890009,0.887001,0.858503,0.870385
6,0.0759,0.563623,0.898258,0.905189,0.86422,0.880813
7,0.0698,0.567172,0.898258,0.89308,0.864487,0.876402
8,0.0643,0.571043,0.897342,0.890365,0.855053,0.869082
9,0.0604,0.575409,0.898258,0.903589,0.855692,0.873879
10,0.057,0.560543,0.897342,0.905192,0.863704,0.880562


[I 2025-03-22 11:30:04,504] Trial 9 finished with value: 0.8799843984149786 and parameters: {'learning_rate': 0.0010568529720322872, 'weight_decay': 0.003, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}. Best is trial 9 with value: 0.8799843984149786.


Trial 10 with params: {'learning_rate': 0.0019688396221773483, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8268,0.712099,0.878093,0.881722,0.821944,0.84067
2,0.1464,0.582014,0.895509,0.888504,0.854284,0.86743
3,0.1049,0.576395,0.890926,0.885877,0.859864,0.87004
4,0.0854,0.585894,0.892759,0.888313,0.860729,0.871763
5,0.0753,0.583728,0.901008,0.894351,0.867396,0.878362
6,0.0681,0.531428,0.907424,0.899003,0.862652,0.877202
7,0.0628,0.559616,0.896425,0.89121,0.863872,0.874899
8,0.0581,0.549959,0.898258,0.892944,0.865753,0.876636
9,0.0535,0.545334,0.903758,0.897908,0.869607,0.881387
10,0.0505,0.537886,0.901925,0.896741,0.868212,0.880166


[I 2025-03-22 11:33:05,391] Trial 10 finished with value: 0.8806069467650639 and parameters: {'learning_rate': 0.0019688396221773483, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 10 with value: 0.8806069467650639.


Trial 11 with params: {'learning_rate': 0.0009675914336245249, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0795,0.764235,0.870761,0.847148,0.789413,0.802308
2,0.1966,0.634839,0.889093,0.897834,0.849142,0.866943
3,0.1229,0.621856,0.890926,0.89677,0.850419,0.866765
4,0.0984,0.603006,0.890009,0.896088,0.849795,0.866618
5,0.0853,0.609064,0.888176,0.896568,0.847824,0.866116
6,0.0763,0.554793,0.901008,0.893689,0.857326,0.871969
7,0.0705,0.565291,0.897342,0.903591,0.854381,0.873244
8,0.0653,0.565931,0.893676,0.899418,0.85231,0.870054
9,0.062,0.57076,0.896425,0.901806,0.854359,0.872294
10,0.0579,0.556715,0.901925,0.906941,0.858262,0.876823


[I 2025-03-22 11:36:10,988] Trial 11 finished with value: 0.8776223938014377 and parameters: {'learning_rate': 0.0009675914336245249, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.5, 'temperature': 5.0}. Best is trial 10 with value: 0.8806069467650639.


Trial 12 with params: {'learning_rate': 0.004183597238132349, 'weight_decay': 0.002, 'warmup_steps': 22, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6791,0.696427,0.87626,0.881072,0.821083,0.838959
2,0.1273,0.552157,0.899175,0.889478,0.846908,0.862755
3,0.0937,0.577769,0.894592,0.889193,0.852603,0.866911
4,0.0783,0.584783,0.896425,0.889032,0.845329,0.861133
5,0.0698,0.54731,0.903758,0.893991,0.851016,0.866875
6,0.0626,0.543414,0.900092,0.88734,0.839922,0.855988
7,0.0569,0.550917,0.897342,0.885869,0.837248,0.85393
8,0.0536,0.53759,0.901925,0.888208,0.841295,0.857315
9,0.0499,0.549078,0.898258,0.886214,0.838594,0.854777
10,0.0477,0.540647,0.903758,0.890325,0.842987,0.859108


[I 2025-03-22 11:38:43,421] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 0.0005493509373133941, 'weight_decay': 0.006, 'warmup_steps': 16, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.278,0.860454,0.862511,0.722194,0.736025,0.728397
2,0.3073,0.69505,0.88176,0.89317,0.842487,0.860808
3,0.1702,0.670945,0.882676,0.892449,0.843723,0.860679
4,0.1239,0.670389,0.887259,0.89524,0.846196,0.864745
5,0.1044,0.635833,0.886343,0.892923,0.845495,0.863184


[I 2025-03-22 11:39:51,897] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 0.0029303028816080995, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.707,0.646011,0.888176,0.890952,0.830324,0.849399
2,0.1358,0.588499,0.893676,0.897315,0.852878,0.868704
3,0.0961,0.55068,0.899175,0.89104,0.857118,0.870043
4,0.0807,0.570889,0.901008,0.891844,0.849708,0.865016
5,0.0705,0.536141,0.903758,0.893689,0.850717,0.866735
6,0.0647,0.52653,0.905591,0.892168,0.842991,0.860175
7,0.0598,0.536193,0.904675,0.89595,0.861813,0.874857
8,0.0556,0.546714,0.903758,0.906376,0.860386,0.877231
9,0.0514,0.541916,0.904675,0.908461,0.861099,0.878656
10,0.0494,0.533817,0.904675,0.90965,0.860144,0.87901


[I 2025-03-22 11:43:00,613] Trial 14 finished with value: 0.8802561724170551 and parameters: {'learning_rate': 0.0029303028816080995, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 2.5}. Best is trial 10 with value: 0.8806069467650639.


Trial 15 with params: {'learning_rate': 0.0019903572352408887, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7719,0.721235,0.878093,0.867409,0.822052,0.8369
2,0.146,0.611159,0.888176,0.881782,0.848945,0.86112
3,0.1037,0.604374,0.893676,0.878431,0.861449,0.868272
4,0.0864,0.582892,0.891842,0.885231,0.851194,0.864126
5,0.0774,0.572458,0.897342,0.900305,0.836854,0.857604


[I 2025-03-22 11:44:27,171] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0038019350332573216, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6677,0.665847,0.892759,0.883726,0.84265,0.857275
2,0.129,0.58861,0.890926,0.895189,0.850552,0.866504
3,0.0944,0.570209,0.891842,0.895569,0.841955,0.860078
4,0.0784,0.580628,0.899175,0.887287,0.838822,0.855333
5,0.0693,0.545678,0.908341,0.894715,0.846014,0.86312
6,0.0626,0.536077,0.906508,0.906628,0.844466,0.864483
7,0.0578,0.539536,0.901925,0.903134,0.84183,0.86126
8,0.0544,0.554959,0.900092,0.901626,0.839229,0.85937
9,0.0508,0.56236,0.900092,0.901433,0.839211,0.859255
10,0.0487,0.547966,0.904675,0.906763,0.85211,0.871149


[I 2025-03-22 11:46:39,899] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 0.0012003117500593078, 'weight_decay': 0.007, 'warmup_steps': 18, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9552,0.760593,0.868928,0.853354,0.82456,0.833251
2,0.176,0.619131,0.892759,0.899858,0.851357,0.869276
3,0.1146,0.63888,0.88176,0.888273,0.844202,0.858992
4,0.0943,0.601491,0.896425,0.904112,0.862947,0.879306
5,0.0823,0.597987,0.890926,0.897627,0.849948,0.867788
6,0.0738,0.555292,0.901925,0.908357,0.867157,0.88387
7,0.0678,0.558563,0.898258,0.893164,0.86494,0.876764
8,0.0631,0.55233,0.899175,0.892005,0.856033,0.870421
9,0.0591,0.562995,0.898258,0.893803,0.864293,0.876792
10,0.0563,0.545876,0.902841,0.898304,0.868106,0.880957


[I 2025-03-22 11:49:52,590] Trial 17 finished with value: 0.8797508819225542 and parameters: {'learning_rate': 0.0012003117500593078, 'weight_decay': 0.007, 'warmup_steps': 18, 'lambda_param': 0.4, 'temperature': 7.0}. Best is trial 10 with value: 0.8806069467650639.


Trial 18 with params: {'learning_rate': 0.004501401120816689, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6343,0.640207,0.891842,0.88228,0.842932,0.856538
2,0.1269,0.587511,0.895509,0.89743,0.845637,0.862954
3,0.0959,0.560988,0.897342,0.899775,0.846385,0.864712
4,0.0798,0.577037,0.898258,0.90275,0.84612,0.866086
5,0.0695,0.545608,0.908341,0.911172,0.854715,0.874835
6,0.0629,0.547138,0.907424,0.910297,0.85304,0.873232
7,0.0572,0.548388,0.902841,0.906751,0.850047,0.869489
8,0.0543,0.534293,0.909258,0.912384,0.864371,0.882206
9,0.0503,0.547529,0.905591,0.9095,0.861351,0.879055
10,0.0481,0.535097,0.912007,0.915008,0.866185,0.884506


[I 2025-03-22 11:52:58,154] Trial 18 finished with value: 0.8829925835070056 and parameters: {'learning_rate': 0.004501401120816689, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 19 with params: {'learning_rate': 0.0025467059918482496, 'weight_decay': 0.005, 'warmup_steps': 18, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.744,0.698988,0.88451,0.888125,0.827203,0.846628
2,0.1386,0.599062,0.890926,0.881683,0.841501,0.855832
3,0.0986,0.590795,0.899175,0.893295,0.865166,0.876747
4,0.0804,0.565553,0.892759,0.885236,0.842187,0.858091
5,0.0725,0.562524,0.903758,0.892179,0.851443,0.866212
6,0.0651,0.542003,0.904675,0.894384,0.851183,0.867471
7,0.0603,0.544973,0.900092,0.902994,0.847815,0.86694
8,0.0557,0.555886,0.897342,0.900249,0.84585,0.864702
9,0.0516,0.553583,0.901008,0.903114,0.848314,0.867377
10,0.0488,0.540606,0.904675,0.906426,0.85125,0.870688


[I 2025-03-22 11:54:42,814] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.0032165803841561367, 'weight_decay': 0.007, 'warmup_steps': 24, 'lambda_param': 0.9, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7101,0.700509,0.885426,0.886859,0.829484,0.846311
2,0.1342,0.571422,0.902841,0.890386,0.851354,0.865051
3,0.0962,0.549946,0.898258,0.880225,0.855936,0.865617
4,0.0784,0.551206,0.901925,0.895905,0.858419,0.872748
5,0.0706,0.526383,0.902841,0.89013,0.840693,0.857982


[I 2025-03-22 11:55:41,132] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 0.003079484154085926, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6946,0.695624,0.878093,0.881503,0.823309,0.840354
2,0.1343,0.584806,0.897342,0.887736,0.846511,0.861326
3,0.0959,0.57869,0.897342,0.900567,0.846687,0.865136
4,0.0798,0.563749,0.902841,0.891054,0.841392,0.858749
5,0.0715,0.545414,0.904675,0.893894,0.851787,0.867536
6,0.064,0.532835,0.902841,0.893001,0.850052,0.866156
7,0.0584,0.537364,0.910174,0.913941,0.87409,0.889874
8,0.0547,0.523769,0.908341,0.909449,0.8552,0.87412
9,0.0511,0.532852,0.907424,0.910718,0.862851,0.880771
10,0.0484,0.526782,0.909258,0.910787,0.854963,0.874669


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--recall/11f90e583db35601050aed380d48e83202a896976b9608432fba9244fb447f24 (last modified on Fri Jan 10 23:14:00 2025) since it couldn't be found locally at evaluate-metric--recall, or remotely on the Hugging Face Hub.
[I 2025-03-22 11:58:48,803] Trial 21 finished with value: 0.8662016831220614 and parameters: {'learning_rate': 0.003079484154085926, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 5.5}. Best is trial 18 with value: 0.8829925835070056.


Trial 22 with params: {'learning_rate': 0.0026513145118241873, 'weight_decay': 0.008, 'warmup_steps': 11, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.713,0.640742,0.888176,0.889385,0.829949,0.848524
2,0.1372,0.611212,0.891842,0.895388,0.850607,0.866468
3,0.0977,0.5687,0.894592,0.898927,0.852724,0.869917
4,0.0798,0.566671,0.892759,0.899674,0.851019,0.869267
5,0.0727,0.535773,0.900092,0.89342,0.856784,0.871422
6,0.0644,0.53396,0.901008,0.904498,0.848047,0.86817
7,0.0593,0.525746,0.901008,0.906844,0.866791,0.882507
8,0.0547,0.532078,0.899175,0.902063,0.848144,0.866907
9,0.0513,0.54813,0.899175,0.903367,0.856691,0.873882
10,0.0486,0.530836,0.899175,0.902402,0.847227,0.866688


[I 2025-03-22 12:01:08,231] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 0.003062639579876736, 'weight_decay': 0.009000000000000001, 'warmup_steps': 2, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.674,0.650025,0.885426,0.873674,0.829056,0.843057
2,0.1332,0.589626,0.898258,0.890714,0.856648,0.869873
3,0.0974,0.571142,0.892759,0.88736,0.851596,0.865544
4,0.0809,0.564066,0.901008,0.894388,0.858585,0.872614
5,0.0713,0.543917,0.902841,0.894927,0.859434,0.873353
6,0.065,0.552018,0.901008,0.894433,0.857968,0.872338
7,0.0599,0.554798,0.904675,0.899117,0.860837,0.876126
8,0.0549,0.556437,0.901008,0.891731,0.848895,0.865046
9,0.0515,0.551682,0.904675,0.895103,0.851679,0.867896
10,0.0488,0.539823,0.901008,0.892516,0.848817,0.865282


[I 2025-03-22 12:03:04,616] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.004767719656506243, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6609,0.646601,0.891842,0.881039,0.833162,0.849215
2,0.1268,0.572163,0.893676,0.881366,0.834383,0.850361
3,0.0935,0.577888,0.896425,0.887508,0.845864,0.860847
4,0.0774,0.57573,0.897342,0.886886,0.837138,0.854043
5,0.0692,0.560599,0.899175,0.887009,0.837876,0.854906


[I 2025-03-22 12:04:01,724] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.004258653994358668, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6482,0.638157,0.892759,0.880924,0.833752,0.849954
2,0.1301,0.595371,0.897342,0.898813,0.847458,0.864606
3,0.0966,0.582732,0.899175,0.888261,0.847516,0.86253
4,0.08,0.571883,0.901925,0.903527,0.840368,0.860998
5,0.0694,0.551402,0.904675,0.904001,0.842365,0.862164
6,0.063,0.554747,0.903758,0.903794,0.841195,0.861536
7,0.0576,0.550055,0.902841,0.903541,0.840852,0.86068
8,0.0536,0.539806,0.904675,0.904197,0.84173,0.861946
9,0.0503,0.547734,0.902841,0.904314,0.840939,0.861397
10,0.0474,0.544955,0.903758,0.904505,0.841606,0.861924


[I 2025-03-22 12:06:26,714] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 0.000528437437423988, 'weight_decay': 0.007, 'warmup_steps': 1, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.263,0.890998,0.853346,0.714068,0.728259,0.720266
2,0.3147,0.707505,0.877177,0.886316,0.829425,0.849377
3,0.1753,0.700187,0.879927,0.889502,0.842851,0.858475
4,0.1276,0.670877,0.886343,0.894681,0.846638,0.864389
5,0.1071,0.636142,0.885426,0.893359,0.844968,0.863355
6,0.0931,0.617283,0.890009,0.89652,0.848501,0.866727
7,0.0851,0.603079,0.883593,0.892699,0.843325,0.862162
8,0.0781,0.626818,0.880843,0.887827,0.843106,0.858945
9,0.0742,0.60232,0.886343,0.894286,0.847088,0.864584
10,0.0696,0.588561,0.894592,0.90154,0.852893,0.871458


[I 2025-03-22 12:08:20,513] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00167727556425868, 'weight_decay': 0.005, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8979,0.68842,0.875344,0.867915,0.82958,0.842801
2,0.1533,0.598553,0.890009,0.895944,0.849847,0.866775
3,0.1041,0.606228,0.890926,0.896708,0.850295,0.86729
4,0.0869,0.599018,0.891842,0.886724,0.85046,0.864542
5,0.0771,0.585554,0.893676,0.877833,0.851378,0.862133
6,0.0698,0.562899,0.896425,0.883928,0.835921,0.85273
7,0.064,0.557894,0.900092,0.892578,0.856771,0.870897
8,0.0589,0.552736,0.897342,0.888177,0.845725,0.861682
9,0.0548,0.55937,0.897342,0.890251,0.854692,0.868728
10,0.0523,0.563591,0.896425,0.888668,0.845028,0.86162


[I 2025-03-22 12:10:28,784] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.004602758215875008, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6546,0.623646,0.892759,0.895785,0.843011,0.860814
2,0.126,0.572001,0.897342,0.887494,0.845962,0.86136
3,0.0936,0.617366,0.892759,0.884712,0.842884,0.857691
4,0.0777,0.577182,0.897342,0.890133,0.845304,0.8622
5,0.0681,0.570871,0.905591,0.894776,0.852541,0.868071
6,0.0621,0.55746,0.896425,0.886978,0.844858,0.860453
7,0.0569,0.549297,0.905591,0.907893,0.852938,0.871937
8,0.053,0.542424,0.907424,0.907385,0.845099,0.865175
9,0.0496,0.562734,0.899175,0.90219,0.847707,0.866217
10,0.0474,0.541934,0.910174,0.912648,0.865241,0.882886


[I 2025-03-22 12:13:35,299] Trial 28 finished with value: 0.8606913707529816 and parameters: {'learning_rate': 0.004602758215875008, 'weight_decay': 0.009000000000000001, 'warmup_steps': 23, 'lambda_param': 0.4, 'temperature': 3.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 29 with params: {'learning_rate': 0.0034467549214159217, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.1, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6786,0.662583,0.887259,0.874417,0.830156,0.844361
2,0.1311,0.577731,0.893676,0.897077,0.843372,0.861795
3,0.0954,0.576741,0.898258,0.900707,0.847853,0.865836
4,0.0796,0.570984,0.897342,0.901753,0.846529,0.86554
5,0.0705,0.552232,0.908341,0.909614,0.844717,0.86636
6,0.0643,0.535392,0.904675,0.905596,0.842482,0.8632
7,0.0583,0.536295,0.907424,0.9076,0.845234,0.865437
8,0.0545,0.53623,0.904675,0.90516,0.843035,0.86312
9,0.0503,0.535988,0.907424,0.909654,0.853908,0.873567
10,0.0482,0.533004,0.901925,0.903389,0.840771,0.861054


[I 2025-03-22 12:15:33,279] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.0007243732057988554, 'weight_decay': 0.0, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1987,0.831958,0.859762,0.720359,0.734543,0.726027
2,0.2411,0.651864,0.88451,0.895464,0.843859,0.863136
3,0.1417,0.658045,0.883593,0.890842,0.845156,0.861014
4,0.1097,0.612631,0.891842,0.897413,0.850647,0.86795
5,0.0925,0.616923,0.893676,0.890193,0.86144,0.873317
6,0.083,0.591067,0.890926,0.898062,0.858688,0.874444
7,0.0758,0.59047,0.892759,0.900317,0.860881,0.87666
8,0.0704,0.589008,0.890009,0.895454,0.849455,0.866424
9,0.0668,0.578454,0.889093,0.895963,0.848381,0.86635
10,0.0627,0.576637,0.895509,0.901114,0.852951,0.871267


[I 2025-03-22 12:17:35,261] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 0.0008620377804791269, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.064,0.814213,0.858845,0.885363,0.743407,0.742669
2,0.2143,0.668052,0.883593,0.894136,0.845262,0.861831
3,0.1319,0.642092,0.888176,0.893924,0.849064,0.864377
4,0.1047,0.621051,0.890926,0.895868,0.850599,0.86699
5,0.089,0.624965,0.887259,0.892948,0.847365,0.864176
6,0.0814,0.571806,0.892759,0.898104,0.851269,0.868758
7,0.0739,0.578206,0.892759,0.898662,0.851954,0.869383
8,0.068,0.580495,0.892759,0.898695,0.852113,0.869271
9,0.0643,0.577488,0.893676,0.898758,0.852435,0.869616
10,0.0606,0.567394,0.896425,0.901178,0.854453,0.871951


[I 2025-03-22 12:20:54,613] Trial 31 finished with value: 0.8685513864650815 and parameters: {'learning_rate': 0.0008620377804791269, 'weight_decay': 0.005, 'warmup_steps': 11, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}. Best is trial 18 with value: 0.8829925835070056.


Trial 32 with params: {'learning_rate': 0.0003890582208628503, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4884,0.986953,0.840513,0.701857,0.718343,0.709272
2,0.4067,0.783778,0.866178,0.848932,0.784426,0.79975
3,0.2283,0.698116,0.885426,0.880749,0.846363,0.85802
4,0.1553,0.707611,0.879927,0.889842,0.84098,0.858025
5,0.1236,0.643356,0.893676,0.900398,0.851719,0.870247
6,0.1058,0.651925,0.892759,0.900512,0.851135,0.870062
7,0.0959,0.629694,0.890009,0.896776,0.848973,0.866961
8,0.086,0.630726,0.889093,0.89472,0.848864,0.865561
9,0.0816,0.616253,0.895509,0.901943,0.853552,0.871798
10,0.076,0.610239,0.896425,0.903146,0.854192,0.87289


[I 2025-03-22 12:23:59,549] Trial 32 finished with value: 0.8706939965956098 and parameters: {'learning_rate': 0.0003890582208628503, 'weight_decay': 0.003, 'warmup_steps': 21, 'lambda_param': 0.7000000000000001, 'temperature': 3.5}. Best is trial 18 with value: 0.8829925835070056.


Trial 33 with params: {'learning_rate': 5.8367877335939255e-05, 'weight_decay': 0.01, 'warmup_steps': 18, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9328,2.299304,0.625115,0.558322,0.519167,0.526374
2,1.7053,1.672215,0.739688,0.619362,0.632605,0.625107
3,1.1981,1.438456,0.776352,0.652155,0.66321,0.657324
4,0.9489,1.300655,0.799267,0.673639,0.683135,0.677509
5,0.7957,1.217555,0.812099,0.680631,0.694746,0.68733
6,0.6885,1.148508,0.821265,0.689584,0.701044,0.695133
7,0.6098,1.110471,0.822181,0.689454,0.702996,0.696011
8,0.5474,1.085083,0.829514,0.696431,0.707981,0.701826
9,0.5015,1.05455,0.824015,0.691294,0.704358,0.697432
10,0.4656,1.030533,0.832264,0.697125,0.710975,0.703872


[I 2025-03-22 12:26:07,667] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 0.0032801859704527357, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6827,0.662188,0.890009,0.891111,0.832243,0.850384
2,0.1315,0.586388,0.898258,0.888059,0.846512,0.861616
3,0.0962,0.598816,0.890926,0.895236,0.850799,0.866766
4,0.0805,0.581619,0.894592,0.889241,0.853323,0.867251
5,0.0716,0.561624,0.904675,0.897934,0.860013,0.875235
6,0.0648,0.525891,0.907424,0.900016,0.862554,0.877603
7,0.0582,0.547186,0.904675,0.909388,0.861036,0.879174
8,0.0541,0.531154,0.903758,0.891828,0.842387,0.859647
9,0.0505,0.53645,0.903758,0.908064,0.860341,0.878291
10,0.0478,0.52189,0.908341,0.912313,0.86379,0.88226


[I 2025-03-22 12:29:32,849] Trial 34 finished with value: 0.8815024102069801 and parameters: {'learning_rate': 0.0032801859704527357, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.9, 'temperature': 4.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 35 with params: {'learning_rate': 0.0006215015800448184, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.219,0.822585,0.862511,0.721642,0.735956,0.728021
2,0.2688,0.686143,0.87901,0.89288,0.840257,0.859113
3,0.154,0.684044,0.877177,0.887082,0.839114,0.85565
4,0.1156,0.649271,0.887259,0.896206,0.846689,0.865621
5,0.0982,0.641377,0.886343,0.89255,0.846395,0.863499
6,0.087,0.600952,0.890009,0.897316,0.848563,0.867122
7,0.0796,0.59337,0.892759,0.900839,0.851201,0.870283
8,0.0733,0.607667,0.883593,0.890382,0.84467,0.861215
9,0.0702,0.590976,0.890009,0.896569,0.849441,0.867123
10,0.0658,0.581422,0.895509,0.902051,0.853344,0.87194


[I 2025-03-22 12:33:06,043] Trial 35 finished with value: 0.8710652122856238 and parameters: {'learning_rate': 0.0006215015800448184, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 4.5}. Best is trial 18 with value: 0.8829925835070056.


Trial 36 with params: {'learning_rate': 0.0034221038949770522, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6698,0.677086,0.882676,0.888119,0.835848,0.85313
2,0.1318,0.60759,0.886343,0.876908,0.83831,0.851719
3,0.0966,0.567666,0.899175,0.903614,0.85669,0.874082
4,0.0796,0.567286,0.898258,0.892823,0.85567,0.870272
5,0.0697,0.555388,0.909258,0.912648,0.865348,0.88305
6,0.0631,0.561925,0.905591,0.908613,0.851691,0.871696
7,0.0583,0.554069,0.902841,0.907897,0.858739,0.87731
8,0.0548,0.550762,0.905591,0.909376,0.862041,0.879861
9,0.0511,0.560659,0.901008,0.906148,0.858023,0.876083
10,0.0482,0.559543,0.902841,0.907394,0.859423,0.877493


[I 2025-03-22 12:35:55,008] Trial 36 finished with value: 0.8706013127743392 and parameters: {'learning_rate': 0.0034221038949770522, 'weight_decay': 0.009000000000000001, 'warmup_steps': 13, 'lambda_param': 1.0, 'temperature': 4.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 37 with params: {'learning_rate': 0.0006604269458020496, 'weight_decay': 0.007, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1674,0.827145,0.861595,0.721734,0.735932,0.727746
2,0.2584,0.671028,0.87626,0.88965,0.827192,0.849499
3,0.1505,0.66969,0.87901,0.887264,0.840785,0.857134
4,0.1154,0.637486,0.892759,0.89941,0.851229,0.869513
5,0.0976,0.621671,0.889093,0.883837,0.849197,0.862814
6,0.0867,0.589812,0.889093,0.897523,0.857373,0.873527
7,0.0785,0.58542,0.893676,0.900377,0.851927,0.870358
8,0.0727,0.595814,0.890926,0.896966,0.850453,0.867698
9,0.0692,0.588156,0.888176,0.89512,0.848391,0.865801
10,0.065,0.577982,0.895509,0.901859,0.853662,0.871993


[I 2025-03-22 12:39:25,397] Trial 37 finished with value: 0.8703016896603218 and parameters: {'learning_rate': 0.0006604269458020496, 'weight_decay': 0.007, 'warmup_steps': 10, 'lambda_param': 0.4, 'temperature': 2.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 38 with params: {'learning_rate': 0.00015181932061058664, 'weight_decay': 0.003, 'warmup_steps': 0, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1137,1.466117,0.762603,0.64041,0.653193,0.644633
2,0.891,1.110475,0.825848,0.694004,0.704014,0.69822
3,0.5788,0.967793,0.836847,0.702006,0.714313,0.707564
4,0.4192,0.900421,0.850596,0.713281,0.725806,0.718821
5,0.3207,0.83135,0.862511,0.846646,0.763592,0.775292
6,0.2485,0.778532,0.870761,0.879961,0.823359,0.843566
7,0.209,0.779816,0.872594,0.880638,0.836008,0.852263
8,0.1801,0.769513,0.875344,0.884879,0.837245,0.855238
9,0.1617,0.752525,0.873511,0.884157,0.835423,0.853478
10,0.1461,0.742852,0.88176,0.89123,0.84181,0.860672


[I 2025-03-22 12:41:31,976] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.003128221129412637, 'weight_decay': 0.006, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6754,0.689966,0.888176,0.874543,0.831251,0.844474
2,0.1349,0.575235,0.898258,0.886543,0.847115,0.860966
3,0.0969,0.580264,0.897342,0.888623,0.856439,0.868236
4,0.0803,0.586182,0.894592,0.884658,0.834345,0.851447
5,0.0721,0.551615,0.908341,0.894658,0.845755,0.862829


[I 2025-03-22 12:43:14,867] Trial 39 pruned. 


Trial 40 with params: {'learning_rate': 0.00464848195038278, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6147,0.633902,0.894592,0.884905,0.843912,0.858779
2,0.1274,0.589691,0.896425,0.874925,0.844371,0.855959
3,0.0963,0.581384,0.900092,0.901768,0.848918,0.866895
4,0.0802,0.562843,0.903758,0.894617,0.851179,0.867419
5,0.0699,0.555379,0.909258,0.897661,0.855524,0.871162
6,0.0631,0.546164,0.907424,0.895387,0.854393,0.869465
7,0.0572,0.557007,0.906508,0.893966,0.853837,0.868137
8,0.0536,0.548446,0.905591,0.906652,0.852888,0.871273
9,0.0503,0.548107,0.907424,0.908198,0.854504,0.872892
10,0.0473,0.531322,0.908341,0.909414,0.854791,0.873865


[I 2025-03-22 12:46:27,136] Trial 40 finished with value: 0.8736231820943862 and parameters: {'learning_rate': 0.00464848195038278, 'weight_decay': 0.007, 'warmup_steps': 4, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 41 with params: {'learning_rate': 0.0034276163748855164, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6827,0.639445,0.898258,0.883576,0.839149,0.853763
2,0.1304,0.547899,0.901925,0.879733,0.849778,0.861445
3,0.0943,0.534182,0.895509,0.874147,0.845279,0.855987
4,0.0784,0.528639,0.908341,0.896836,0.854424,0.870246
5,0.0693,0.521801,0.908341,0.894136,0.844674,0.862089


[I 2025-03-22 12:47:25,791] Trial 41 pruned. 


Trial 42 with params: {'learning_rate': 0.0007937992245075283, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0975,0.820003,0.861595,0.888318,0.745769,0.745487
2,0.2285,0.646504,0.890926,0.898995,0.850354,0.867994
3,0.1372,0.650677,0.883593,0.88899,0.845271,0.85991
4,0.1075,0.605275,0.896425,0.901261,0.854386,0.871905
5,0.0912,0.61821,0.890009,0.896448,0.849119,0.866868
6,0.0815,0.571669,0.893676,0.899802,0.851985,0.870001
7,0.0746,0.57239,0.895509,0.900835,0.853668,0.871362
8,0.069,0.579839,0.892759,0.897862,0.8518,0.868788
9,0.0651,0.573696,0.896425,0.903094,0.854391,0.872982
10,0.0616,0.565677,0.898258,0.903594,0.855613,0.873818


[I 2025-03-22 12:50:30,143] Trial 42 finished with value: 0.8683058191724583 and parameters: {'learning_rate': 0.0007937992245075283, 'weight_decay': 0.0, 'warmup_steps': 15, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 43 with params: {'learning_rate': 0.0017369720305714822, 'weight_decay': 0.007, 'warmup_steps': 19, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8312,0.703177,0.873511,0.874452,0.810622,0.827297
2,0.1547,0.603134,0.892759,0.898543,0.852152,0.868809
3,0.1035,0.615002,0.890009,0.873542,0.850184,0.858884
4,0.0853,0.588503,0.897342,0.891464,0.855566,0.869464
5,0.0771,0.597499,0.890009,0.883891,0.849085,0.862502


[I 2025-03-22 12:51:27,578] Trial 43 pruned. 


Trial 44 with params: {'learning_rate': 0.0004014407821893915, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4989,0.992403,0.83868,0.700933,0.71709,0.707762
2,0.3911,0.78088,0.866178,0.872345,0.791871,0.813289
3,0.2155,0.700562,0.879927,0.887666,0.842503,0.857199
4,0.1498,0.696915,0.882676,0.893103,0.84295,0.861029
5,0.1197,0.644966,0.888176,0.896765,0.847319,0.866224
6,0.1031,0.636514,0.892759,0.899853,0.851218,0.869757
7,0.0937,0.616335,0.889093,0.897798,0.848059,0.867007
8,0.0845,0.623833,0.888176,0.894817,0.8484,0.865427
9,0.0801,0.612375,0.892759,0.899648,0.851256,0.869459
10,0.0747,0.603058,0.893676,0.901012,0.851962,0.870724


[I 2025-03-22 12:53:25,977] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0016142490515864152, 'weight_decay': 0.0, 'warmup_steps': 26, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9054,0.675575,0.87901,0.868548,0.823158,0.838071
2,0.1551,0.591792,0.897342,0.890526,0.855229,0.869088
3,0.1068,0.598286,0.894592,0.899003,0.853407,0.870004
4,0.0862,0.595604,0.892759,0.88926,0.860232,0.872005
5,0.0767,0.57765,0.896425,0.890346,0.853358,0.86812
6,0.07,0.555313,0.896425,0.887382,0.84423,0.86057
7,0.0645,0.551869,0.899175,0.891625,0.856206,0.87017
8,0.0589,0.551733,0.896425,0.888181,0.845267,0.861464
9,0.0552,0.551316,0.893676,0.887645,0.85217,0.86625
10,0.0522,0.539209,0.898258,0.887569,0.836925,0.855055


[I 2025-03-22 12:55:25,144] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00044789989803155166, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4045,0.996866,0.839597,0.701602,0.719677,0.708351
2,0.362,0.758292,0.862511,0.876475,0.807608,0.828976
3,0.1968,0.698555,0.88176,0.89287,0.842944,0.85939
4,0.1406,0.697257,0.87901,0.889973,0.839593,0.857986
5,0.1129,0.643578,0.888176,0.896127,0.847441,0.865753
6,0.0979,0.626725,0.894592,0.901227,0.852384,0.870983
7,0.0884,0.610926,0.890009,0.897123,0.849378,0.867482
8,0.0811,0.630775,0.889093,0.895026,0.848582,0.865563
9,0.0767,0.611429,0.892759,0.899291,0.851172,0.869317
10,0.0712,0.598728,0.898258,0.904543,0.85534,0.874193


[I 2025-03-22 12:58:40,182] Trial 46 finished with value: 0.8712978743913795 and parameters: {'learning_rate': 0.00044789989803155166, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.0, 'temperature': 6.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 47 with params: {'learning_rate': 5.232252858049981e-05, 'weight_decay': 0.002, 'warmup_steps': 3, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9784,2.382341,0.606783,0.553797,0.500634,0.508146
2,1.8132,1.757214,0.732356,0.617848,0.62382,0.619333
3,1.2913,1.512958,0.759853,0.637724,0.651102,0.64233
4,1.0304,1.355912,0.788268,0.663328,0.672778,0.6676
5,0.8689,1.272858,0.8011,0.671891,0.685639,0.677988


[I 2025-03-22 12:59:41,853] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0011982404501382205, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.001,0.727661,0.875344,0.834571,0.8019,0.811496
2,0.1772,0.615395,0.891842,0.886431,0.850879,0.863955
3,0.1146,0.642463,0.88176,0.889366,0.843885,0.859446
4,0.0933,0.618201,0.889093,0.895388,0.849042,0.865749
5,0.0825,0.586211,0.893676,0.88863,0.852094,0.866706
6,0.0731,0.542673,0.903758,0.908452,0.860125,0.878486
7,0.0677,0.552344,0.901008,0.896235,0.866733,0.879187
8,0.0626,0.549076,0.901008,0.905117,0.858804,0.876
9,0.0587,0.55428,0.899175,0.893654,0.857006,0.871731
10,0.0559,0.545392,0.901925,0.897696,0.867844,0.880522


[I 2025-03-22 13:03:15,896] Trial 48 finished with value: 0.8823675907887741 and parameters: {'learning_rate': 0.0011982404501382205, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 49 with params: {'learning_rate': 0.0038564400576857347, 'weight_decay': 0.003, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6975,0.660696,0.885426,0.876856,0.837882,0.850699
2,0.13,0.555676,0.904675,0.89429,0.851797,0.867671
3,0.0946,0.569865,0.902841,0.894997,0.859274,0.873367
4,0.0793,0.562855,0.902841,0.891507,0.841173,0.858829
5,0.0688,0.553241,0.901925,0.904447,0.840379,0.861323
6,0.0627,0.537353,0.904675,0.906284,0.842113,0.863257
7,0.0577,0.52356,0.912007,0.913483,0.858149,0.877279
8,0.0533,0.520181,0.915674,0.918006,0.870185,0.888072
9,0.0505,0.530365,0.907424,0.911673,0.862314,0.880795
10,0.0477,0.538071,0.910174,0.913406,0.864813,0.883107


[I 2025-03-22 13:06:27,782] Trial 49 finished with value: 0.8828281452932502 and parameters: {'learning_rate': 0.0038564400576857347, 'weight_decay': 0.003, 'warmup_steps': 27, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 50 with params: {'learning_rate': 0.0008906343565525657, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1035,0.771139,0.872594,0.841053,0.763467,0.768565
2,0.2111,0.647247,0.883593,0.893886,0.844571,0.861663
3,0.1277,0.644197,0.888176,0.882751,0.848182,0.86064
4,0.1029,0.614962,0.892759,0.899056,0.851172,0.869096
5,0.0876,0.623334,0.894592,0.89022,0.86223,0.873798
6,0.0792,0.570443,0.898258,0.905248,0.864428,0.880918
7,0.0731,0.576668,0.892759,0.900816,0.860586,0.876776
8,0.0668,0.582522,0.896425,0.902328,0.863771,0.878985
9,0.0633,0.578772,0.895509,0.903449,0.86283,0.879208
10,0.0596,0.57122,0.898258,0.906117,0.864516,0.881402


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 13:10:27,995] Trial 50 finished with value: 0.880532233272645 and parameters: {'learning_rate': 0.0008906343565525657, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 6.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 51 with params: {'learning_rate': 0.004401974552010397, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6629,0.677226,0.887259,0.876388,0.82956,0.844769
2,0.1267,0.599059,0.895509,0.876844,0.855418,0.863172
3,0.0929,0.606472,0.898258,0.889338,0.846644,0.862104
4,0.0761,0.58642,0.902841,0.89295,0.850529,0.866077
5,0.0679,0.535046,0.908341,0.896372,0.854292,0.869974
6,0.0613,0.537171,0.912007,0.897116,0.848461,0.865382
7,0.0566,0.544635,0.905591,0.891365,0.843702,0.86009
8,0.0534,0.538469,0.904675,0.890606,0.84297,0.859202
9,0.0497,0.536901,0.901925,0.888811,0.84021,0.857064
10,0.0471,0.531309,0.909258,0.898341,0.855506,0.87152


[I 2025-03-22 13:12:34,736] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0006200904127673509, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2642,0.897819,0.855179,0.716714,0.731724,0.721806
2,0.273,0.690907,0.879927,0.892315,0.840857,0.859538
3,0.1547,0.698635,0.87901,0.888382,0.84102,0.856865
4,0.1176,0.659089,0.887259,0.893964,0.846677,0.864303
5,0.0985,0.627134,0.890926,0.886573,0.859108,0.87033
6,0.0869,0.599216,0.887259,0.895662,0.855694,0.871766
7,0.0791,0.591463,0.890926,0.898069,0.84955,0.868061
8,0.0732,0.600855,0.893676,0.900744,0.861321,0.876893
9,0.0694,0.590252,0.891842,0.897732,0.850492,0.868257
10,0.065,0.58259,0.899175,0.906639,0.864928,0.881859


[I 2025-03-22 13:16:07,154] Trial 52 finished with value: 0.873275022171374 and parameters: {'learning_rate': 0.0006200904127673509, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.2, 'temperature': 6.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 53 with params: {'learning_rate': 0.0029957507655179847, 'weight_decay': 0.001, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7457,0.669571,0.878093,0.882612,0.822733,0.841229
2,0.1351,0.621608,0.88451,0.88865,0.837626,0.854189
3,0.0971,0.569082,0.901008,0.904663,0.857663,0.875155
4,0.0794,0.609737,0.894592,0.900685,0.853037,0.870166
5,0.0712,0.594989,0.900092,0.901784,0.839497,0.858691


[I 2025-03-22 13:17:23,586] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0012950344145219084, 'weight_decay': 0.0, 'warmup_steps': 30, 'lambda_param': 0.1, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.976,0.72621,0.880843,0.861764,0.833409,0.842524
2,0.1707,0.584645,0.894592,0.899855,0.852688,0.870062
3,0.1111,0.619204,0.88451,0.891252,0.845523,0.861456
4,0.0908,0.59147,0.891842,0.898923,0.85135,0.868488
5,0.0806,0.584617,0.894592,0.900358,0.852913,0.870488
6,0.0725,0.548451,0.903758,0.908486,0.860073,0.878405
7,0.0659,0.546233,0.901008,0.893663,0.857857,0.872078
8,0.0617,0.546495,0.904675,0.908225,0.8607,0.878536
9,0.0577,0.559221,0.898258,0.903623,0.855408,0.873477
10,0.0542,0.556477,0.898258,0.892328,0.85561,0.870279


[I 2025-03-22 13:19:52,054] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.0006705354779334944, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2303,0.823059,0.865261,0.72417,0.738578,0.730371
2,0.2546,0.674159,0.885426,0.896705,0.844738,0.864154
3,0.1475,0.669869,0.879927,0.88895,0.841991,0.857979
4,0.1127,0.635638,0.894592,0.901053,0.852504,0.870884
5,0.095,0.613367,0.893676,0.887771,0.851985,0.86616
6,0.0858,0.594139,0.889093,0.895145,0.84789,0.865734
7,0.0779,0.590391,0.888176,0.895416,0.847251,0.865534
8,0.0716,0.593434,0.890009,0.884302,0.849124,0.862935
9,0.0681,0.589508,0.888176,0.89635,0.847467,0.866086
10,0.0638,0.577007,0.891842,0.898395,0.850305,0.86859


[I 2025-03-22 13:22:18,139] Trial 55 pruned. 


Trial 56 with params: {'learning_rate': 0.0014949468406953378, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9171,0.739834,0.880843,0.862814,0.833941,0.843448
2,0.1637,0.598616,0.888176,0.895363,0.847754,0.865433
3,0.1095,0.624158,0.885426,0.89198,0.846656,0.862497
4,0.0868,0.593117,0.894592,0.901831,0.862773,0.877886
5,0.0786,0.566978,0.899175,0.903757,0.847165,0.867254
6,0.0706,0.547122,0.899175,0.903811,0.847028,0.867318
7,0.0647,0.533033,0.906508,0.897744,0.86226,0.87617
8,0.0596,0.531183,0.901008,0.891097,0.849131,0.864744
9,0.056,0.521782,0.903758,0.896129,0.860497,0.874633
10,0.053,0.530663,0.901925,0.895987,0.858796,0.87384


[I 2025-03-22 13:26:29,612] Trial 56 finished with value: 0.8682523986519878 and parameters: {'learning_rate': 0.0014949468406953378, 'weight_decay': 0.007, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 57 with params: {'learning_rate': 0.0014537196875734418, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9399,0.691722,0.879927,0.885586,0.832174,0.850157
2,0.1654,0.577331,0.896425,0.901827,0.854603,0.872004
3,0.1089,0.635807,0.880843,0.88733,0.843323,0.857939
4,0.0889,0.603799,0.891842,0.886252,0.850609,0.864404
5,0.0791,0.573273,0.894592,0.888932,0.852397,0.866832
6,0.0705,0.53861,0.901008,0.894607,0.857297,0.872365
7,0.0649,0.548733,0.899175,0.892349,0.856797,0.870568
8,0.0605,0.555622,0.900092,0.903671,0.858057,0.8748
9,0.0564,0.555231,0.897342,0.890271,0.855256,0.86879
10,0.0531,0.550199,0.900092,0.892955,0.857463,0.871561


[I 2025-03-22 13:30:21,066] Trial 57 finished with value: 0.8714110788639622 and parameters: {'learning_rate': 0.0014537196875734418, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 58 with params: {'learning_rate': 0.0015280637168032915, 'weight_decay': 0.003, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9146,0.737276,0.87626,0.877872,0.811508,0.829293
2,0.1616,0.589955,0.887259,0.895306,0.846625,0.865087
3,0.1067,0.630212,0.886343,0.892322,0.846692,0.862798
4,0.088,0.601205,0.890009,0.887687,0.858213,0.870043
5,0.0791,0.590169,0.889093,0.88148,0.839316,0.85491


[I 2025-03-22 13:31:23,255] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.00445567512197694, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6742,0.676861,0.87901,0.861385,0.832185,0.842508
2,0.13,0.598596,0.888176,0.880949,0.839892,0.854549
3,0.0948,0.624369,0.891842,0.884244,0.852002,0.86355
4,0.0787,0.585717,0.900092,0.891361,0.848866,0.864006
5,0.0692,0.566589,0.901008,0.891547,0.849467,0.865028
6,0.0627,0.563525,0.904675,0.891441,0.843072,0.859776
7,0.0576,0.575244,0.898258,0.888891,0.847157,0.862618
8,0.0539,0.568369,0.902841,0.889334,0.842183,0.858375
9,0.0504,0.560372,0.901925,0.891542,0.850384,0.865465
10,0.0472,0.567551,0.898258,0.89002,0.847135,0.863046


[I 2025-03-22 13:33:59,280] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 0.004518525351339877, 'weight_decay': 0.006, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6515,0.638754,0.891842,0.886369,0.851615,0.86443
2,0.1275,0.593629,0.901925,0.892382,0.860225,0.872325
3,0.0948,0.587199,0.896425,0.889574,0.855105,0.868315
4,0.0788,0.59397,0.896425,0.888191,0.846519,0.861748
5,0.0691,0.567162,0.900092,0.894382,0.857186,0.871779
6,0.0633,0.558811,0.901008,0.89445,0.858365,0.872321
7,0.0587,0.57449,0.901925,0.907459,0.859942,0.877249
8,0.0546,0.563598,0.901925,0.906301,0.850279,0.869917
9,0.0511,0.56548,0.904675,0.894924,0.852902,0.868198
10,0.049,0.557442,0.904675,0.894465,0.852853,0.868048


[I 2025-03-22 13:36:39,365] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00027314446191377634, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6822,1.164133,0.80385,0.671416,0.689612,0.678946
2,0.5522,0.878907,0.847846,0.713419,0.72319,0.717033
3,0.3274,0.775976,0.873511,0.882716,0.827722,0.844599
4,0.2163,0.723268,0.882676,0.89229,0.842551,0.860587
5,0.1666,0.709244,0.879927,0.890494,0.839786,0.858996
6,0.1394,0.695472,0.88451,0.89729,0.84286,0.863441
7,0.1231,0.671744,0.883593,0.890386,0.844001,0.861181
8,0.1068,0.665454,0.889093,0.896131,0.848626,0.866501
9,0.0989,0.655188,0.887259,0.894719,0.8471,0.864877
10,0.0915,0.645258,0.889093,0.896755,0.847658,0.866366


[I 2025-03-22 13:39:37,396] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.0005177627782238657, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.5, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3232,0.905869,0.857929,0.717363,0.733635,0.724017
2,0.3173,0.72932,0.872594,0.887114,0.834866,0.853558
3,0.1765,0.682865,0.88176,0.890903,0.842819,0.859072
4,0.1288,0.671044,0.883593,0.891292,0.843317,0.861217
5,0.1077,0.623505,0.892759,0.899657,0.851168,0.869457
6,0.0932,0.609576,0.891842,0.898318,0.850368,0.86853
7,0.0843,0.600804,0.890009,0.898195,0.848862,0.867627
8,0.0776,0.61563,0.891842,0.898561,0.860403,0.875138
9,0.0737,0.603891,0.892759,0.899577,0.851207,0.869512
10,0.0687,0.588799,0.898258,0.906016,0.864247,0.881229


[I 2025-03-22 13:43:53,471] Trial 62 finished with value: 0.8697776769692411 and parameters: {'learning_rate': 0.0005177627782238657, 'weight_decay': 0.008, 'warmup_steps': 21, 'lambda_param': 0.5, 'temperature': 4.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 63 with params: {'learning_rate': 0.004940185973715077, 'weight_decay': 0.003, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6543,0.598046,0.895509,0.898737,0.835168,0.855912
2,0.1278,0.581571,0.897342,0.886933,0.846683,0.861074
3,0.0941,0.610157,0.894592,0.897906,0.853375,0.869215
4,0.0775,0.579578,0.905591,0.908235,0.852693,0.871999
5,0.0693,0.557081,0.908341,0.912223,0.863641,0.882061
6,0.0621,0.550029,0.902841,0.905793,0.850154,0.869682
7,0.0582,0.566709,0.901008,0.888331,0.839794,0.85656
8,0.0547,0.55147,0.904675,0.890696,0.842772,0.859325
9,0.0505,0.560976,0.902841,0.889658,0.841636,0.858024
10,0.0477,0.551447,0.903758,0.904316,0.841958,0.862147


[I 2025-03-22 13:46:18,281] Trial 63 pruned. 


Trial 64 with params: {'learning_rate': 0.0019103514492135156, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8673,0.723232,0.871677,0.85756,0.807942,0.82255
2,0.1473,0.636333,0.882676,0.877109,0.844872,0.856023
3,0.1017,0.608857,0.889093,0.882779,0.848882,0.861757
4,0.0844,0.598466,0.885426,0.88158,0.845884,0.859615
5,0.0757,0.577208,0.891842,0.894119,0.831729,0.851923


[I 2025-03-22 13:47:31,844] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.00453434226551975, 'weight_decay': 0.006, 'warmup_steps': 19, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6442,0.627423,0.891842,0.89606,0.842681,0.860258
2,0.1282,0.594968,0.897342,0.887509,0.846442,0.861014
3,0.0928,0.611272,0.891842,0.896927,0.851388,0.867753
4,0.0777,0.600765,0.898258,0.890197,0.846956,0.862391
5,0.0684,0.563458,0.898258,0.903087,0.84522,0.865596
6,0.0625,0.562081,0.903758,0.907738,0.860179,0.877919
7,0.058,0.557369,0.904675,0.906333,0.841849,0.862742
8,0.054,0.561882,0.903758,0.90448,0.841438,0.861802
9,0.0498,0.576599,0.899175,0.886877,0.83777,0.854504
10,0.0471,0.562376,0.900092,0.902193,0.83889,0.859451


[I 2025-03-22 13:50:00,043] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.0021261512658516876, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7645,0.666122,0.880843,0.885839,0.833449,0.850783
2,0.1427,0.589948,0.898258,0.891634,0.856228,0.869806
3,0.1023,0.61315,0.890009,0.874603,0.858556,0.864958
4,0.0842,0.587844,0.892759,0.879399,0.851401,0.862261
5,0.0739,0.551503,0.899175,0.892153,0.855995,0.870295
6,0.0669,0.538068,0.902841,0.895505,0.858987,0.873424
7,0.0611,0.543029,0.901925,0.894029,0.858811,0.872349
8,0.0561,0.535618,0.903758,0.892897,0.851527,0.866761
9,0.0532,0.545417,0.901925,0.893357,0.858868,0.872229
10,0.0505,0.543614,0.903758,0.895168,0.860316,0.873923


[I 2025-03-22 13:53:35,865] Trial 66 finished with value: 0.8746694168220889 and parameters: {'learning_rate': 0.0021261512658516876, 'weight_decay': 0.01, 'warmup_steps': 14, 'lambda_param': 0.8, 'temperature': 3.5}. Best is trial 18 with value: 0.8829925835070056.


Trial 67 with params: {'learning_rate': 0.00311001349416844, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6919,0.671979,0.888176,0.889245,0.830791,0.848629
2,0.1343,0.589904,0.891842,0.88224,0.842311,0.856605
3,0.0972,0.603094,0.892759,0.897087,0.85223,0.868528
4,0.0788,0.586436,0.899175,0.894078,0.856334,0.871186
5,0.0708,0.572936,0.904675,0.90777,0.851609,0.871268
6,0.0637,0.536554,0.907424,0.908155,0.844227,0.86535
7,0.0592,0.556016,0.901008,0.905383,0.848076,0.868491
8,0.0548,0.549061,0.904675,0.904936,0.84224,0.862567
9,0.0513,0.551782,0.900092,0.902377,0.838602,0.859617
10,0.0484,0.544743,0.905591,0.908022,0.852512,0.87204


[I 2025-03-22 13:57:14,953] Trial 67 finished with value: 0.8723118959383918 and parameters: {'learning_rate': 0.00311001349416844, 'weight_decay': 0.008, 'warmup_steps': 14, 'lambda_param': 0.5, 'temperature': 2.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 68 with params: {'learning_rate': 0.00025810659403101256, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 0.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7696,1.215231,0.79835,0.667227,0.684431,0.673301
2,0.5891,0.894698,0.851512,0.716706,0.725675,0.720166
3,0.3463,0.781264,0.865261,0.858022,0.775223,0.789589
4,0.229,0.733761,0.87626,0.886806,0.836832,0.855765
5,0.1726,0.710869,0.885426,0.894955,0.844458,0.863651
6,0.1448,0.683588,0.888176,0.897956,0.846256,0.866008
7,0.1253,0.673907,0.885426,0.891445,0.845362,0.862368
8,0.1108,0.671594,0.887259,0.892977,0.846986,0.864061
9,0.1023,0.656474,0.885426,0.893713,0.845149,0.86331
10,0.0945,0.64709,0.892759,0.900089,0.850522,0.869438


[I 2025-03-22 14:00:03,461] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 7.808255793137976e-05, 'weight_decay': 0.009000000000000001, 'warmup_steps': 20, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.7348,2.023373,0.673694,0.587895,0.567679,0.573291
2,1.3963,1.450269,0.777269,0.649877,0.665907,0.657006
3,0.9655,1.267029,0.807516,0.680884,0.68972,0.685032
4,0.751,1.172943,0.807516,0.679183,0.69055,0.682983
5,0.6149,1.072563,0.826764,0.692614,0.70736,0.699637


[I 2025-03-22 14:01:22,230] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.0044102254606172625, 'weight_decay': 0.008, 'warmup_steps': 5, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6191,0.639175,0.890009,0.893323,0.832278,0.850571
2,0.1269,0.596603,0.899175,0.889063,0.847722,0.862455
3,0.0943,0.615816,0.888176,0.880471,0.839607,0.854175
4,0.0785,0.587488,0.894592,0.885174,0.834153,0.852144
5,0.0701,0.571325,0.900092,0.888075,0.838216,0.855776


[I 2025-03-22 14:02:36,016] Trial 70 pruned. 


Trial 71 with params: {'learning_rate': 0.0010465388314913, 'weight_decay': 0.003, 'warmup_steps': 20, 'lambda_param': 0.8, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0137,0.77872,0.868011,0.87295,0.805382,0.82356
2,0.1887,0.629855,0.890009,0.897082,0.849584,0.8663
3,0.1203,0.648754,0.883593,0.890755,0.854191,0.867146
4,0.0978,0.620229,0.889093,0.896792,0.858319,0.87313
5,0.0848,0.610666,0.888176,0.897178,0.865955,0.878792
6,0.075,0.552477,0.901008,0.907379,0.866539,0.883013
7,0.0692,0.561892,0.895509,0.902401,0.862765,0.878424
8,0.0644,0.561434,0.897342,0.889805,0.855031,0.868727
9,0.0604,0.569773,0.893676,0.900977,0.861461,0.877255
10,0.0568,0.555875,0.901925,0.907468,0.868356,0.883977


[I 2025-03-22 14:06:14,457] Trial 71 finished with value: 0.8802993707742991 and parameters: {'learning_rate': 0.0010465388314913, 'weight_decay': 0.003, 'warmup_steps': 20, 'lambda_param': 0.8, 'temperature': 3.0}. Best is trial 18 with value: 0.8829925835070056.


Trial 72 with params: {'learning_rate': 0.0006363912129891186, 'weight_decay': 0.004, 'warmup_steps': 23, 'lambda_param': 0.8, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2267,0.84356,0.855179,0.718263,0.730226,0.722841
2,0.2702,0.679268,0.883593,0.895096,0.843313,0.862454
3,0.1525,0.663194,0.880843,0.889873,0.842622,0.85883
4,0.1169,0.638598,0.890926,0.897579,0.850222,0.86794
5,0.0983,0.626324,0.890926,0.884926,0.849807,0.863604


[I 2025-03-22 14:07:26,405] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.003519798614882059, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6853,0.662603,0.891842,0.89392,0.833233,0.852605
2,0.1293,0.593594,0.893676,0.896348,0.843557,0.861443
3,0.0945,0.605659,0.894592,0.899763,0.85311,0.870189
4,0.0785,0.572117,0.897342,0.903016,0.845634,0.865679
5,0.0689,0.555436,0.904675,0.906236,0.842322,0.863093
6,0.0628,0.546466,0.901925,0.906103,0.849271,0.869469
7,0.0577,0.557734,0.902841,0.906597,0.850889,0.870113
8,0.0537,0.563168,0.895509,0.89787,0.836319,0.856102
9,0.0505,0.559708,0.901008,0.904728,0.849343,0.868459
10,0.0478,0.552457,0.902841,0.906041,0.85069,0.86978


[I 2025-03-22 14:09:49,477] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0002952710041203322, 'weight_decay': 0.01, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6782,1.112945,0.817599,0.68386,0.699963,0.69041
2,0.5071,0.835913,0.857012,0.719939,0.730924,0.723977
3,0.2892,0.736383,0.877177,0.885933,0.83907,0.855322
4,0.1922,0.711092,0.88176,0.892044,0.841329,0.860219
5,0.1496,0.67169,0.886343,0.895854,0.844298,0.864237


[I 2025-03-22 14:11:03,130] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0012324506314641429, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9648,0.724048,0.879927,0.884855,0.832461,0.84947
2,0.1732,0.602308,0.895509,0.901293,0.853465,0.870749
3,0.1128,0.616158,0.889093,0.894849,0.848245,0.865014
4,0.0917,0.59637,0.889093,0.894681,0.849062,0.865491
5,0.0812,0.579679,0.891842,0.898908,0.859466,0.875103
6,0.0726,0.552619,0.908341,0.911196,0.863407,0.881531
7,0.0678,0.556433,0.901008,0.892782,0.858155,0.871212
8,0.0616,0.543482,0.901925,0.905505,0.858621,0.876176
9,0.0581,0.55039,0.904675,0.908265,0.860664,0.87869
10,0.0557,0.542638,0.907424,0.912948,0.871551,0.888368


[I 2025-03-22 14:15:05,787] Trial 75 finished with value: 0.8842639810054393 and parameters: {'learning_rate': 0.0012324506314641429, 'weight_decay': 0.001, 'warmup_steps': 24, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 76 with params: {'learning_rate': 0.0013507202559664845, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9491,0.71937,0.882676,0.874122,0.834815,0.847917
2,0.1698,0.591366,0.895509,0.900849,0.853901,0.870985
3,0.1101,0.637962,0.88451,0.891317,0.845903,0.861385
4,0.0899,0.611756,0.890009,0.883802,0.850165,0.863039
5,0.0806,0.579729,0.897342,0.903705,0.85453,0.873156
6,0.0718,0.538881,0.902841,0.907777,0.859116,0.877415
7,0.0661,0.55113,0.903758,0.896512,0.859623,0.874453
8,0.0612,0.542184,0.902841,0.907119,0.859793,0.8776
9,0.0571,0.563616,0.898258,0.904024,0.85637,0.874402
10,0.0548,0.547133,0.904675,0.897433,0.860597,0.875467


[I 2025-03-22 14:18:39,739] Trial 76 finished with value: 0.8769068213166374 and parameters: {'learning_rate': 0.0013507202559664845, 'weight_decay': 0.004, 'warmup_steps': 28, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 77 with params: {'learning_rate': 0.0007775785524400827, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1306,0.843615,0.861595,0.889153,0.745521,0.745357
2,0.2291,0.634917,0.890009,0.897873,0.849227,0.867056
3,0.1361,0.647665,0.879927,0.888847,0.850714,0.864392
4,0.1074,0.609567,0.895509,0.902786,0.862393,0.878446
5,0.0909,0.618881,0.894592,0.900803,0.852413,0.870673
6,0.0821,0.57163,0.893676,0.901581,0.860812,0.877251
7,0.074,0.577808,0.890926,0.899338,0.858688,0.875122
8,0.0692,0.581568,0.894592,0.901201,0.861953,0.877448
9,0.0655,0.568809,0.891842,0.898639,0.850471,0.868678
10,0.0615,0.567215,0.899175,0.905394,0.864865,0.881214


[I 2025-03-22 14:22:52,086] Trial 77 finished with value: 0.8785651803567155 and parameters: {'learning_rate': 0.0007775785524400827, 'weight_decay': 0.0, 'warmup_steps': 20, 'lambda_param': 0.9, 'temperature': 2.0}. Best is trial 75 with value: 0.8842639810054393.


Trial 78 with params: {'learning_rate': 0.0014144487021507152, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9322,0.720905,0.878093,0.870784,0.831301,0.84454
2,0.1632,0.605834,0.886343,0.893012,0.846512,0.863635
3,0.1095,0.645412,0.88176,0.887818,0.843213,0.858629
4,0.0904,0.595942,0.892759,0.886873,0.852225,0.865567
5,0.0796,0.581173,0.894592,0.899843,0.852556,0.870182
6,0.0714,0.548202,0.904675,0.897079,0.860594,0.875257
7,0.0654,0.533237,0.901925,0.894926,0.858295,0.872971
8,0.0607,0.535852,0.902841,0.895064,0.860156,0.873947
9,0.0561,0.543155,0.898258,0.892421,0.855636,0.870472
10,0.0534,0.535965,0.902841,0.898176,0.868106,0.880911


[I 2025-03-22 14:26:32,110] Trial 78 finished with value: 0.878904500955275 and parameters: {'learning_rate': 0.0014144487021507152, 'weight_decay': 0.001, 'warmup_steps': 26, 'lambda_param': 0.7000000000000001, 'temperature': 2.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 79 with params: {'learning_rate': 0.0022443707209050664, 'weight_decay': 0.003, 'warmup_steps': 19, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7763,0.665486,0.880843,0.884281,0.824129,0.843208
2,0.1412,0.62835,0.883593,0.878211,0.846173,0.8573
3,0.1008,0.599362,0.892759,0.898833,0.862521,0.875731
4,0.0821,0.569445,0.892759,0.889129,0.861218,0.872558
5,0.0727,0.590794,0.895509,0.901129,0.854563,0.871727
6,0.0666,0.553823,0.896425,0.901276,0.854179,0.871756
7,0.0609,0.570295,0.891842,0.897923,0.850868,0.867726
8,0.0562,0.551456,0.896425,0.900836,0.855494,0.872066
9,0.0528,0.568663,0.896425,0.901421,0.854982,0.872185
10,0.0507,0.545655,0.892759,0.898422,0.851803,0.869124


[I 2025-03-22 14:29:00,548] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.002201917786708791, 'weight_decay': 0.002, 'warmup_steps': 27, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8146,0.717536,0.875344,0.875476,0.811831,0.828689
2,0.1437,0.656975,0.878093,0.86169,0.840981,0.84776
3,0.1021,0.613099,0.888176,0.883308,0.856499,0.867403
4,0.0826,0.605628,0.895509,0.889566,0.863139,0.873256
5,0.0736,0.58005,0.895509,0.888008,0.853315,0.866797
6,0.067,0.552171,0.896425,0.890432,0.853357,0.868278
7,0.0615,0.560091,0.895509,0.887885,0.852952,0.866755
8,0.0575,0.55231,0.894592,0.887792,0.852337,0.866362
9,0.053,0.556243,0.894592,0.887837,0.85228,0.86623
10,0.0498,0.553343,0.896425,0.889121,0.854381,0.868051


[I 2025-03-22 14:31:25,820] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.0027330407239476552, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.9, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7527,0.702359,0.875344,0.880672,0.82073,0.838721
2,0.1363,0.64715,0.88176,0.885808,0.835862,0.850808
3,0.1001,0.585966,0.895509,0.887853,0.853599,0.866906
4,0.0817,0.59706,0.893676,0.888218,0.852986,0.866415
5,0.0727,0.589513,0.895509,0.900384,0.84348,0.863728


[I 2025-03-22 14:32:36,198] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0008543060515153923, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1192,0.813556,0.864345,0.890411,0.748863,0.747626
2,0.2153,0.642229,0.892759,0.900671,0.851467,0.869202
3,0.13,0.629737,0.891842,0.897119,0.851932,0.867839
4,0.1025,0.619701,0.890926,0.898651,0.859426,0.874866
5,0.0881,0.627227,0.895509,0.903599,0.862164,0.878911
6,0.0795,0.587837,0.890926,0.898787,0.858653,0.874765
7,0.0721,0.58902,0.891842,0.900266,0.859985,0.876178
8,0.0669,0.58041,0.895509,0.902582,0.862857,0.878717
9,0.0634,0.580329,0.892759,0.90051,0.860512,0.876527
10,0.0598,0.570109,0.896425,0.903687,0.862472,0.879187


[I 2025-03-22 14:36:27,520] Trial 82 finished with value: 0.8774975986838122 and parameters: {'learning_rate': 0.0008543060515153923, 'weight_decay': 0.001, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 83 with params: {'learning_rate': 0.0037338449223035712, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7182,0.650213,0.887259,0.889626,0.83044,0.848594
2,0.1314,0.597627,0.894592,0.897447,0.844582,0.862108
3,0.0942,0.607801,0.892759,0.897602,0.852143,0.868534
4,0.0782,0.576579,0.899175,0.904491,0.856913,0.874366
5,0.0693,0.581329,0.896425,0.902081,0.854507,0.871966
6,0.0622,0.556647,0.896425,0.900548,0.845109,0.864604
7,0.057,0.554339,0.902841,0.905465,0.850477,0.869588
8,0.053,0.565654,0.898258,0.902274,0.856661,0.872934
9,0.0493,0.572536,0.903758,0.907426,0.860446,0.877726
10,0.047,0.565338,0.903758,0.907418,0.860181,0.877651


[I 2025-03-22 14:40:03,594] Trial 83 finished with value: 0.8600219055497725 and parameters: {'learning_rate': 0.0037338449223035712, 'weight_decay': 0.006, 'warmup_steps': 30, 'lambda_param': 0.4, 'temperature': 4.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 84 with params: {'learning_rate': 0.0017120069408213664, 'weight_decay': 0.001, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8633,0.687003,0.873511,0.878228,0.818955,0.837078
2,0.1528,0.620417,0.890009,0.895953,0.850559,0.866005
3,0.1066,0.605702,0.891842,0.88625,0.851382,0.864071
4,0.0862,0.598706,0.894592,0.887647,0.853665,0.866294
5,0.0771,0.590981,0.894592,0.887685,0.853069,0.866513
6,0.0696,0.552979,0.900092,0.89496,0.866124,0.87804
7,0.0638,0.557732,0.900092,0.895036,0.866959,0.878422
8,0.0582,0.559808,0.900092,0.892655,0.85794,0.871476
9,0.0545,0.563138,0.901008,0.893135,0.858836,0.872098
10,0.0513,0.55988,0.900092,0.895217,0.866746,0.878531


[I 2025-03-22 14:43:48,305] Trial 84 finished with value: 0.8792789501599606 and parameters: {'learning_rate': 0.0017120069408213664, 'weight_decay': 0.001, 'warmup_steps': 23, 'lambda_param': 0.5, 'temperature': 2.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 85 with params: {'learning_rate': 0.0004261486710117755, 'weight_decay': 0.003, 'warmup_steps': 7, 'lambda_param': 0.8, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3904,0.982704,0.839597,0.700738,0.718582,0.708053
2,0.3799,0.771597,0.861595,0.859847,0.807798,0.82397
3,0.2084,0.693804,0.885426,0.882179,0.846694,0.858711
4,0.1458,0.701048,0.88176,0.892123,0.842795,0.860331
5,0.1177,0.652352,0.889093,0.895792,0.848644,0.866237
6,0.1018,0.637395,0.883593,0.892836,0.843793,0.862575
7,0.091,0.61685,0.891842,0.89886,0.850443,0.868808
8,0.083,0.622625,0.890009,0.897028,0.849104,0.866875
9,0.0788,0.612593,0.891842,0.898858,0.850541,0.868787
10,0.0732,0.604817,0.894592,0.901463,0.852713,0.871339


[I 2025-03-22 14:46:12,691] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0002681159956916346, 'weight_decay': 0.003, 'warmup_steps': 22, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.728,1.167062,0.80385,0.671389,0.688264,0.677933
2,0.5587,0.883916,0.851512,0.716216,0.726028,0.720066
3,0.3286,0.771337,0.872594,0.880682,0.826666,0.843755
4,0.2177,0.737974,0.87626,0.888253,0.837116,0.855784
5,0.1641,0.706707,0.880843,0.892493,0.8409,0.860665
6,0.1386,0.671058,0.885426,0.895134,0.844207,0.863838
7,0.1204,0.66635,0.890009,0.89613,0.849207,0.86652
8,0.1072,0.66545,0.887259,0.893594,0.846708,0.86427
9,0.099,0.655544,0.888176,0.894779,0.848139,0.865232
10,0.0917,0.644393,0.887259,0.895417,0.846756,0.865246


[I 2025-03-22 14:48:52,117] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0011926901736563048, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9973,0.743966,0.875344,0.850908,0.81082,0.822568
2,0.1753,0.614223,0.890009,0.897255,0.8495,0.866461
3,0.1137,0.621319,0.888176,0.895134,0.848489,0.865075
4,0.0956,0.603761,0.891842,0.89984,0.860045,0.875423
5,0.0831,0.566875,0.904675,0.910307,0.869473,0.885785
6,0.0736,0.5361,0.901925,0.908697,0.867206,0.88404
7,0.0675,0.541931,0.906508,0.900658,0.871726,0.883907
8,0.063,0.547787,0.900092,0.904496,0.858097,0.875455
9,0.0584,0.550902,0.901925,0.907527,0.858443,0.877244
10,0.0558,0.547047,0.901008,0.908762,0.866781,0.883866


[I 2025-03-22 14:52:36,831] Trial 87 finished with value: 0.8828734155365686 and parameters: {'learning_rate': 0.0011926901736563048, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 88 with params: {'learning_rate': 0.001036130319734984, 'weight_decay': 0.006, 'warmup_steps': 24, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0321,0.769281,0.867094,0.839119,0.795479,0.807002
2,0.1898,0.637584,0.890009,0.885741,0.849867,0.863061
3,0.1209,0.637146,0.88451,0.891023,0.845906,0.861607
4,0.0995,0.613778,0.891842,0.896333,0.850957,0.867158
5,0.0872,0.610932,0.891842,0.88751,0.850775,0.865456


[I 2025-03-22 14:53:50,450] Trial 88 pruned. 


Trial 89 with params: {'learning_rate': 0.0011837859921504366, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.2, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9945,0.731815,0.87626,0.858763,0.830141,0.839206
2,0.1782,0.611102,0.894592,0.900883,0.853691,0.870369
3,0.1157,0.625719,0.883593,0.889552,0.8458,0.860375
4,0.0933,0.595443,0.894592,0.900031,0.853619,0.870398
5,0.0826,0.593872,0.897342,0.903124,0.85485,0.873114
6,0.0747,0.557419,0.901925,0.9066,0.858651,0.876863
7,0.0686,0.563573,0.901008,0.893444,0.858235,0.872249
8,0.0634,0.552161,0.901008,0.894012,0.857698,0.872269
9,0.0591,0.555027,0.899175,0.904791,0.856413,0.874773
10,0.0559,0.554195,0.900092,0.89357,0.856718,0.87161


[I 2025-03-22 14:56:26,707] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 8.506725454664425e-05, 'weight_decay': 0.0, 'warmup_steps': 11, 'lambda_param': 0.9, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6412,1.946716,0.689276,0.595635,0.580037,0.583999
2,1.3303,1.409309,0.777269,0.650567,0.665201,0.65652
3,0.9179,1.231397,0.812099,0.684492,0.693688,0.688675
4,0.7127,1.143367,0.815765,0.684779,0.697472,0.689611
5,0.5779,1.042549,0.826764,0.692631,0.706662,0.699414
6,0.4889,0.988706,0.836847,0.700709,0.714985,0.707515
7,0.4256,0.958918,0.843263,0.705947,0.720442,0.712841
8,0.3722,0.939433,0.842346,0.705767,0.720006,0.712042
9,0.3329,0.911021,0.846013,0.71004,0.721349,0.715552
10,0.305,0.897746,0.857012,0.882848,0.742184,0.742028


[I 2025-03-22 14:59:06,825] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0038708383146785754, 'weight_decay': 0.007, 'warmup_steps': 12, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6541,0.666338,0.88176,0.871867,0.826033,0.840372
2,0.1312,0.570977,0.894592,0.895653,0.845105,0.861706
3,0.0952,0.540677,0.910174,0.909605,0.847479,0.867305
4,0.0796,0.556882,0.904675,0.9083,0.851201,0.871516
5,0.0696,0.54686,0.904675,0.906027,0.842253,0.863331


[I 2025-03-22 15:00:24,707] Trial 91 pruned. 


Trial 92 with params: {'learning_rate': 0.0016009185554428393, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9012,0.725166,0.870761,0.853097,0.825693,0.83455
2,0.1563,0.621807,0.887259,0.894002,0.847868,0.864741
3,0.1067,0.665069,0.880843,0.876994,0.852161,0.860742
4,0.0875,0.602029,0.890009,0.886871,0.858262,0.869669
5,0.0767,0.562014,0.896425,0.901064,0.853494,0.871185
6,0.0688,0.536187,0.897342,0.891005,0.85485,0.869342
7,0.0632,0.535624,0.901925,0.896665,0.867525,0.879628
8,0.0587,0.544396,0.899175,0.892244,0.856915,0.870941
9,0.0547,0.548584,0.897342,0.893702,0.864837,0.87699
10,0.052,0.53477,0.901008,0.896068,0.867086,0.879323


[I 2025-03-22 15:04:22,784] Trial 92 finished with value: 0.8792361851260911 and parameters: {'learning_rate': 0.0016009185554428393, 'weight_decay': 0.005, 'warmup_steps': 29, 'lambda_param': 0.5, 'temperature': 6.0}. Best is trial 75 with value: 0.8842639810054393.


Trial 93 with params: {'learning_rate': 0.00039897021548598437, 'weight_decay': 0.006, 'warmup_steps': 25, 'lambda_param': 0.4, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4843,0.985216,0.843263,0.703874,0.720771,0.711465
2,0.3944,0.782329,0.861595,0.873267,0.79844,0.818973
3,0.2185,0.688724,0.887259,0.894147,0.847418,0.863535
4,0.1521,0.715202,0.874427,0.886809,0.836278,0.854515
5,0.1218,0.645174,0.889093,0.897744,0.847272,0.866722
6,0.1045,0.645341,0.890926,0.898369,0.849753,0.868275
7,0.094,0.622328,0.892759,0.899598,0.850373,0.86914
8,0.0856,0.633641,0.890926,0.896377,0.850184,0.867195
9,0.0805,0.619055,0.894592,0.901228,0.852519,0.87096
10,0.0749,0.609228,0.897342,0.903614,0.854969,0.87351


[I 2025-03-22 15:07:58,638] Trial 93 finished with value: 0.8691376557627554 and parameters: {'learning_rate': 0.00039897021548598437, 'weight_decay': 0.006, 'warmup_steps': 25, 'lambda_param': 0.4, 'temperature': 6.0}. Best is trial 75 with value: 0.8842639810054393.


Trial 94 with params: {'learning_rate': 0.001595659394753807, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9026,0.729375,0.874427,0.856884,0.828603,0.838208
2,0.1584,0.607631,0.887259,0.873906,0.846531,0.858021
3,0.1061,0.627623,0.887259,0.893052,0.848052,0.863619
4,0.0877,0.599137,0.885426,0.884285,0.855003,0.866828
5,0.0778,0.587055,0.890926,0.896573,0.84976,0.867142
6,0.0698,0.547459,0.906508,0.909773,0.862069,0.880008
7,0.0638,0.540348,0.900092,0.893203,0.85734,0.871562
8,0.0589,0.548307,0.900092,0.904007,0.857701,0.874805
9,0.0552,0.544923,0.898258,0.891969,0.856025,0.870286
10,0.0522,0.535721,0.901008,0.894237,0.85813,0.87262


[I 2025-03-22 15:11:45,600] Trial 94 finished with value: 0.8664223259032443 and parameters: {'learning_rate': 0.001595659394753807, 'weight_decay': 0.005, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 6.5}. Best is trial 75 with value: 0.8842639810054393.


Trial 95 with params: {'learning_rate': 0.0011090973581539035, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0298,0.732587,0.879927,0.871129,0.823368,0.838929
2,0.1816,0.617881,0.894592,0.901923,0.853567,0.870372
3,0.1185,0.643269,0.886343,0.892749,0.846932,0.862659
4,0.0951,0.584878,0.898258,0.905126,0.864863,0.880735
5,0.0846,0.600011,0.897342,0.903622,0.854559,0.873204
6,0.0754,0.533639,0.901925,0.906438,0.858346,0.876551
7,0.0685,0.544965,0.903758,0.898902,0.868636,0.88151
8,0.0635,0.549965,0.901008,0.893439,0.857649,0.871942
9,0.0601,0.548617,0.903758,0.908514,0.86006,0.878387
10,0.057,0.53157,0.905591,0.911197,0.870217,0.886824


[I 2025-03-22 15:15:38,143] Trial 95 finished with value: 0.8854974599379749 and parameters: {'learning_rate': 0.0011090973581539035, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 5.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 96 with params: {'learning_rate': 5.399635979922363e-05, 'weight_decay': 0.0, 'warmup_steps': 25, 'lambda_param': 0.30000000000000004, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9953,2.367922,0.608616,0.551718,0.502375,0.510126
2,1.7858,1.737939,0.728689,0.612516,0.622935,0.616673
3,1.2651,1.487235,0.768103,0.644541,0.656938,0.649993
4,1.0068,1.337153,0.796517,0.670828,0.680676,0.675225
5,0.8455,1.252474,0.809349,0.678804,0.692399,0.684977
6,0.7329,1.177163,0.817599,0.686512,0.698183,0.69215
7,0.6508,1.138057,0.819432,0.687232,0.700387,0.693626
8,0.5861,1.112678,0.824015,0.692228,0.703372,0.697276
9,0.5383,1.079847,0.825848,0.692706,0.706473,0.699066
10,0.502,1.055336,0.824931,0.692098,0.704949,0.69835


[I 2025-03-22 15:18:14,490] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.0029227713813997857, 'weight_decay': 0.003, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7423,0.661075,0.885426,0.888575,0.828241,0.847099
2,0.134,0.648,0.87901,0.860452,0.83367,0.842278
3,0.0961,0.607663,0.893676,0.888884,0.862163,0.872634
4,0.0804,0.582519,0.894592,0.887458,0.853919,0.866262
5,0.0712,0.570374,0.898258,0.87516,0.837861,0.851505


[I 2025-03-22 15:19:31,525] Trial 97 pruned. 


Trial 98 with params: {'learning_rate': 0.0011149523881269648, 'weight_decay': 0.006, 'warmup_steps': 29, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0202,0.740155,0.875344,0.857201,0.802537,0.815814
2,0.1814,0.609734,0.888176,0.896862,0.848474,0.865861
3,0.1192,0.657675,0.87901,0.886402,0.842151,0.856389
4,0.0959,0.616566,0.886343,0.89192,0.846846,0.862939
5,0.0832,0.590889,0.895509,0.901442,0.853371,0.871421
6,0.0752,0.543118,0.902841,0.907731,0.858943,0.877543
7,0.0687,0.54786,0.899175,0.892632,0.856316,0.87086
8,0.0638,0.557914,0.899175,0.904373,0.856462,0.874518
9,0.06,0.554569,0.904675,0.909279,0.860435,0.879088
10,0.0563,0.553374,0.899175,0.905457,0.856095,0.87503


[I 2025-03-22 15:23:22,877] Trial 98 finished with value: 0.8759590261211012 and parameters: {'learning_rate': 0.0011149523881269648, 'weight_decay': 0.006, 'warmup_steps': 29, 'lambda_param': 0.2, 'temperature': 5.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 99 with params: {'learning_rate': 0.0017422533204379319, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7792,0.739177,0.869844,0.872742,0.815933,0.831827
2,0.1558,0.595711,0.897342,0.889429,0.854893,0.8682
3,0.1051,0.626753,0.885426,0.872108,0.855538,0.861006
4,0.0876,0.59217,0.894592,0.889857,0.863076,0.873545
5,0.0769,0.580827,0.896425,0.893256,0.862882,0.875522
6,0.07,0.560474,0.898258,0.894405,0.864525,0.877218
7,0.0645,0.565352,0.901008,0.897775,0.866674,0.879601
8,0.0598,0.564205,0.899175,0.895825,0.865551,0.878335
9,0.0559,0.576387,0.899175,0.895692,0.865586,0.878248
10,0.0529,0.560449,0.898258,0.89545,0.864138,0.87743


[I 2025-03-22 15:27:18,658] Trial 99 finished with value: 0.8750844266949 and parameters: {'learning_rate': 0.0017422533204379319, 'weight_decay': 0.0, 'warmup_steps': 3, 'lambda_param': 0.6000000000000001, 'temperature': 6.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 100 with params: {'learning_rate': 0.00026885910198952694, 'weight_decay': 0.008, 'warmup_steps': 30, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7371,1.17811,0.802933,0.671138,0.68814,0.677083
2,0.5602,0.868792,0.850596,0.714367,0.725141,0.718773
3,0.3281,0.770171,0.872594,0.880787,0.827046,0.843791
4,0.216,0.720713,0.88176,0.89107,0.841027,0.859865
5,0.1649,0.703228,0.882676,0.892464,0.84188,0.861084
6,0.1384,0.679996,0.887259,0.89772,0.84543,0.865519
7,0.1196,0.665357,0.886343,0.892771,0.846479,0.863499
8,0.1065,0.658812,0.890009,0.896209,0.849056,0.866774
9,0.098,0.64566,0.890009,0.896678,0.848827,0.866762
10,0.0906,0.638637,0.890926,0.898913,0.849436,0.868301


[I 2025-03-22 15:29:51,323] Trial 100 pruned. 


Trial 101 with params: {'learning_rate': 0.0002807761197078069, 'weight_decay': 0.005, 'warmup_steps': 31, 'lambda_param': 0.1, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7107,1.148555,0.811182,0.677934,0.694582,0.684258
2,0.5351,0.864085,0.854262,0.718507,0.728365,0.722197
3,0.3105,0.766402,0.877177,0.874094,0.840073,0.851257
4,0.2066,0.718259,0.878093,0.888173,0.839279,0.857046
5,0.1571,0.692556,0.882676,0.893414,0.842431,0.861912


[I 2025-03-22 15:31:07,207] Trial 101 pruned. 


Trial 102 with params: {'learning_rate': 0.004651210562327747, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.8, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.636,0.650555,0.888176,0.877278,0.830257,0.846132
2,0.1295,0.5936,0.892759,0.883243,0.842763,0.857386
3,0.095,0.576027,0.898258,0.890496,0.855914,0.869442
4,0.0785,0.597914,0.890009,0.895092,0.839944,0.85883
5,0.0703,0.574958,0.897342,0.902536,0.844977,0.865395
6,0.0635,0.569089,0.899175,0.903196,0.846935,0.866482
7,0.0575,0.555684,0.902841,0.906349,0.850015,0.869716
8,0.0545,0.564413,0.904675,0.906162,0.842479,0.863272
9,0.0509,0.570494,0.900092,0.892042,0.847935,0.864294
10,0.048,0.564644,0.901008,0.889522,0.839527,0.856962


[I 2025-03-22 15:33:38,475] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.004915308044593102, 'weight_decay': 0.01, 'warmup_steps': 12, 'lambda_param': 0.4, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6244,0.668684,0.890926,0.885066,0.852039,0.86296
2,0.1297,0.599292,0.895509,0.884753,0.845371,0.859103
3,0.0954,0.624145,0.892759,0.895553,0.843615,0.860669
4,0.0794,0.561432,0.903758,0.891213,0.841791,0.858946
5,0.07,0.556421,0.904675,0.891764,0.842528,0.859728


[I 2025-03-22 15:34:54,945] Trial 103 pruned. 


Trial 104 with params: {'learning_rate': 0.00112299093282574, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0094,0.79066,0.864345,0.850523,0.793487,0.807503
2,0.1823,0.618729,0.890926,0.897922,0.850369,0.867255
3,0.1174,0.638052,0.880843,0.888384,0.843504,0.858691
4,0.0953,0.593401,0.890926,0.897325,0.851266,0.867668
5,0.0845,0.578424,0.901925,0.906878,0.858219,0.87673
6,0.0747,0.563216,0.898258,0.903514,0.855456,0.873457
7,0.0684,0.560139,0.897342,0.903053,0.854594,0.873068
8,0.0635,0.552791,0.900092,0.904462,0.857472,0.874951
9,0.0594,0.558559,0.896425,0.902623,0.854521,0.872756
10,0.0561,0.557133,0.899175,0.904928,0.856225,0.874796


[I 2025-03-22 15:38:46,063] Trial 104 finished with value: 0.8808897020929392 and parameters: {'learning_rate': 0.00112299093282574, 'weight_decay': 0.004, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 105 with params: {'learning_rate': 0.000830378388476444, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1274,0.827165,0.865261,0.83689,0.75761,0.762856
2,0.2223,0.656174,0.88451,0.895219,0.845977,0.862778
3,0.1329,0.652924,0.880843,0.877607,0.852032,0.861252
4,0.1054,0.619054,0.888176,0.898512,0.856896,0.873501
5,0.0903,0.624652,0.888176,0.886198,0.856261,0.868859
6,0.08,0.5713,0.896425,0.902768,0.862561,0.878677
7,0.0732,0.587761,0.891842,0.899578,0.860069,0.875835
8,0.0687,0.581207,0.896425,0.902873,0.863753,0.879248
9,0.0648,0.577523,0.893676,0.901799,0.860838,0.87742
10,0.0607,0.564399,0.898258,0.904946,0.863881,0.880516


[I 2025-03-22 15:42:36,459] Trial 105 finished with value: 0.8791784531947736 and parameters: {'learning_rate': 0.000830378388476444, 'weight_decay': 0.007, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 4.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 106 with params: {'learning_rate': 0.000658881079298782, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1983,0.824719,0.861595,0.72179,0.735309,0.727569
2,0.2582,0.671461,0.885426,0.897037,0.844994,0.864388
3,0.1494,0.660288,0.882676,0.890754,0.84412,0.860177
4,0.1147,0.633673,0.891842,0.900009,0.85988,0.875928
5,0.0963,0.626351,0.892759,0.898318,0.851692,0.869051
6,0.0859,0.589903,0.891842,0.898417,0.850203,0.868555
7,0.0774,0.594005,0.890926,0.899861,0.858736,0.875395
8,0.0717,0.598777,0.889093,0.895391,0.849018,0.86613
9,0.0685,0.5805,0.892759,0.899712,0.851692,0.869919
10,0.0641,0.578931,0.897342,0.9035,0.854646,0.873314


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Sat Oct 12 13:56:14 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
[I 2025-03-22 15:46:32,937] Trial 106 finished with value: 0.8682852183600717 and parameters: {'learning_rate': 0.000658881079298782, 'weight_decay': 0.004, 'warmup_steps': 20, 'lambda_param': 0.4, 'temperature': 3.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 107 with params: {'learning_rate': 0.0010205154594901584, 'weight_decay': 0.002, 'warmup_steps': 26, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0407,0.737115,0.874427,0.855357,0.819321,0.831228
2,0.1931,0.650229,0.883593,0.893512,0.844882,0.861344
3,0.1225,0.637578,0.883593,0.89,0.845253,0.860553
4,0.0986,0.618195,0.889093,0.895984,0.849239,0.865937
5,0.0863,0.585544,0.890009,0.885423,0.84915,0.863623


[I 2025-03-22 15:48:22,410] Trial 107 pruned. 


Trial 108 with params: {'learning_rate': 0.0017253741675577194, 'weight_decay': 0.005, 'warmup_steps': 22, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8448,0.697456,0.87901,0.88402,0.832403,0.84918
2,0.1536,0.595532,0.891842,0.886178,0.851137,0.864394
3,0.1041,0.599171,0.885426,0.881638,0.856133,0.865739
4,0.0853,0.598274,0.890926,0.888665,0.859306,0.870776
5,0.076,0.57629,0.897342,0.880204,0.855247,0.865516


[I 2025-03-22 15:49:42,724] Trial 108 pruned. 


Trial 109 with params: {'learning_rate': 0.0006658225052399958, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2165,0.836049,0.857012,0.71972,0.731918,0.724289
2,0.26,0.660589,0.886343,0.896726,0.845642,0.864917
3,0.1491,0.646912,0.88176,0.890538,0.843275,0.859763
4,0.1162,0.634368,0.893676,0.899101,0.852303,0.869712
5,0.0962,0.62611,0.888176,0.882333,0.847682,0.861139
6,0.0855,0.583844,0.892759,0.899143,0.850911,0.86924
7,0.0781,0.579539,0.894592,0.902953,0.861699,0.878447
8,0.0718,0.584294,0.889093,0.883346,0.849036,0.862357
9,0.0686,0.581522,0.891842,0.89872,0.850346,0.868763
10,0.0642,0.575967,0.898258,0.904809,0.854999,0.87415


[I 2025-03-22 15:53:39,412] Trial 109 finished with value: 0.8720147098185301 and parameters: {'learning_rate': 0.0006658225052399958, 'weight_decay': 0.005, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 110 with params: {'learning_rate': 0.0003558918296795099, 'weight_decay': 0.0, 'warmup_steps': 27, 'lambda_param': 0.6000000000000001, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5628,1.042806,0.829514,0.692959,0.709382,0.699739
2,0.4349,0.803313,0.860678,0.889902,0.74196,0.745337
3,0.242,0.702275,0.882676,0.891079,0.843988,0.860028
4,0.1638,0.697854,0.886343,0.894744,0.845681,0.863564
5,0.1294,0.651444,0.890009,0.898701,0.847555,0.867311
6,0.1107,0.661445,0.885426,0.894557,0.844709,0.863747
7,0.0988,0.627556,0.892759,0.899829,0.850815,0.869298
8,0.0886,0.635438,0.893676,0.898406,0.853079,0.86967
9,0.0841,0.615879,0.894592,0.901271,0.852582,0.87088
10,0.0778,0.609161,0.893676,0.900832,0.851649,0.870425


[I 2025-03-22 15:56:27,177] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.000982939309117536, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0829,0.763394,0.870761,0.848431,0.789704,0.802781
2,0.1943,0.6366,0.88451,0.892242,0.846011,0.862458
3,0.1214,0.633171,0.888176,0.892726,0.849204,0.864184
4,0.0983,0.60327,0.893676,0.901366,0.852621,0.870709
5,0.0847,0.593885,0.893676,0.900768,0.851865,0.870434
6,0.0757,0.55168,0.901008,0.904883,0.858011,0.875568
7,0.0699,0.555726,0.898258,0.904551,0.865108,0.880874
8,0.0651,0.567088,0.895509,0.900928,0.853727,0.87153
9,0.0612,0.564896,0.901008,0.905515,0.85827,0.876077
10,0.0576,0.555509,0.904675,0.910542,0.869165,0.885937


[I 2025-03-22 16:01:04,676] Trial 111 finished with value: 0.883826566153206 and parameters: {'learning_rate': 0.000982939309117536, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 112 with params: {'learning_rate': 0.00048661605948329805, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3847,0.935688,0.850596,0.711214,0.727837,0.717879
2,0.3325,0.725564,0.870761,0.882974,0.823318,0.844132
3,0.1845,0.69809,0.879927,0.890374,0.84157,0.857826
4,0.1331,0.688874,0.882676,0.890949,0.843185,0.860981
5,0.1084,0.62754,0.890009,0.897782,0.848952,0.867194
6,0.0938,0.624363,0.889093,0.898134,0.857756,0.874003
7,0.0863,0.606832,0.892759,0.900973,0.860609,0.876687
8,0.0785,0.622743,0.890009,0.885573,0.858986,0.869494
9,0.0748,0.601703,0.889093,0.89771,0.857414,0.873413
10,0.0694,0.59137,0.898258,0.905641,0.864477,0.881166


[I 2025-03-22 16:04:59,087] Trial 112 finished with value: 0.8787075529099727 and parameters: {'learning_rate': 0.00048661605948329805, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 4.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 113 with params: {'learning_rate': 0.0009571791053373046, 'weight_decay': 0.005, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.078,0.761581,0.867094,0.870375,0.795215,0.813849
2,0.1994,0.641228,0.893676,0.901842,0.862253,0.877136
3,0.1264,0.646609,0.88451,0.892478,0.854339,0.868022
4,0.0993,0.610627,0.889093,0.897231,0.857807,0.873168
5,0.0867,0.606526,0.890926,0.898082,0.859795,0.874635
6,0.0766,0.552279,0.898258,0.893387,0.86479,0.876812
7,0.0711,0.57641,0.894592,0.890325,0.862559,0.874121
8,0.0652,0.566655,0.892759,0.888952,0.860808,0.87248
9,0.0616,0.571488,0.892759,0.900483,0.860816,0.876696
10,0.0583,0.554489,0.898258,0.905086,0.864401,0.880833


[I 2025-03-22 16:09:20,108] Trial 113 finished with value: 0.8748927402792198 and parameters: {'learning_rate': 0.0009571791053373046, 'weight_decay': 0.005, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 5.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 114 with params: {'learning_rate': 0.0010712005053589236, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0288,0.762601,0.868928,0.841907,0.797131,0.808816
2,0.1866,0.636801,0.885426,0.89578,0.845968,0.863445
3,0.1199,0.660715,0.882676,0.870173,0.853442,0.859389
4,0.0975,0.62069,0.888176,0.884698,0.857926,0.868165
5,0.0846,0.596345,0.894592,0.9011,0.861647,0.877349
6,0.0763,0.554667,0.896425,0.901628,0.854278,0.872134
7,0.07,0.555465,0.899175,0.89268,0.856775,0.871006
8,0.0645,0.560885,0.898258,0.904002,0.855733,0.874035
9,0.0602,0.562887,0.897342,0.902214,0.855066,0.872772
10,0.0568,0.549095,0.901925,0.907957,0.867317,0.883686


[I 2025-03-22 16:12:59,915] Trial 114 finished with value: 0.8808265077803158 and parameters: {'learning_rate': 0.0010712005053589236, 'weight_decay': 0.003, 'warmup_steps': 28, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 115 with params: {'learning_rate': 0.0020704852494442343, 'weight_decay': 0.002, 'warmup_steps': 18, 'lambda_param': 0.30000000000000004, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7898,0.690523,0.88176,0.859316,0.825498,0.837148
2,0.1458,0.637615,0.888176,0.892467,0.839466,0.856537
3,0.1021,0.606516,0.899175,0.892997,0.866667,0.876592
4,0.0841,0.607124,0.895509,0.890047,0.863144,0.873852
5,0.074,0.584906,0.895509,0.888884,0.853227,0.86711
6,0.0665,0.543493,0.904675,0.896677,0.860617,0.874925
7,0.0613,0.555224,0.900092,0.891841,0.857172,0.870725
8,0.0568,0.557639,0.895509,0.899038,0.844445,0.863521
9,0.0533,0.568009,0.894592,0.899806,0.852784,0.870322
10,0.0509,0.555575,0.899175,0.902033,0.847153,0.866325


[I 2025-03-22 16:15:37,064] Trial 115 pruned. 


Trial 116 with params: {'learning_rate': 0.001465622479114234, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.86,0.740908,0.864345,0.843946,0.811506,0.821277
2,0.1615,0.598312,0.891842,0.899089,0.851103,0.868434
3,0.1091,0.641776,0.887259,0.893252,0.848069,0.863613
4,0.0891,0.595736,0.889093,0.885083,0.858375,0.868673
5,0.0785,0.572064,0.897342,0.893596,0.863643,0.876036
6,0.072,0.528255,0.905591,0.900494,0.870299,0.883146
7,0.0654,0.532154,0.906508,0.901924,0.871373,0.884322
8,0.0603,0.538878,0.904675,0.899238,0.87052,0.882469
9,0.057,0.542987,0.902841,0.895784,0.859696,0.874157
10,0.0543,0.533906,0.903758,0.899777,0.869645,0.882408


[I 2025-03-22 16:19:11,923] Trial 116 finished with value: 0.8817172257644907 and parameters: {'learning_rate': 0.001465622479114234, 'weight_decay': 0.007, 'warmup_steps': 9, 'lambda_param': 0.8, 'temperature': 4.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 117 with params: {'learning_rate': 0.0009634255038625986, 'weight_decay': 0.005, 'warmup_steps': 14, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0219,0.782004,0.862511,0.832727,0.756551,0.760733
2,0.1996,0.665372,0.887259,0.895971,0.847562,0.864329
3,0.1252,0.652485,0.885426,0.891866,0.846667,0.862149
4,0.1001,0.637061,0.882676,0.890353,0.844623,0.861319
5,0.0857,0.63426,0.88451,0.891601,0.845432,0.862425


[I 2025-03-22 16:20:20,875] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.0008174935368438945, 'weight_decay': 0.007, 'warmup_steps': 6, 'lambda_param': 0.7000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0609,0.793906,0.857929,0.717867,0.733279,0.724335
2,0.2236,0.665293,0.887259,0.896436,0.847272,0.864762
3,0.1337,0.647095,0.886343,0.893185,0.847153,0.86298
4,0.1056,0.626137,0.888176,0.894954,0.848237,0.865412
5,0.0908,0.630177,0.883593,0.890343,0.845149,0.861612


[I 2025-03-22 16:21:34,535] Trial 118 pruned. 


Trial 119 with params: {'learning_rate': 0.0013493819220644597, 'weight_decay': 0.008, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8638,0.751011,0.872594,0.87611,0.818954,0.834423
2,0.17,0.614021,0.889093,0.89639,0.848288,0.866262
3,0.1108,0.64029,0.885426,0.879656,0.845974,0.858161
4,0.0908,0.594355,0.898258,0.893769,0.86544,0.876939
5,0.0795,0.601357,0.894592,0.889347,0.852705,0.867267
6,0.0731,0.568428,0.900092,0.892096,0.846985,0.864335
7,0.0677,0.555531,0.900092,0.895569,0.866134,0.878469
8,0.0626,0.558984,0.899175,0.89331,0.856198,0.871021
9,0.058,0.571271,0.899175,0.895388,0.865718,0.878179
10,0.0552,0.554863,0.901008,0.897939,0.866689,0.880024


[I 2025-03-22 16:25:29,381] Trial 119 finished with value: 0.8658055889074839 and parameters: {'learning_rate': 0.0013493819220644597, 'weight_decay': 0.008, 'warmup_steps': 5, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 120 with params: {'learning_rate': 0.00016104904333464902, 'weight_decay': 0.009000000000000001, 'warmup_steps': 15, 'lambda_param': 0.2, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.102,1.415812,0.772686,0.646534,0.662514,0.652784
2,0.8506,1.071582,0.828598,0.696839,0.706649,0.7012
3,0.5527,0.943414,0.846013,0.708827,0.721293,0.71456
4,0.3963,0.876187,0.849679,0.713729,0.72477,0.71844
5,0.2979,0.811627,0.862511,0.847341,0.790772,0.806776


[I 2025-03-22 16:26:40,201] Trial 120 pruned. 


Trial 121 with params: {'learning_rate': 8.532115701682182e-05, 'weight_decay': 0.003, 'warmup_steps': 20, 'lambda_param': 1.0, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.6571,1.942293,0.690192,0.593524,0.581191,0.58387
2,1.3215,1.400689,0.780018,0.652158,0.667883,0.658869
3,0.9081,1.224579,0.811182,0.683207,0.692749,0.68769
4,0.7038,1.138363,0.812099,0.683654,0.694258,0.687002
5,0.571,1.037617,0.826764,0.692356,0.70736,0.699509
6,0.4811,0.985408,0.836847,0.700155,0.715508,0.707312
7,0.4182,0.952847,0.846013,0.707402,0.723413,0.71498
8,0.3658,0.933159,0.839597,0.703466,0.717425,0.709166
9,0.3268,0.905702,0.846013,0.710333,0.721098,0.715563
10,0.2995,0.895588,0.852429,0.879064,0.737941,0.737938


[I 2025-03-22 16:29:08,234] Trial 121 pruned. 


Trial 122 with params: {'learning_rate': 0.001514633292887471, 'weight_decay': 0.003, 'warmup_steps': 26, 'lambda_param': 0.30000000000000004, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9114,0.71308,0.879927,0.884253,0.83292,0.849399
2,0.1584,0.598815,0.893676,0.899648,0.852484,0.869705
3,0.1082,0.636579,0.888176,0.872084,0.84882,0.857545
4,0.0874,0.594599,0.895509,0.890692,0.863586,0.874321
5,0.0774,0.57046,0.893676,0.899893,0.852016,0.869909
6,0.0713,0.555933,0.901925,0.894815,0.857872,0.872563
7,0.0649,0.550667,0.894592,0.888594,0.852989,0.866825
8,0.06,0.550075,0.892759,0.887433,0.851803,0.865966
9,0.0556,0.559971,0.897342,0.890217,0.855566,0.868951
10,0.0528,0.555886,0.896425,0.902413,0.854667,0.872709


[I 2025-03-22 16:31:40,725] Trial 122 pruned. 


Trial 123 with params: {'learning_rate': 0.004365749484521776, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.645,0.65517,0.887259,0.874489,0.830559,0.844231
2,0.1309,0.586621,0.893676,0.885941,0.85412,0.86565
3,0.0958,0.585855,0.895509,0.888641,0.85469,0.867614
4,0.0801,0.565828,0.903758,0.897371,0.860909,0.874943
5,0.072,0.536364,0.908341,0.900018,0.865353,0.878622
6,0.064,0.534019,0.901925,0.892637,0.850525,0.865965
7,0.0585,0.545725,0.903758,0.893454,0.852489,0.867307
8,0.0553,0.540199,0.901925,0.891999,0.851042,0.865968
9,0.0516,0.540367,0.901008,0.891814,0.849977,0.865289
10,0.0485,0.533948,0.900092,0.890826,0.849073,0.864256


[I 2025-03-22 16:34:19,240] Trial 123 pruned. 


Trial 124 with params: {'learning_rate': 0.0007230566903421039, 'weight_decay': 0.003, 'warmup_steps': 31, 'lambda_param': 0.5, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2045,0.824071,0.859762,0.887759,0.742876,0.743932
2,0.24,0.65753,0.890926,0.899691,0.849969,0.867964
3,0.1425,0.665207,0.87901,0.887199,0.841259,0.856908
4,0.1105,0.62544,0.892759,0.898212,0.85152,0.868943
5,0.0925,0.607907,0.894592,0.889228,0.85243,0.867232
6,0.083,0.58232,0.896425,0.902461,0.853793,0.872338
7,0.0758,0.58038,0.891842,0.898338,0.850557,0.868577
8,0.0704,0.592686,0.888176,0.894627,0.848143,0.865391
9,0.067,0.579668,0.890926,0.897598,0.849687,0.867881
10,0.0628,0.577344,0.894592,0.901225,0.852243,0.870917


[I 2025-03-22 16:36:44,047] Trial 124 pruned. 


Trial 125 with params: {'learning_rate': 0.0009473479492386732, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0889,0.761315,0.875344,0.843076,0.765758,0.770958
2,0.2009,0.662202,0.882676,0.892931,0.844805,0.86145
3,0.1259,0.643562,0.887259,0.884739,0.856379,0.867109
4,0.1004,0.623373,0.888176,0.895067,0.857107,0.871892
5,0.0867,0.621698,0.890926,0.899267,0.858923,0.874943
6,0.0773,0.558741,0.901925,0.907083,0.867268,0.8832
7,0.0706,0.576445,0.892759,0.900574,0.860401,0.876561
8,0.0653,0.57689,0.896425,0.902613,0.864025,0.879329
9,0.0619,0.575432,0.898258,0.903526,0.856235,0.874013
10,0.0587,0.567193,0.898258,0.905457,0.864781,0.881222


[I 2025-03-22 16:40:28,607] Trial 125 finished with value: 0.8825417736515265 and parameters: {'learning_rate': 0.0009473479492386732, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 3.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 126 with params: {'learning_rate': 0.0013005580514197468, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 0.2, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9765,0.722701,0.886343,0.866792,0.837947,0.84749
2,0.1713,0.582225,0.894592,0.899947,0.853208,0.870393
3,0.1102,0.612606,0.886343,0.880338,0.847385,0.859186
4,0.0903,0.581583,0.894592,0.888766,0.853552,0.86706
5,0.0795,0.579216,0.896425,0.891029,0.85324,0.868482
6,0.0722,0.543878,0.902841,0.906968,0.859208,0.877249
7,0.0661,0.552273,0.903758,0.886256,0.859906,0.870923
8,0.0612,0.550095,0.903758,0.907816,0.860271,0.878095
9,0.0574,0.551004,0.897342,0.891279,0.854913,0.869403
10,0.0548,0.555265,0.898258,0.892049,0.855119,0.86999


[I 2025-03-22 16:42:56,720] Trial 126 pruned. 


Trial 127 with params: {'learning_rate': 0.00010672432719553498, 'weight_decay': 0.01, 'warmup_steps': 11, 'lambda_param': 0.4, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.448,1.71912,0.734189,0.624423,0.621637,0.620973
2,1.1307,1.274702,0.80385,0.674641,0.686234,0.679646
3,0.7673,1.111709,0.824931,0.692539,0.703598,0.697823
4,0.5763,1.035985,0.834097,0.699643,0.712913,0.705085
5,0.456,0.946011,0.849679,0.711465,0.725759,0.718366
6,0.3772,0.901463,0.850596,0.712188,0.726226,0.718953
7,0.3271,0.877589,0.856095,0.799482,0.740295,0.740523
8,0.2784,0.865705,0.849679,0.844344,0.761969,0.776743
9,0.2445,0.826739,0.858845,0.869528,0.804563,0.826119
10,0.2208,0.816299,0.872594,0.882325,0.834412,0.852458


[I 2025-03-22 16:45:31,316] Trial 127 pruned. 


Trial 128 with params: {'learning_rate': 0.0005316617024146923, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3363,0.865025,0.857929,0.718066,0.733543,0.725073
2,0.3028,0.703204,0.882676,0.893931,0.843752,0.861763
3,0.1704,0.690928,0.879927,0.889405,0.841419,0.857408
4,0.1249,0.675046,0.889093,0.896032,0.848481,0.866212
5,0.1043,0.621971,0.887259,0.894832,0.84673,0.864915
6,0.0905,0.598029,0.892759,0.898782,0.850571,0.868903
7,0.0824,0.584617,0.894592,0.901004,0.852659,0.871068
8,0.0764,0.619254,0.886343,0.882961,0.856271,0.866779
9,0.0723,0.591767,0.891842,0.898221,0.85025,0.868266
10,0.0676,0.582706,0.901008,0.907027,0.866655,0.882921


[I 2025-03-22 16:49:21,791] Trial 128 finished with value: 0.8703182389688622 and parameters: {'learning_rate': 0.0005316617024146923, 'weight_decay': 0.004, 'warmup_steps': 30, 'lambda_param': 0.30000000000000004, 'temperature': 2.5}. Best is trial 95 with value: 0.8854974599379749.


Trial 129 with params: {'learning_rate': 0.0005899560790187384, 'weight_decay': 0.007, 'warmup_steps': 8, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2239,0.845826,0.864345,0.723857,0.737393,0.729912
2,0.2844,0.695188,0.877177,0.888739,0.828437,0.849517
3,0.1606,0.695962,0.87626,0.885226,0.839175,0.854617
4,0.1198,0.670724,0.883593,0.893101,0.843366,0.862216
5,0.1008,0.640123,0.882676,0.891277,0.843746,0.861498


[I 2025-03-22 16:50:37,599] Trial 129 pruned. 


Trial 130 with params: {'learning_rate': 0.0008543638019222787, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.123,0.805932,0.867094,0.891857,0.759653,0.765546
2,0.2148,0.671322,0.877177,0.889853,0.83865,0.856738
3,0.1303,0.647682,0.883593,0.889987,0.844058,0.85985
4,0.1026,0.62571,0.890926,0.896177,0.850219,0.866974
5,0.0873,0.638398,0.886343,0.882599,0.855175,0.866261
6,0.0781,0.57642,0.893676,0.900544,0.860812,0.8767
7,0.0716,0.583278,0.892759,0.900927,0.8606,0.87684
8,0.0668,0.589202,0.892759,0.898355,0.85194,0.869267
9,0.0628,0.582349,0.889093,0.895621,0.848977,0.866394
10,0.0592,0.571695,0.896425,0.9034,0.863121,0.879335


[I 2025-03-22 16:54:22,691] Trial 130 finished with value: 0.8786915972335735 and parameters: {'learning_rate': 0.0008543638019222787, 'weight_decay': 0.004, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 4.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 131 with params: {'learning_rate': 0.0008489248196425639, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1202,0.812415,0.860678,0.805781,0.744255,0.744077
2,0.2175,0.671682,0.87901,0.890636,0.841511,0.857476
3,0.13,0.641565,0.885426,0.882652,0.855014,0.865454
4,0.1034,0.637793,0.886343,0.89393,0.855918,0.870243
5,0.0892,0.604057,0.892759,0.888978,0.85974,0.872034
6,0.0805,0.573764,0.896425,0.891306,0.86283,0.874792
7,0.0729,0.584578,0.891842,0.899869,0.860017,0.87597
8,0.0672,0.578612,0.895509,0.902295,0.862857,0.878544
9,0.0635,0.578025,0.895509,0.902087,0.862614,0.878379
10,0.0598,0.564705,0.894592,0.901583,0.861187,0.877485


[I 2025-03-22 16:58:22,739] Trial 131 finished with value: 0.8766217621037792 and parameters: {'learning_rate': 0.0008489248196425639, 'weight_decay': 0.004, 'warmup_steps': 29, 'lambda_param': 0.1, 'temperature': 6.0}. Best is trial 95 with value: 0.8854974599379749.


Trial 132 with params: {'learning_rate': 0.0005853983837128073, 'weight_decay': 0.002, 'warmup_steps': 31, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2945,0.850229,0.861595,0.723509,0.735656,0.728265
2,0.2808,0.698868,0.880843,0.892653,0.842617,0.860557
3,0.1591,0.688838,0.87901,0.888362,0.841321,0.857065
4,0.1199,0.648132,0.890926,0.898111,0.849144,0.867633
5,0.1007,0.6194,0.895509,0.888688,0.853596,0.867367
6,0.0882,0.590142,0.889093,0.896603,0.848079,0.866557
7,0.0801,0.596998,0.891842,0.898715,0.85036,0.868648
8,0.0741,0.60462,0.890926,0.897035,0.850578,0.867479
9,0.0701,0.590292,0.891842,0.898341,0.850568,0.86849
10,0.0657,0.577244,0.896425,0.903255,0.853997,0.872875


[I 2025-03-22 17:00:40,845] Trial 132 pruned. 


Trial 133 with params: {'learning_rate': 0.0011072530594019546, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9762,0.727843,0.878093,0.864829,0.813353,0.828461
2,0.1816,0.629357,0.88176,0.87998,0.844049,0.85719
3,0.1156,0.65978,0.87901,0.855867,0.841752,0.846067
4,0.0962,0.60795,0.885426,0.881163,0.8465,0.859704
5,0.0818,0.592511,0.898258,0.893616,0.865404,0.877111
6,0.0754,0.545986,0.901925,0.906744,0.858514,0.876848
7,0.0696,0.545615,0.901925,0.896695,0.868058,0.880006
8,0.064,0.554244,0.896425,0.901306,0.855023,0.872172
9,0.0601,0.538423,0.901008,0.906098,0.858227,0.876345
10,0.0566,0.529915,0.903758,0.898617,0.869364,0.881747


[I 2025-03-22 17:04:44,007] Trial 133 finished with value: 0.8881671330460716 and parameters: {'learning_rate': 0.0011072530594019546, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 5.5}. Best is trial 133 with value: 0.8881671330460716.


Trial 134 with params: {'learning_rate': 0.0015438541667967373, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8534,0.710364,0.877177,0.87096,0.830048,0.844035
2,0.1602,0.613214,0.890009,0.895923,0.850125,0.866703
3,0.109,0.658956,0.883593,0.866956,0.845949,0.852657
4,0.0888,0.61874,0.894592,0.889441,0.853483,0.866949
5,0.0782,0.587579,0.897342,0.904071,0.854416,0.873263
6,0.0706,0.557157,0.901008,0.905492,0.858041,0.875743
7,0.0655,0.55524,0.901008,0.894715,0.857672,0.872365
8,0.0601,0.567285,0.894592,0.888976,0.853247,0.867248
9,0.0562,0.57285,0.895509,0.890029,0.853927,0.86809
10,0.053,0.555451,0.899175,0.895778,0.865885,0.878527


[I 2025-03-22 17:08:46,473] Trial 134 finished with value: 0.8804536739309388 and parameters: {'learning_rate': 0.0015438541667967373, 'weight_decay': 0.008, 'warmup_steps': 13, 'lambda_param': 0.8, 'temperature': 5.5}. Best is trial 133 with value: 0.8881671330460716.


Trial 135 with params: {'learning_rate': 0.001464425374761532, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8511,0.715653,0.871677,0.873585,0.808954,0.825624
2,0.1651,0.616402,0.887259,0.894601,0.846744,0.864653
3,0.1087,0.642922,0.88176,0.876125,0.844348,0.854843
4,0.09,0.611584,0.889093,0.883246,0.849329,0.861494
5,0.0792,0.578666,0.895509,0.889356,0.85312,0.867476
6,0.0708,0.554928,0.904675,0.900042,0.869999,0.882744
7,0.0648,0.545461,0.899175,0.894487,0.865964,0.877799
8,0.0607,0.558769,0.901925,0.895183,0.859458,0.873659
9,0.0569,0.567964,0.896425,0.89251,0.86392,0.875817
10,0.0535,0.563602,0.899175,0.896089,0.866025,0.87878


[I 2025-03-22 17:12:36,828] Trial 135 finished with value: 0.8845655993754321 and parameters: {'learning_rate': 0.001464425374761532, 'weight_decay': 0.01, 'warmup_steps': 4, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 133 with value: 0.8881671330460716.


Trial 136 with params: {'learning_rate': 0.001303111229534305, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8832,0.781938,0.859762,0.837097,0.799129,0.809385
2,0.1728,0.616884,0.894592,0.901468,0.853125,0.870442
3,0.1121,0.652579,0.883593,0.876834,0.845288,0.856102
4,0.0934,0.619661,0.889093,0.894952,0.849344,0.865772
5,0.0813,0.600376,0.892759,0.887307,0.851093,0.865485


[I 2025-03-22 17:13:55,420] Trial 136 pruned. 


Trial 137 with params: {'learning_rate': 0.0030396951213185687, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6487,0.675878,0.888176,0.890207,0.83091,0.849236
2,0.1351,0.576623,0.901008,0.888569,0.849424,0.863312
3,0.0964,0.563731,0.906508,0.897073,0.862817,0.876308
4,0.0795,0.562374,0.901008,0.893439,0.857654,0.871407
5,0.0743,0.55075,0.908341,0.894933,0.844916,0.86273
6,0.0655,0.545669,0.904675,0.890832,0.841811,0.859167
7,0.0595,0.549759,0.907424,0.897405,0.863416,0.876591
8,0.0554,0.557596,0.909258,0.900143,0.864691,0.87878
9,0.0519,0.551355,0.902841,0.894588,0.859129,0.873152
10,0.0491,0.534789,0.909258,0.899919,0.864198,0.878312


[I 2025-03-22 17:17:45,736] Trial 137 finished with value: 0.8805111918682496 and parameters: {'learning_rate': 0.0030396951213185687, 'weight_decay': 0.01, 'warmup_steps': 0, 'lambda_param': 0.7000000000000001, 'temperature': 7.0}. Best is trial 133 with value: 0.8881671330460716.


Trial 138 with params: {'learning_rate': 0.0025805223244731966, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6869,0.674926,0.885426,0.888355,0.83755,0.854149
2,0.1364,0.59785,0.888176,0.880875,0.848738,0.860242
3,0.1007,0.591403,0.898258,0.890705,0.866025,0.875313
4,0.0829,0.54745,0.898258,0.891598,0.855784,0.869626
5,0.0733,0.535953,0.903758,0.8961,0.859373,0.874001
6,0.0656,0.527889,0.903758,0.895479,0.859287,0.873585
7,0.0606,0.513449,0.910174,0.901267,0.865279,0.879532
8,0.0563,0.520492,0.902841,0.892437,0.84999,0.86594
9,0.0524,0.525571,0.907424,0.898558,0.862916,0.876988
10,0.0491,0.519898,0.908341,0.911185,0.863866,0.88162


[I 2025-03-22 17:22:21,756] Trial 138 finished with value: 0.8780115221376649 and parameters: {'learning_rate': 0.0025805223244731966, 'weight_decay': 0.009000000000000001, 'warmup_steps': 6, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 133 with value: 0.8881671330460716.


Trial 139 with params: {'learning_rate': 0.004422690647185826, 'weight_decay': 0.007, 'warmup_steps': 16, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6306,0.654854,0.888176,0.89257,0.829818,0.849567
2,0.1285,0.586553,0.898258,0.884261,0.838509,0.853547
3,0.093,0.596564,0.893676,0.881829,0.835333,0.850933
4,0.0789,0.608599,0.898258,0.888208,0.838276,0.855502
5,0.0703,0.585903,0.901008,0.879488,0.840303,0.854678


[I 2025-03-22 17:23:54,038] Trial 139 pruned. 


Trial 140 with params: {'learning_rate': 0.0006947067987458605, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1332,0.896136,0.848763,0.713314,0.72593,0.716906
2,0.2511,0.660591,0.887259,0.896722,0.847406,0.865921
3,0.1448,0.666927,0.883593,0.892597,0.844914,0.861262
4,0.113,0.640844,0.890926,0.897338,0.850213,0.867942
5,0.0949,0.624677,0.887259,0.894343,0.847162,0.86467
6,0.0861,0.582607,0.892759,0.89781,0.85097,0.868509
7,0.077,0.594219,0.894592,0.902087,0.861842,0.877969
8,0.0724,0.608107,0.892759,0.888402,0.860638,0.871973
9,0.0691,0.594219,0.892759,0.900375,0.86069,0.876355
10,0.0649,0.578851,0.900092,0.907002,0.866243,0.882722


[I 2025-03-22 17:28:12,470] Trial 140 finished with value: 0.8726399279276659 and parameters: {'learning_rate': 0.0006947067987458605, 'weight_decay': 0.009000000000000001, 'warmup_steps': 5, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 133 with value: 0.8881671330460716.


Trial 141 with params: {'learning_rate': 0.00109374694953428, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9903,0.77295,0.865261,0.869985,0.803909,0.82085
2,0.1866,0.646526,0.889093,0.896304,0.848489,0.865624
3,0.1196,0.671223,0.879927,0.887376,0.842829,0.857561
4,0.0968,0.630296,0.887259,0.885264,0.86591,0.873663
5,0.0841,0.620759,0.892759,0.901614,0.869623,0.882984
6,0.0759,0.55099,0.901925,0.908427,0.8763,0.889767
7,0.0691,0.572273,0.897342,0.892665,0.863834,0.875872
8,0.0639,0.563254,0.896425,0.901601,0.854289,0.872159
9,0.0605,0.565426,0.902841,0.908416,0.868948,0.884695
10,0.0569,0.558654,0.898258,0.905461,0.864363,0.881023


[I 2025-03-22 17:32:25,204] Trial 141 finished with value: 0.8835551847827067 and parameters: {'learning_rate': 0.00109374694953428, 'weight_decay': 0.009000000000000001, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 5.5}. Best is trial 133 with value: 0.8881671330460716.


Trial 142 with params: {'learning_rate': 0.0009869097157898473, 'weight_decay': 0.009000000000000001, 'warmup_steps': 22, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0508,0.783937,0.875344,0.844569,0.783973,0.794415
2,0.1942,0.652335,0.887259,0.896641,0.848476,0.865058
3,0.1221,0.64783,0.880843,0.887594,0.842277,0.857897
4,0.0981,0.609352,0.896425,0.901925,0.86311,0.878021
5,0.0855,0.60424,0.895509,0.903412,0.862808,0.879075
6,0.0768,0.556152,0.898258,0.891085,0.855618,0.869726
7,0.0699,0.555208,0.901008,0.895932,0.866814,0.879022
8,0.0648,0.561782,0.900092,0.892442,0.857059,0.871085
9,0.0613,0.558038,0.899175,0.90447,0.856435,0.874641
10,0.0574,0.55318,0.901925,0.897001,0.867391,0.879964


[I 2025-03-22 17:36:26,713] Trial 142 finished with value: 0.883210770054867 and parameters: {'learning_rate': 0.0009869097157898473, 'weight_decay': 0.009000000000000001, 'warmup_steps': 22, 'lambda_param': 0.8, 'temperature': 5.0}. Best is trial 133 with value: 0.8881671330460716.


Trial 143 with params: {'learning_rate': 0.0009059666378956125, 'weight_decay': 0.01, 'warmup_steps': 17, 'lambda_param': 0.9, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0571,0.774629,0.870761,0.871686,0.798391,0.816078
2,0.2072,0.655122,0.883593,0.892967,0.844803,0.86161
3,0.128,0.657328,0.88451,0.890747,0.845787,0.86139
4,0.1022,0.616763,0.890926,0.897524,0.849901,0.867723
5,0.0872,0.632589,0.886343,0.893063,0.846396,0.863605


[I 2025-03-22 17:37:47,485] Trial 143 pruned. 


Trial 144 with params: {'learning_rate': 0.001500550707783043, 'weight_decay': 0.008, 'warmup_steps': 25, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9173,0.694802,0.877177,0.870363,0.830767,0.844157
2,0.1594,0.602237,0.888176,0.882326,0.848243,0.861442
3,0.1076,0.61422,0.888176,0.8722,0.848489,0.857392
4,0.0874,0.593533,0.892759,0.887117,0.851646,0.865476
5,0.0782,0.565768,0.893676,0.887615,0.851895,0.866092
6,0.0713,0.54351,0.894592,0.88904,0.852276,0.867086
7,0.0647,0.541066,0.901008,0.893615,0.857435,0.871769
8,0.0599,0.559493,0.896425,0.889579,0.854343,0.868145
9,0.0557,0.561058,0.899175,0.892064,0.856343,0.870467
10,0.0529,0.54617,0.897342,0.891015,0.855053,0.869401


[I 2025-03-22 17:40:25,280] Trial 144 pruned. 


Trial 145 with params: {'learning_rate': 0.0008303691716594152, 'weight_decay': 0.008, 'warmup_steps': 16, 'lambda_param': 0.6000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0841,0.799144,0.863428,0.806648,0.746727,0.746095
2,0.2227,0.641066,0.888176,0.897578,0.848115,0.865563
3,0.135,0.64872,0.88451,0.891105,0.845143,0.861386
4,0.1054,0.603416,0.890926,0.897193,0.85047,0.867747
5,0.0898,0.624403,0.890926,0.883978,0.850126,0.863218


[I 2025-03-22 17:41:46,048] Trial 145 pruned. 


Trial 146 with params: {'learning_rate': 0.00048105620598179265, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.342,0.921143,0.848763,0.709435,0.725628,0.716555
2,0.3406,0.733959,0.869844,0.880159,0.813475,0.834941
3,0.1875,0.691114,0.880843,0.890134,0.84209,0.858436
4,0.1344,0.676144,0.880843,0.889162,0.841503,0.859156
5,0.1106,0.639428,0.888176,0.895374,0.847079,0.865318
6,0.0955,0.623252,0.890009,0.89787,0.857775,0.873821
7,0.0861,0.60868,0.890009,0.898625,0.857881,0.874239
8,0.0798,0.615872,0.895509,0.902143,0.862752,0.878275
9,0.075,0.604763,0.897342,0.905349,0.863713,0.880558
10,0.0702,0.593408,0.896425,0.905129,0.863054,0.880192


[I 2025-03-22 17:45:41,011] Trial 146 finished with value: 0.8758190568359326 and parameters: {'learning_rate': 0.00048105620598179265, 'weight_decay': 0.01, 'warmup_steps': 15, 'lambda_param': 0.8, 'temperature': 6.5}. Best is trial 133 with value: 0.8881671330460716.


Trial 147 with params: {'learning_rate': 0.0019529594801860164, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7979,0.689592,0.878093,0.882451,0.832011,0.84856
2,0.1486,0.608319,0.898258,0.890229,0.856819,0.868984
3,0.1029,0.60033,0.894592,0.886514,0.854135,0.865777
4,0.0849,0.596044,0.893676,0.887435,0.853167,0.865876
5,0.0748,0.57302,0.897342,0.902607,0.855027,0.872861
6,0.0679,0.565727,0.901925,0.905019,0.858938,0.876008
7,0.063,0.55283,0.903758,0.894843,0.859964,0.873548
8,0.0578,0.554829,0.895509,0.889391,0.854394,0.86814
9,0.0541,0.568625,0.900092,0.892701,0.85727,0.871183
10,0.0512,0.552738,0.900092,0.904641,0.857411,0.875135


[I 2025-03-22 17:50:00,578] Trial 147 finished with value: 0.8734705039919833 and parameters: {'learning_rate': 0.0019529594801860164, 'weight_decay': 0.008, 'warmup_steps': 17, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 133 with value: 0.8881671330460716.


Trial 148 with params: {'learning_rate': 0.0006899326854992335, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1507,0.891442,0.856095,0.718583,0.732011,0.72271
2,0.2531,0.649203,0.886343,0.897488,0.845307,0.865281
3,0.1466,0.657862,0.882676,0.89119,0.843599,0.860588
4,0.1139,0.632898,0.890009,0.896295,0.849118,0.866886
5,0.0961,0.625441,0.890926,0.896675,0.85017,0.867478
6,0.085,0.583917,0.889093,0.895929,0.848192,0.86621
7,0.0773,0.594493,0.893676,0.900246,0.852418,0.870397
8,0.0715,0.595268,0.890926,0.8969,0.850529,0.867562
9,0.0682,0.590795,0.892759,0.899224,0.851821,0.869646
10,0.0639,0.583767,0.899175,0.904703,0.856391,0.87478


[I 2025-03-22 17:53:39,316] Trial 148 finished with value: 0.8723543968632019 and parameters: {'learning_rate': 0.0006899326854992335, 'weight_decay': 0.006, 'warmup_steps': 10, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 133 with value: 0.8881671330460716.


Trial 149 with params: {'learning_rate': 0.0013388240690482276, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8751,0.752546,0.870761,0.848678,0.81675,0.826369
2,0.1684,0.615927,0.897342,0.9023,0.855729,0.872544
3,0.1105,0.658747,0.87901,0.863572,0.841693,0.849248
4,0.0899,0.61017,0.888176,0.883692,0.848775,0.861841
5,0.0788,0.5897,0.893676,0.89079,0.86121,0.873618
6,0.0727,0.571215,0.901008,0.894695,0.858551,0.873052
7,0.0664,0.561674,0.898258,0.893865,0.864922,0.877096
8,0.0615,0.558819,0.900092,0.893816,0.857124,0.87167
9,0.0572,0.567727,0.901008,0.897566,0.867284,0.880123
10,0.0542,0.548837,0.900092,0.896639,0.866485,0.879287


Using the latest cached version of the module from /home/jovyan/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--f1/34c46321f42186df33a6260966e34a368f14868d9cc2ba47d142112e2800d233 (last modified on Fri Jan 10 23:14:01 2025) since it couldn't be found locally at evaluate-metric--f1, or remotely on the Hugging Face Hub.
[I 2025-03-22 17:58:10,111] Trial 149 finished with value: 0.8805127551471325 and parameters: {'learning_rate': 0.0013388240690482276, 'weight_decay': 0.007, 'warmup_steps': 7, 'lambda_param': 1.0, 'temperature': 5.0}. Best is trial 133 with value: 0.8881671330460716.


In [47]:
print(best_trial_distill_aug)

BestRun(run_id='133', objective=0.8881671330460716, hyperparameters={'learning_rate': 0.0011072530594019546, 'weight_decay': 0.008, 'warmup_steps': 12, 'lambda_param': 0.9, 'temperature': 5.5}, run_summary=None)
