**SETUP**

In [4]:
!pip install -q transformers datasets torch scikit-learn optuna evaluate

import pandas as pd
import torch, numpy as np, random
from sklearn.model_selection import train_test_split
from transformers import DistilBertTokenizerFast, AutoModelForSequenceClassification, Trainer, TrainingArguments
import optuna
import evaluate
from datasets import Dataset

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In this part of the code, we are setting up the tools and libraries we need for our experiment. First, we install the necessary Python packages like Transformers for working with the DistilBERT model, Datasets for handling the dataset, Torch for running the deep learning model on CPU or GPU, scikit-learn for data splitting, Optuna for automated hyperparameter search, and Evaluate for measuring the model’s performance.

Next, we import these packages into our notebook. Pandas is used to load and handle our CSV dataset, NumPy and random help with calculations and ensuring reproducibility, and train_test_split from scikit-learn is used to divide the data into training and evaluation sets.

Finally, we import DistilBertTokenizerFast to convert text into tokens that the model can understand, AutoModelForSequenceClassification to load the pre-trained DistilBERT model for classification tasks, Trainer and TrainingArguments to handle model training and evaluation, and Dataset from Hugging Face to manage our data in a format suitable for the model.



---



In [5]:
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)

Device: cuda


Here, we set a seed value to make sure that our results are reproducible, meaning we get the same results every time we run the code. We set this seed for Python’s random, NumPy, and PyTorch. Then, we check if a GPU is available and set it as the device for training. Using a GPU makes training much faster, but if it’s not available, the code will use the CPU instead.



---



**LOAD DATASET**

In [6]:
from google.colab import files
uploaded = files.upload()

Saving amazon_eco-friendly_products.csv to amazon_eco-friendly_products.csv


We load our dataset using Pandas. Since the dataset has different text columns (title, material, description), we combine them into one column called text so the model can read it as a single input. Then, we create a simple label: if the text contains any words like “eco”, “sustain”, or “organic”, it is labeled as 1 (eco-friendly), otherwise 0. Finally, we split the dataset into training and evaluation sets, keeping 80% for training and 20% for evaluation, while making sure the label distribution stays balanced.



---



In [7]:
df = pd.read_csv("amazon_eco-friendly_products.csv")
df["text"] = df["title"].astype(str) + " " + df["material"].astype(str) + " " + df["description"].astype(str)
eco_terms = ["eco", "recycl", "sustain", "biodegrad", "organic", "green"]
df["label"] = df["text"].str.contains("|".join(eco_terms), case=False, regex=True).astype(int)
train_texts, eval_texts, train_labels, eval_labels = train_test_split(df["text"], df["label"], test_size=0.2, stratify=df["label"], random_state=SEED)

We load our dataset using Pandas. Since the dataset has different text columns (title, material, description), we combine them into one column called text so the model can read it as a single input. Then, we create a simple label: if the text contains any words like “eco”, “sustain”, or “organic”, it is labeled as 1 (eco-friendly), otherwise 0. Finally, we split the dataset into training and evaluation sets, keeping 80% for training and 20% for evaluation, while making sure the label distribution stays balanced.



---



**TOKENIZATION**

In [8]:
MODEL_NAME = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(MODEL_NAME)

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding=True, max_length=128)

train_ds = Dataset.from_pandas(pd.DataFrame({"text": train_texts, "labels": train_labels}))
eval_ds  = Dataset.from_pandas(pd.DataFrame({"text": eval_texts, "labels": eval_labels}))
tokenized_train = train_ds.map(tokenize, batched=True)
tokenized_eval  = eval_ds.map(tokenize, batched=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Map:   0%|          | 0/2869 [00:00<?, ? examples/s]

Map:   0%|          | 0/718 [00:00<?, ? examples/s]

Since BERT models can’t read plain text directly, we use a tokenizer to convert the text into numbers that the model can understand. Here, we use the pre-trained DistilBERT tokenizer. We then create datasets in the Hugging Face format, which the model expects, and apply the tokenizer to every piece of text. We also make sure that all sequences are padded to the same length and cut off if they are too long.



---



**METRICS**

In [9]:
f1_metric = evaluate.load("f1")
accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    f1 = f1_metric.compute(predictions=preds, references=labels, average="binary")["f1"]
    acc = accuracy_metric.compute(predictions=preds, references=labels)["accuracy"]
    return {"accuracy": acc, "f1": f1}

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

We define the metrics to measure how well the model is doing. We use accuracy to see how many predictions are correct, and F1-score to balance precision and recall, which is useful when labels are not perfectly balanced. The compute_metrics function takes the model’s predictions and calculates these values for evaluation.



---



**MODEL INITIALIZER**

In [10]:
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2)

Here, we create a function that loads a fresh model for every trial in the hyperparameter search. We use DistilBERT pre-trained for sequence classification and tell it we have 2 labels (eco-friendly or not). Using a fresh model ensures that each trial starts from the same base without inheriting weights from previous runs.



---



**TRAINING ARGUMENTS**

In [11]:
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,  # placeholder, will be overridden
    per_device_eval_batch_size=16,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    report_to="none",
)

In this part, we set up the training configuration for the model using TrainingArguments. We specify where to save the results with output_dir, and set a default number of training epochs to 3 (this is just a placeholder because the hyperparameter search will override it later).

We also define how the model will be evaluated: per_device_eval_batch_size=16 sets the batch size for evaluation, and eval_strategy="epoch" means the model will be evaluated at the end of each training epoch. save_strategy="epoch" tells the Trainer to save the model at the end of each epoch.

The argument load_best_model_at_end=True ensures that after training, the Trainer keeps the best model based on the chosen metric, which in this case is F1-score (metric_for_best_model="f1"). Finally, report_to="none" disables logging to external tools like Weights & Biases, keeping the training output simple.



---



**HYPERPARAMETER SPACE**

In [16]:
grid_space = {
    "num_train_epochs": [3],
    "per_device_train_batch_size": [16,32],
    "learning_rate": [2e-5, 3e-5, 5e-5],
    "weight_decay": [0.0, 0.01, 0.05, 0.1]
}

def hp_space_grid(trial):
    return {
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", grid_space["num_train_epochs"]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", grid_space["per_device_train_batch_size"]),
        "learning_rate": trial.suggest_categorical("learning_rate", grid_space["learning_rate"]),
        "weight_decay": trial.suggest_categorical("weight_decay", grid_space["weight_decay"]),
    }

def hp_space_random(trial):
    return {
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", grid_space["num_train_epochs"]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", grid_space["per_device_train_batch_size"]),
        "learning_rate": trial.suggest_float("learning_rate", 2e-5, 5e-5, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0.0, 0.1),
    }

In this block, we define the hyperparameter search space for both Grid Search and Random Search.

The variable grid_space lists the possible values for each hyperparameter we want to test:

* num_train_epochs: the number of times the model will go through the full training data (here fixed at 3).
* per_device_train_batch_size: the batch sizes we want to try (16 or 32).
* learning_rate: the learning rates to test (2e-5, 3e-5, 5e-5).
* weight_decay: regularization values to test (0.0, 0.01, 0.05, 0.1).

The function hp_space_grid(trial) is used for Grid Search. It tells Optuna to try all possible combinations of these hyperparameters, one by one, so we can systematically see which combination gives the best performance.

The function hp_space_random(trial) is used for Random Search. Instead of testing all combinations, it randomly selects values for each hyperparameter from the specified ranges. For example, the learning rate is sampled as a float number between 2e-5 and 5e-5, and weight decay is sampled between 0.0 and 0.1. Random Search is usually faster because it doesn’t test every combination.





---



**CREATE TRAINER**

In [17]:
trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

  trainer = Trainer(
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In this part of the code, we create the Trainer, which is the main tool that handles training and evaluating our model. We tell it to use a fresh DistilBERT model for each trial by passing model_init=model_init, so every hyperparameter combination starts from the same base. The training settings we defined earlier, like the number of epochs, evaluation strategy, and which metric to use, are provided through args=training_args. The tokenized datasets for training and evaluation are given to train_dataset and eval_dataset, and we pass the tokenizer so the model processes the text correctly. Finally, compute_metrics=compute_metrics tells the Trainer how to calculate performance measures like accuracy and F1-score. Together, this setup allows the Trainer to train the model, evaluate it, and record the results for each hyperparameter trial.



---



**LOG ALL TRIALS**

In [18]:
trial_results = []

def logging_callback(trial):
    # After each trial, Hugging Face automatically evaluates the model
    result = trainer.evaluate()
    trial_results.append(result)

In this part, we create a way to record the results of each trial during hyperparameter search. We start by making an empty list called trial_results that will store the evaluation metrics. Then, we define a function called logging_callback that runs after each trial. Inside this function, the Trainer automatically evaluates the model on the validation set, and the results — like loss, accuracy, F1-score, runtime, and speed — are added to the trial_results list. This allows us to keep track of all the trial performances so we can later analyze and compare the different hyperparameter combinations.



---



**GRID SEARCH**

In [19]:
print("--- Starting Grid Search ---")
best_grid = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space_grid,
    n_trials=len(grid_space["num_train_epochs"])*
             len(grid_space["per_device_train_batch_size"])*
             len(grid_space["learning_rate"])*
             len(grid_space["weight_decay"]),
    sampler=optuna.samplers.GridSampler(grid_space)
)

print("\nBest Grid Search Trial:")
print(best_grid.hyperparameters)
print(f"F1 Score: {best_grid.objective:.4f}")

[I 2025-11-08 09:56:38,195] A new study created in memory with name: no-name-df5e7f5f-9636-4a5f-b1c1-509a2143b87b
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


--- Starting Grid Search ---


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.220644,0.892758,0.936311
2,No log,0.15177,0.927577,0.954225
3,0.185000,0.172474,0.931755,0.95698


[I 2025-11-08 09:59:52,948] Trial 0 finished with value: 1.888734681499923 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 3e-05, 'weight_decay': 0.1}. Best is trial 0 with value: 1.888734681499923.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.220332,0.892758,0.936311
2,No log,0.151897,0.927577,0.954145
3,0.184900,0.17416,0.931755,0.95698


[I 2025-11-08 10:03:51,924] Trial 1 finished with value: 1.888734681499923 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 3e-05, 'weight_decay': 0.05}. Best is trial 0 with value: 1.888734681499923.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.169011,0.922006,0.950877
2,No log,0.147254,0.931755,0.957055
3,No log,0.183161,0.928969,0.956072


[I 2025-11-08 10:07:51,889] Trial 2 finished with value: 1.885041710752665 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 32, 'learning_rate': 5e-05, 'weight_decay': 0.05}. Best is trial 0 with value: 1.888734681499923.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.179783,0.915042,0.946161
2,No log,0.169789,0.930362,0.956672
3,No log,0.166697,0.928969,0.955536


[I 2025-11-08 10:11:20,987] Trial 3 finished with value: 1.8845055406741094 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 32, 'learning_rate': 3e-05, 'weight_decay': 0.05}. Best is trial 0 with value: 1.888734681499923.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.173404,0.927577,0.954386
2,No log,0.151853,0.928969,0.955459
3,No log,0.189526,0.926184,0.954271


[I 2025-11-08 10:14:56,177] Trial 4 finished with value: 1.8804547672208056 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 32, 'learning_rate': 5e-05, 'weight_decay': 0.0}. Best is trial 0 with value: 1.888734681499923.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.256622,0.877437,0.927869


[I 2025-11-08 10:15:32,875] Trial 5 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.17943,0.916435,0.94709


[I 2025-11-08 10:16:06,846] Trial 6 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.169936,0.922006,0.950877
2,No log,0.148411,0.928969,0.955224


[I 2025-11-08 10:17:41,063] Trial 7 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.184099,0.895543,0.937343


[I 2025-11-08 10:18:17,205] Trial 8 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.219458,0.891365,0.935537


[I 2025-11-08 10:18:52,975] Trial 9 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.244927,0.891365,0.934454


[I 2025-11-08 10:19:26,959] Trial 10 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.17209,0.913649,0.947547


[I 2025-11-08 10:20:02,892] Trial 11 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.24545,0.891365,0.934564


[I 2025-11-08 10:20:36,807] Trial 12 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.259104,0.877437,0.927987


[I 2025-11-08 10:21:12,960] Trial 13 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.245669,0.891365,0.934454


[I 2025-11-08 10:21:46,838] Trial 14 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.189844,0.89415,0.936772


[I 2025-11-08 10:22:22,853] Trial 15 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.219764,0.892758,0.936311


[I 2025-11-08 10:22:58,900] Trial 16 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.245335,0.892758,0.935348


[I 2025-11-08 10:23:32,676] Trial 17 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.203568,0.873259,0.925349


[I 2025-11-08 10:24:08,738] Trial 18 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.257227,0.876045,0.927109


[I 2025-11-08 10:24:44,704] Trial 19 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.259027,0.876045,0.927109


[I 2025-11-08 10:25:20,743] Trial 20 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.173048,0.927577,0.954306
2,No log,0.150956,0.927577,0.954545


[I 2025-11-08 10:26:39,141] Trial 21 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.179704,0.915042,0.946161


[I 2025-11-08 10:27:13,254] Trial 22 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.179504,0.916435,0.94709


[I 2025-11-08 10:27:46,831] Trial 23 pruned. 



Best Grid Search Trial:
{'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 3e-05, 'weight_decay': 0.1}
F1 Score: 1.8887


In this part, we perform the Grid Search to find the best combination of hyperparameters. We start by printing a message to show that the Grid Search is beginning. Then, we call trainer.hyperparameter_search() and tell it to maximize the F1-score, using Optuna as the backend. We provide the function hp_space_grid to define which hyperparameter values to test, and n_trials is set to the total number of all possible combinations of the hyperparameters. We also use GridSampler to make sure every combination is tested systematically. After the search finishes, we print out the hyperparameters of the best trial and its corresponding F1-score. This lets us know which combination of settings performed the best on our evaluation data.



---



**RANDOM SEARCH**

In [21]:
trial_results_random = []

trainer_random = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

print("\n--- Starting Random Search ---")
best_random = trainer_random.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space_random,
    n_trials=20
)

print("\nBest Random Search Trial:")
print(best_random.hyperparameters)
print(f"F1 Score: {best_random.objective:.4f}")

  trainer_random = Trainer(
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[I 2025-11-08 10:50:44,953] A new study created in memory with name: no-name-8815c41b-3d25-4ac5-b823-8a0981a99fca
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



--- Starting Random Search ---


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.171165,0.924791,0.95288
2,No log,0.146467,0.928969,0.955536
3,No log,0.182124,0.924791,0.953448


[I 2025-11-08 10:52:51,306] Trial 0 finished with value: 1.8782393622130438 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 32, 'learning_rate': 4.5433573501200675e-05, 'weight_decay': 0.08840262589694615}. Best is trial 0 with value: 1.8782393622130438.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.254052,0.876045,0.927109
2,No log,0.16018,0.927577,0.954386
3,0.195800,0.173729,0.928969,0.955536


[I 2025-11-08 10:55:28,035] Trial 1 finished with value: 1.8845055406741094 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 2.5953470783493583e-05, 'weight_decay': 0.0154682134658093}. Best is trial 1 with value: 1.8845055406741094.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.246204,0.87883,0.92863
2,No log,0.155416,0.927577,0.954145
3,0.202900,0.162812,0.930362,0.95637


[I 2025-11-08 10:57:59,220] Trial 2 finished with value: 1.8867320995396364 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 2.1556486865179342e-05, 'weight_decay': 0.08079542877716589}. Best is trial 2 with value: 1.8867320995396364.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.202115,0.906685,0.943839
2,No log,0.156515,0.931755,0.956752
3,0.178900,0.187323,0.930362,0.95614


[I 2025-11-08 11:00:16,766] Trial 3 finished with value: 1.8865024678688365 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 3.326759504571375e-05, 'weight_decay': 0.05474361764114044}. Best is trial 2 with value: 1.8867320995396364.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.199021,0.898329,0.939116
2,No log,0.15159,0.931755,0.956828
3,0.177400,0.179581,0.933148,0.957968


[I 2025-11-08 11:02:37,576] Trial 4 finished with value: 1.8911161086692458 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 4.141297438099738e-05, 'weight_decay': 0.04981690004987138}. Best is trial 4 with value: 1.8911161086692458.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.208367,0.910864,0.945205


[I 2025-11-08 11:03:09,892] Trial 5 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.215001,0.89415,0.936982


[I 2025-11-08 11:03:44,921] Trial 6 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.254617,0.876045,0.927109


[I 2025-11-08 11:04:20,454] Trial 7 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.178202,0.927577,0.954861
2,No log,0.159235,0.93454,0.959378
3,No log,0.168238,0.938719,0.961739


[I 2025-11-08 11:06:38,561] Trial 8 finished with value: 1.900457793387429 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 32, 'learning_rate': 3.866291655245945e-05, 'weight_decay': 0.08237160058049127}. Best is trial 8 with value: 1.900457793387429.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.248378,0.881616,0.930271


[I 2025-11-08 11:07:13,510] Trial 9 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.178687,0.920613,0.950904


[I 2025-11-08 11:07:46,482] Trial 10 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.168937,0.922006,0.950442


[I 2025-11-08 11:08:19,797] Trial 11 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.180766,0.917827,0.949269


[I 2025-11-08 11:08:53,457] Trial 12 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.183943,0.920613,0.950989


[I 2025-11-08 11:09:27,194] Trial 13 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.205523,0.905292,0.942953
2,No log,0.150012,0.933148,0.957522
3,0.179300,0.179287,0.933148,0.957821


[I 2025-11-08 11:12:05,475] Trial 14 finished with value: 1.8909683704490603 and parameters: {'num_train_epochs': 3, 'per_device_train_batch_size': 16, 'learning_rate': 3.34642873631847e-05, 'weight_decay': 0.06468499081801753}. Best is trial 8 with value: 1.900457793387429.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.175288,0.924791,0.952962


[I 2025-11-08 11:12:37,622] Trial 15 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.179586,0.923398,0.952463


[I 2025-11-08 11:13:10,407] Trial 16 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.217418,0.896936,0.938538


[I 2025-11-08 11:13:45,703] Trial 17 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.174222,0.926184,0.953793


[I 2025-11-08 11:14:18,781] Trial 18 pruned. 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.202927,0.885794,0.93178


[I 2025-11-08 11:14:54,210] Trial 19 pruned. 



Best Random Search Trial:
{'num_train_epochs': 3, 'per_device_train_batch_size': 32, 'learning_rate': 3.866291655245945e-05, 'weight_decay': 0.08237160058049127}
F1 Score: 1.9005


In this part, we perform the Random Search to explore hyperparameters in a more flexible way. We first create an empty list trial_results_random to store the results of each trial. Then, we set up a new Trainer, similar to before, which uses a fresh model, the training and evaluation datasets, the tokenizer, and the metric function. We print a message to indicate that the Random Search is starting. Using trainer_random.hyperparameter_search(), we tell the Trainer to maximize the F1-score, and we provide the hp_space_random function so that hyperparameter values are chosen randomly within the specified ranges. We run 20 trials to test different random combinations. After the search finishes, we print out the hyperparameters of the best trial and its F1-score, showing which random combination worked best on the validation set.



---

