**SETUP**

In [1]:
!pip install -q transformers datasets torch scikit-learn optuna evaluate

import pandas as pd
import torch, numpy as np, random
from sklearn.model_selection import train_test_split
from transformers import DistilBertTokenizerFast, AutoModelForSequenceClassification, Trainer, TrainingArguments
import optuna
import evaluate
from datasets import Dataset

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/404.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m399.4/404.7 kB[0m [31m89.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m404.7/404.7 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In this part of the code, we are setting up the tools and libraries we need for our experiment. First, we install the necessary Python packages like Transformers for working with the DistilBERT model, Datasets for handling the dataset, Torch for running the deep learning model on CPU or GPU, scikit-learn for data splitting, Optuna for automated hyperparameter search, and Evaluate for measuring the model’s performance.

Next, we import these packages into our notebook. Pandas is used to load and handle our CSV dataset, NumPy and random help with calculations and ensuring reproducibility, and train_test_split from scikit-learn is used to divide the data into training and evaluation sets.

Finally, we import DistilBertTokenizerFast to convert text into tokens that the model can understand, AutoModelForSequenceClassification to load the pre-trained DistilBERT model for classification tasks, Trainer and TrainingArguments to handle model training and evaluation, and Dataset from Hugging Face to manage our data in a format suitable for the model.



---



In [2]:
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)

Device: cuda


Here, we set a seed value to make sure that our results are reproducible, meaning we get the same results every time we run the code. We set this seed for Python’s random, NumPy, and PyTorch. Then, we check if a GPU is available and set it as the device for training. Using a GPU makes training much faster, but if it’s not available, the code will use the CPU instead.



---



**LOAD DATASET**

In [3]:
from google.colab import files
uploaded = files.upload()

Saving amazon_eco-friendly_products.csv to amazon_eco-friendly_products.csv


We load our dataset using Pandas. Since the dataset has different text columns (title, material, description), we combine them into one column called text so the model can read it as a single input. Then, we create a simple label: if the text contains any words like “eco”, “sustain”, or “organic”, it is labeled as 1 (eco-friendly), otherwise 0. Finally, we split the dataset into training and evaluation sets, keeping 80% for training and 20% for evaluation, while making sure the label distribution stays balanced.



---



In [4]:
df = pd.read_csv("amazon_eco-friendly_products.csv")
df["text"] = df["title"].astype(str) + " " + df["material"].astype(str) + " " + df["description"].astype(str)
eco_terms = ["eco", "recycl", "sustain", "biodegrad", "organic", "green"]
df["label"] = df["text"].str.contains("|".join(eco_terms), case=False, regex=True).astype(int)
train_texts, eval_texts, train_labels, eval_labels = train_test_split(df["text"], df["label"], test_size=0.2, stratify=df["label"], random_state=SEED)

We load our dataset using Pandas. Since the dataset has different text columns (title, material, description), we combine them into one column called text so the model can read it as a single input. Then, we create a simple label: if the text contains any words like “eco”, “sustain”, or “organic”, it is labeled as 1 (eco-friendly), otherwise 0. Finally, we split the dataset into training and evaluation sets, keeping 80% for training and 20% for evaluation, while making sure the label distribution stays balanced.



---



**TOKENIZATION**

In [5]:
MODEL_NAME = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(MODEL_NAME)

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding=True, max_length=128)

train_ds = Dataset.from_pandas(pd.DataFrame({"text": train_texts, "labels": train_labels}))
eval_ds  = Dataset.from_pandas(pd.DataFrame({"text": eval_texts, "labels": eval_labels}))
tokenized_train = train_ds.map(tokenize, batched=True)
tokenized_eval  = eval_ds.map(tokenize, batched=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Map:   0%|          | 0/2869 [00:00<?, ? examples/s]

Map:   0%|          | 0/718 [00:00<?, ? examples/s]

Since BERT models can’t read plain text directly, we use a tokenizer to convert the text into numbers that the model can understand. Here, we use the pre-trained DistilBERT tokenizer. We then create datasets in the Hugging Face format, which the model expects, and apply the tokenizer to every piece of text. We also make sure that all sequences are padded to the same length and cut off if they are too long.



---



**METRICS**

In [6]:
f1_metric = evaluate.load("f1")
accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    f1 = f1_metric.compute(predictions=preds, references=labels, average="binary")["f1"]
    acc = accuracy_metric.compute(predictions=preds, references=labels)["accuracy"]
    return {"accuracy": acc, "f1": f1}

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

We define the metrics to measure how well the model is doing. We use accuracy to see how many predictions are correct, and F1-score to balance precision and recall, which is useful when labels are not perfectly balanced. The compute_metrics function takes the model’s predictions and calculates these values for evaluation.



---



**MODEL INITIALIZER**

In [7]:
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2)

Here, we create a function that loads a fresh model for every trial in the hyperparameter search. We use DistilBERT pre-trained for sequence classification and tell it we have 2 labels (eco-friendly or not). Using a fresh model ensures that each trial starts from the same base without inheriting weights from previous runs.



---



**TRAINING ARGUMENTS**

In [8]:
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,  # placeholder, will be overridden
    per_device_eval_batch_size=16,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    report_to="none",
)

In this part, we set up the training configuration for the model using TrainingArguments. We specify where to save the results with output_dir, and set a default number of training epochs to 3 (this is just a placeholder because the hyperparameter search will override it later).

We also define how the model will be evaluated: per_device_eval_batch_size=16 sets the batch size for evaluation, and eval_strategy="epoch" means the model will be evaluated at the end of each training epoch. save_strategy="epoch" tells the Trainer to save the model at the end of each epoch.

The argument load_best_model_at_end=True ensures that after training, the Trainer keeps the best model based on the chosen metric, which in this case is F1-score (metric_for_best_model="f1"). Finally, report_to="none" disables logging to external tools like Weights & Biases, keeping the training output simple.



---



**HYPERPARAMETER SPACE**

In [9]:
grid_space = {
    "num_train_epochs": [3],
    "per_device_train_batch_size": [16,32],
    "learning_rate": [2e-5, 3e-5, 5e-5],
    "weight_decay": [0.0, 0.01, 0.05, 0.1]
}

def hp_space_grid(trial):
    return {
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", grid_space["num_train_epochs"]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", grid_space["per_device_train_batch_size"]),
        "learning_rate": trial.suggest_categorical("learning_rate", grid_space["learning_rate"]),
        "weight_decay": trial.suggest_categorical("weight_decay", grid_space["weight_decay"]),
    }

def hp_space_random(trial):
    return {
        "num_train_epochs": trial.suggest_categorical("num_train_epochs", grid_space["num_train_epochs"]),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", grid_space["per_device_train_batch_size"]),
        "learning_rate": trial.suggest_float("learning_rate", 2e-5, 5e-5, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0.0, 0.1),
    }

In this block, we define the hyperparameter search space for both Grid Search and Random Search.

The variable grid_space lists the possible values for each hyperparameter we want to test:

* num_train_epochs: the number of times the model will go through the full training data (here fixed at 3).
* per_device_train_batch_size: the batch sizes we want to try (16 or 32).
* learning_rate: the learning rates to test (2e-5, 3e-5, 5e-5).
* weight_decay: regularization values to test (0.0, 0.01, 0.05, 0.1).

The function hp_space_grid(trial) is used for Grid Search. It tells Optuna to try all possible combinations of these hyperparameters, one by one, so we can systematically see which combination gives the best performance.

The function hp_space_random(trial) is used for Random Search. Instead of testing all combinations, it randomly selects values for each hyperparameter from the specified ranges. For example, the learning rate is sampled as a float number between 2e-5 and 5e-5, and weight decay is sampled between 0.0 and 0.1. Random Search is usually faster because it doesn’t test every combination.





---



**CREATE TRAINER**

In [10]:
trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

  trainer = Trainer(


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In this part of the code, we create the Trainer, which is the main tool that handles training and evaluating our model. We tell it to use a fresh DistilBERT model for each trial by passing model_init=model_init, so every hyperparameter combination starts from the same base. The training settings we defined earlier, like the number of epochs, evaluation strategy, and which metric to use, are provided through args=training_args. The tokenized datasets for training and evaluation are given to train_dataset and eval_dataset, and we pass the tokenizer so the model processes the text correctly. Finally, compute_metrics=compute_metrics tells the Trainer how to calculate performance measures like accuracy and F1-score. Together, this setup allows the Trainer to train the model, evaluate it, and record the results for each hyperparameter trial.



---



**LOG ALL TRIALS**

In [11]:
trial_results = []

def logging_callback(trial):
    # After each trial, Hugging Face automatically evaluates the model
    result = trainer.evaluate()
    trial_results.append(result)

In this part, we create a way to record the results of each trial during hyperparameter search. We start by making an empty list called trial_results that will store the evaluation metrics. Then, we define a function called logging_callback that runs after each trial. Inside this function, the Trainer automatically evaluates the model on the validation set, and the results — like loss, accuracy, F1-score, runtime, and speed — are added to the trial_results list. This allows us to keep track of all the trial performances so we can later analyze and compare the different hyperparameter combinations.



---



In [12]:
from transformers import AutoModelForSequenceClassification

MODELNAME = "distilbert-base-uncased"

model = AutoModelForSequenceClassification.from_pretrained(MODELNAME, num_labels=2)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


This code is basically loading a pre-trained model from Hugging Face and preparing it for a text classification project. First, it imports the function that lets us create a model meant for classifying sequences, like sentences or paragraphs. Then, we set MODELNAME to “distilbert-base-uncased,” which is the model we chose because it’s lightweight, fast, and commonly used in NLP tasks. After that, we create the actual model using AutoModelForSequenceClassification, and we set num_labels=2 because our project only needs two output classes. In simple terms, this code takes a ready-made DistilBERT model and configures it so it can classify text into two categories, which is useful for binary classification tasks in our system.



---



In [13]:
from transformers import TrainingArguments, Trainer
from google.colab import files

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=6,
    per_device_train_batch_size=32,
    learning_rate=3e-5,
    weight_decay=0.0,
    eval_strategy="epoch",
    save_strategy="epoch",
    logging_dir='./logs',
    load_best_model_at_end=True,

    push_to_hub=False,
    report_to=[],
    hub_model_id=None,
    hub_token=None
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

# Train the model
trainer.train()

# Save locally only
trainer.save_model("./results")

# Zip and download
import shutil
shutil.make_archive("distilbert_finetuned_model", 'zip', "./results")
files.download("distilbert_finetuned_model.zip")


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.194806,0.916435,0.947183
2,No log,0.153916,0.926184,0.953953
3,No log,0.164226,0.930362,0.955357
4,No log,0.187607,0.93454,0.958517
5,No log,0.224945,0.927577,0.954939
6,0.123200,0.22686,0.933148,0.958042


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

This part of the code is responsible for training our machine learning model and saving it locally without needing any API key or online connection to the Hugging Face Hub. First, we import the tools needed for training, such as TrainingArguments and Trainer, which help us control how the training process works. We also import files so we can download the final output. Next, we set up the training arguments, which are basically the settings for how the model will be trained. We specify things like where the results will be saved, how many epochs the model should train for, the batch size, and the learning rate. We also tell it to evaluate and save the model every epoch, and to load the best-performing version when training finishes. We also turn off all Hugging Face Hub features by disabling things like push_to_hub so the code won’t ask for API keys.

After that, we create the Trainer, which connects our model, the training settings, our training dataset, evaluation dataset, the tokenizer, and the function that calculates accuracy and other metrics. Once everything is set up, we call trainer.train() to start the actual training process. When the training is done, we save the model locally in the results folder. Finally, we create a ZIP file of the trained model using shutil.make_archive, and files.download allows us to download the ZIP file to our computer. Overall, this code handles the entire training process and prepares the model for export, making it easy for us to use the fine-tuned model in our project.



---

