# Fashion Image Classification Using CNN (EfficientNet-B0)

In this notebook, we build a deep learning model to classify fashion images into five categories: casual, formal, streetwear, sporty, and vintage. The dataset consists of images scraped from Reddit and manually labeled using Label Studio.

The workflow includes the following steps:

📦 Installation

Installs required libraries: wandb for experiment tracking and optuna for hyperparameter tuning.


📁 Load and Preprocess Dataset

Reads a CSV file containing image filenames and style labels.

Cleans data by removing missing labels.

Encodes textual labels into numeric format using LabelEncoder.

🧺 Define Custom PyTorch Dataset

Creates a FashionDataset class to load image-label pairs with optional image transformations (resizing and normalization).


🔁 Apply Stratified K-Fold Cross-Validation

Uses StratifiedKFold to split the dataset into 3 balanced folds.

🧪 Hyperparameter Tuning with Optuna

Defines an objective function for Optuna that:

Suggests values for batch_size and learning_rate.

Initializes a wandb run to track experiment metrics.

Trains and evaluates the model on each fold.

Returns averaged F1-score across folds for optimization.

🧠 Fine-Tune EfficientNet

Loads a pre-trained EfficientNet model from torchvision.models.

Replaces the classification head to match the number of classes in the fashion dataset.

Optimizes only the classifier head while retaining pretrained feature extractor layers.

📊 Track Experiments with Weights & Biases

Logs hyperparameters and performance metrics for each Optuna trial using wandb, enabling easy comparison and experiment tracking.

## Installing and Loading the Libraries
This cell installs and imports all required libraries for the CNN workflow. The environment is set up for reproducible, modular deep learning experiments.

In [1]:
pip install wandb



In [2]:
pip install optuna

Collecting optuna
  Downloading optuna-4.4.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.16.4-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.4.0-py3-none-any.whl (395 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m395.9/395.9 kB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.16.4-py3-none-any.whl (247 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.0/247.0 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, alembic, optuna
Successfully installed alembic-1.16.4 colorlog-6.9.0 optuna-4.4.0


In [11]:
import os
import pandas as pd
import numpy as np
from PIL import Image
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.metrics import accuracy_score, f1_score
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
import torchvision.models as models

import optuna
import wandb

## Data Loading, Cleaning, and Label Encoding

This cell loads the CSV file containing image paths and style labels, drops any rows with missing labels, and extracts the cleaned image paths and labels. Labels are encoded as integers using LabelEncoder for compatibility with PyTorch and scikit-learn workflows.

- Purpose: Ensures only valid, labeled images are used for training and evaluation.
- Result: Cleaned lists of image paths and encoded labels ready for dataset creation.

In [4]:
label_file = pd.read_csv('Dataset/labels_file.csv')
label_file_clean = label_file.dropna(subset=['choice'])

image_paths = label_file_clean['image'].apply(os.path.basename).tolist()
labels = label_file_clean['choice'].tolist()

label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(labels)

image_dir = 'Dataset/images/'  # update if different


## FashionDataset Class – Custom PyTorch Dataset
This class is a custom implementation of a PyTorch Dataset designed to load and preprocess fashion images and their associated style labels.

In [5]:
# ------------------ Dataset Class ------------------ #
class FashionDataset(Dataset):
    def __init__(self, image_paths, labels, transform=None):
        self.image_paths = image_paths
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        image_path = os.path.join(image_dir, self.image_paths[idx])
        image = Image.open(image_path).convert('RGB')
        label = self.labels[idx]

        if self.transform:
            image = self.transform(image)
        return image, label

## Training and Evaluation Loop
This function performs a single training epoch on the provided dataset and evaluates the model on the validation set. It calculates and returns two key performance metrics: accuracy and weighted F1-score.

- Training Phase:
Optimizes the model using the specified loss function and optimizer.

- Evaluation Phase:
Runs inference on the validation set without updating weights and computes predictions.

- Output:
Returns a tuple: (accuracy, f1_score) representing model performance on the validation set.

In [6]:
# ------------------ Training and Evaluation ------------------ #
def train_model(model, criterion, optimizer, train_loader, val_loader, device):
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        output = model(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()

    model.eval()
    val_preds, val_labels = [], []
    with torch.no_grad():
        for images, labels in val_loader:
            images = images.to(device)
            output = model(images)
            preds = torch.argmax(output, 1).cpu().numpy()
            val_preds.extend(preds)
            val_labels.extend(labels.numpy())

    acc = accuracy_score(val_labels, val_preds)
    f1 = f1_score(val_labels, val_preds, average='weighted')
    return acc, f1

## objective – Optuna Objective Function for Hyperparameter Tuning

This function defines the optimization objective for Optuna, guiding the search for the best hyperparameters for fine-tuning an EfficientNet-based image classifier.

- Purpose:
Trains and evaluates a model across multiple Stratified K-Folds using parameters suggested by Optuna (e.g., batch_size, learning_rate).

- Key Features:

  - Performs 3-fold stratified cross-validation to ensure stable performance estimation.

  - Fine-tunes EfficientNet-B0 by replacing its classifier head to fit the number of classes.

  - Uses CrossEntropyLoss and Adam optimizer.

  - Tracks experiments and metrics via Weights & Biases (wandb).

  - Computes and returns the mean F1-score across folds as the objective metric for optimization.



In [7]:
# ------------------ Objective Function ------------------ #
def objective(trial):
    wandb.init(project='efficientnet_fashion', reinit=True)

    batch_size = trial.suggest_categorical('batch_size', [16, 32, 64])
    lr = trial.suggest_float('lr', 1e-5, 1e-3, log=True)
    epochs = 5  # small for testing

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
    ])

    all_acc, all_f1 = [], []
    for train_idx, val_idx in skf.split(image_paths, encoded_labels):
        train_dataset = FashionDataset([image_paths[i] for i in train_idx], [encoded_labels[i] for i in train_idx], transform)
        val_dataset = FashionDataset([image_paths[i] for i in val_idx], [encoded_labels[i] for i in val_idx], transform)

        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size)

        model = models.efficientnet_b0(pretrained=True)
        model.classifier[1] = nn.Linear(model.classifier[1].in_features, len(label_encoder.classes_))
        model = model.to(device)

        criterion = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)

        for _ in range(epochs):
            acc, f1 = train_model(model, criterion, optimizer, train_loader, val_loader, device)

        all_acc.append(acc)
        all_f1.append(f1)

    mean_acc = np.mean(all_acc)
    mean_f1 = np.mean(all_f1)

    wandb.log({"mean_acc": mean_acc, "mean_f1": mean_f1})
    wandb.finish()

    return mean_f1  # or mean_acc if preferred

## study.optimize – Running Hyperparameter Search with Optuna

This code initializes and runs an Optuna study to find the best hyperparameters for training the EfficientNet model.

In [8]:
# ------------------ Run Optuna ------------------ #
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10)

[I 2025-07-17 18:55:03,996] A new study created in memory with name: no-name-bed9e36b-e60d-4fdb-aaa8-6354c879f2c8


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mshadiifarzankia[0m ([33mhuman-value-detection[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 181MB/s]


0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.37801
mean_f1,0.32279


[I 2025-07-17 19:07:20,390] Trial 0 finished with value: 0.3227910530979076 and parameters: {'batch_size': 64, 'lr': 9.519291997549366e-05}. Best is trial 0 with value: 0.3227910530979076.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.30412
mean_f1,0.27231


[I 2025-07-17 19:18:42,936] Trial 1 finished with value: 0.2723067437377078 and parameters: {'batch_size': 32, 'lr': 1.8783155738797448e-05}. Best is trial 0 with value: 0.3227910530979076.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.25601
mean_f1,0.24116


[I 2025-07-17 19:30:17,966] Trial 2 finished with value: 0.2411634985483565 and parameters: {'batch_size': 32, 'lr': 1.526866638343813e-05}. Best is trial 0 with value: 0.3227910530979076.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.26976
mean_f1,0.25152


[I 2025-07-17 19:42:26,606] Trial 3 finished with value: 0.2515198531068288 and parameters: {'batch_size': 16, 'lr': 1.3150353781876527e-05}. Best is trial 0 with value: 0.3227910530979076.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.23711
mean_f1,0.21364


[I 2025-07-17 19:54:52,651] Trial 4 finished with value: 0.21364327209452857 and parameters: {'batch_size': 64, 'lr': 1.6191304882425097e-05}. Best is trial 0 with value: 0.3227910530979076.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.39175
mean_f1,0.33691


[I 2025-07-17 20:06:34,935] Trial 5 finished with value: 0.33691474202311206 and parameters: {'batch_size': 32, 'lr': 6.589963089628296e-05}. Best is trial 5 with value: 0.33691474202311206.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.41581
mean_f1,0.39628


[I 2025-07-17 20:18:48,603] Trial 6 finished with value: 0.39627961629871455 and parameters: {'batch_size': 16, 'lr': 0.000558917566745144}. Best is trial 6 with value: 0.39627961629871455.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.34192
mean_f1,0.31023


[I 2025-07-17 20:31:16,103] Trial 7 finished with value: 0.31022809277132174 and parameters: {'batch_size': 32, 'lr': 3.0665647676016096e-05}. Best is trial 6 with value: 0.39627961629871455.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.44158
mean_f1,0.40297


[I 2025-07-17 20:43:16,505] Trial 8 finished with value: 0.4029735545985324 and parameters: {'batch_size': 32, 'lr': 0.0002045611035213279}. Best is trial 8 with value: 0.4029735545985324.




0,1
mean_acc,▁
mean_f1,▁

0,1
mean_acc,0.23196
mean_f1,0.22665


[I 2025-07-17 20:55:31,406] Trial 9 finished with value: 0.22665020618385792 and parameters: {'batch_size': 64, 'lr': 1.4181985176484361e-05}. Best is trial 8 with value: 0.4029735545985324.


## Final Model Training and Evaluation on Test Set

This block uses the best hyperparameters from the Optuna study to retrain the model on the training set and evaluate its final performance on a held-out test set.

In [9]:
# ------------------ Final Model Evaluation on Test Set ------------------ #
best_params = study.best_params
print("Best Hyperparameters:", best_params)

# Optional: Train on full training set and evaluate on held-out test set
train_img, test_img, train_lbl, test_lbl = train_test_split(image_paths, encoded_labels, test_size=0.2, stratify=encoded_labels)

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

train_dataset = FashionDataset(train_img, train_lbl, transform)
test_dataset = FashionDataset(test_img, test_lbl, transform)

train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=best_params['batch_size'])

model = models.efficientnet_b0(pretrained=True)
model.classifier[1] = nn.Linear(model.classifier[1].in_features, len(label_encoder.classes_))
model = model.to(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=best_params['lr'])

for _ in range(5):
    train_model(model, criterion, optimizer, train_loader, test_loader, device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'))

# Final test evaluation
model.eval()
preds, trues = [], []
with torch.no_grad():
    for imgs, lbls in test_loader:
        imgs = imgs.to(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
        outputs = model(imgs)
        predictions = torch.argmax(outputs, 1).cpu().numpy()
        preds.extend(predictions)
        trues.extend(lbls.numpy())

test_acc = accuracy_score(trues, preds)
test_f1 = f1_score(trues, preds, average='weighted')

print(f"Test Accuracy: {test_acc:.4f}")
print(f"Test F1 Score: {test_f1:.4f}")

Best Hyperparameters: {'batch_size': 32, 'lr': 0.0002045611035213279}




Test Accuracy: 0.4444
Test F1 Score: 0.4071


In [12]:
target_names = label_encoder.classes_

print(classification_report(trues, preds, target_names=target_names))

              precision    recall  f1-score   support

      casual       0.37      0.60      0.46        30
      formal       0.62      0.42      0.50        24
      sporty       0.00      0.00      0.00        10
  streetwear       0.50      0.62      0.55        37
     vintage       0.17      0.06      0.09        16

    accuracy                           0.44       117
   macro avg       0.33      0.34      0.32       117
weighted avg       0.40      0.44      0.41       117



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [10]:
torch.save(model, "efficientnet_fashion_full_model_v1.pth")