<a href="https://colab.research.google.com/github/Danjari/ML/blob/main/Model_Architecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install torch torchvision pandas kagglehub




In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("alistairking/recyclable-and-household-waste-classification")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/alistairking/recyclable-and-household-waste-classification?dataset_version_number=1...


100%|██████████| 920M/920M [00:09<00:00, 102MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/alistairking/recyclable-and-household-waste-classification/versions/1


In [None]:
import os

DATA_ROOT = "/root/.cache/kagglehub/datasets/alistairking/recyclable-and-household-waste-classification/versions/1/images/images"

os.listdir(DATA_ROOT)[:10]

['plastic_trash_bags',
 'clothing',
 'steel_food_cans',
 'paper_cups',
 'office_paper',
 'styrofoam_cups',
 'plastic_detergent_bottles',
 'aluminum_food_cans',
 'aerosol_cans',
 'shoes']

Here we define class to material mapping and then extract images from the dataset into a dataframe that will be used for future training. We also define labels "fine_label", "coarse_label", "domain"

In [None]:
material_map = {
    "aerosol_cans": "metal",
    "aluminum_food_cans": "metal",
    "aluminum_soda_cans": "metal",
    "steel_food_cans": "metal",

    "glass_beverage_bottles": "glass",
    "glass_cosmetic_containers": "glass",
    "glass_food_jars": "glass",

    "plastic_soda_bottles": "plastic",
    "plastic_water_bottles": "plastic",
    "plastic_shopping_bags": "plastic",
    "plastic_food_containers": "plastic",
    "plastic_trash_bags": "plastic",
    "plastic_cup_lids": "plastic",
    "plastic_detergent_bottles": "plastic",
    "plastic_straws": "plastic",
    "disposable_plastic_cutlery": "plastic",

    "cardboard_boxes": "paper",
    "cardboard_packaging": "paper",
    "magazines": "paper",
    "newspaper": "paper",
    "office_paper": "paper",
    "paper_cups": "paper",

    "food_waste": "organic",
    "coffee_grounds": "organic",
    "eggshells": "organic",
    "tea_bags": "organic",

    "clothing": "other",
    "shoes": "other",
    "styrofoam_cups": "styrofoam",
    "styrofoam_food_containers": "styrofoam",
}


In [None]:
import os
import pandas as pd

rows = []

for fine_label in os.listdir(DATA_ROOT):
    fine_path = os.path.join(DATA_ROOT, fine_label)
    if not os.path.isdir(fine_path):
        continue

    coarse_label = material_map[fine_label]

    for domain in ["default", "real_world"]:
        domain_path = os.path.join(fine_path, domain)
        if not os.path.isdir(domain_path):
            continue

        for fname in os.listdir(domain_path):
            if fname.lower().endswith(("jpg", "jpeg", "png")):
                rows.append({
                    "file": os.path.join(fine_label, domain, fname),
                    "fine_label": fine_label,
                    "coarse_label": coarse_label,
                    "domain": domain
                })

df = pd.DataFrame(rows)
df.to_csv("labels.csv", index=False)
print("Created labels.csv with", len(df), "samples")
df.head()


Created labels.csv with 15000 samples


Unnamed: 0,file,fine_label,coarse_label,domain
0,plastic_trash_bags/default/Image_219.png,plastic_trash_bags,plastic,default
1,plastic_trash_bags/default/Image_220.png,plastic_trash_bags,plastic,default
2,plastic_trash_bags/default/Image_152.png,plastic_trash_bags,plastic,default
3,plastic_trash_bags/default/Image_21.png,plastic_trash_bags,plastic,default
4,plastic_trash_bags/default/Image_146.png,plastic_trash_bags,plastic,default


Label encoders and data class:
label encoders are used to convert label string like "plastic_straws" into a numerical value

data class helps to resolve path to image and load it efficiently

In [None]:
fine_classes = sorted(df["fine_label"].unique())
coarse_classes = sorted(df["coarse_label"].unique())

fine_to_idx = {c: i for i, c in enumerate(fine_classes)}
coarse_to_idx = {c: i for i, c in enumerate(coarse_classes)}

print("Fine classes:", len(fine_classes))
print("Coarse classes:", len(coarse_classes))


Fine classes: 30
Coarse classes: 7


In [None]:
import torch
from torch.utils.data import Dataset
from PIL import Image

class WasteDataset(Dataset):
    def __init__(self, csv_file, img_root, fine_to_idx, coarse_to_idx, transform=None):
        self.data = pd.read_csv(csv_file)
        self.img_root = img_root
        self.fine_to_idx = fine_to_idx
        self.coarse_to_idx = coarse_to_idx
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        row = self.data.iloc[idx]

        img_path = os.path.join(self.img_root, row["file"])
        image = Image.open(img_path).convert("RGB")

        if self.transform:
            image = self.transform(image)

        fine = self.fine_to_idx[row["fine_label"]]
        coarse = self.coarse_to_idx[row["coarse_label"]]

        return image, fine, coarse


Transforms & DataLoaders:

parallelize loading of images and resize them to 224x224 to support pretrained weights of the pytorch ResNet

In [None]:
from torchvision import transforms
from torch.utils.data import DataLoader, random_split

transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(),
    transforms.ToTensor(),
])

dataset = WasteDataset("labels.csv", DATA_ROOT, fine_to_idx, coarse_to_idx, transform)

train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size

train_ds, val_ds = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_ds, batch_size=32, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=32)


The CNN (ResNet) itself

In [None]:
import torch.nn as nn
import torchvision.models as models

class MultiTaskModel(nn.Module):
    def __init__(self, num_fine, num_coarse):
        super().__init__()

        self.backbone = models.resnet18(weights="IMAGENET1K_V1")
        in_features = self.backbone.fc.in_features
        self.backbone.fc = nn.Identity()  # remove last layer

        self.fine_head = nn.Linear(in_features, num_fine)
        self.coarse_head = nn.Linear(in_features, num_coarse)

    def forward(self, x):
        feats = self.backbone(x)
        fine_logits = self.fine_head(feats)
        coarse_logits = self.coarse_head(feats)
        return fine_logits, coarse_logits


Training loop:

In [None]:
import torch.optim as optim
import torch.nn.functional as F
from tqdm.notebook import tqdm # Import tqdm for progress bar

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

model = MultiTaskModel(len(fine_classes), len(coarse_classes)).to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-4)

def train_epoch():
    model.train()
    total_loss = 0

    for x, fine, coarse in tqdm(train_loader, desc="Training"): # Add tqdm to train_loader
        x = x.to(device)
        fine = fine.to(device)
        coarse = coarse.to(device)

        optimizer.zero_grad()
        fine_logits, coarse_logits = model(x)

        loss_fine = F.cross_entropy(fine_logits, fine)
        loss_coarse = F.cross_entropy(coarse_logits, coarse)

        loss = loss_fine + loss_coarse
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    return total_loss / len(train_loader)

def validate():
    model.eval()
    correct_fine = 0
    correct_coarse = 0
    total = 0

    with torch.no_grad():
        for x, fine, coarse in tqdm(val_loader, desc="Validation"): # Add tqdm to val_loader
            x = x.to(device)
            fine = fine.to(device)
            coarse = coarse.to(device)

            fine_logits, coarse_logits = model(x)

            fine_pred = fine_logits.argmax(dim=1)
            coarse_pred = coarse_logits.argmax(dim=1)

            correct_fine += (fine_pred == fine).sum().item()
            correct_coarse += (coarse_pred == coarse).sum().item()
            total += fine.size(0)

    return correct_fine / total, correct_coarse / total

Using device: cuda
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


100%|██████████| 44.7M/44.7M [00:00<00:00, 192MB/s]


Train the model:

In [None]:
for epoch in range(10):
    loss = train_epoch()
    acc_fine, acc_coarse = validate()
    print(f"Epoch {epoch}: loss={loss:.4f} fine_acc={acc_fine:.3f} coarse_acc={acc_coarse:.3f}")


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 0: loss=1.7245 fine_acc=0.813 coarse_acc=0.907


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 1: loss=0.6818 fine_acc=0.846 coarse_acc=0.923


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 2: loss=0.4108 fine_acc=0.860 coarse_acc=0.920


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 3: loss=0.2854 fine_acc=0.854 coarse_acc=0.925


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 4: loss=0.2210 fine_acc=0.866 coarse_acc=0.931


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 5: loss=0.1901 fine_acc=0.863 coarse_acc=0.934


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 6: loss=0.1484 fine_acc=0.865 coarse_acc=0.933


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 7: loss=0.1328 fine_acc=0.872 coarse_acc=0.941


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 8: loss=0.1582 fine_acc=0.859 coarse_acc=0.927


Training:   0%|          | 0/375 [00:00<?, ?it/s]

Validation:   0%|          | 0/94 [00:00<?, ?it/s]

Epoch 9: loss=0.1577 fine_acc=0.857 coarse_acc=0.925
