# Food Classification with CNN - Building a Restaurant Recommendation System

This assignment focuses on developing a deep learning-based food classification system using Convolutional Neural Networks (CNNs). You will build a model that can recognize different food categories and use it to return the food preferences of a user.

## Learning Objectives
- Implement CNNs for image classification
- Work with real-world food image datasets
- Build a preference-detector system

## Background: AI-Powered Food Preference Discovery

The system's core idea is simple:

1. Users upload 10 photos of dishes they enjoy
2. Your CNN classifies these images into the 91 categories
3. Based on these categories, the system returns the user's taste profile

Your task is to develop the core computer vision component that will power this detection engine.

You are given a training ("train" folder) and a test ("test" folder) dataset which have ~45k and ~22k samples respectively. For each one of the 91 classes there is a subdirectory containing the images of the respective class.

## Assignment Requirements

### Technical Requirements
- Implement your own pytorch CNN architecture for food image classification
- Use only the provided training dataset split for training
- Train the network from scratch ; No pretrained weights can be used
- Report test-accuracy after every epoch
- Report all hyperparameters of final model
- Use a fixed seed and do not use any CUDA-features that break reproducibility
- Use Pytorch 2.6

### Deliverables
1. Jupyter Notebook with CNN implementation, training code etc.
2. README file
3. Report (max 3 pages)

Submit your report, README and all code files as a single zip file named GROUP_[number]_NC2425_PA. The names and IDs of the group components must be mentioned in the README.
Do not include the dataset in your submission.

### Grading

1. Correct CNN implementation, training runs on the uni computers (computer rooms DM.0.07, DM.0.13, DM.0.17, DM.0.21) according to the README.MD instructions without ANY exceptions: 3pt
2. Perfect 1:1 reproducibility on those machines: 1pt
3. Very clear github-repo-style README.MD with instructions for running the code: 1pt
4. Report: 1pt
5. Model test performance on test-set: interpolated from 30-80% test-accuracy: 0-3pt
6. Pick 10 random pictures of the test set to simulate a user uploading images and report which categories occur how often in these: 1pt
7. Bonus point: use an LLM (API) to generate short description / profile of preferences of the simulated user

**The main reason why we mention the machines in the uni computer rooms is to make it clear to you how we will test your submissions and to make you aware that you can use these machines for working on your assignment if you do not have access to a computer with enough gpu. You do not have to use these machines for this assignment. Using the specified pytorch version, not enabling torch.backends.cuda.matmul.allow_tf32, setting fixed seeds and making sure your code doesn't use more than 8gb VRAM should allow you to fulfill the reproducibility criteria.**

**(If there is anything unclear about this assignment please post your question in the Brightspace discussions forum or send an email)**


In [1]:
import pandas
import torch
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image, ImageOps
import os
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision.transforms import ToTensor
from torchvision import datasets
from torch.utils.data import TensorDataset, DataLoader, Dataset


In [2]:

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)

# Ensure deterministic behaviour on GPU
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.deterministic = True
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Enable autotuner for faster convolutions
torch.backends.cudnn.benchmark = True


# Loading the datasets
The dataset is already split into a train and test set in the directories "train" and "test".

In [3]:
# FoodDataset reads the folder structure and returns (tensor, label)
class FoodDataset(Dataset):
    def __init__(self, root_dir, encoder, transform=None):
        self.images, self.labels = [], []
        self.transform, self.encoder = transform, encoder

        # collect all image paths + string labels
        str_labels = []
        for folder in os.scandir(root_dir):
            if not folder.is_dir():
                continue
            for img in os.scandir(folder):
                str_labels.append(folder.name)
                self.images.append(str(img.path))

        # encode strings → ints
        self.labels = self.encoder.transform(str_labels)

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img = Image.open(self.images[idx]).convert('RGB') # use RGB
        if self.transform:
            img = self.transform(img)
        return img, self.labels[idx]

    def get_norm_stats(self, size=(150, 150)):
        """Compute per‑channel mean & std over the whole dataset."""
        s1, s2, n = np.zeros(3), np.zeros(3), 0
        for p in self.images:
            arr = np.array(Image.open(p).convert('RGB').resize(size), np.float32) / 255
            s1 += arr.sum((0, 1))
            s2 += (arr ** 2).sum((0, 1))
            n += arr.shape[0] * arr.shape[1]
        mean = s1 / n
        std = np.sqrt(s2 / n - mean ** 2)
        return mean.tolist(), std.tolist()

    def set_transform(self, tf):
        self.transform = tf

    # get list of all class names
class_names = [d.name for d in os.scandir("train") if d.is_dir()]
encoder = LabelEncoder().fit(class_names)

train_ds = FoodDataset("train", encoder)
test_ds  = FoodDataset("test",  encoder)

train_mean, train_std = train_ds.get_norm_stats()
test_mean , test_std  = test_ds.get_norm_stats()

# training augmentations
train_tf = torchvision.transforms.Compose([
    torchvision.transforms.Resize((160, 160)), # resize to 160x160
    torchvision.transforms.RandomCrop(150), # crop back to 150x150
    torchvision.transforms.RandomHorizontalFlip(), # random horizontal flip
    torchvision.transforms.RandomRotation(15),  # random rotation
    torchvision.transforms.ColorJitter(0.2, 0.2, 0.2), # brightness, contrast, saturation
    torchvision.transforms.ToTensor(),
    torchvision.transforms.RandomErasing(p=0.5), # random erasing
    torchvision.transforms.Normalize(train_mean, train_std) # normalize to mean=0, std=1
])
# test transforms
test_tf = torchvision.transforms.Compose([
    torchvision.transforms.Resize((150, 150)),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(test_mean, test_std)
])

train_ds.set_transform(train_tf)
test_ds.set_transform(test_tf)

# num_workers=0 avoids hanging in a Windows notebook (was needed in PyTorch 1.7.0)
train_loader = DataLoader(train_ds, batch_size=32, shuffle=True,  num_workers=0)
test_loader  = DataLoader(test_ds,  batch_size=32, shuffle=False, num_workers=0)



# CNN Implementation

In [4]:
class FoodCNN(nn.Module):
    """Uses 3 convolutional blocks with 2 conv layers each, followed by a linear classifier."""
    def __init__(self, num_classes):
        super().__init__()
        self.features = nn.Sequential(
            # block 1: 3→32
            nn.Conv2d(3, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(),
            nn.Conv2d(128, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(),
            nn.MaxPool2d(2),                       # 32×150×150 → 32×75×75

            # block 2: 32→64
            nn.Conv2d(128, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(),
            nn.Conv2d(256, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(),
            nn.MaxPool2d(2),                       # 64×75×75 → 64×37×37

            # block 3: 64→128
            nn.Conv2d(256, 512, 3, padding=1), nn.BatchNorm2d(512), nn.ReLU(),
            nn.Conv2d(512, 512, 3, padding=1), nn.BatchNorm2d(512), nn.ReLU(),
            nn.MaxPool2d(2)                       # 128×37×37 → 128×18×18
        )
        # flatten size computed with dummy tensor
        dummy = torch.zeros(1, 3, 150, 150)
        flat = self.features(dummy).numel()

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(flat, 512), nn.BatchNorm1d(512), nn.ReLU(), nn.Dropout(0.1),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        return self.classifier(self.features(x))



# Training the model
Implement your training process below. Report the test-accuracy after every epoch for the training run of the final model.

Hint: before training your model make sure to reset the seed in the training cell, as otherwise the seed may have changed due to previous training runs in the notebook

Note: If you implement automatic hyperparameter tuning, split the train set into train and validation subsets for the objective function.

In [5]:
# check if GPU is available and use it if possible, run on CPU otherwise

print(torch.__version__, torch.version.cuda)
print(torch.cuda.is_available())
if torch.cuda.is_available():
    print(torch.cuda.get_device_name(0))


2.7.0+cu128 12.8
True
NVIDIA GeForce RTX 4070 Laptop GPU


In [None]:
from tqdm.auto import tqdm  # nice progress bar

# reset the seed in this cell (important for reruns)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)

def accuracy(logits, labels):
    """Return percentage of correct predictions in a batch."""
    return (logits.argmax(1) == labels).float().mean().item() * 100

@torch.no_grad()            # no gradients during evaluation, saves memory
def evaluate(model, loader):
    model.eval()
    acc_sum, n = 0.0, 0
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        acc_sum += accuracy(model(x), y) * x.size(0)
        n += x.size(0)
    return acc_sum / n

def train(num_epochs, learning_rate, weight_decay, p_patience, p_threshold):
    # hyper‑parameters
    # batch_size    = 32
    # learning_rate = 0.0005 # was 0.001
    max_lr = learning_rate * 10
    # num_epochs    = 100
    num_classes   = len(class_names)

    # build model
    model = FoodCNN(num_classes).to(device)

    criterion  = nn.CrossEntropyLoss()
    optimizer  = optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    # optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=1e-4)
    scheduler  = optim.lr_scheduler.OneCycleLR(
        optimizer, max_lr=max_lr,
        total_steps=num_epochs * len(train_loader)
    )


    plateau = torch.optim.lr_scheduler.ReduceLROnPlateau(
        optimizer,
        mode="max",      # we monitor accuracy (higher is better)
        factor=0.5,      # halve the LR
        patience=p_patience,      # if no +p_threshold% acc for p_patience epochs
        threshold=p_threshold,   # minimum improvement to reset patience
        # verbose=True
    )

    best = 0.0
    ineffective_epoch_count = 0
    try:
        for epoch in range(1, num_epochs + 1):
            model.train()
            epoch_loss = 0.0

            # tqdm shows live batch progress
            for i, (x, y) in enumerate(tqdm(train_loader, desc=f"Epoch {epoch}/{num_epochs}")):
                x, y = x.to(device), y.to(device)

                optimizer.zero_grad()
                loss = criterion(model(x), y)
                loss.backward()
                optimizer.step()
                scheduler.step()

                # ── NEW: cheaper accumulation ──
                if i % 20 == 0:                # only every 20 mini‑batches
                    epoch_loss += loss.detach().item()  # .detach() avoids extra sync

            val_acc = evaluate(model, test_loader)
            print(f"Epoch {epoch:02d}/{num_epochs} | "
                  f"loss {epoch_loss/len(train_loader):.4f} | acc {val_acc:.2f}%")

            plateau.step(val_acc)   # <-- add this line

            if val_acc > best:
                torch.save(model.state_dict(), "best_model.pth")
                best = val_acc
                ineffective_epoch_count = 0
            else:
                ineffective_epoch_count += 1
            if ineffective_epoch_count >= 5:
                print(f"training ineffective, stopping at epoch {epoch} with accuracy {val_acc:.2f}%")
                return

    except KeyboardInterrupt:
        # graceful stop – save current weights so you can resume later
        torch.save(model.state_dict(), "checkpoint_interrupt.pth")
        print("Training interrupted – checkpoint saved.")

def try_params():
    num_epochs = 30
    learning_rate = [0.005, 0.001, 0.0005]
    weight_decay = [0.0005, 0.0001, 0.00005]
    p_threshold = [0.1, 0.2, 0.3]
    p_patience = [2, 3, 4]
    for i in range(1, 3):
        print(f"Trying parameters: learning rate: {learning_rate[i]}, weight decay: {weight_decay[i]}, plateau patience: {p_patience[i]}, plateau threshold: {p_threshold[i]}")
        train(num_epochs, learning_rate[i], weight_decay[i], p_patience[i], p_threshold[i])

try_params()


  from .autonotebook import tqdm as notebook_tqdm


Trying parameters: learning rate: 0.001, weight decay: 0.0001, plateau patience: 3, plateau threshold: 0.2


Epoch 1/30:   2%|▏         | 34/1429 [00:23<11:45,  1.98it/s] 

# Calculating model performance
Load the best version of your model ( which should be produced and saved by previous cells ), calculate and report the test accuracy.

In [15]:
model.load_state_dict(torch.load("best_model.pth"))
print("Final test accuracy:", evaluate(model, test_loader))

Final test accuracy: 46.290458642541424


# Summary of hyperparameters
Report the hyperparameters ( learning rate etc ) that you used in your final model for reproducibility.

# Simulation of random user
Pick 10 random pictures of the test set to simulate a user uploading images and report which categories occur how often in these: 1pt

In [44]:
# ---------------------------------------------------------------------
#  Helper: convert ONE image file to a normalised tensor (RGB, 150×150)
# ---------------------------------------------------------------------
def img_to_tensor(img_path, mean, std):
    """
    Parameters
    ----------
    img_path : str
        Full path to the image file.
    mean, std : list[float]
        Per‑channel normalisation stats (use test_mean / test_std).

    Returns
    -------
    torch.Tensor  # shape [3,150,150]
    """
    tf = torchvision.transforms.Compose([
        torchvision.transforms.Resize((150, 150)),   # match training size
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean, std)
    ])
    return tf(Image.open(img_path).convert('RGB'))


# ---------------------------------------------------------------------
#  Helper: pick N random test images ➜ predict with trained model
#  Returns two lists: true class names, predicted class names
# ---------------------------------------------------------------------
def get_random_images(folder, encoder, mean, std, device, n=10):
    """
    Parameters
    ----------
    folder   : str          # e.g. "test"
    encoder  : LabelEncoder # to decode the model's int outputs
    mean,std : list[float]  # normalisation values
    device   : torch.device
    n        : int          # how many random images

    Returns
    -------
    true_cls : list[str]    # ground‑truth folder names
    pred_cls : list[str]    # model predictions (decoded)
    """
    imgs, true_cls = [], []

    # same resize / normalise pipeline as img_to_tensor
    tf = torchvision.transforms.Compose([
        torchvision.transforms.Resize((150, 150)),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean, std)
    ])

    # --- randomly sample n images ------------------------------------
    for _ in range(n):
        # choose a random class sub‑folder
        cls_dir = np.random.choice([d for d in os.listdir(folder)
                                    if os.path.isdir(os.path.join(folder, d))])
        # choose a random file inside that folder
        file    = np.random.choice(os.listdir(os.path.join(folder, cls_dir)))
        path    = os.path.join(folder, cls_dir, file)

        imgs.append(tf(Image.open(path).convert('RGB')))
        true_cls.append(cls_dir)

    # stack to mini‑batch, run through model
    batch = torch.stack(imgs).to(device)
    preds = model(batch).argmax(1).cpu().numpy()

    # decode integer predictions back to string labels
    pred_cls = encoder.inverse_transform(preds)
    return true_cls, pred_cls, imgs


# ---------------------------------------------------------------------
#  Use the helpers: compare true vs predicted frequencies
# ---------------------------------------------------------------------
true_cls, pred_cls, img = get_random_images(
    folder='test',
    encoder=encoder,
    mean=test_mean,
    std=test_std,
    device=device,
    n=10
)

from collections import Counter
print("True class counts :", Counter(true_cls))
print("Pred class counts :", Counter(pred_cls))


True class counts : Counter({np.str_('beet_salad'): 1, np.str_('red_velvet_cake'): 1, np.str_('paella'): 1, np.str_('pork_chop'): 1, np.str_('risotto'): 1, np.str_('breakfast_burrito'): 1, np.str_('fried_calamari'): 1, np.str_('lobster_bisque'): 1, np.str_('samosa'): 1, np.str_('pizza'): 1})
Pred class counts : Counter({np.str_('scallops'): 1, np.str_('red_velvet_cake'): 1, np.str_('chocolate_cake'): 1, np.str_('pork_chop'): 1, np.str_('risotto'): 1, np.str_('gnocchi'): 1, np.str_('lobster_roll_sandwich'): 1, np.str_('lobster_bisque'): 1, np.str_('onion_rings'): 1, np.str_('pizza'): 1})


# Bonus point
Use an LLM (API) to generate a description of the food preference of a user based on 10 images that a potential user could provide. 
Please include an example of the output of your code, especially if you used an API other than the OpenAI API.

This should work well even with differing test images by setting different random seeds for the image selector.

In [46]:
""""Use openai API, give him frequency dict and ask to generate a text profile of the user based on the frequencies"""
import matplotlib.pyplot as plt
import torchvision

from openai import OpenAI
import os        


client = OpenAI(
    api_key=""
)


# -------------------------------------------------
freq = Counter(pred_cls)           

system_msg = "You are a food critic who writes short taste profiles."
user_msg   = (
    "Based on these dish counts, describe the user's food preferences "
    "in 3–4 lines.\n"
    f"Dish counts (label : times liked):\n{json.dumps(freq, indent=2)}"
)

# -------------------------------------------------

response = client.chat.completions.create(
    model="gpt-3.5-turbo",          # or "gpt-4o" if you have access
    messages=[
        {"role": "system", "content": system_msg},
        {"role": "user",   "content": user_msg}
    ],
    temperature=0.7                 # a bit of creativity
)

profile_text = response.choices[0].message.content.strip()
print(profile_text)
batch = torch.stack(img).cpu()  # move to CPU for display
grid = torchvision.utils.make_grid(
    batch.cpu(), nrow=5,               # 2 rows × 5 columns
    normalize=True,                    # auto‑undo mean/std for display
    value_range=(-3, 3)                # safe range after norm
)
plt.figure(figsize=(10,4))
plt.imshow(grid.permute(1, 2, 0))
plt.axis("off")
plt.show()

APIConnectionError: Connection error.