# Binary Classification 
### Introduction
In this project, we make decisions on food taste similarity based on image and human judgements. We are provided with a dataset of images of 10'000 dishes. 
<br>
<img src="image.png" alt="Food Dish" width="600" height="500"> <br>
<br>
Furthermore, we are provided with a set of triples $(A, B, C)$ representing human annotations, meaning that dish $A$ is more similar to the taste of dish $B$ than to the taste of dish $C$. 

Our task is to predict for unseen tripbles $(A, B, C)$ whether dish $A$ is more similar in taste to $B$ or $C$. More precisely, for each triple in `test_triplets.txt` predict $0$ or $1$ as follows: 
- $1$ if the dish in image `A.png` is closter in taste to the dish in image `B.png` than to the dish in `C.png`
- $0$ if the dish in image `A.png` is closer in taste to the dish in image `C.png` than to the dish in `B.png`

### Imports 
First, we import necessary libraries. 

In [1]:
import numpy as np
from torchvision import transforms
from torch.utils.data import DataLoader, TensorDataset
import os
import torch
from torchvision import transforms
import torchvision.datasets as datasets
import torch.nn as nn
import torch.nn.functional as F

In [2]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

### Extracting the Embeddings using pretrained Model
Fist, we have to transform, resize and normalize the images. Then we use a pretrained model to extract the embeddings. 
References: https://pytorch.org/vision/stable/models.html#using-the-pre-trained-models

Comments: 
- since we train our model on GPUs, set `pin_memory=True` to enable faster data transfer between CPU and GPU
- we have 128 cores, thus `num_workers=128`
- embeddings $=$ compact, numerical representation of images. The purpose of an embedding is to capture the most important features of the image. Embeddings typically represent things like edges, textures, colors.
- extracting embeddings $=$ feed images through pretrained model and collect the output from intermediate layer (right before the final classification layer)

In [4]:
from torchvision.models import resnet50, ResNet50_Weights
from torchvision.transforms import ToTensor


transform = transforms.Compose([
    transforms.Resize((224, 224)),   
    transforms.ToTensor(),           
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  
])

training_data = datasets.ImageFolder(
    root="dataset/", 
    transform=transform
)

train_dataloader = DataLoader(
    training_data, 
    batch_size=64, 
    shuffle=True, 
    pin_memory=True, 
    num_workers=64, 
)

model = resnet50(pretrained=True)
model = model.to(device)
model = torch.nn.Sequential(*list(model.children())[:-1])
model.eval()

batch_size = 64
embedding_size = 2048 
num_images = len(training_data)
embeddings = np.zeros((num_images, embedding_size))

idx = 0
with torch.no_grad():
    for (X, y) in train_dataloader:
        X = X.to(device)
        batch_embeddings = model(X)
        batch_embeddings = batch_embeddings.view(batch_embeddings.size(0), -1).cpu().numpy() # vectorize to numpy array 
        batch_size = batch_embeddings.shape[0]
        embeddings[idx : idx + batch_size] = batch_embeddings
        idx += batch_size

np.save('dataset/embeddings.npy', embeddings)

### Use Embeddings as Training Data
After generating the embeddings for the images, we now need to create the appropriate training data. More precisely, for a random sample, e.g., `02461 03450 02678` from `train_triplets.txt`, we need to find the corresponding embeddings and create the correct label. We use a mapping to associate filenames with embeddings. In this case, we have the label 1, and we apply data augmentation to add `02461 02678 03450` with label 0. In summary, the function `get_data` transforms the training data (and test data) into matrices $X$ and $y$, storing the corresponding embeddings and labels.

In [5]:
def l2_normalize(embeddings):
    norm = np.linalg.norm(embeddings, axis=1, keepdims=True)
    return embeddings / norm 

# X contains embeddings for training data 
# y contains normal labels, i.e. 0 or 1 
def get_data(file, train=True):
    # read triplets into array 
    triplets = []
    with open(file) as f:
        for line in f:
            triplets.append(line)

    # generate training data from triplets
    training_data = datasets.ImageFolder(root="dataset/", transform=None)
    # contains 00000, 00001, ...
    filenames = [s[0].split('/')[-1].replace('.jpg', '') for s in training_data.samples]
    embeddings = np.load('dataset/embeddings.npy')
    embeddings = l2_normalize(embeddings)

    # maps filenames to their corresponding embedding, e.g. 00000 -> [x,y,z,...]
    file_to_embedding = {}
    for i in range(len(filenames)):
        file_to_embedding[filenames[i]] = embeddings[i]
    X = []
    y = []
    # use the individual embeddings to generate the features and labels for triplets
    for t in triplets:
        emb = [file_to_embedding[a] for a in t.split()]
        X.append(np.hstack([emb[0], emb[1], emb[2]]))
        y.append(1) # since dish is closer to B then to C 
        # Generating negative samples (data augmentation)
        if train:
            X.append(np.hstack([emb[0], emb[2], emb[1]]))
            y.append(0)  
    X = np.vstack(X)
    y = np.hstack(y)
    return X, y

The function `create_loader_from_np` creates an instance of the iterable class `DataLoader` for our training- and testset . 

In [6]:
def create_loader_from_np(X, y = None, train = True, batch_size=64, shuffle=True, num_workers = 4):
    if train:
        dataset = TensorDataset(torch.from_numpy(X).type(torch.float), 
                                torch.from_numpy(y).type(torch.long))
    else:
        dataset = TensorDataset(torch.from_numpy(X).type(torch.float))
    
    loader = DataLoader(dataset=dataset,
                        batch_size=batch_size,
                        shuffle=shuffle,
                        pin_memory=True, num_workers=num_workers)
    return loader

In [7]:
TRAIN_TRIPLETS = 'train_triplets.txt'
TEST_TRIPLETS = 'test_triplets.txt'

X, y = get_data(TRAIN_TRIPLETS)
train_dataloader = create_loader_from_np(X, y, train = True, batch_size=64)
del X
del y

X_test, y_test = get_data(TEST_TRIPLETS, train=False)
test_dataloader = create_loader_from_np(X_test, train = False, batch_size=2048, shuffle=False)
del X_test
del y_test

### Building my Neural Network 
We can use a simple neural network for classification since we have already extracted the necessary information about the images and stored it in `embeddings.npy`. My approach includes a neural network with two hidden layers, ReLU and Sigmoid activation functions, and batch normalization.

In [8]:
class CovNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(6144, 128)
        self.bn1 = nn.BatchNorm1d(128)
        self.act1 = nn.ReLU()
        self.fc2 = nn.Linear(128, 1)
        self.act2 = nn.Sigmoid()
           

    def forward(self, x):
        x = self.fc1(x)
        x = self.bn1(x)
        x = self.act1(x)
        x = self.fc2(x)
        x = self.act2(x)
        return x.squeeze(1)

In [9]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()

    for batch_idx, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device).float()
        pred = model(X)
        loss = loss_fn(pred, y)
    
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch_idx % 500 == 0:
            loss = loss.item()
            current = batch_idx * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [10]:
def test_loop(dataloader, model, loss_fn=None):
    predictions = []
    model.eval()

    with torch.no_grad():
        for [X] in dataloader:  # assuming dataloader provides only inputs
            X = X.to(device)
            pred = model(X)
            pred = torch.round(pred)  # if you want binary predictions
            pred = pred.cpu().numpy()
            predictions.append(pred)

    predictions = np.concatenate(predictions, axis=0)  # ensure consistent shape
    np.savetxt("results.txt", predictions, fmt='%i')
    print("Results saved to results.txt")


### Training & Making Predictions using Neural Network
After defining the neural network, we can finally choose our hyperparameters, such as learning rate, number of epochs, and batch size. I selected Binary Cross Entropy loss as the loss function and the Adam optimization algorithm.

We can see that the model is trained properly since the training loss decreases to less than 0.05. 

In [11]:
model = CovNet()
model = model.to(device)

learning_rate = 1E-3
batch_size = 64
epochs = 50

loss_fn = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
print("Training Completed!")

Epoch 1
-------------------------------
loss: 0.710329  [   64/119030]
loss: 0.716275  [32064/119030]
loss: 0.678448  [64064/119030]
loss: 0.686482  [96064/119030]
Epoch 2
-------------------------------
loss: 0.637854  [   64/119030]
loss: 0.673233  [32064/119030]
loss: 0.652536  [64064/119030]
loss: 0.620008  [96064/119030]
Epoch 3
-------------------------------
loss: 0.612144  [   64/119030]
loss: 0.573645  [32064/119030]
loss: 0.548633  [64064/119030]
loss: 0.606827  [96064/119030]
Epoch 4
-------------------------------
loss: 0.628059  [   64/119030]
loss: 0.575090  [32064/119030]
loss: 0.541187  [64064/119030]
loss: 0.430381  [96064/119030]
Epoch 5
-------------------------------
loss: 0.400872  [   64/119030]
loss: 0.440567  [32064/119030]
loss: 0.406173  [64064/119030]
loss: 0.404736  [96064/119030]
Epoch 6
-------------------------------
loss: 0.360248  [   64/119030]
loss: 0.298176  [32064/119030]
loss: 0.407631  [64064/119030]
loss: 0.244681  [96064/119030]
Epoch 7
--------

After training is compmleted, we can now make our predictions for the test data and save it in `results.txt`. 

In [12]:
test_loop(test_dataloader, model)

Results saved to results.txt
