<img src="img/dsci572_header.png" width="600">

# Appendix C: Computing Derivatives

In [1]:
import numpy as np
import sklearn
import sklearn.linear_model
import scipy.optimize
import autograd # pip install autograd
from autograd import grad
import autograd.numpy as anp
from scipy.optimize import minimize
import matplotlib.pyplot as plt

plt.rcParams.update({'font.size': 14, 'axes.labelweight': 'bold', 'figure.figsize': (6, 4)})
%config InlineBackend.figure_formats = ['svg']

## Few-Shot Learning

- We have so far discussed how CNNs can be used to classify images

- The assumption here is that we have access to relatively large datasets that contain 100s or 1000s of images per class
- However, this is not always the case. More data is usually hard, or even impossible to obtain
- Take the example of a **face recognition** system for a company's employees:

  - Does it make sense to ask each employee to provide, say, 1000 images of their face so as to register them in the system?

  - Even if we have 1000 images of each employee, we'd have to retrain the model every time a new employee is added to the system

<br><br><br>

- For cases like these, we can use a technique called **few-shot** learning. Instead of training the model on a new class of images, we use the similarities and differences of an image with other images to decide the class of that image.

- Let me explain this with some examples (credit: most images in this section are adopted from this [website](https://zzaebok.github.io/machine_learning/FSL/) and [Youtube video](https://www.youtube.com/watch?v=hE7eGew4eeg)):

For training a conventional CNN, we use a dataset like this:

<img src="https://imgur.com/6MAisQL.png" width="600">

<br><br><br>

When we want to predict the class of a new image, that class should already exist in the training set of the CNN:

<img src="https://imgur.com/H1rP0Aw.png" width="600">

-But what if the class of the test image does not exist among the classes on which the CNN was trained?

- This is the kind of problem we're interested in solving with few-shot learning

- **Few-shot learning is about learning to classify a new test image, with only few examples of that new class**

<br><br><br>

- Here for example, rabbit class does not exist in the training set. It would be desirable to have a model that can learn to classify this **query** image, based on a few images of rabbits

<img src="https://imgur.com/StTYXay.png" width="600">

<br><br><br>

- The dataset of images on which the model wasn't trained on is called the **support set**

- This problem is called **k-way n-shot** learning, when we have **k classes** and **n samples per class** in the **support set**

<img src="https://imgur.com/UsmR4Ow.png" width="600">


<br><br><br>

- **Now comes the key idea of few-show learning:** instead of directly learning how to classify, learn how to find similarities between samples belonging to the same class, and differences between samples belonging to different classes

- In other words, instead of **learning the classification itself**, we'd like to **learn how to learn the classification**!

- This is why few-shot learning is said to be an example of **meta learning**


<br><br><br>

### Siamese Networks for Few-Shot Learning

- As mentioned above, the goal is to learn **similarities** and **differences** between images in the same and different classes, rather than the corresponding classes

- Therefore, it seems natural to think that a CNN could be used for feature extraction

- An interesting model for few-shot learning is a Siamese (or twin) network ([image source](https://people.kth.se/~rosun/deep-learning/figures/siamese-arch.svg)):

<img src="https://people.kth.se/~rosun/deep-learning/figures/siamese-arch.svg" width="700">

<br><br><br>

- The Siamese network is supposed to learn similarities and differences

- This is why the first step is to **construct a new dataset from an existing one** as follows:

<img src="img/pos_neg.png" width="700">

- Positive samples are pairs of images that belong to the same class

- Negative samples are pairs of images that don't belong to the same class

- The Siamese network is a parallel neural network architecture has two streams of images: one for each image in a pair of images, that form either a positive or negative sample

- The two streams share exactly the same architecture and model parameters. This is why this particular architecture is called a **Siamese** or **twin** network

<br><br><br>

**How does a Siamese network learn?**

A Siamese network uses a particular type of loss function called **contrastive loss**, with the following form:

$$
(1-Y) \left(D\right)^2+(Y) \left\{\max \left(0, m-D\right)\right\}^2
$$

where $Y$ is the label of the generated samples; 0 for similar images (positive sample), and 1 for dissimilar images (negative sample).

- The contrastive loss tries to decrease distance between embeddings (feature vectors) of similar images, and to increase distance between embeddings of dissimilar images.

- In other words, through the contrastive loss, a Siamese network tries to pull similar images together, and push dissimilar images away, in the embedding space

<br><br><br>

- I'll explain the implementation in the lecture (the code below is adopted from [here](https://github.com/harveyslash/Facial-Similarity-with-Siamese-Networks-in-Pytorch/blob/master/Siamese-networks-medium.ipynb) with some minor changes).

In [None]:
class SiameseNetworkDataset(Dataset):
    def __init__(self, imageFolderDataset, transform=None):
        self.imageFolderDataset = imageFolderDataset
        self.transform = transform

    def __getitem__(self, index):
        idx = np.random.choice(len(self.imageFolderDataset.imgs))
        img0_tuple = self.imageFolderDataset.imgs[idx]

        # 50% chance of images being in the same class
        should_get_same_class = np.random.randint(0, 2)
        if should_get_same_class:
            while True:
                # loop untill the same class is found
                idx = np.random.choice(len(self.imageFolderDataset.imgs))
                img1_tuple = self.imageFolderDataset.imgs[idx]
                if img0_tuple[1] == img1_tuple[1]:
                    break
        else:
            while True:
                # loop untill a different class is found
                idx = np.random.choice(len(self.imageFolderDataset.imgs))
                img1_tuple = self.imageFolderDataset.imgs[idx]
                if img0_tuple[1] != img1_tuple[1]:
                    break

        img0 = Image.open(img0_tuple[0])
        img1 = Image.open(img1_tuple[0])

        # convert to gray-scale
        img0 = img0.convert("L")
        img1 = img1.convert("L")

        if self.transform is not None:
            img0 = self.transform(img0)
            img1 = self.transform(img1)

        return (
            img0,
            img1,
            torch.from_numpy(
                np.array([int(img1_tuple[1] != img0_tuple[1])], dtype=np.float32)
            ),
        )

    def __len__(self):
        return 500

In [None]:
class SiameseNetwork(nn.Module):

    def __init__(self):
        super().__init__()

        self.cnn = nn.Sequential(

            nn.Conv2d(1, 96, kernel_size=5,stride=2),
            nn.BatchNorm2d(96),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, stride=2),
            
            nn.Conv2d(96, 64, kernel_size=3, stride=2),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, stride=2),

            nn.Flatten()
        )

        self.fc = nn.Sequential(

            nn.Linear(3136, 256),
            nn.ReLU(inplace=True),

            nn.Linear(256, 128),
            nn.ReLU(inplace=True),

            nn.Linear(128, 32),
            nn.ReLU(inplace=True),
            
            nn.Linear(32, 2)
        )
        
    def forward_each(self, x):
        output = self.cnn(x)
        output = self.fc(output)
        
        return output

    def forward(self, input1, input2):
        output1 = self.forward_each(input1)
        output2 = self.forward_each(input2)

        return output1, output2


model = SiameseNetwork()
summary(model, [(1, 128, 128,), (1, 128, 128,)]);

In [None]:
class ContrastiveLoss(torch.nn.Module):
    def __init__(self, margin=2.0):
        super(ContrastiveLoss, self).__init__()
        self.margin = margin

    def forward(self, output1, output2, label):
        # Calculate the euclidian distance and calculate the contrastive loss
        euclidean_distance = F.pairwise_distance(output1, output2, keepdim=True)

        loss_contrastive = torch.mean(
        (1 - label) * torch.pow(euclidean_distance, 2)
        + (label) * torch.pow(torch.clamp(self.margin - euclidean_distance, min=0.0), 2)
        )
        
        return loss_contrastive

In [None]:
def gridshow(img, text=None):
    npimg = img.numpy()
    plt.axis("off")
    if text:
        plt.text(75, 8, text, style='italic',fontweight='bold',
            bbox={'facecolor':'white', 'alpha':0.8, 'pad':10})
        
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show() 

In [None]:
folder_dataset = datasets.ImageFolder(root="data/faces/training")

transformation = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

siamese_dataset = SiameseNetworkDataset(
    imageFolderDataset=folder_dataset, transform=transformation
)

train_loader = DataLoader(siamese_dataset, shuffle=True, batch_size=8)

example_batch = next(iter(train_loader))
concatenated = torch.cat((example_batch[0], example_batch[1]), axis=0)

gridshow(utils.make_grid(concatenated, nrow=8))
print(example_batch[2].numpy().reshape(-1))

In [None]:
device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')

model = SiameseNetwork().to(device)
criterion = ContrastiveLoss(margin=3.0)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

siamese_dataset = SiameseNetworkDataset(imageFolderDataset=folder_dataset, transform=transformation)
# siamese_dataset = Subset(siamese_dataset, range(500))

train_loader = DataLoader(siamese_dataset, shuffle=True, batch_size=64)

loss_history = [] 

for epoch in range(200):

    for i, (img0, img1, label) in enumerate(train_loader, 0):

        if device.type in ['cuda', 'mps']:
            img0, img1, label = img0.to(device), img1.to(device), label.to(device)

        optimizer.zero_grad()
        output1, output2 = model(img0, img1)
        loss_contrastive = criterion(output1, output2, label)
        loss_contrastive.backward()
        optimizer.step()

        if i % 10 == 0 :
            print(f"Epoch {epoch}: Training batch loss = {loss_contrastive.item():g}")
            loss_history.append(loss_contrastive.item())

In [None]:
plt.loglog(loss_history)

In [None]:
model.eval()

folder_dataset_test = datasets.ImageFolder(root="data/faces/testing/")
siamese_dataset = SiameseNetworkDataset(imageFolderDataset=folder_dataset_test, transform=transformation)
                                        
# siamese_dataset = Subset(siamese_dataset, range(2000))

test_dataloader = DataLoader(siamese_dataset, batch_size=1, shuffle=True)

# Take one image to test on
dataiter = iter(test_dataloader)
x0, _, _ = next(dataiter)

if device.type in ['cuda', 'mps']: x0 = x0.to(device)

for i in range(10):
    _, x1, label2 = next(dataiter)
    if device.type in ['cuda', 'mps']:
        x1, labe12 = x1.to(device), label2.to(device)

    concatenated = torch.cat((x0, x1), 0)
    
    output1, output2 = model(x0, x1)
    euclidean_distance = F.pairwise_distance(output1, output2)
    gridshow(utils.make_grid(concatenated.cpu()), f'Distance: {euclidean_distance.item():.4g}')