<a href="https://colab.research.google.com/github/AehLane/BoneTrade-Project-OSLNN/blob/master/Human_Remains_OSLNN_Triplet_Loss.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visual Dissimilarity 

Original code framework adapted by Alex Lane from Harshvardhan Gupta, 2017 [One Shot Learning With Siamese Networks](https://github.com/harveyslash/Facial-Similarity-with-Siamese-Networks-in-Pytorch/blob/master/Siamese-networks-medium.ipynb) with code snippets from Tim Sherrat

---

## Foundational Resources


For **novice users** new to data science or Google Colab, below are several links to resources hosted by 3rd parties that may help provide additional information or how-to instructions relevant to this Colab Notebook.*

Google Colaboratory:
*   Google's ["Welcome to Colaboratory"](https://colab.research.google.com/notebooks/welcome.ipynb)
*   [Google Colaboratory FAQ](https://research.google.com/colaboratory/faq.html)
*   Anne Bonner, 2019 [Getting Started With Google Colab, A Simple Tutotial for the Frustrated and Confused](https://towardsdatascience.com/getting-started-with-google-colab-f2fff97f594c)
*   Jason Richards, 2019 [Getting Local with Google Colab](https://medium.com/@jasonrichards911/getting-local-with-google-colab-a4d69f373364)

Tensorflow & PyTorch:
*   Jake VanderPlas, 2019 [Get started with Google Colaboratory (Coding TensorFlow)](https://www.youtube.com/watch?v=inN8seMm7UI&ab_channel=TensorFlow)
*   Dr. Joanne Kitson, 2019 [Installing Tensorflow with CUDA, cuDNN and GPU support on Windows 10](https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781)
*   Oliver Moindrot, 2018 [Triplet Loss and Online Triplet Mining in Tensorflow](https://omoindrot.github.io/triplet-loss)
*   Joakim Rishaug, 2020 [PyTorch Conversion of Triplet Loss and Online Triplet Mining](https://github.com/NegatioN/OnlineMiningTripletLoss)

Neural Networks & Data Science:
*   Harshvardhan Gupta, 2017 [One Shot Learning With Siamese Networks](https://github.com/harveyslash/Facial-Similarity-with-Siamese-Networks-in-Pytorch/blob/master/Siamese-networks-medium.ipynb)
*   Will Koehrsen, 2018 [Neural Networks Embeddings Explained](https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526)
*   Sagar Sharma, 2017 [Epoch vs Batch Size vs Iterations](https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9)
*   Raúl Gómez, 2019 [Understanding Ranking Loss, Contrastive Loss, Margin Loss, Triplet Loss, Hinge Loss and all those confusing names](https://gombru.github.io/2019/04/03/ranking_loss/)







*We do not guarantee the quality of the content hosted by these 3rd parties.

---
## Running this Notebook in Hosted vs. Local Runtime Environments

We are assuming you have loaded this notebook with Google Colab and that you are signed into a Google account. 

For the first time through, you will run each block of code in sequential order. If after having trained a network you wish to resume testing at a later date, we will show you how to save your model and reload it, at the appropriate blocks.

By **default**, we are assuming **you will be using Colab's hosted runtime environment**, but if you have access to a server/cluster that you would prefer to use instead, or would prefer to run it locally on the machine that has loaded up this Colab notebook, read on:


*   If you want to run this notebook in a **local environment on the machine that has loaded up this Colab notebook** or if you are using **Google's Compute Engine**, you can find instructions [here](https://research.google.com/colaboratory/local-runtimes.html)

*   If you are looking to **run this notebook on another machine** and are **not** using **Google Compute Engine**, the following instructional guidelines may be of use to you.*

*This is just how we have been running our local server instance. No guarantee it will work exactly as described for you.

### Our Server Settings and Specifications

Server OS: Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-74-generic x86_64)

GPU: Tesla V100-PCIE-16GB

Compute capability: 7.0

CUDA Toolkit: 10.1

CuDnn: v7.6.5 for CUDA 10.1

Python: 3.6.9

Tensorflow: 2.1.0

PyTorch: 1.4.0

Jupyter Notebook: 6.0.3

Additional libraries needed: imgaug, matplotlib, [online_triplet_loss](https://github.com/NegatioN/OnlineMiningTripletLoss) (can be installed through pip)

### Steps We Use to Connect to Our Server for Local Runtime

1.   Open cmd prompt on the machine with the open Colab notebook.
2. Create a port tunnel using ssh, so choose an open port on both the client and server side; `local_port` & `server_port`.
3.   Choose one:

    3a. If the server asks for a password on ssh connection `ssh -L localhost:local_port:localhost:server_port username:password@server_IP`.

    3b.If the server asks for a password after ssh connection `ssh -L localhost:local_port:localhost:server_port username@server_IP`.
4. Start the Jupyter notebook on the server_port chosen `jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --NotebookApp.port_retries=0 --notebook-dir="" --no-browser --allow-root --NotebookApp.token='' --NotebookApp.disable_check_xsrf=True --port=server_port`.
5. On another tab of the client side browser, check `localhost:local_port` to make sure the notebook has been correctly port forwarded.
6. Select the drop-down menu for Colab runtimes in the top right of the notebook UI, 'Connect'.
7. Select 'Connect to local runtime'.
8. Enter `http://localhost:local_port/` into the pop-up window and select the 'Connect' button.

---
## Imports


These imported libraries are required to build the neural network, define our loss function, and display the results in the 'Testing' section

In [0]:
import torchvision
import torchvision.datasets as dset
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import torchvision.utils
import numpy as np
import random
from PIL import Image
import torch
from torch.autograd import Variable
import PIL.ImageOps
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import time

In [None]:
# We use the online mining triplet loss method outlined below
# https://www.jrishaug.com/OnlineMiningTripletLoss/
!pip install online_triplet_loss
from online_triplet_loss.losses import *

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

try:
    %tensorflow_version 2.x
except:
    pass

import tensorflow as tf

---
## Hardware Acceleration (GPU)

Under 'Edit' -> 'Notebook Settings' make sure to select 'GPU' for Hardware Acceleration. This block of code confirms that a GPU has been selected for  Tensorflow and PyTorch operations. 

In [0]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
    raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)


---
## Loading Data for Training & Testing

**By default, we expect the data to be stored as a compressed .zip directory with the following structure:**

    data.zip/
    |----testing/
    |    |----test_img_1
    |    |    |----test_img_1.png
    |    |----test_img_2
    |    |    |----test_img_2.png
    |    |----test_img_3
    |    |    |----test_img_3.png
    |    |----...
    |----training/
    |    |----train_img_1/
    |    |    |----train_img_1_augmented_variation_1.png
    |    |    |----train_img_1_augmented_variation_2.png
    |    |    |----train_img_1_augmented_variation_3.png
    |    |    |----...
    |    |----train_img_2/
    |    |    |----train_img_2_augmented_variation_1.png
    |    |    |----train_img_2_augmented_variation_2.png
    |    |    |----train_img_2_augmented_variation_3.png
    |    |    |----...
    |    |----train_img_3/
    |    |    |----train_img_3_augmented_variation_1.png
    |    |    |----train_img_3_augmented_variation_2.png
    |    |    |----train_img_3_augmented_variation_3.png
    |    |    |----...

The methods to upload the data into this notebook depend upon the runtime environment type chosen; hosted runtime or local runtime. Both hosted and local runtimes have their own instructions and associated code outlined below in their respective following sections.

### Load Data (Hosted Runtime)

Loading data for a hosted runtime may be done by uploading the zipped data folder directly or via a Google Drive account which contains the 'data.zip' by mounting the account.

**To upload directly**, open the tray at left by clicking on the `>` button. Select 'Files' and then 'Upload'. Then, select a zip file in the file explorer from the local machine. If the tray does not refresh automatically, hit 'refresh' to update the tray to show changes.

Unzip the data by running the next block.

In [0]:
!mkdir data
!unzip data.zip -d data/


 **To mount Google Drive**, run the next block of code. The results block will display a URL. Click on this URL, and a new window will open asking for confirmation to connect Google Drive by allowing the listed permissions. Once confirmed, an authorization code will be displayed. Copy this code, and paste into the results block below. If all goes well, the results block will shortly display the text, 'Mounted at /content/drive'. Refresh the files pane in the tray at left.

If the Colab Notebook successfully connected to the Google Drive account, it will appear as a folder within the tray. To change the location in the tray Google Drive is mounted, right-click on a folder within Google Drive, copy the path, and then paste that into the code below. **Note**: The leading `/content/` may need to be deleted from the path.

In [0]:
from google.colab import drive
drive.mount('/content/drive')


The code cell below shows how to copy a file from Google Drive to this space, and then unzip the folder. It may be required that the path specified below is modified so it properly details the location of the unzipped data directory. Optionally, uncommenting `!mkdir data && unzip data.zip -d data/` will create a new directory to contain the unzipped data.

In [0]:
# If you have data already on google drive
!cp "drive/My Drive/one-shot-test/data.zip" data.zip

# !mkdir data && unzip data.zip -d data/
!unzip data.zip

### Load Data (Local Runtime)

Loading data for a local runtime may be performed by the following code cell which will unzip the data folder. Optionally, it is possible to specify the path to the 'data.zip' below, ie. `path/to/your/data.zip`

In [0]:
from zipfile import ZipFile

# https://thispointer.com/python-how-to-unzip-a-file-extract-single-multiple-or-all-files-from-a-zip-archive/
# Create a ZipFile Object and load sample.zip in it
with ZipFile('data.zip', 'r') as zipObj:
    # Extract all the contents of zip file in current directory
    zipObj.extractall('data')


---
## Helper Functions

Three helper functions, `imshow` and `show_plot`, and `worker_init_fn`, are defined here to assist in parallelizing or displaying results in later sections. 

In [0]:
# Is called when we want to show our compared images in the testing output
def imshow(img, text=None, text2=None, should_save=False):
    npimg = img.numpy()
    plt.axis("off")
    if text:
        plt.text(
            10, 10, text, style='italic', fontweight='bold',
            bbox={'facecolor': 'white', 'alpha': 0.8, 'pad': 10})
    if text2:
        plt.text(
            120, 10, text2, style='italic', fontweight='bold',
            bbox={'facecolor': 'white', 'alpha': 0.8, 'pad': 10})
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# Simple display function used to show loss during training
def show_plot(iteration, loss):
    plt.plot(iteration, loss)
    plt.show()


### Defining Dataloader Workers' init()

This function, `worker_init_fn` allows splitting the task of generating image pairs among a group of workers. This function is provided as an argument for `DataLoader` class types, giving the dataloader an init method for its workers.

In [0]:
def worker_init_fn(worker_id):
    worker_info = torch.utils.data.get_worker_info()
    dataset = worker_info.dataset

    dataset.worker_unique_id = worker_info.id
    dataset.number_of_workers = worker_info.num_workers
    dataset.reference_iter_index = torch.tensor(worker_info.id)
    dataset.unknown_iter_index = torch.tensor(worker_info.id)


---
##Custom Classes

Custom class definitions including the `Config` class, `SiameseNetworkDataset` classes, and our neural network's `SiameseNetwork` class are defined here.

### Configuration Class

The 'Config' class defines variables that will be reused throughout later sections.

You may want to change some of these variables to work for your purposes.

Refer to the **'Loading Data' section for guidelines** for directory structure for the loaded data.

`training_dir = path/to/your/data/training`

`testing_dir = path/to/your/data/testing`

`num_generator_workers = desired_number_of_parallel_workers`*

*Currently, Python's multiprocessing, Dataloaders, and our selected method of training are not fully compatible with each other which is why the `num_generator_workers` equals 0 by default to avoid unhandled runtime errors due to workers unexpectedly exiting. As this may behaviour may change in the future, we are keeping it here.

In [0]:
class Config():
    training_dir = "data/training"
    testing_dir = "data/testing"
    train_batch_size = 32
    train_number_epochs = 200
    num_generator_workers = 0


### Custom Dataset Classes

Under a superclass of `SiameseNetworkDataset`, three subclasses exist for different purposes; one for training and two for testing.

**For training**, using `BatchTripletSiameseNetworkDataset`, returns a valid triplet with an anchor image, positive image of the same class as the anchor, and a negative image of a different class as the anchor.


**For testing**, `TestingSiameseNetworkDatasetReferences`' `__getitem__` method returns a valid reference image while `TestingSiameseNetworkDatasetUnknowns`' `__getitem__` method returns a valid unknown together forming a valid reference-unknown pair of images. Since the data is kept as a generator due to possibly working with large sets of data, the runtime should be __O(n^2)__ in the 'Testing' section due to iteratively comparing each element of the dataset to each potential pair member in the same dataset.

In [0]:
# Our base class for all training & testing dataset classes
class SiameseNetworkDataset(Dataset):

    def __init__(self, imageFolderDataset, transform=None, should_invert=True):
        self.imageFolderDataset = imageFolderDataset
        self.transform = transform
        self.should_invert = should_invert

    def __len__(self):
        return len(self.imageFolderDataset.imgs)


# This simple class retrieves a random image. During training, using
# online_triplet_mining, we'll select valid triplets for training from the set
# of random images this generates.
class BatchTripletSiameseNetworkDataset(SiameseNetworkDataset):

    # Returns an random image from our training data
    def __getitem__(self, index):

        randIndex = random.randrange(0, len(self.imageFolderDataset.imgs))
        img0_tuple = self.imageFolderDataset.imgs[randIndex]

        img0 = Image.open(img0_tuple[0])
        img0 = img0.convert("L")

        if self.should_invert:
            img0 = PIL.ImageOps.invert(img0)

        if self.transform is not None:
            img0 = self.transform(img0)

        return (img0,
                img0_tuple)


# During testing, we'll use this class to find reference images in the data.
# These references are found using the 'reference_file_indentifier', checking
# the data directory for file names containing 'reference_file_identifier'.
class TestingSiameseNetworkDatasetReferences(SiameseNetworkDataset):

    def __init__(
            self,
            imageFolderDataset,
            transform, should_invert,
            reference_iter_index,
            unknown_iter_index=0,
            reference_file_identifier='',
            worker_unique_id=0,
            number_of_workers=1):
        super().__init__(imageFolderDataset, transform, should_invert)
        self.reference_iter_index = reference_iter_index
        self.unknown_iter_index = unknown_iter_index
        self.reference_file_identifier = reference_file_identifier
        self.worker_unique_id = worker_unique_id
        self.number_of_workers = number_of_workers

    def __getitem__(self, index):
        while self.reference_iter_index < len(self):
            if (
                    self.reference_file_identifier
                    in
                    self.imageFolderDataset.imgs[self.reference_iter_index][0]
                    ):
                reference_tuple = (
                    self.imageFolderDataset.imgs[self.reference_iter_index])
                break
            else:
                self.reference_iter_index += self.number_of_workers

        if self.reference_iter_index >= len(self):
            return -1, -1, -1, -1, -1, -1
        else:
            self.reference_iter_index += self.number_of_workers
            reference_image = Image.open(reference_tuple[0])
            reference_image = reference_image.convert("L")

            if self.should_invert:
                reference_image = PIL.ImageOps.invert(reference_image)

            if self.transform is not None:
                reference_image = self.transform(reference_image)

            return (self.reference_iter_index-self.number_of_workers,
                    self.worker_unique_id,
                    reference_image,
                    -1,
                    reference_tuple[0],
                    -1)


# During testing, we'll use this class to retrieve specified "unknowns" for our
# reference-unknown pairs.
class TestingSiameseNetworkDatasetUnknowns(SiameseNetworkDataset):

    def __init__(
            self,
            imageFolderDataset, transform, should_invert,
            reference_iter_index, unknown_iter_index=0,
            reference_file_identifier='',
            worker_unique_id=0,
            number_of_workers=1):
        super().__init__(imageFolderDataset, transform, should_invert)
        self.reference_iter_index = reference_iter_index
        self.unknown_iter_index = unknown_iter_index
        self.reference_file_identifier = reference_file_identifier
        self.worker_unique_id = worker_unique_id
        self.number_of_workers = number_of_workers

    def __getitem__(self, index):
        reference_tuple = (
            self.imageFolderDataset.imgs[self.reference_iter_index])

        while self.unknown_iter_index < len(self):
            if (
                    self.reference_file_identifier
                    in self.imageFolderDataset.imgs[self.unknown_iter_index][0]
                    ):
                self.unknown_iter_index += self.number_of_workers
            else:
                unknown_tuple = (
                    self.imageFolderDataset.imgs[self.unknown_iter_index])
                break

        if self.unknown_iter_index >= len(self):
            return -1, self.worker_unique_id, -1, -1, -1, -1
        else:
            self.unknown_iter_index += self.number_of_workers

            reference_image = Image.open(reference_tuple[0])
            unknown_image = Image.open(unknown_tuple[0])
            reference_image = reference_image.convert("L")
            unknown_image = unknown_image.convert("L")

            if self.should_invert:
                reference_image = PIL.ImageOps.invert(reference_image)
                unknown_image = PIL.ImageOps.invert(unknown_image)

            if self.transform is not None:
                reference_image = self.transform(reference_image)
                unknown_image = self.transform(unknown_image)

            return (self.unknown_iter_index,
                    self.worker_unique_id,
                    reference_image,
                    unknown_image,
                    reference_tuple[0],
                    unknown_tuple[0])


### Neural Network Class

Defined below is a standard convolutional neural network. Each convolutional layer has batch normalisation and then dropout. As Gupta says, 'There is nothing special about this network. It accepts an input of 100px by 100px and has 3 full connected layers after the convolution layers'. Optionally, to further experiment, adding or modifying of layers may be used.

In [0]:
class SiameseNetwork(nn.Module):
    def __init__(self):
        super(SiameseNetwork, self).__init__()

        self.cnn1 = nn.Sequential(
            nn.ReflectionPad2d(1),
            nn.Conv2d(1, 4, kernel_size=3),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(4),

            nn.ReflectionPad2d(1),
            nn.Conv2d(4, 8, kernel_size=3),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(8),


            nn.ReflectionPad2d(1),
            nn.Conv2d(8, 8, kernel_size=3),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(8),
        )

        self.fc1 = nn.Sequential(
            nn.Linear(8*100*100, 500),
            nn.ReLU(inplace=True),

            nn.Linear(500, 500),
            nn.ReLU(inplace=True),

            nn.Linear(500, Config.train_batch_size)
        )

    def forward_once(self, x):
        output = self.cnn1(x)
        output = output.view(output.size()[0], -1)
        output = self.fc1(output)
        return output

    def forward(self, input1, input2, training=True):
        if training:
            output1 = self.forward_once(input1)
            return output1
        else:
            output1 = self.forward_once(input1)
            output2 = self.forward_once(input2)
            return output1, output2


---
## Setting Up the Training Dataset and Associated Image Folder

The location of the training data is set below to the variable defined by 'Config'. In the second block, the images are provided as a parameter to the custom dataset class, `BatchTripletSiameseNetworkDataset`, which resizes them to 100 x 100 pixels and transforming them into tensors.

In [0]:
folder_dataset = dset.ImageFolder(root=Config.training_dir)
# print(folder_dataset.imgs)


In [0]:
training_siamese_dataset = BatchTripletSiameseNetworkDataset(
    imageFolderDataset=folder_dataset,
    transform=transforms.Compose([transforms.Resize((100,100)),
                                  transforms.ToTensor()]),
    should_invert=False)


---
## Visualising the Training Data


This displays a couple of batches of data from which the loss function may create triplets through online mining.

In [0]:
vis_dataloader = DataLoader(
    training_siamese_dataset, shuffle=True, num_workers=1, batch_size=16)
dataiter = iter(vis_dataloader)

example_batch_img, example_batch_names = next(dataiter)
example_batch_img2, example_batch_names2 = next(dataiter)
concatenated = torch.cat((example_batch_img, example_batch_img2), 0)
concatenated_names = torch.cat((
    example_batch_names[1], example_batch_names2[1]), 0)
imshow(torchvision.utils.make_grid(concatenated))
print("Class labels:")
print(concatenated_names.numpy())


---
## Create or Load a Neural Network Model

Here, a new neural network model can be trained from scratch, or alternatively skip ahead to upload a pre-existing model.

### Training a New Neural Network Model



#### Setting Up the Network for Training

The next three blocks configure all of the variables and settings for training the neural network. Optionally, the values of the Adam optimizer may be tweaked for further experimentation.

In [0]:
train_dataloader = DataLoader(
    training_siamese_dataset,
    shuffle=False,
    num_workers=Config.num_generator_workers,
    batch_size=Config.train_batch_size)


In [0]:
net = SiameseNetwork().cuda()
net = net.to(device)
criterion = batch_hard_triplet_loss
optimizer = optim.Adam(net.parameters(), lr=0.00006)


In [0]:
counter = []
loss_history = []
iteration_number = 0


#### Training the Neural Network

This next block will start the training for the number of epochs set at the start of the notebook in the configuration block. The code is slightly modified so that filenames get stored for the images. The loss function used for training is [online_triplet_loss](https://github.com/NegatioN/OnlineMiningTripletLoss).

In [0]:
batch_generator = iter(train_dataloader)
for epoch in range(0, Config.train_number_epochs):
    batch_tensor = torch.Tensor(0, Config.train_batch_size).to(device)
    batch_labels = torch.Tensor(0,).to(device)
    selected_images = []
    while(True):
        # https://github.com/amdegroot/ssd.pytorch/issues/214
        try:
            potential_batch_sample, potential_name = next(batch_generator)
        except StopIteration:
            batch_generator = iter(train_dataloader)
            potential_batch_sample, potential_name = next(batch_generator)
        if (potential_name[0] in selected_images):
            continue
        else:
            selected_images.append(potential_name[0])

            batch_sample = potential_batch_sample
            batch_sample = batch_sample.cuda()

            potential_name = potential_name[1].cuda()
            optimizer.zero_grad()

            batch_sample_embed = net(batch_sample, _)

            batch_tensor = torch.cat((batch_tensor, batch_sample_embed), 0)
            batch_labels = torch.cat((
                batch_labels, potential_name.float()), 0)
            break

    batch_tensor = batch_tensor.cuda()
    batch_labels = batch_labels.to(device)
    hard_triplet_loss = criterion(
        batch_labels, batch_tensor, margin=2.0, device=device)
    hard_triplet_loss.backward()
    optimizer.step()
    print("Epoch number {}\n Current loss {}\n".format(
        epoch,
        hard_triplet_loss.item()))
    iteration_number += 1
    counter.append(iteration_number)
    loss_history.append(hard_triplet_loss.item())
show_plot(counter, loss_history)


### Saving the Neural Network Model & State Dictionary

The block below saves the state dictionary, and the model. Then, it is possible to return to it if the notebook connection to Colab is broken, or if the project is set aside for a time. The second block copies `cp` the file to a location on Google drive. It is also possible to download the file to the local machine directly by right-clicking the filename in the tray at left (hit 'refresh' to update changes if the model does not at first appear).

In [0]:
# Save the model!
torch.save(net.state_dict(), 'net_params_new.pkl')
torch.save(net, 'net.h5')


In [0]:
cp net_params.pkl "drive/My Drive/one-shot-test"


### Loading a Neural Network Model & State Dictionary

The first time through this notebook, this section is not important; skip down to 'Testing'. Otherwise, upon returning to the project make sure that **sections 'Imports' through 'Setting the image folder...' are run**.

This code cell below assumes Google Drive is mounted & connected. Alternatively, a model may be directly uploaded or load a model from the local machine using the second code cell.

In [0]:
# Copy the model back from your drive
!cp "drive/My Drive/one-shot-test/net_params.pkl" net_params.pkl


...then tell the machine to load the model:

In [0]:
# Load the model
net = SiameseNetwork()
net.load_state_dict(torch.load('net_params.pkl'))
dp = nn.DataParallel(net)  # https://github.com/pytorch/pytorch/issues/3805

# The incompatiblekeys message might not be an issue - see
# https://gpytorch.readthedocs.io/en/latest/examples/00_Basic_Usage/Saving_and_Loading_Models.html
# which replicates that incompatiblekeys message without any kind of comment,
# seems to be hunkydory


---
## Testing

This block iteratively loads pairs of images with known provenance versus unknown provenance from different subfolders in the testing folder. It then compares the results of these reference-unknown pairs using euclidean distance. It will print out the images with the dissimilarity (euclidean distance), as well as printing out the filenames for each pair.

Since this code currently outputs all possible reference-unknown pairs in the testing folder specified in `Config`, this results in the runtime being **O(n^2)**.

In [0]:
# Time code snippet
# https://stackoverflow.com/questions/1557571/how-do-i-get-time-of-a-python-programs-execution
start_time = time.time()

# Init variable representing testing data's directory, see Config section to
# specify path.
folder_dataset_test = dset.ImageFolder(root=Config.testing_dir)

# Create the dataset, siamese_dataset_references, with the testing data
# Then, create a generator with siamese_dataset_references
siamese_dataset_references = TestingSiameseNetworkDatasetReferences(
    imageFolderDataset=folder_dataset_test,
    transform=transforms.Compose(
        [transforms.Resize((100, 100)), transforms.ToTensor()]),
    should_invert=False,
    reference_iter_index=0,
    reference_file_identifier='ref-')

test_dataloader = DataLoader(
    siamese_dataset_references,
    num_workers=Config.num_generator_workers,
    batch_size=1,
    shuffle=False,
    worker_init_fn=worker_init_fn)

generator_reference_images = iter(test_dataloader)
workers_terminated_outer = np.zeros(Config.num_generator_workers)

# Outer for loop searches for references
for i in range(len(generator_reference_images)):

    # Returns a found reference's index, image, and filepath
    (reference_index,
        worker_id_outer,
        reference_image,
        _,
        reference_filepath,
        _) = next(generator_reference_images)

    '''
    # Stop outer loop if index is out of bounds, no more potential references
    if not(0 in workers_terminated_outer):
        break
    '''

    if reference_index.item() < 0:
        '''
        np.put(workers_terminated_outer, worker_id_outer, 1)
        if not(0 in workers_terminated_outer):
            break
        '''
        break
    else:

        # Create new dataset with a given known reference's index
        # Then, create a generator with siamese_dataset_unknowns
        siamese_dataset_unknowns = TestingSiameseNetworkDatasetUnknowns(
            imageFolderDataset=folder_dataset_test,
            transform=transforms.Compose(
                [transforms.Resize((100, 100)), transforms.ToTensor()]),
            should_invert=False,
            reference_iter_index=reference_index,
            reference_file_identifier='ref-')

        compare_dataloader = test_dataloader = DataLoader(
            siamese_dataset_unknowns,
            num_workers=Config.num_generator_workers,
            batch_size=1, shuffle=False,
            worker_init_fn=worker_init_fn)

        generator_unknown_prov_images = iter(compare_dataloader)
        workers_terminated_inner = np.zeros(Config.num_generator_workers)

        # Inner loop pairs reference with all images of unknown provenance
        for k in range(len(generator_unknown_prov_images)):

            # no_more_unknowns will return -1 if all pairs have been found
            no_more_unknowns, worker_id_inner, _, unknown_prov_image, _, (
                unknown_prov_filepath) = (next(generator_unknown_prov_images))

            '''
            # Stop this inner loop if all reference-unknown pairs for the
            # current reference image are found
            if not(0 in workers_terminated_inner):
                break
            '''

            if no_more_unknowns.item() < 0:
                '''
                np.put(workers_terminated_inner, worker_id_inner, 1)
                if not(0 in workers_terminated_inner):
                    break
                '''
                break
            else:
                concatenated = torch.cat(
                    (reference_image, unknown_prov_image), 0)

                # Feed the reference-unknown pair into the neural network to
                # return the pair's embeddings.
                reference_embedding, unknown_embedding = net(
                    Variable(reference_image).cuda(),
                    Variable(unknown_prov_image).cuda(), training=False)

                # Evaluate the embeddings using Euclidean distance as the
                # metric.
                euclidean_distance = F.pairwise_distance(
                    reference_embedding, unknown_embedding)

                # Show the images:
                imshow(
                    torchvision.utils.make_grid(concatenated),
                    'Dissimilarity: {:.2f}'.format(euclidean_distance.item()))

                # Show the paths for the two images
                print('Image 1: {}'.format(reference_filepath[0]))
                print('Image 2: {}'.format(unknown_prov_filepath[0]))
                print('Dissimilarity: {:.2f}'.format(
                    euclidean_distance.item()))
                # if you wish to write directly to file, comment out the 'Show the images' code
                # and then modify the three print statements above along these lines:
                # print('Image 1: {}'.format(anchor_filepath[0]), file=open('output.txt', 'a'))

print('Finshed all reference-unknown pair comparisons')
print("--- %s seconds ---" % (time.time() - start_time))


---
## The End

The two code blocks below print out the structure of the neural network, and the version info of all of the loaded packages in this environment. This information is useful for replicating this notebook in the future.

In [0]:
from torchvision import models
model = net
print(model)


In [0]:
# watermark is not installed by default.
# the first time through, uncomment the two lines below
# then run the block.
#
#
# !pip install watermark
# %load_ext watermark
%watermark -v -m -p numpy, scipy, torchvision, PIL, tensorflow, torch -g
