# Module 5 - Deep neural networks and feature extraction
Neural networks (NN) belong to an entirely different family of learning algorithms called representation learning. Such classifiers learn what they think is most salient about a class directly from the data. Until recently computers were not powerful enough to operate directly on dense data such as images. Instead, researchers used NNs with hand-engineered features as *de facto* feature selectors. 

The past ten years have seen a confluence of advances in computer processing and labeled datasets. Together, these have led to renewed interest and rapid development in NN algorithms. The underlying mechanism remains similar: feed the network labeled examples, see how it does, adjust paramters, and repeat. This process is done many millions of times to tune weights in a network.

There are many specific architectures of neural networks. For most of this tutorial we will focus on using *deep residual networks* or ResNets. ResNets are a refinement of general Convolutional Neural Networks(CNNs) that allow for more efficent training. In essence, there are two distinct phases of a ResNet: feature extraction and classification. The earliest layers of a network are filters that are used to find local regions fitting some pattern in the image.

It turns out, these early fitlers are quite general. That is, they do not change very much regardless of the dataset used for training. We can exploit that fact to use a pre-trained network as a feature extractor. So rather then spending lots of time engineering our our features, we will use a ResNet to pull out the information for us.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
from torch.autograd import Variable
from sklearn import ensemble
from sklearn import preprocessing
from PIL import Image
import matplotlib.pyplot as plt
import os
import sys
import copy
import glob
import random
from tqdm.notebook import tqdm

sys.path.insert(1, os.path.join(os.getcwd(),'computer-vision-workshop/utilities'))
from internet_utils import get_json_url
from custom_torch_utils import ImageFolderWithPaths
from display_utils import make_confmat

DATASET_PATH = "/groups/cv-workshop/SPC_manual_labels"

We have imported a lot of new stuff here. Most of it is related to pytorch, the library we will be using for running the ResNet.

## Loading a model and preprocessing data

Pytorch ships comes with many preloaded models. Here we will call up ResNet18 and use it with predefinied weights.

In [None]:
# load a pretrained network
resnet18 = models.resnet18(pretrained=True)

This version of resnet has 18 layers and has been already trained with ImageNet. [ImageNet](http://www.image-net.org/) is a huge labeled image dataset maintained by the Stanford Vision Lab. Each image is associated with a noun describing what is in the picture. Each noun has thousands of labeled images assocaited with it. ImageNet is ubiquitious in computer vision research and is used to train many standard models. 

When ResNet18 is trained with ImageNet, it is tuned for *generic object classification* -- sorting cats from dogs, trucks from cars, etc. This particular version was trained with 1000 generic object classes.

Now let us take our diatom chain image and put it through the net and see what we get.

In [None]:
# load the image
ptf = glob.glob(os.path.join(os.getcwd(), 'computer-vision-workshop/assets', 'SPC*'))

# we will load images with Python Image Library (PIL). 
img = Image.open(ptf[0])

# print the image dimensions
print("image size:", img.size)

img

Note that the colors look different. PIL loads the color channels in the familiar RGB order, not the BGR order of OpenCV.

Also notice that the image dimensions are not square. For margin and ensemble classifiers, this is not an issue. But NN require that everything be the same size as it is put into the system. This is a consequence of the underlying math that governs these systems: for the filters to work, they must be applied to data of the same dimension.

To handle this, Pytorch includes an image transform class to put together the *tensors* needed to run through the network. A tensor is a multidimensional matrix with the pre-defined dimensions needed to optimize the speed of training and exectution of a NN. 

ResNet18 expects input tensors to have the shape [batch_dimension, channel, height, width]. 

* *batch_dimension* is the number of images to be processed at once. This size is usually a multiple of 2. The maximum size is limited by the available hardware.
* *channel* is the number of color channels, usually 3. If using gray scale images with a pretrained network, each image must be replicated into 3 channels. 
* *height* and *width* are the image width and height. This dimension is also standardized according to the network architecture. It is also usually a multiple of 2. Images that are not of this shape need to be resized accordingly.

For now, we need to load a tensor of a single image with the dimensions [1, 3, 224, 224].

In [None]:
# define the preprocessing transform
# this first part normalizes the color channels for ImageNet. If this is not done,
# the classifier will get confused.
normalize = transforms.Normalize(
   mean=[0.485, 0.456, 0.406],
   std=[0.229, 0.224, 0.225]
)

# this is where the image is reshaped into the appropriate tensor dimensions
preprocess = transforms.Compose([
   transforms.Resize((224, 224)),
   transforms.ToTensor(),
   normalize
])

Now that the preprocessing is defined, we can make our image fit into the classifier.

In [None]:
img_tensor = preprocess(img)

# print out the size
img_tensor.size()

The image is now a torch tensor. But it is not quite ready to go into the network. It is still missing the batch size dimension. Since only this one images is going through, use *unsqueeze*, a Pytorch method, to add a dummy dimension.

In [None]:
# add the dummy dimension at position 0
# note that unsqueeze works in place. we do not need to copy the matrix
img_tensor.unsqueeze_(0)

img_tensor.size()

Now the tensor has the right size to put through the network.

## Run an image through ResNet

Putting an image or set of image through a trained network is known as a *forward pass*. Since the network is already trained, we are not concerned with tuning the network via *back propagation*. We will get there later. 

For now, the image can be passed through the network and we can see what the label is. 

In [None]:
# pass it through the ResNet18 and record the output
out = resnet18(img_tensor)

# print out the size of the resulting tensor
out.size()

The resulting tensor has 1000 entries, one for each class the ResNet was trained with. This output corresponds to the probability of the image belonging to each of the classes. To print a few of them, check out the data array of the tensor.

In [None]:
# print out a few of the outputs from the last fully connected layer.
out.data.numpy()[0][0:10]

The final class is just the maximum value of the array. But taking the max will just give us the position. We need the ImageNet key of classes. Here we grab the ImageNet class index as a json document. 

In [None]:
# unpack json from the following url
label_url = "https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json"

# read the dictionary
label_dict = get_json_url(label_url)

# make a list of the first few labels and print them
first_labs = [(item, label_dict[item][1]) for item in list(label_dict.keys())[0:10]]
first_labs

These are the first 10 labels in the ImageNet data set. Those look pretty familiar! Now let's see what ResNet called the diatom chain.

In [None]:
# get the label
print("ResNet18 sez: ", label_dict[str(out.data.numpy().argmax())][1])

So our great and powerful NN thinks this diatom chain is a bucket. 

The point here is that the network is not tuned to think about plankton data. In the next module, we will retrain the network to understand plankton data. 

### A warning about image dimensions

Lets see what the diatom chain looks like when it is resized to [224 x 224]. 

In [None]:
# resize the chain to 224x224
img_res = img.resize((224,224))

img_res

Notice that the image looks squished. By resizing the images, we lose some of the scale that we human rely on to identify these organisms. In principle, the computer does not care about the scale. 

Think carefully when training and testing NN about how the preprocessing will affect the input data. If you think it is important, the aspect can be preserved. There are also methods to re-insert that information in the network.


## Extracting features from the ResNet

Instead of asking the pretrained model to classify the image, we can use it to pull out features. Here, we will read out the weights associated with the final hidden layer of the network.

In [None]:
# first, define a copy of the network, but remove the last layer)
feat_extractor = nn.Sequential(*list(resnet18.children())[:-1])

# Activate evaluation mode
feat_extractor.eval()

# pump the preprocessed image through the network
feats = feat_extractor(img_tensor)

# get the tensor from the end of the truncated network
feats = feats.data

# print the dimensions
feats.shape

This tensor has 512 entries corresponding to the weights on the final layer of the network. To collapse this to a single dimension and use as an array, simply flatten it.

In [None]:
# convert the torch tensor to np array
feats = np.ndarray.flatten(feats.numpy())

feats.shape

We have taken the image, run it throught the ResNet trained on ImageNet, and produced a vector of 512 features. These feature can now be used to train a second stage classifier. Basically, we have reduced all the hand engineered feature extraction to just a few lines of code.

## Running all the data through

Now we need to repeat the process for all the images we want to play with. Our expected output will be the # images by # weights

In [None]:
# Torch has a bunch of dataloading utilities built in. This custom loader adds to their ImageFolder utility to have it 
# return file paths so we can observe the output.
# It assumes that the images are loaded as "{DATASET_PATH}/{class_name}/{image_id}.ext"
dataset = ImageFolderWithPaths(DATASET_PATH, preprocess)

# We can use the transform to set up a block of image for the GPU to process
loader = torch.utils.data.DataLoader(dataset, batch_size=4)

# This gets a single batch of images
images, labels, paths = next(iter(loader))

# print the shape of the tensors
print("images:", images.shape)
print("labels:", labels.shape)

These inputs were transformed the same way as above, but now the first dimension is 4. This means there are 4 images stacked on top of eachother. This is the *batch_size* and dictates how many images are passed to the network at once. 

*paths* is a tuple with the path to each file in it. This will be used to view the images later. 

Now lets see what the output looks like for these 4 images.

In [None]:
# pass the images through the network and retrieve the labels
feats_small = feat_extractor(images)

# just pull out the data and convert to a numpy
feats_small = feats_small.data.numpy()

# make it into an array and remove extra dimensions
feats_small = np.asarray(feats_small)[:, :, 0, 0]

# check the dimensions
feats_small.shape

This is the shape we expect: 4 rows, 512 features corresponding to the weights on each filter. To extract the features from every image requires a for-loop. In we attempted to define a batch that was the entire dataset, the computer would run out of memory. 

In [None]:
# define the new loader with the bigger batchsize
loader_all = torch.utils.data.DataLoader(dataset, batch_size=128)

# initalize an empty dictonarty to store the features by image path
feat_dict = {}

# put the network on the GPU
feat_extractor = feat_extractor.cuda()

# Activate evaluation mode
feat_extractor.eval()

# tell the network not to compute gradients since we aren't training
with torch.no_grad():
    
    # use the tqdm module to monitor the progress of the extractor
    with tqdm(loader_all, desc="Evaluating") as t:
        
        # iterate over each batch of 128 in the loader
        for inputs, labels, paths in t:
            
            # put the images onto the GPU
            inputs = inputs.cuda()
            
            # extract the features
            feats_temp = feat_extractor(inputs)
            
            # bring output tensor back onto CPU and collapse extra dimensions
            feats_temp = feats_temp.cpu().data.numpy()[:, :, 0, 0]
            
            # put into a temp dictionary
            temp_dict = {paths[ii]: feats_temp[ii, :] for ii in range(len(paths))}
            
            # update the output
            feat_dict.update(temp_dict)



*feat_dict* is organized by the file path of each image. The images have the 512 features associated with them. Again, these features are in many ways akin to those we hand-engineered earlier. But these are defined by a computer trained for generic object classification. 

## Train a RF model

With the ResNet 18 features in hand, we can go ahead and train another Random Forest (RF) using the new features. First, we need to associate a numeric label with each of the images.



In [None]:
# get a list of the unique class names from the dictionary keys
ptfs = list(feat_dict.keys())
cls_names = [line.split('/')[-2] for line in ptfs]
cls_names = list(set(cls_names))
cls_names.sort()

cls_names = [(ii, cls_names[ii]) for ii in range(len(cls_names))]
cls_names

Now shuffle the feature dictionary and split it into training and test data like in the previous module. 

In [None]:
# shuffle the files paths (ie keys)
random.shuffle(ptfs)

# compute the index for splitting the data
idx = 0.8*len(ptfs)

train_ids = ptfs[0:int(idx)]
test_ids = ptfs[int(idx)::]

# double check
print("cut off for 80-20 split:", str(int(idx)))
print("number of training images:", str(len(train_ids)))
print("nubmer of test images:", str(len(test_ids)))

Great, now that we have the train-test split we can train and evaluate a classifier.

In [None]:
# get the training features
train_features = [feat_dict[line] for line in train_ids]
train_features = np.asarray(train_features)

# get the training labels with the look up
train_labels = [[line[0] for line in cls_names if item.split('/')[-2] == line[1]][0] for item in train_ids]
train_labels = np.asarray(train_labels)

# check to make sure these numbers are right. We expect the training features to be a matrix with 
# dimensions [n_images x n_features] and the training labels to be a matrix with dimensions [n_images x 1]
print("train features dim:", train_features.shape)
print("train labels dim:", train_labels.shape)

Now train a Random Forest with 30 trees. 

In [None]:
# invoke an instance of the standardizer class and fit it to the training features
scale_transform = preprocessing.StandardScaler().fit(train_features)

# instantiate the RF
rf_clf = ensemble.RandomForestClassifier(n_estimators=100, n_jobs=8, verbose=1)

# train it. this will take a little longer because the feature space is bigger
rf_clf.fit(scale_transform.transform(train_features), train_labels)

Now that the RF is trained, the independent test data can be run through it for evaluation.

In [None]:
# first get the features and labels for the test set
# get the training features
test_features = [feat_dict[line] for line in test_ids]
test_features = np.asarray(test_features)

# get the training labels with the look up
test_labels = [[line[0] for line in cls_names if item.split('/')[-2] == line[1]][0] for item in test_ids]
test_labels = np.asarray(test_labels)

# check to make sure these numbers are right. We expect the test features to be a matrix with 
# dimensions [n_images x n_features] and the test labels to be a matrix with dimensions [n_images x 1]
print("test features dim:", test_features.shape)
print("test labels dim:", test_labels.shape)

# get the mean accuracy across all the classes
acc = rf_clf.score(scale_transform.transform(test_features), test_labels)

# get the labels for the test set from the classifier
preds = rf_clf.predict(scale_transform.transform(test_features))

# make a confusion matrix
make_confmat(test_labels, preds, acc)

## Compare result to hand-engineered features

The performance of the classifier with the ResNet features is comparable to those from the hand-engineered features. We could probably improve the second stage classifier or attempt to pull more generic features from an earlier layer in the ResNet -- here we took the weights from quite near the end of the network.

The relative ease of generating these features is substantial. Rather than spending time figuring out what to measure from the image, we crank them through the ResNet and use the weights. 

As noted above, we sacrifice some scale information that could be important. Depending on the application, it may be worth the deep features from some generated by hand. 

Given sufficent data, fine tuning a network (module 6) could be a better option. Likewise, if there is a network trained for a specific task close to yours (such as plankton imaging), those weights might be more informative than those generate by a generic object classifier 

## Exercises

To practice the above techniques extract features and train a RF from the ZooScan data.

In [None]:
# reset the dataset path
DATASET_PATH = 

# reset the dataset with the defined preprocessing
dataset = ImageFolderWithPaths(DATASET_PATH, preprocess)

# instantiate a data loader with the a batchsize of 128
loader = torch.utils.data.DataLoader()

# Check that the size and shape are right
images, labels, paths = 

# print the shape of the tensors
print("images:", images.shape)
print("labels:", labels.shape)

Once the DataLoader seems to be working, extract the features.

In [None]:
# write a loop that saves out all the features for the ZooScan data

# initalize an empty dictonary to store the features by image path
feat_dict = {}

# put the network on the GPU
feat_extractor = feat_extractor.cuda()

# tell the network not to compute gradients since we aren't training
with torch.no_grad():
    
    # use the tqdm module to monitor the progress of the extractor
    with tqdm_notebook(loader_all, desc="Evaluating") as t:
        
        # for-loop goes here

After extracting the features, generate a class name key.

In [None]:
# get a list of the unique class names from the dictionary keys


Once the keys are in order, split the data in for training and testing.

In [None]:
# shuffle the files paths (ie keys)
random.shuffle(ptfs)

# compute the index for splitting the data
idx = 

train_ids = 
test_ids =

# double check
print("cut off for 80-20 split:", str(int(idx)))
print("number of training images:", str(len(train_ids)))
print("nubmer of test images:", str(len(test_ids)))

Get the train and test matricies

In [None]:
# get the training features
train_features = 
train_features = np.asarray(train_features)

# get the training labels with the look up
train_labels = [[line[0] for line in cls_names if item.split('/')[-2] == line[1]][0] for item in train_ids]
train_labels = np.asarray(train_labels)

# check to make sure these numbers are right. We expect the training features to be a matrix with 
# dimensions [n_images x n_features] and the training labels to be a matrix with dimensions [n_images x 1]
print("train features dim:", train_features.shape)
print("train labels dim:", train_labels.shape)

# first get the features and labels for the test set
# get the training features
test_features = 
test_features = np.asarray(test_features)

# get the training labels with the look up
test_labels = 
test_labels = np.asarray(test_labels)

# check to make sure these numbers are right. We expect the test features to be a matrix with 
# dimensions [n_images x n_features] and the test labels to be a matrix with dimensions [n_images x 1]
print("test features dim:", test_features.shape)
print("test labels dim:", test_labels.shape)

With the information in hand, train and evaluate a Random Forest with 100 trees

In [None]:
# invoke an instance of the standardizer class and fit it to the training features

# instantiate the RF

# train it. 

# get the mean accuracy across all the classes

# get the labels for the test set from the classifier

# make a confusion matrix
