<a href="https://colab.research.google.com/github/amareyah/MaxinAI_Education/blob/master/dl_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
pip install fastai2

In [0]:
def cnn_dim(in_shape, ker_shape, stride, padding):
    """
    Calculate output dimension of the convolutional layer
    Args:
       in_shape: input shape - height or width
       ker_shape: kernel shape - height or width
       stride: stride
       padding: padding
    
    Returns:
        output dimension of convolutional layer
    """
    out_shape = (in_shape - ker_shape + 2 * padding) / stride + 1
    out_shape = int(out_shape)
    
    return out_shape

out_dim = cnn_dim(224, 3, 1, 1)
print(f'out_dim = {out_dim}')

In [0]:
from torchvision.models import vgg16
model = vgg16(pretrained=True)

In [0]:
from torchvision.models import resnet50
model = resnet50(pretrained=True)

In [0]:
from torchvision.models import resnext101_32x8d
model = resnext101_32x8d(pretrained=False)

There are different ResNet models:
- ResNet18
- ResNet34
- ResNet50
- ResNet101
- ResNet1001

Inception-ResNet
- Inception-ResNet A Block
- Inception-ResNet B Block
- Inception-ResNet C Block
- Reduction A Block
- Reduction B Block

ResNeXt architectures:
- ResNext50_32X4d
- ResNext50_64X4d
- ResNext101_32X4d
- ResNext101_64X4d

Other architectures
- DenceNet
- TraceNet
- EfficientNet

see details in **lecture_16_convolutional_neural_networks.ipynb**

**Global Average Pooling Layer:**  For each feature map takes the average value of all nodes and maps that value to one node. Number of in_channels = Number of out_channels.

**Adaptive Average Pooling Layer:**
For each feature map applies a 2D adaptive average pooling over an input signal. The output is of size H x W, **for any input size**. The number of output features is equal to the number of input planes.

## Feature extraction / embedding
Let's take one of the pre-trained (on ImageNet) models, VGG, Inception, ResNet, etc and remove all the last layers before convolutional layers:

- For VGG16 remove last two fully connected layers
- For Inception and ResNet remove all the layer after adaptive (global) average pooling

So our model generates vector from the image

In [0]:
import torch
from torch import nn
from torchvision.models import resnet50, resnet34, vgg16

In [0]:
net = vgg16(pretrained=True)
net

In [0]:
model = nn.Sequential(*list(net.children())[:-1])
model

In [0]:
x = torch.randn(1, 3, 399, 399)
with torch.no_grad():
    y1 = net(x)
    y2 = model(x)

In [0]:
y1.size()

In [0]:
y2.size()

In [0]:
y2 = torch.flatten(y2, 1)
y2.size()

So we have $2048$ dimensional vectors, we can run our model on the our dataset of images and generate $2048$ dimensional vectors.
$$
f: \mathbb{R}^{3 \times H \times W} \mapsto \mathbb{R}^d
$$
<br>
Our model maps each $C \times H \times W$ (they might be different for adaptive average pooling) dimensional image to the fixed $d$ dimensional vector. 

Vectors have "distance" property.
<br>
If we store this vectors and run **K-nearest neighbor** search we can observe that similarity search is working even if our dataset was not used during the training.
<br>
Note: Search results depend on model and domain of training set and dataset
<br>

Last layer **Dimensionality Reduction**

**Visualize the space of feature vectors** by reducing dimensionality of vectors from 4096 to 2 dimensions.
Simpple algorith: **Principle Component Analysis (PCA)**
More comlex: **t-SNE**

## Transfer-learning

We can see that first layers extract essential features which are pretty similar for all images. Second layers extract more complex features and last layers more domain-specific features
Can we use this information for different task. Would it be enough information, enough features if use it pre-trained model on the different dataset?

With the following approach:
- We extract features from the images with the pre-trained model
- Train different model with this features

Turns out that this approach works and it's called transfer earning. For transfer learning we should consider the following:
- Is the model is trained on the similar domain
- Is the model trained on the enough data

The state-of-the art result achieved with model trained on ImageNet classification task:
- It has different and well-distributed images
- More precise labeled
- Or it has enough images to extract "all-possible" features

There are several approaches:
- Use extracted features and train different model
- Freeze the weights and train only classifier
- Fine-tune whole model with discriminative learning rates

First approach needs pre-extraction of the feature vectors and training different model on them:

- Extract features
- Train different classifier (SVM, RF, GB) on them

For the second approach we put our layers on top the model and train it:
- Put custom layers on model
- Freeze feature extraction layers weights
- Train custom layer

In [0]:
model_fn = nn.Sequential(*list(model.children()) + [nn.Linear(25088, 500), nn.Dropout(p=0.3), nn.Linear(500, 20)])
model_fn

For third approach, we put our layers on top the model and train it with different learning rate:

- Put custom layers on model
- Train full model using larger learning rate for last layers, smaller maybe  $\frac{1}{100}$  for the middle layers and  $\frac{1}{1000}$  for the first layers.

Pre-trained classifier also used for different tasks
- Segmentation
- Detection
- Image search / metric learning
- Auto-encoders
- GAN
- etc

In [0]:
from sklearn.datasets import fetch_openml

In [0]:
X, y_str = fetch_openml('mnist_784', version=1, return_X_y=True)

In [0]:
img1 = X[1].reshape(28, 28)
plt.imshow(img1)