# Chapter 15 - Classifying Images with Deep Convolutional Neural Networks

Local modelling below. 

Tensorboard Tutorial: https://www.tensorflow.org/tensorboard/get_started

[Colab notebook](https://drive.google.com/file/d/1VGzUW849GlF208pduiQ6UNBmNV4e8y9h/view?usp=sharing)

## Loading and preprocessing the data

In [1]:
## unzips mnist

import sys
import gzip
import shutil
import os


if (sys.version_info > (3, 0)):
    writemode = 'wb'
else:
    writemode = 'w'

zipped_mnist = [f for f in os.listdir('./')
                if f.endswith('ubyte.gz')]
for z in zipped_mnist:
    with gzip.GzipFile(z, mode='rb') as decompressed, open(z[:-3], writemode) as outfile:
        outfile.write(decompressed.read())

In [2]:
import struct
import numpy as np


def load_mnist(path, kind='train'):
    """Load MNIST data from `path`"""
    labels_path = os.path.join(path,
                               '%s-labels-idx1-ubyte'
                                % kind)
    images_path = os.path.join(path,
                               '%s-images-idx3-ubyte'
                               % kind)

    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II',
                                 lbpath.read(8))
        labels = np.fromfile(lbpath,
                             dtype=np.uint8)

    with open(images_path, 'rb') as imgpath:
        magic, num, rows, cols = struct.unpack(">IIII",
                                               imgpath.read(16))
        images = np.fromfile(imgpath,
                             dtype=np.uint8).reshape(len(labels), 784)

    return images, labels


X_data, y_data = load_mnist('./data', kind='train')
print('Rows: %d,  Columns: %d' % (X_data.shape[0], X_data.shape[1]))
X_test, y_test = load_mnist('./data', kind='t10k')
print('Rows: %d,  Columns: %d' % (X_test.shape[0], X_test.shape[1]))

X_train, y_train = X_data[:50000,:], y_data[:50000]
X_valid, y_valid = X_data[50000:,:], y_data[50000:]

print('Training:   ', X_train.shape, y_train.shape)
print('Validation: ', X_valid.shape, y_valid.shape)
print('Test Set:   ', X_test.shape, y_test.shape)

Rows: 60000,  Columns: 784
Rows: 10000,  Columns: 784
Training:    (50000, 784) (50000,)
Validation:  (10000, 784) (10000,)
Test Set:    (10000, 784) (10000,)


### Standardize each collection with respect to training
- Pixel-wise means
- Set-wise standard deviations (avoiding 0 division for constant pixels)

In [3]:
# Standardize data based on training data
mean_vals = np.mean(X_train, axis=0)
std_val = np.std(X_train)

X_train_centered = (X_train - mean_vals)/std_val
X_valid_centered = (X_valid - mean_vals)/std_val
X_test_centered = (X_test - mean_vals)/std_val

del X_data, y_data, X_train, X_valid, X_test

### Encodig and reshaping to 4D tensors

In [4]:
from sklearn.preprocessing import OneHotEncoder #, LabelEncoder
#label_encoder = LabelEncoder()
#y_encoded = label_encoder.fit_transform(y_train)
one_hot_encoder = OneHotEncoder(sparse=False, categories='auto')
Y_train = one_hot_encoder.fit_transform(y_train.reshape(-1,1))
Y_valid = one_hot_encoder.transform(y_valid.reshape(-1,1))
Y_test  = one_hot_encoder.transform(y_test.reshape(-1,1))

X_train_centered = X_train_centered.reshape((X_train_centered.shape[0], 28, 28, 1)) # Grayscale = 1
X_valid_centered = X_valid_centered.reshape((X_valid_centered.shape[0], 28, 28, 1))
X_test_centered  = X_test_centered.reshape((X_test_centered.shape[0], 28, 28, 1))
print(X_train_centered.shape)
print(Y_train.shape)

(50000, 28, 28, 1)
(50000, 10)


## Architectures
- Non-sequential network topology
    - Connections between non-neighbour layers (residual networks, skip-connections)
        - Possibly to all later layers (DenseNet)
    - Parallell filter groups, e.g. series of convoutions in parallel (e.g. Inception cells)
    - Extra input and/or output layers in the network
- Can be modular - blocks of convolutions used several times
- Keras Applications: pre-built, pre-trained
- Implementation requires `keras.Model` (Functional API)

```python
model = Sequential()
model.add(...)      
   ->               
x2 = ... (x1)       
x3 = ... (x2)       
x4 = ... (x2, x3)
```

### Keras' Functional API

In [5]:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,)) # , dtype='int32', name='main_input')

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
# model.fit(data, labels)  # starts training

In [6]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 784)]             0         
                                                                 
 dense (Dense)               (None, 64)                50240     
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 10)                650       
                                                                 
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________


### Residual networks
Add activation from a layer to the pre-activation of a later layer.
<img src="./images/ResNet1.png" alt="Feature map" style="width: 400px;"/>

### Residual networks
- Imagine you are trying to model or approximate a function $H(x)$ with your neural network. 
- The network learn the difference or the "residual" between the input and the desired output. This residual is denoted as $F(x)$, where:

$F(x)=H(x)−x$

So, the network is effectively trying to learn $F(x)$ such that when it is added to the input x, we get closer to $H(x)$.

### Residual networks
- How it Works in ResNet:

- In ResNet, each "residual block" (a couple of layers in the network) aims to learn the residual function $F(x)$. The output of the block is then:

$F(x)+x$ . This addition is facilitated by the skip or shortcut connection, which carries $x$ over the block and adds it to the output.
- Simple Analogy: Think of it as if you are trying to teach someone a new topic. Instead of explaining everything from scratch, you first identify what they already know (the input x) and then focus on teaching what they are missing or what's different (the residual $F(x)$). This often makes the learning process more efficient.

In [7]:
# Add a shortcut/residual to a network (ResNet)
from tensorflow.keras.layers import ReLU, Add

inputs = Input(shape=(784,))

x = Dense(64, activation='relu')(inputs)
shortcut = x           # Branch out

x = Dense(32, activation='relu')(x)
x = Dense(64)(x)       # No activation

x = Add()([shortcut, x]) # Add outputs (Make sure sizes match up)
x = ReLU()(x)
predictions = Dense(10, activation='softmax')(x)

Implementation in Applications:
<img src="./images/ResNet.png" alt="Feature map" style="width: 800px;"/>

## Batch normalization
- Rescale activations to maintain mean and standard deviation close to 0 and 1.
- Especially useful for networks with many layers.
- Not fully understood why it has a positive effect.

## Batch normalization
<img src="./images/batch.png" alt="Feature map" style="width: 800px;"/>

**Example code of what a model definition could look like.**
```python
from tensorflow.keras.layers import BatchNormalization

inputs = Input(shape=(28,28,1))

x = inception_cell(inputs)
x = BatchNormalization()(x)
x = inception_cell(x)
x = BatchNormalization()(x)
x = inception_cell(x)
x = BatchNormalization()(x)
x = inception_cell(x)
x = BatchNormalization()(x)

predictions = Dense(10, activation='softmax')(x)

model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()
```

### License disclaimer (sections below)
- Figures/images shown are from a mix of articles, bloggs, wikis, teaching materials and Python outputs.
- Many figures are reused indiscriminately in bloggs
- https://towardsdatascience.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47
- https://arxiv.org/abs/1505.04597
- https://www.depends-on-the-definition.com/unet-keras-segmenting-images/
- https://towardsdatascience.com/up-sampling-with-transposed-convolution-9ae4f2df52d0
- ...

## Building on a pretrained network
- Reuse existing networks for new/tuned purposes
    - Called applications in Keras
- Main tasks of neural networks on images:
    - Generate meaningful features
    - Combine into objects
    - Distinguish between types of objects
- Strategy:
    - Strip final dense layer(s) (softmax)
    - Freeze network parameters
    - Train new dense layers for specific purpose

### Available networks in Keras (Tensorflow 2.6)
- Xception
- VGG16
- VGG19
- ResNet (50, 101, 152; v1, v2)
- Inception V3
- Inception ResNet V2
- MobileNet (v1, v2, v3Large, v3Small)
- DenseNet (121, 169, 201)
- NASNet (Large, Mobile)
- EfficientNet (B0, ..., B7)

### Let's test one before theorizing

In [8]:
from tensorflow.keras.applications import InceptionV3

conv_base = InceptionV3(weights='imagenet', # Pre-trained on ImageNet data
                  include_top=False,        # Remove classification layer
                  input_shape=(28*3, 28*3, 1*3))  # IncpetionV3 requires at least 75x75 RGB
for layer in conv_base.layers:
    layer.trainable = False
conv_base.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_3 (InputLayer)           [(None, 84, 84, 3)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 41, 41, 32)   864         ['input_3[0][0]']                
                                                                                                  
 batch_normalization (BatchNorm  (None, 41, 41, 32)  96          ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 activation (Activation)        (None, 41, 41, 32)   0           ['batch_normalization[

 ormalization)                                                                                    
                                                                                                  
 batch_normalization_11 (BatchN  (None, 8, 8, 32)    96          ['conv2d_11[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 activation_5 (Activation)      (None, 8, 8, 64)     0           ['batch_normalization_5[0][0]']  
                                                                                                  
 activation_7 (Activation)      (None, 8, 8, 64)     0           ['batch_normalization_7[0][0]']  
                                                                                                  
 activation_10 (Activation)     (None, 8, 8, 96)     0           ['batch_normalization_10[0][0]'] 
          

                                                                                                  
 conv2d_23 (Conv2D)             (None, 8, 8, 96)     55296       ['activation_22[0][0]']          
                                                                                                  
 batch_normalization_20 (BatchN  (None, 8, 8, 48)    144         ['conv2d_20[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 batch_normalization_23 (BatchN  (None, 8, 8, 96)    288         ['conv2d_23[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 activation_20 (Activation)     (None, 8, 8, 48)     0           ['batch_normalization_20[0][0]'] 
          

 conv2d_34 (Conv2D)             (None, 3, 3, 128)    98304       ['mixed3[0][0]']                 
                                                                                                  
 batch_normalization_34 (BatchN  (None, 3, 3, 128)   384         ['conv2d_34[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 activation_34 (Activation)     (None, 3, 3, 128)    0           ['batch_normalization_34[0][0]'] 
                                                                                                  
 conv2d_35 (Conv2D)             (None, 3, 3, 128)    114688      ['activation_34[0][0]']          
                                                                                                  
 batch_normalization_35 (BatchN  (None, 3, 3, 128)   384         ['conv2d_35[0][0]']              
 ormalizat

 activation_44 (Activation)     (None, 3, 3, 160)    0           ['batch_normalization_44[0][0]'] 
                                                                                                  
 conv2d_45 (Conv2D)             (None, 3, 3, 160)    179200      ['activation_44[0][0]']          
                                                                                                  
 batch_normalization_45 (BatchN  (None, 3, 3, 160)   480         ['conv2d_45[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 activation_45 (Activation)     (None, 3, 3, 160)    0           ['batch_normalization_45[0][0]'] 
                                                                                                  
 conv2d_41 (Conv2D)             (None, 3, 3, 160)    122880      ['mixed4[0][0]']                 
          

 ormalization)                                                                                    
                                                                                                  
 activation_55 (Activation)     (None, 3, 3, 160)    0           ['batch_normalization_55[0][0]'] 
                                                                                                  
 conv2d_51 (Conv2D)             (None, 3, 3, 160)    122880      ['mixed5[0][0]']                 
                                                                                                  
 conv2d_56 (Conv2D)             (None, 3, 3, 160)    179200      ['activation_55[0][0]']          
                                                                                                  
 batch_normalization_51 (BatchN  (None, 3, 3, 160)   480         ['conv2d_51[0][0]']              
 ormalization)                                                                                    
          

                                                                                                  
 conv2d_66 (Conv2D)             (None, 3, 3, 192)    258048      ['activation_65[0][0]']          
                                                                                                  
 batch_normalization_61 (BatchN  (None, 3, 3, 192)   576         ['conv2d_61[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 batch_normalization_66 (BatchN  (None, 3, 3, 192)   576         ['conv2d_66[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 activation_61 (Activation)     (None, 3, 3, 192)    0           ['batch_normalization_61[0][0]'] 
          

                                                                                                  
 batch_normalization_74 (BatchN  (None, 3, 3, 192)   576         ['conv2d_74[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 activation_70 (Activation)     (None, 3, 3, 192)    0           ['batch_normalization_70[0][0]'] 
                                                                                                  
 activation_74 (Activation)     (None, 3, 3, 192)    0           ['batch_normalization_74[0][0]'] 
                                                                                                  
 conv2d_71 (Conv2D)             (None, 1, 1, 320)    552960      ['activation_70[0][0]']          
                                                                                                  
 conv2d_75

 activation_82 (Activation)     (None, 1, 1, 384)    0           ['batch_normalization_82[0][0]'] 
                                                                                                  
 activation_83 (Activation)     (None, 1, 1, 384)    0           ['batch_normalization_83[0][0]'] 
                                                                                                  
 batch_normalization_84 (BatchN  (None, 1, 1, 192)   576         ['conv2d_84[0][0]']              
 ormalization)                                                                                    
                                                                                                  
 activation_76 (Activation)     (None, 1, 1, 320)    0           ['batch_normalization_76[0][0]'] 
                                                                                                  
 mixed9_0 (Concatenate)         (None, 1, 1, 768)    0           ['activation_78[0][0]',          
          

                                                                                                  
 activation_85 (Activation)     (None, 1, 1, 320)    0           ['batch_normalization_85[0][0]'] 
                                                                                                  
 mixed9_1 (Concatenate)         (None, 1, 1, 768)    0           ['activation_87[0][0]',          
                                                                  'activation_88[0][0]']          
                                                                                                  
 concatenate_1 (Concatenate)    (None, 1, 1, 768)    0           ['activation_91[0][0]',          
                                                                  'activation_92[0][0]']          
                                                                                                  
 activation_93 (Activation)     (None, 1, 1, 192)    0           ['batch_normalization_93[0][0]'] 
          

### Expand network

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout
from tensorflow.keras import Model

base_out = conv_base.output
base_out = Flatten()(base_out)
base_out = Dense(1024, activation='relu')(base_out)
base_out = Dropout(.5)(base_out)
base_out = Dense(10, activation='softmax')(base_out)
InceptionV3_model = Model(conv_base.input, base_out)

InceptionV3_model.compile(optimizer='adam',
          loss='categorical_crossentropy', 
          metrics=['accuracy'])

### Update data generator with new size 


In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(width_shift_range=0.2,
                            height_shift_range=0.2,
                            zoom_range=0.2,
                            rotation_range=30,
                            vertical_flip=False,
                            horizontal_flip=False)
datagen.fit(X_train_centered)
train_generator = datagen.flow(np.array(X_train_centered.repeat(3,1).repeat(3,2).repeat(3,3)), np.array(Y_train), # <-- Update
                               batch_size=64)

### Train InceptionV3 on MNIST
~ 7m/epoch on teacher's laptop  
Needs images to be at least 75x75, i.e. larger than MNIST's 28x28, therefore .repeat(3,x)

In [None]:
historyFlowInceptionV3 = InceptionV3_model.fit(
    train_generator,
    epochs=10, steps_per_epoch=len(X_train_centered) / 64,
    validation_data=(np.array(X_valid_centered.repeat(3,1).repeat(3,2).repeat(3,3)), np.array(Y_valid)), 
    validation_steps=len(X_valid_centered) / 64)

### Inception networks
- Inception cells / modules
    - Extract features on different scales, concatenate output
    - Use padding="same" to preserve sizes for concatenation
- Usually combined with bottlenecks to reduce channel depth (number of parameters)
  - Convolutional layers sum over input channels after convolution
<img src="./images/Inception_cell.png" alt="Inception cell" style="width: 600px;"/>

### Inception networks - 1x1 Convolution Trick:

One might wonder: won't using multiple filters of varying sizes dramatically increase computational costs? The answer is "yes" – but here's where the Inception architecture introduces a clever trick. Before applying larger filters like 5x5, the network first reduces the depth (number of channels) of the input using 1x1 convolutions. This process, often called "bottleneck layers", compresses the information without losing too much of it, and thus reduces computational costs.

In [None]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Concatenate

def inception_cell(x):
    # 1x1 convolution
    x1 = Conv2D(filters=32, kernel_size=(1,1), strides=(1,1), padding="same", activation="relu")(x)

    # 1x1 + 3x3 convolution
    x3 = Conv2D(filters=32, kernel_size=(1,1), strides=(1,1), padding="same", activation="relu")(x)
    x3 = Conv2D(filters=32, kernel_size=(3,3), strides=(1,1), padding="same", activation="relu")(x3)

    # 1x1 + 5x5 convolution
    x5 = Conv2D(filters=32, kernel_size=(1,1), strides=(1,1), padding="same", activation="relu")(x)
    x5 = Conv2D(filters=32, kernel_size=(5,5), strides=(1,1), padding="same", activation="relu")(x5)
    
    # MaxPool + 1x1 convolution
    xp = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding="same")(x)
    xp = Conv2D(filters=32, kernel_size=(1,1), strides=(1,1), padding="same", activation="relu")(xp)
 
    x = Concatenate()([x1, x3, x5, xp])
    return(x)
    
inputs = Input(shape=(28,28,1))

x = inception_cell(inputs)
x = inception_cell(x)

predictions = Dense(10, activation='softmax')(x)

### Inception V1
<img src="./images/InceptionV1.png" alt="Inception V1" style="width: 900px;"/>
~5 million parameters

### Kernel stacking
- Larger spatial filters are more expressive and able to extract features at a larger scale (larger receptive field).
- Stacking two 3x3 filters covers the same space as one 5x5 filter.
    - 5x5xc = 25c parameters vs 2 x 3x3xc = 18c parameters
- Filter stacking increases focus in the centre of the larger filters.
<img src="./images/Conv_5x5_3x3.png" alt="Inception cell" style="width: 500px;"/>

### Kernel stacking
- Most accurate representation with linear (no) activation.
- Best performance activating (e.g. ReLU) between stacked layers too.

In [15]:
def Conv2D_stack_n_1(x, n):
    xn = Conv2D(filters=32, kernel_size=(n,1), strides=(1,1), padding="same", activation="relu")(x)
    xn = Conv2D(filters=32, kernel_size=(1,n), strides=(1,1), padding="same", activation="relu")(xn)
    return(xn)

### Atrous / dilated convolutions
- Increased receptive field by spreading kernel
- Example with 3x3 kernel over 5x5 area used on 7x7 image (no padding, stride 1)
- In Keras, when using a 3×3 filter with a dilation_rate of 2×2, the kernel effectively spans a 5×5 receptive field on the input. However, the kernel retains the original 3×3 weights, so the number of parameters doesn't change, but the spatial context it captures expands.
<img src="./images/Dilated_convolution.png" alt="Dilated" style="width: 800px;"/>
(From "A guide to convolution arithmetic for deep learning", Dumoulin et al. 2018)


### Max pooling revisited
- In addition to reducing dimensions:
  - Increases receptive field by size of filter
  - Multiple max pools multiply this effect
<img src="./images/15_06.png" alt="Sub-sampling" style="width: 700px;"/>

<img src="./images/Architectures.jpg" alt="Architectures" style="width: 900px;"/>  
[Comparison with EfficientNet](https://github.com/qubvel/efficientnet#about-efficientnet-models) - notice placement of Xception

### Early stopping
- Regardless of optimisation strategy, loss may plateau
- Stopp iterations based on:
    - Minimum loss change over epochs
    - Reached certain threshold
  
### Continued optimisation
- Save weights in a model
- Continue fitting
    - Reset decay/momentum
    - Learning rate scheduling, e.g. cosine based
    - Different optimiser

## Initialization of weights
- Can be important for convergence.
- Variations of truncated normal distributions are often used (> +/-2 stddev are redrawn).
- Dense and Conv2D have the Glorut normal initializer as default:
    - Truncated normal distribution with stddev = sqrt(2 / (fan_in + fan_out)), where fan_in is the number of input units in the weight tensor and  fan_out is the number of output units in the weight tensor.
- Biases are usually initialized as 0s.
- Feature scales typically need to match weight initializations (and possible weight restrictions)
    - uint8 usually rescaled by 1/255.0

## More regularization
- We have regularized the network through dropout.
- L1 and/or L2 norm regularization can be added to kernels, biases and activatitions separately for each layer.

### Gradient Clipping
- Gradient clipping is a technique that tackles exploding gradients. The idea of gradient clipping is: If the gradient gets too large (> threshold), we rescale it to keep it small. 

## Back propagation in CNN
- Sliding the Filter:
When we perform a convolution, we slide the filter across the input feature map. For a given position of the filter, we get a single value in the output feature map.

- Backpropagation through the Convolution:
When computing the gradients, consider each position of the filter separately. For each position, you compute how the output would change with a small change to the filter value at that position.

-  Gradient Accumulation:
As a result of the filter sliding across the input, and being involved in the computation of many output values, the gradient for a specific weight in the filter is accumulated over all the positions where that weight was used.

### Forward pass
<img src="./images/Conv_forward.gif" alt="Forward convolution" style="width: 800px;"/>
<img src="./images/Conv_forward.png" alt="Forward convolution" style="width: 350px;"/>

- Illustrations from: [Back Propagation in Convolutional Neural Networks — Intuition and Code](https://becominghuman.ai/back-propagation-in-convolutional-neural-networks-intuition-and-code-714ef1c38199)

### Filter gradient
- Sums up local contributions (patch * output gradient)
- Each weight in the filter contributes to each pixel in the output map. Thus, any change in a weight in the filter will affect all the output pixels. 
- Thus, all these changes add up to contribute to the final loss. Thus, we can easily calculate the derivatives as follows.

<img src="./images/Conv_backward_filter.gif" alt="Backward convolution (filter)" style="width: 800px;"/>
<img src="./images/Conv_backward_filter.png" alt="Backward convolution (filter)" style="width: 350px;"/>

### Automatic tuning
- [Keras Tuner](https://keras-team.github.io/keras-tuner)  
- Random search over a grid/interval of hyperparameters
    - E.g. number of units, learning rate, number of layers, etc.
- Hyperband optimisation
- Bayesian optimization
- Use with care, can run for days if run too wide

## Test time augmentation
- Augmentation of images during test data prediction
- May seem counter intuitive
    - Why give the model something difficult?
- Rationale: Give the model several chances to get it correct, then average
- Simplest way: Re-use ImageDataGenerator from training data on test data
    - No re-fitting needed  
[Test Time Augmentation](https://towardsdatascience.com/test-time-augmentation-tta-and-how-to-perform-it-with-keras-4ac19b67fb4d)

## Fashion MNIST example
https://colab.research.google.com/drive/1hLWoP9P9alkso2ih5cpUi-EGX4I6teFJ?usp=sharing

[Dog Breed Identification on Kaggle](https://www.kaggle.com/khliland/dog-breed-identification)