<img src="./images/cads-logo.png" style="height: 100px;" align=left> 
<img src="./images/keras-logo.png" style="height: 85px;" align=right>

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True' # temporary to fix MKL related error

## CNN

Convolutional Neural Networks are very similar to ordinary Neural Networks. They are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. And they still have a loss function on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.

So what changes? ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.

Recall: Regular Neural Nets. As we saw in the previous chapter, Neural Networks receive an input (a single vector), and transform it through a series of hidden layers. Each hidden layer is made up of a set of neurons, where each neuron is fully connected to all neurons in the previous layer, and where neurons in a single layer function completely independently and do not share any connections. The last fully-connected layer is called the “output layer” and in classification settings it represents the class scores.

Regular Neural Nets don’t scale well to full images. In CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights. This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images. For example, an image of more respectable size, e.g. 200x200x3, would lead to neurons that have 200*200*3 = 120,000 weights. Moreover, we would almost certainly want to have several such neurons, so the parameters would add up quickly! Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.
<img src='images/WeightSharing.png'>
<img src='images/no_padding_strides.gif'>

## Maxpooling

It is common to periodically insert a Pooling layer in-between successive Conv layers in a ConvNet architecture. Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. The most common form is a pooling layer with filters of size 2x2 applied with a stride of 2 downsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations. Every MAX operation would in this case be taking a max over 4 numbers (little 2x2 region in some depth slice). The depth dimension remains unchanged. More generally, the pooling layer:
<img src='images/Maxpool.png'>

## Adding dropout

Dropout is one of the most effective and most commonly used regularization techniques for neural networks. Dropout, applied to a layer, consists of randomly dropping out (setting to zero) a number of output features of the layer during training. Let’s say a given layer would normally return a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a given input sample during training. After applying dropout, this vector will have a few zero entries distributed at random: for example, [0, 0.5, 1.3, 0, 1.1]. The dropout rate is the fraction of the features that are zeroed out; it’s usually set between 0.2 and 0.5. At test time, no units are dropped out; instead, the layer’s output values are scaled down by a factor equal to the dropout rate, to balance for the fact that more units are active than at training time. The core idea is that introducing noise in the output values of a layer can break up happenstance patterns that aren’t significant.
<img src='images/dropout.png'>

In [None]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dropout(rate=0.2))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

**Exercise: Train the network using MNIST dataset**

In [None]:
#MC



from keras.datasets import mnist
from keras.utils import to_categorical
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images[:2000]
train_labels = train_labels[:2000]

train_images = train_images.reshape((-1, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((-1, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=20, batch_size=64)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

**Exercise: Now design your network and train it using Fasion MNIST**

In [None]:
from keras.datasets import fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images[:2000]
train_labels = train_labels[:2000]
print(f'train_images shape {train_images.shape}')
print(f'test_images shape {test_images.shape}')
print(f'train_labels shape {train_labels.shape}')

In [None]:
image = train_images[5]
plt.imshow(image, cmap=plt.cm.binary)

In [None]:
#MC





model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dropout(rate=0.2))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

train_images = train_images.reshape((-1, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((-1, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=20, batch_size=64)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

## Data Augmentation

Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, your model will never see the exact same picture twice. This helps expose the model to more aspects of the data and generalize better.

In [None]:
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
    #width_shift_range=0.2,
    #height_shift_range=0.2,
    #horizontal_flip=True
    #zoom_range=0.2,
    rotation_range=20,
)

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dropout(rate=0.2))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images[:2000]
train_labels = train_labels[:2000]

train_images = train_images.reshape((-1, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((-1, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
batch_size = 64
test_labels = to_categorical(test_labels)
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit_generator(datagen.flow(train_images, train_labels, batch_size=batch_size), 
                    steps_per_epoch=len(train_images) / batch_size, epochs=20)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

**Exercise: Now apply data augmentation for Fashion MNIST**

In [None]:
#MC



from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
    #width_shift_range=0.2,
    #height_shift_range=0.2,
    horizontal_flip=True
    #zoom_range=0.2,
    #rotation_range=20,
)


model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dropout(rate=0.2))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images[:2000]
train_labels = train_labels[:2000]

train_images = train_images.reshape((-1, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((-1, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
batch_size = 64
test_labels = to_categorical(test_labels)
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit_generator(datagen.flow(train_images, train_labels, batch_size=batch_size), 
                    steps_per_epoch=len(train_images) / batch_size, epochs=20)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

## Transfer Learning

A common and highly effective approach to deep learning on small image datasets is to use a pretrained network. A pretrained network is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. If this original dataset is large enough and general enough, then the spatial hierarchy of features learned by the pretrained network can effectively act as a generic model of the visual world, and hence its features can prove useful for many different computer vision problems, even though these new problems may involve completely different classes than those of the original task. For instance, you might train a network on ImageNet (where classes are mostly animals and everyday objects) and then repurpose this trained network for something as remote as identifying furniture items in images. Such portability of learned features across different problems is a key advantage of deep learning compared to many older, shallow-learning approaches, and it makes deep learning very effective for small-data problems.

In [None]:
batch_size = 64
train_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
    'datasets/fruits/training',
    target_size=(100, 100),
    batch_size=batch_size,
    class_mode='categorical',
    classes = ['Banana', 'Apricot']
)
test_generator = train_datagen.flow_from_directory(
    'datasets/fruits/testing',
    target_size=(100, 100),
    batch_size=batch_size,
    class_mode='categorical',
    classes = ['Banana', 'Apricot']
)

In [None]:
class_indices = train_generator.class_indices
labels = [0]*len(class_indices)
for label, ind in class_indices.items():
    labels[ind] = label
    
f = plt.figure(figsize=(12,6))
for bat_x,batch_y in train_generator:
    for i in range(10):
        x = bat_x[i]
        y = batch_y[i].argmax()
        label= labels[y]
        plt.subplot(2,5,i+1,title=label)
        plt.imshow(x, )
        
    break

In [None]:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(100, 100, 3))
conv_base.summary()

In [None]:
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(len(class_indices), activation='softmax'))
conv_base.trainable = False

In [None]:
from keras import optimizers
model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.Adam(1e-5),
              metrics=['acc'])

history = model.fit_generator(
      train_generator,
      steps_per_epoch=900/64,
      epochs=1,
      )
loss, accuracy = model.evaluate_generator(test_generator, steps=300/batch_size)
print('Accuracy',accuracy)

In [None]:
class_indices = test_generator.class_indices
labels = [0]*len(class_indices)
for label, ind in class_indices.items():
    labels[ind] = label
    
f = plt.figure(figsize=(12,6))
for bat_x,batch_y in test_generator:
    predictions = model.predict(bat_x)
    for i in range(10):
        x = bat_x[i]
        y = predictions[i].argmax()
        label= labels[y]
        plt.subplot(2,5,i+1,title=label)
        plt.imshow(x, )
        
    break

**Exercise: try to train the model using different fruits, network, learning rate or any other parameters**

**try to change conv_base.trainable to true**