As in the previous week we start by mounting our drive folder:

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

And we import tensorflow and keras to our workspace:

In [None]:
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras

# Convolutional Neural Networks
We will now introduce convolutional neural networks, also known as CNNs, a type of deep-learning model almost universally used in computer vision applications. We will see how to apply convnets to image-classification problems.

Standard CNNs are basically a stack of **Convolutional** layers (`keras.layers.Conv2D`) followed by **Pooling** layers (`keras.layers.MaxPooling2D`) with interleaved non-linear activation functions. Let us understand this better with some examples:

## Convolutional filters and pooling over batches of images
In TensorFlow, each input image is typically represented as a 3D tensor of shape `[height, width, channels]`. A mini-batch is represented as a 4D tensor of shape `[mini-batch size, height, width, channels]`. The weights of a
convolutional layer are represented as a 4D tensor of shape `[f_h,f_w,f_m,f_n]`.

Let’s look at a simple example. The following code loads two sample images, using `scikit-Learn`'s `load_sample_image()` (which loads two color images, one of a Chinese temple, and the other of a flower) and stacks them into a single `numpy` array:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_sample_image

# Load sample images
china = load_sample_image("china.jpg") / 255
flower = load_sample_image("flower.jpg") / 255
images = np.array([china, flower])
batch_size, height, width, channels = images.shape

In [None]:
images.shape

In [None]:
f, ax = plt.subplots(figsize=(10,10), nrows=1, ncols=2)
ax[0].imshow(images[0,:,:,:])
ax[0].axis('off')
ax[1].imshow(images[1,:,:,:])
ax[1].axis('off')
plt.show();

Let us create two very simple filters and apply them to both images, displaying the result:

In [None]:
# Create 2 filters
filters = np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)
filters[:, 3, :, 0] = 1 # vertical line
filters[3, :, :, 1] = 1 # horizontal line

In [None]:
f, ax = plt.subplots(nrows=1, ncols=2)
ax[0].imshow(filters[:,:,:,0])
ax[1].imshow(filters[:,:,:,1])
plt.show();

In [None]:
outputs = tf.nn.conv2d(images, filters, strides=1, padding="SAME")

In [None]:
outputs.shape

In [None]:
f, ax = plt.subplots(figsize=(15,15), nrows=1, ncols=2)
ax[0].imshow(images[0,:,:])
ax[0].axis('off')
ax[1].imshow(outputs[0,:,:,0], cmap='gray')
ax[1].axis('off')
plt.show();

In a CNN we use lots of these filters, but we don't want to have to specify them by hand as above. Instead, we want to learn them from our training data. The idea is to keep filtering an image and downsampling the result until we end up with a long one dimensional array of numbers (features) that we can feed to one (or more) Fully-Connected layer that will map that array to a prediction.

As an example of this, have a look at this CNN architecture, known as VGG16:

![](https://neurohive.io/wp-content/uploads/2018/11/vgg16-1-e1542731207177.png)

In Keras/Tensorflow we perform downsampling with max-pooling layers:

In [None]:
# code here

In [None]:
images.shape, outputs.shape

In [None]:
f, ax = plt.subplots(figsize=(15,15), nrows=1, ncols=2)
ax[0].imshow(images[0,:,:,0], cmap='gray')
ax[1].imshow(outputs[0,:,:,0], cmap='gray')
plt.show();

## Builindg a CNN in Keras:
We will implement a simple CNN to solve the same problem as last week, namely classifying clothes in the Fashion-MNIST dataset. This is quite similar in spirit to the above picture. Let us first build the convolutional part of our model:

In [None]:
from tensorflow.keras import layers
from tensorflow.keras import models

In [None]:
model = models.Sequential()
# code here

Note that a CNN takes as input tensors of shape `(batch_size, image_height, image_width, image_channels)`, so we don't need to flatten the input as we did last week. In this case, we configure the CNN to process inputs of size `(28, 28, 1)`, which is the dimensions of Fashion-MNIST images. We do this by passing the argument `input_shape=(28, 28, 1)` to the first layer. Let's display the architecture of the CNN so far:

In [None]:
# code here

We see that the output of every `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape `(height, width,channels)`. The width and height dimensions tend to shrink as we go deeper in the network. The number of channels is controlled by the first argument passed to the `Conv2D` layers (32 or 64).

The next step is to feed the last output tensor (of shape (3, 3, 64)) into a densely connected classifier network like those we saw last week: a stack of `Dense` layers. This module will process vectors, which are 1D, whereas the current output is a 3D tensor. Therefore, we first have to flatten the 3D outputs to 1D, and then add a few Dense layers on top.

We do 10-category classification, using a final layer with 10 outputs and a softmax activation. Here's what the network looks like now:

In [None]:
# code here

As you can see, the (3, 3, 64) outputs are flattened into vectors of shape `(576,)` before going through two Dense layers.

Let us now train this CNN on our data. The piece of code below is basically the same as last week:

In [None]:
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

In [None]:
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] /255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

In [None]:
X_train.shape, X_valid.shape

In [None]:
X_train=X_train.reshape((55000, 28, 28,1))
X_valid=X_valid.reshape((5000, 28, 28,1))

In [None]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])

In [None]:
# code here

# Transfer Learning
You will probably sometimes hear that deep learning only works when lots of data is available. This is valid in part: one fundamental characteristic of deep learning is that it can find interesting features in the training data on its own, without any need for manual feature engineering, and this can only be achieved when lots of training examples are available. This is particularly true for problems where the input samples are very high-dimensional, like images.

Fortunately, Deep Learning models are highly repurposable: you can take, say, an image-classification or speech-to-text model trained on a large-scale dataset and reuse it on a significantly different problem with only minor changes. Specifically in the case of computer vision, many pretrained models (usually trained on the Image-
Net dataset) are now publicly available for download and allow to train highly accurate vision models out of very little data.

We will do so now using dataset intended to perform dogs vs cats classification from a popular Kaggle competition:


This dataset contains 25,000 images of dogs and cats (12,500 from each class)I have created a simplified version of this that already  contains three subsets: a training set with 1,000 samples of each class, a validation set with 500 samples of each class, and a test set with 500 samples of each class. The piece of code below should unzip it into your Drive folder:

In [None]:
!unzip /content/gdrive/My\ Drive/LAB7/cats_and_dogs_small.zip -d /content/gdrive/My\ Drive/LAB7/

As a sanity check, let’s count how many pictures are in each training split (train/validation/test):

In [None]:
import os
import os.path as osp

base_dir = '/content/gdrive/My Drive/LAB7/cats_and_dogs_small'

train_dir = osp.join(base_dir, 'train')
validation_dir = osp.join(base_dir, 'validation')
test_dir = osp.join(base_dir, 'test')

train_cats_dir = osp.join(train_dir, 'cats')
train_dogs_dir = osp.join(train_dir, 'dogs')

validation_cats_dir = osp.join(validation_dir, 'cats')
validation_dogs_dir = osp.join(validation_dir, 'dogs')

test_cats_dir = osp.join(test_dir, 'cats')
test_dogs_dir = os.path.join(test_dir, 'dogs')

In [None]:
print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))

### Reading data and data augmentation
As we know, data should be formatted into appropriately preprocessed floating-point tensors before being fed into the network. Currently, the data is on a drive as JPEG files, so the steps for getting it into the network are roughly as follows:
1. Read the picture files.
2. Decode the JPEG content to RGB grids of pixels.
3. Convert these into floating-point tensors.
4. Rescale the pixel values (between 0 and 255) to the [0, 1] interval..
Keras has utilities to take care of these steps automatically. Keras has a module with image-processing helper tools, located at keras.preprocessing.image. In particular, it contains the class `ImageDataGenerator`, which lets you quickly set up Python generators that can automatically turn image files on disk into batches of preprocessed tensors:

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# code here

Let’s look at the output of one of these generators: they yield batches of 150x150 RGB images (shape=(20, 150, 150, 3)) and binary labels (shape (20,) ). There are 20 samples in each batch (the batch size).

In [None]:
for data_batch, labels_batch in train_generator:
    print('data batch shape:', data_batch.shape)
    print('labels batch shape:', labels_batch.shape)
    break

### Data Augmentation
Overfitting is caused by having too few samples to learn from, making it very hard to train a model that can generalize to new data. For this reason, it is useful to artificially generate more training data from existing training samples, by augmenting the samples via a number of random transformations.

In Keras, this can be done by configuring a number of random transformations to be performed on the images read by the `ImageDataGenerator` instance:

In [None]:
# code here

In [None]:
from tensorflow.keras.preprocessing import image
fnames = [os.path.join(train_cats_dir, fname) for fname in os.listdir(train_cats_dir)]

img_path = fnames[3] # choose one image to augment
img = image.load_img(img_path, target_size=(150, 150)) # load it and resize to 150x150
x = image.img_to_array(img)
x = x.reshape((1,) + x.shape) # add a fake batch dimension
plt.imshow(image.array_to_img(x[0]))

In [None]:
i = 0
for batch in train_datagen.flow(x, batch_size=1):
    plt.figure(i)
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4 == 0:
        break
plt.show()

If we train a new network using this data-augmentation configuration, the network will never see the same input twice, which will make it harder to overfit to our training data.

### Using a Pre-Trained CNN

You pass three arguments to the constructor:
* `weights` specifies the weight checkpoint from which to initialize the model.
* `include_top` refers to including (or not) the densely connected classifier on top of the network. By default, this densely connected classifier corresponds to the 1,000 classes from ImageNet. Because you intend to use your own densely connected classifier (with only two classes: cat and dog), you don’t need to include it.
* `input_shape` is the shape of the image tensors that you’ll feed to the network. This argument is purely optional: if you don’t pass it, the network will be able to process inputs of any size.

In [None]:
# code here

In [None]:
conv_base.summary()

The final feature map has shape (4, 4, 512). That’s the feature on top of which we will stick a densely connected classifier. We will now perform fine-tuning on our CNN. This means that we give the following two steps:

1. Add our custom network on top of an already-trained base network.
2. Jointly train both these layers and the part we added.

#### Add a network on top of the pretrained CNN

In [None]:
# code here

#### Train the entire CNN

In [None]:
val_datagen = ImageDataGenerator(rescale=1./255) # note, we do not augment validation images
validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(150, 150),
                                                        batch_size=20, class_mode='binary')

# code here

In [None]:
# code here

## 1-D CNNs

In Keras, you use a 1D convnet via the Conv1D layer, which has an interface similar to Conv2D. It takes as input 3D tensors with shape (samples, time, features) and returns similarly shaped 3D tensors. The convolution window is a 1D window on the temporal axis: axis 1 in the input tensor. 

Let’s build a simple two-layer 1D convnet and apply it to a sentiment classification task from movie reviews. For that, we work with the IMDB dataset: a set of 50,000 highly polarized reviews from the Internet Movie Database. They’re split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting of 50% negative and 50% positive reviews. The IMDB dataset comes packaged with Keras. It has already been preprocessed: the reviews (sequences of words) have been turned into sequences of integers, where each integer stands for a specific word in a dictionary. The following code will download and prepare the dataset:

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

max_features = 10000
max_len = 500
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

The argument `max_features=10000` means we only keep the top 10,000 most frequently occurring words in the training data. Rare words will be discarded. This allows to work with vector data of manageable size. We also fix the maximum size of a review to be `max_len=500`.

The variables `x_train` and `x_test` are lists of reviews; each review is a list of word indices (encoding a sequence of words). `y_train` and `y_test` are lists of 0s and 1s, where 0 stands for negative and 1 stands for positive:

In [None]:
print(y_train[0])

1D convnets are structured in the same way as their 2D counterparts, which we used above: they consist of a stack of Conv1D and MaxPooling1D layers, ending in either a global pooling layer or a Flatten layer, that turn the `batch_size x 2D` outputs into `batch_size x 1D` outputs, allowing us to add one or more `Dense` layers to the model for classification or regression.

One difference, though, is the fact that we can afford to use larger convolution windows with 1D convnets. With a 2D convolution layer, a `3x3` convolution window contains `3x3 = 9` learnable weights; but with a 1D convolution layer, a convolution window of size 3 contains only 3 weights. We can thus easily afford 1D convolution windows of size 7 or 9.

In [None]:
model = Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))

In [None]:
model.summary()

In [None]:
from tensorflow.keras.optimizers import RMSprop

In [None]:
model.compile(optimizer=RMSprop(lr=1e-4), loss='binary_crossentropy', metrics=['acc'])
history = model.fit(x_train, y_train, epochs=10,batch_size=128,validation_split=0.2)