# Deep Learning for Computer Vision using Convolutional Neural Networks

With dense layers applied to images, we have learnt global patterns that can be exploited to make predictions. The main difference with **convolutional** layers, is that we will now learn local patterns. 

These local patterns have two important features:
* They are translation invariant. It does not matter where in the image we see the pattern, the layer will be able to capture it and exploit it. In contrast, for a dense layer, if the same local pattern appears in two different location in the image, it would interpret them as two different patterns.
* Convolutional layers can lear a spatial hierarchy of patterns. Imagine the problem of recognizing a *face*. A first layer would learn something about the *nose*, some other layers about the *eyes*, and so on. The aggregation 

![](./imgs/09_cnn_hierarchy.png)

The input to a convolutional layer is a 3-D tensor: height, width and depth (the channels in the image). For RGB images, depth will be 3, for the *R*ed, *B*lue and *G*reen colors. For a black and white image, depth will be just 1.

Each layer recognizes a patch (subset) of the image, with a specific pattern. When applied to the original input, the layer will filter the rest of the image, highlighting the pattern that has learnt. That is, the layer becomes a **features map**.

## Convolution operation

For a convolutional layer, we need to decide the size of the patches (commonly, 3x3), and the depth of the output of the feature map (it is no longer 3, and in fact, it will be a number larger than that -- 16, 32, 64). The output will be another tensor, that is the input to the next layers. These tensors will no longer be images; that is, 3D tensors with a depth of 3, etc. To transform the output into a spatial tensor, we can use padding (adding additional rows or columns).

![](./imgs/10_convolution.png)

### Padding

The convolution operation will slide through the image, trying to cover different zones, to extract common patterns found in different locations. In the edges of the image, the layer will not be able to extract patches, because the regions will be smaller than the patch size. With padding, we make it possible for the layer to extract patches even in the edges of the image, thus using that part of the image too to try to identify a common pattern.

###  Stride

Another parameter that we must take into account is *striding*. The patches can overlap with other patches. The distance between two windows used to extract patches is called **stride**. For instance, with a patch size of 3x3, and a stride of 3, patches will not overlap. We will normally will try to avoid overlapping in the windows extracting the patches; unless that during the training and validation process, we need to change the parameters to obtain a better model.

### Pooling

In a network, we need to partially reduce the dimension of the data, layer after layer, so we can learn at the output layer a number, a vector of a certain size, etc. The convolution operation is in fact increasing the size of the output. How do we do reduce the size of the output learnt in each layer? By **pooling**.

The most typical pooling operation is *max pooling*. For each patch learnt in the layer, we apply a window of 2x2 or less (smaller than the striding window), and then apply the max operation. For each 2x2 possible window, we keep the max in that window. This way, we are reducing the size of the patches, and the size of the output of the layer. By doing this reduction in the size of the output, we will also help the network to build a hierarchy of patterns.

Other pooling operations are also possible: averaging, in different ways. For full details of what pooling operations are available in Keras see https://keras.io/layers/pooling/

## Additional readings/videos

* How CNNs work: https://www.youtube.com/watch?v=FmpDIaiMIeA
* Deep Learning demystified: http://brohrer.github.io/deep_learning_demystified.html
* Hot-dog? No hot-dog? http://mateos.io/blog/getting-some-hotdogs/

# Common functions and download data



In [39]:
%pylab inline
plt.style.use('seaborn-talk')

Populating the interactive namespace from numpy and matplotlib


In [0]:
def plot_metric(history, metric):
    history_dict = history.history
    values = history_dict[metric]
    if 'val_' + metric in history_dict.keys():  
        val_values = history_dict['val_' + metric]

    epochs = range(1, len(values) + 1)

    if 'val_' + metric in history_dict.keys():  
        plt.plot(epochs, val_values, label='Validation')
    plt.semilogy(epochs, values, label='Training')

    if 'val_' + metric in history_dict.keys():  
        plt.title('Training and validation %s' % metric)
    else:
        plt.title('Training %s' % metric)
    plt.xlabel('Epochs')
    plt.ylabel(metric.capitalize())
    plt.legend()
    plt.grid()

    plt.show()  

In [41]:
!pip install keras



In [0]:
#Lo siguiente es para coger los datos de Drive, para ejecutar los datos en el framework de colab

In [0]:
# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz

In [44]:
!ls

adc.json  dogs_cats  dogs_cats_small  sample_data  test1.zip  train.zip


In [0]:
#Ya he podido autenticarme. Vemos que hay dos ficheros. Con el id de fichero nos bajamos las imagnees tageadas (labeladas)

In [0]:
file_id = '1nL7cgXGkNGS79FORsrCfrfcpzrBtoX8K'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile("train.zip")

file_id = '1edO-psKzj7gpYgf5PcDKyPgbFFJN09D3'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile("test1.zip")

In [47]:
!ls

adc.json  dogs_cats  dogs_cats_small  sample_data  test1.zip  train.zip


In [48]:
!ls -hl

total 815M
-rw-r--r-- 1 root root 2.5K Sep 21 18:06 adc.json
drwxr-xr-x 4 root root 4.0K Sep 21 18:08 dogs_cats
drwxr-xr-x 5 root root 4.0K Sep 21 18:12 dogs_cats_small
drwxr-xr-x 2 root root 4.0K Sep 20 00:09 sample_data
-rw-r--r-- 1 root root 272M Sep 21 18:51 test1.zip
-rw-r--r-- 1 root root 544M Sep 21 18:51 train.zip


In [49]:
!mkdir dogs_cats
!cd dogs_cats && unzip -q ../train.zip
!cd dogs_cats && unzip -q ../test1.zip
!ls -hl dogs_cats
#los datos están clasificados por ficheros en directorios diferentes.

mkdir: cannot create directory ‘dogs_cats’: File exists
replace train/cat.0.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: N
replace test1/1.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: N
total 1.1M
drwxr-xr-x 2 root root 288K Sep 20  2013 test1
drwxr-xr-x 2 root root 756K Sep 20  2013 train


In [0]:
#código para crear directorios e ir copiando ficheros a diferentes ficheros (train, test, validation). Esto es así pq no vamos a poder importarlos todos a la vez, pq son muchos datos.
#USAREMOS LOS GENERADORES DE PYTHON. Se cogen batches, lee del disco (lento), hace las operaciones, guarda los resultados y desecha los datos iniciales para volver a leer del disco los siguientes datos.
#Vimos algo de esto, eran los chunks.
#ESTO ES LO NORMAL CUANDO SE TRATAN IMÁGENES EN DEEP LEARNING.

original_dataset_dir="/content/dogs_cats/train/"

import os, shutil

base_dir = "/content/dogs_cats_small"


train_dir = os.path.join(base_dir, "train")
validation_dir = os.path.join(base_dir, "validation")
test_dir = os.path.join(base_dir, "test")


train_cats_dir = os.path.join(train_dir, "cats")
train_dogs_dir = os.path.join(train_dir, "dogs")

validation_dogs_dir = os.path.join(validation_dir, "dogs")
validation_cats_dir = os.path.join(validation_dir, "cats")

test_dogs_dir = os.path.join(test_dir, "dogs")
test_cats_dir = os.path.join(test_dir, "cats")

In [0]:
#Creación de los directorios (vamos a coger un subconjuntos de datos, de ahí las carpetas small)
!rm -rf dogs_cats_small/
os.mkdir(base_dir)
os.mkdir(train_dir)
os.mkdir(validation_dir)
os.mkdir(test_dir)
os.mkdir(train_cats_dir)
os.mkdir(train_dogs_dir)
os.mkdir(validation_dogs_dir)
os.mkdir(validation_cats_dir)
os.mkdir(test_dogs_dir)
os.mkdir(test_cats_dir)

In [67]:
!find dogs_cats_small

dogs_cats_small
dogs_cats_small/test
dogs_cats_small/test/cats
dogs_cats_small/test/dogs
dogs_cats_small/train
dogs_cats_small/train/cats
dogs_cats_small/train/dogs
dogs_cats_small/validation
dogs_cats_small/validation/cats
dogs_cats_small/validation/dogs


In [0]:
#COPIAMOS LOS FICHEROS DE SU LOCALIZACION ORIGNIAL Y LOS METEMOS A LOS DIRECTORIOS ANTERIORES
#1000 IMAGENES PARA ENTRENAR, 500 PARA VALIDAR Y POCAS PARA TESTEAR. Son pocas imagnees para DL, deberían ser 10 veces más

fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:                                                       
    src = os.path.join(original_dataset_dir, fname)                        
    dst = os.path.join(train_cats_dir, fname)                              
    shutil.copyfile(src, dst)                                              

fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]               
for fname in fnames:                                                       
    src = os.path.join(original_dataset_dir, fname)                        
    dst = os.path.join(validation_cats_dir, fname)                         
    shutil.copyfile(src, dst)                                              

fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]               
for fname in fnames:                                                       
    src = os.path.join(original_dataset_dir, fname)                        
    dst = os.path.join(test_cats_dir, fname)                               
    shutil.copyfile(src, dst)                                              

fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]                     
for fname in fnames:                                                       
    src = os.path.join(original_dataset_dir, fname)                        
    dst = os.path.join(train_dogs_dir, fname)                              
    shutil.copyfile(src, dst)                                              
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]               
for fname in fnames:                                                       
    src = os.path.join(original_dataset_dir, fname)                        
    dst = os.path.join(validation_dogs_dir, fname)                         
    shutil.copyfile(src, dst)                                              

fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]               
for fname in fnames:                                                       
    src = os.path.join(original_dataset_dir, fname)                        
    dst = os.path.join(test_dogs_dir, fname)                               
    shutil.copyfile(src, dst) 

In [69]:
!find dogs_cats_small/ | head

dogs_cats_small/
dogs_cats_small/test
dogs_cats_small/test/cats
dogs_cats_small/test/cats/cat.1831.jpg
dogs_cats_small/test/cats/cat.1989.jpg
dogs_cats_small/test/cats/cat.1574.jpg
dogs_cats_small/test/cats/cat.1564.jpg
dogs_cats_small/test/cats/cat.1784.jpg
dogs_cats_small/test/cats/cat.1966.jpg
dogs_cats_small/test/cats/cat.1516.jpg


In [70]:
# Check if we have a GPU
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 471243910195302528, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 230686720
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 4458045193438441487
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"]

# Prepare data

Because we are dealing with large images, we cannot just read them in a Numpy array. We will use generators, to consume the images as they are needed by the network.

In [0]:
#generador para train, test y validación. 
#Lo que espera keras es que train esté en un directorio en la que cada etiqueta esté en un subdirectorio con los de una clase, otro con los de otra y así. Tantas como clases haya.

#importando el generador(no de imágenes, sino de python) de imagenes
from keras.preprocessing import image

#Cuando no hay suficiente imágenes se lleva cabo un aumentado de datos (rotar, especular, y demas transdormaciones. Esto ayuda a las RN a aprender los patrones aunque dispongan de la imagen original y mejorar en la clasificacion)

In [0]:
train_datagen=image.ImageDataGenerator(rescale=1./255.0)#normalizamos apra que vayan de 0 a 1
test_datagen=image.ImageDataGenerator(rescale=1./255.0)#normalizamos apra que vayan de 0 a 1
validation_datagen=image.ImageDataGenerator(rescale=1./255.0)#normalizamos apra que vayan de 0 a 1

In [73]:
train_gen=train_datagen.flow_from_directory(train_dir,
                                            target_size=(100,100),#redimensionado de las imagenes. Todas las im que metamos a la RN tienen que tener el mismo tamaño.
                                            batch_size=100,#tamaño del batch (cuando entrenemos no vamos a usar el batch, será una funcion de generación)
                                            class_mode='binary'
)
test_gen=test_datagen.flow_from_directory(test_dir,
                                            target_size=(100,100),#redimensionado de las imagenes. Todas las im que metamos a la RN tienen que tener el mismo tamaño.
                                            batch_size=100,#tamaño del batch (cuando entrenemos no vamos a usar el batch, será una funcion de generación)
                                            class_mode='binary'
)
validation_gen=validation_datagen.flow_from_directory(validation_dir,
                                            target_size=(100,100),#redimensionado de las imagenes. Todas las im que metamos a la RN tienen que tener el mismo tamaño.
                                            batch_size=100,#tamaño del batch (cuando entrenemos no vamos a usar el batch, será una funcion de generación)
                                            class_mode='binary'
)

Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


In [0]:
#nos dice que ha encontrado en cada directorio. Justo lo que habíamos puesto nosotros. Está bien.

#ahora no tenemos que distinguir entre X e Y pq el generador ya contiene esa info. Solo usaremos el generador.

#No necesitariamos ningun tipo de procesamiento de imagenes más. El generador ya lo hace todo.

#Pasamos a construir el modelo.

# Build model

In [0]:
from keras import models
from keras import layers

In [0]:
m=models.Sequential()
m.add(layers.Conv2D(32,(3,3),input_shape=(100,100,3),activation='relu'))#Convolucion2D, con dimensiones adicionales como video usariamos las 3D). Necesita las dimensiones de salida que tendrá la imagen de salida (3x3)
#está demostrado que la secuencia convolucion poolin es lo que mejor resultado da
m.add(layers.MaxPooling2D((2,2)))
m.add(layers.Conv2D(64,(3,3),activation='relu'))#convolucion aumenta mientras que en la capa densa disminuye
m.add(layers.MaxPooling2D((2,2)))
#antes de la capa densa, el flatten
m.add(layers.Flatten())
m.add(layers.Dense(256,activation='relu'))
m.add(layers.Dense(64,activation='relu'))
m.add(layers.Dense(1,activation='sigmoid'))


In [0]:
from keras import optimizers
from keras import losses
from keras import metrics

In [0]:
m.compile(optimizer=optimizers.rmsprop(),
         loss=losses.binary_crossentropy,
         metrics=[metrics.binary_accuracy]
         )


In [78]:
h = m.fit_generator(train_gen,epochs=10,steps_per_epoch=10,
                   validation_data=validation_gen,validation_steps=5)#hay que indicar el nº de elementos que hay que pedirle al generador para cada época. Ajustaremos el step_per_epoch. De esta forma controlamos el batch_size
#usará 500 imágenes para hacer la validación en cada época.

Epoch 1/10


ResourceExhaustedError: ignored

In [0]:
#volvemos a re-entrenar el modelo y regenero los datos

In [79]:
train_gen=train_datagen.flow_from_directory(train_dir,
                                            target_size=(100,100),#redimensionado de las imagenes. Todas las im que metamos a la RN tienen que tener el mismo tamaño.
                                            batch_size=100,#tamaño del batch (cuando entrenemos no vamos a usar el batch, será una funcion de generación)
                                            class_mode='binary'
)
test_gen=test_datagen.flow_from_directory(test_dir,
                                            target_size=(100,100),#redimensionado de las imagenes. Todas las im que metamos a la RN tienen que tener el mismo tamaño.
                                            batch_size=100,#tamaño del batch (cuando entrenemos no vamos a usar el batch, será una funcion de generación)
                                            class_mode='binary'
)
validation_gen=validation_datagen.flow_from_directory(validation_dir,
                                            target_size=(100,100),#redimensionado de las imagenes. Todas las im que metamos a la RN tienen que tener el mismo tamaño.
                                            batch_size=100,#tamaño del batch (cuando entrenemos no vamos a usar el batch, será una funcion de generación)
                                            class_mode='binary'
)

Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.


In [0]:
m.compile(optimizer=optimizers.rmsprop(),
         loss=losses.binary_crossentropy,
         metrics=[metrics.binary_accuracy]
         )



In [81]:
h = m.fit_generator(train_gen,epochs=30,steps_per_epoch=20,
                   validation_data=validation_gen,validation_steps=10)

Epoch 1/30


ResourceExhaustedError: ignored

In [0]:
#plotear

# Evaluate model

Our model is a binary classifier. We can evaluate it as any other classifier.

**EXERCISE 1**. Obtain the confusion matrix and associated metrics (precision, recall, F-score) for this classifier.

**EXERCISE 2**. Plot the ROC curve and calculate the AUC score.

**EXERCISE 3**. What is the best model you can obtain using the above evaluation parameters?


In addition to this evaluation, for convolutional layers, we can attempt to plot each layer, applied to a image, to see what are the elements used by the model to find out the class the item belongs to.

In [82]:
N=18
test_gen[0][0].shape
#100 imágenes de 100X100 con colores RGB(3)

(100, 100, 100, 3)

In [0]:
#prediccion
m.predict(test_gen[0][0][N:N+1])

In [85]:
test_gen[0][0][N]

array([[[0.28235295, 0.25490198, 0.18431373],
        [0.43921572, 0.41176474, 0.34117648],
        [0.4156863 , 0.38823533, 0.31764707],
        ...,
        [0.        , 0.2509804 , 0.6117647 ],
        [0.01960784, 0.26666668, 0.6313726 ],
        [0.01568628, 0.25882354, 0.6       ]],

       [[0.30980393, 0.28235295, 0.21176472],
        [0.42352945, 0.39607847, 0.3254902 ],
        [0.35686275, 0.32941177, 0.25882354],
        ...,
        [0.        , 0.25490198, 0.6156863 ],
        [0.01960784, 0.26666668, 0.6313726 ],
        [0.00392157, 0.24313727, 0.5921569 ]],

       [[0.35686275, 0.32941177, 0.25882354],
        [0.41960788, 0.3921569 , 0.32156864],
        [0.3372549 , 0.30980393, 0.2392157 ],
        ...,
        [0.        , 0.25882354, 0.627451  ],
        [0.01568628, 0.27058825, 0.6392157 ],
        [0.00784314, 0.25882354, 0.6039216 ]],

       ...,

       [[0.56078434, 0.427451  , 0.32156864],
        [0.49803925, 0.3647059 , 0.26666668],
        [0.54901963, 0

# Data augmentation

If we don't have enough images to train our model, we can manipulate our images to produce modifications, and to *augment* the training data.

Because images are slightly different, this can help the network to learn some of the patterns better.

See https://keras.io/preprocessing/image/


# Reusing a pre-trained convnet

Training a convolutional network is slow and tedious. And if we think of every day objects, some patterns will probably be useful for images of different types.

Similarly to word2vec, Glove, and other pre-trained word embeddings, we can use pre-trained convolutional networks, to improve our models.

# Exercise

In our classification problem, dogs are the *positive* case, and cats the *negative* (it could not be otherwise...).

Obtain the ROC curve for the following classifiers:
 
* Dense network
* Convolutional layers
* Convolutional layers with training data augmentation
* Convolutional layers, using a pre-trained network, letting your network modify the weights
* Convolutional layers, using a pre-trained network, with all the weights frozen

Which one is the best classifier? Why?