##Transfer Learning (Tensorflow +  vgg16 + CIFAR10 )


In this tutorial you will see a complete guide of how to apply deep learning on image recognition by studying an example of how to transfer learning from the VGG16 model (trained with IMAGENET dataset) to solve the CIFAR10 problem. After follow the next steps, it will be easy to adapt the code in order to apply the vgg16 model for recognizing others image classification problems.


###Requirements:
- Personal 
  - Python (basic) <br>
  - Tensorflow (basic) <br>
  - Convolutional Neural Network - CNN (basic) <br>

- Software:
  - Python 3.6 or later <br>
  - Tensorflow 1.6 or later<br>

- Hardware 
  - This code was developed to be run on <a href="https://colab.research.google.com/notebooks/welcome.ipynb">Google Colab</a>. But, taking off the Google colab dependecies, it can be run on a computer with more than 8 GB of RAM. A good processor and/or GPU will do the training faster, but it is not mandatory.

##A little of theory
- ** What is Transfer Learning?**<br>

  One of the most important characteristics of the deep learning is the training time. Many models could take hours, days, or even weeks to get trained. Thus, in order to accelerate this process, powerful machines are often used. But not all researchers have a machine with this power of processing. What lead small groups to not be able to use this models for solving their problems. However, great companies or research groups, such as Google and Facebook, have been training the models and releasing them for the community.
  
 Many works in deep learning field say that the inicialize of the weights is one of the most important tasks when one wishes to train a deep model. Therefore, the reuse the weights of models with similar tasks is a common practice. That reuse of weight is called transfer learning, because the knowledge acquired during the training is now transferred to other problem. Lets give you an example of transfer learning: imagine 2 persons who want to learn to play piano. The first one never plays any instrument, than his/her knowledge needs to start from scratch, what can be hard, considering that the piano is a difficult instrument to play. On the other hands, the other one already plays guitar, so his/her knowledge about music, rhythm and chords can be reused to get easier the process of learning to play piano. Who of those persons will learning faster? Of course the second one, because the possibility of  transferring his/her knowledge about to play guitar.

 Thus, in order to accelerate the time of training, its common to use trained model to solve different tasks.
In the majority of cases, transfer learning is able to reach good results, so is important to now how it works and the better way to use it.

 It is important to mention that the transfer learning do not just reuse the weights, it is possible reuse all the model, what can became easier the process of applying deep models to solve problems.




- **When to use transfer learning?**<br>

  This is a good question. Many people who uses this technique do not know how to answer to this question. Because they just follow some tutorials and do not learn more about the theory behind it.
As mentioned by <a href="http://cs231n.github.io/transfer-learning/"> Fei-Fei Li </a>, the transfer learning can occur in 4 different contexts, but here we summarize them in only 2 points, such as:
 
  1. When the new task is similar to the trained model and de dataset is enough big, the transfer learning must be made. In this case, only the fully connected layer need to be trained/retrained. The CNN is taken only as a feature extractor, and should not be trained as well.
  
 2. When the task is different of the original or the dataset is small, it is necessary to do a fine tuning. This technique consists of doing adjusts in the CNN weights while the fully connected layers are training.

 Do not worry if you do not know how to use these techniques yet, this tutorial will cover both transfer learning and fine tuning. But, if you want to see a complete guide about this and others related topics, please, see this  course from Stanford University, titled: <a href="http://cs231n.github.io/">Convolutional Neural Networks for Visual Recognition. </a>

 
- **What is a Bottleneck Approach?**

  When the first case occurs, the CNN layers continue without change during all the training. What could cause a waste of processing because an image that pass many times through the CNN has the same result in its output. So, in order to get faster the training process, a technique called "bottleneck" is applied. It consists of passing all the training dataset through the CNN and storing its results on disk. Therefore, the first fully connected layer will receive in the next steps the data already processed, what can save a considerable time depending on the machine and the length of the CNN. It is not common to find tutorials that address this subject. But, due its importance for decrease of training time, this is addressed here.


- **VGG16 Architecture**

 The VGG16 is one of the models that got the top accuracy in the IMAGENET challenge in 2014, achiving 92.7% top-5 test accuracy. This challenge consists of classifying more than 100,000 images in 1,000 differents classes. For it, there are more than 1,2 million images for training and 50.000 for validation.
 
 The vgg16 is one of the most used model for applying transfer learning in image recognition problems. Furthermore, it has a traditional deep architecture, which is just based on convolutional and pooling layers. The Figure 1 shows a representation of it. Notice that it is composed by 16 principal layers, where 13 are convolutional, 1 is flatten, 1 is fully connected and the last one is a softmax classifier. There are 5 more layers corresponding to the max pooling operation, but these layers was not taken in account for giving the model name.

 <center>
 <img src="http://www.cs.toronto.edu/~frossard/post/vgg16/vgg16.png" width="480" height="240" align="center"/><br>
  <b> Figure 1: The VGG16 architecture</b></center> <br><br>
                                   
  





- **The CIFAR10 dataset**
The CIFAR10 is a dataset with $60.000$ tiny images ($50.000$ for training and $10.000$ for test). Each image has a shape of $37 x 37 x 3$. All the images are distributed equaly between $10$ classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck), as shown by the Figure 2.
 
<table>
    <tbody><tr>
        <td class="cifar-class-name">airplane</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">automobile</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/automobile10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">bird</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">cat</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/cat10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">deer</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/deer10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">dog</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/dog10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">frog</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/frog10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">horse</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/horse10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">ship</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/ship10.png" class="cifar-sample"></td>
    </tr>
    <tr>
        <td class="cifar-class-name">truck</td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck1.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck2.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck3.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck4.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck5.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck6.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck7.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck8.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck9.png" class="cifar-sample"></td>
        <td><img src="https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck10.png" class="cifar-sample"></td>
    </tr>
       
</tbody>

</table>
<center><b>Figure 2: A sample of CIFAR 10 dataset</b></center><br><br>


Notice that the VGG16 was trained receiving an input image of $224 x 224 x 3$ and  the CIFAR10 has $32x32x3$ instead. The vgg16 was trained to recognize images in 1,000 different classes, but CIFAR10 has only 10. Thus, the new dataset is different from the original one, what can lead us to guess that a fine tunning is the best solution for this problem. 
 
 Despite this guess, we will show you how to do the transfer learning without fine tunning, and how to use the bottleneck when necessary.


## Lets get started with the code
The code below performs a complete task of transfer learning. All of it was made thinking in an easy way to learn this subject and a easy way of modifying it in order to resolve other sorts of tasks.


###All the necessary imports
Note that this code was made for running on Google Colab. Then, its usage outside this plataform requires adaptations. As taking off all the Google Colab dependencies and download manually the VGG16 model and put it into the folder "./model". The model can be downloaded <a href="https://github.com/ry/tensorflow-vgg16/blob/master/vgg16-20160129.tfmodel.torrent">here</a>:

In [0]:
%matplotlib inline  
import pickle
import numpy as np
import os
from urllib.request import urlretrieve
import tarfile
import zipfile
import sys
import tensorflow as tf
import numpy as np
from time import time
import skimage as sk
from skimage import transform
from skimage import util
import random
import math
import os.path
from random import shuffle
import logging
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix
from google.colab import files
from itertools import product
!pip install googledrivedownloader
logging.getLogger("tensorflow").setLevel(logging.ERROR)
   

---
### Class that defines the principals hyperparameters used by the model


In [0]:
class Hyperparameters:
  def __init__(self):
    self.image_size = 32
    self.image_channels = 3
    self.num_classes = 10
    self.initial_learning_rate = 1e-4
    self.decay_steps = 1e3
    self.decay_rate = 0.98
    self.cut_layer = "pool5"
    self.hidden_layers = [512]
    self.batch_size = 128
    self.num_epochs = 200
    self.check_points_path= "./tensorboard/cifar10_vgg16"
    self.keep = 1.0
    self.fine_tunning = False
    self.bottleneck = True
  

### Class that  provides same utilities for the model, such as downloads files,  gets dataset, does  data augmentation,  generates bottlenecks files and creates a confusion matrix from the model.

In [0]:
class utils:
      def get_or_generate_bottleneck( sess, model, file_name, dataset, labels, batch_size = 128):

          path_file = os.path.join("./data_set",file_name+".pkl")
          if(os.path.exists(path_file)):
                print("Loading bottleneck from \"{}\" ".format(path_file))
                with open(path_file, 'rb') as f:
                   return pickle.load(f)

          bottleneck_data = []
          original_labels = []

          print("Generating Bottleneck \"{}.pkl\" ".format(file_name) )
          count = 0
          amount = len(labels) // batch_size
          indices = list(range(len(labels)))
          for i in range(amount+1):

                if (i+1)*batch_size < len(indices):
                  indices_next_batch = indices[i*batch_size: (i+1)*batch_size]
                else:
                   indices_next_batch = indices[i*batch_size:]
                batch_size = len(indices_next_batch)

                data = dataset[indices_next_batch]
                label = labels[indices_next_batch]
                input_size = np.prod(model["bottleneck_tensor"].shape.as_list()[1:])
                tensor = sess.run(model["bottleneck_tensor"], feed_dict={model["images"]:data, model["bottleneck_input"]:np.zeros((batch_size,input_size)), model["labels"]:label,model["keep"]:1.0})
                for t in range(batch_size):
                  bottleneck_data.append(np.squeeze(tensor[t]))
                  original_labels.append(np.squeeze(label[t]))
          
          bottleneck = {
              "data":np.array(bottleneck_data),
              "labels":np.array(original_labels)
          } 
          
          with open(path_file, 'wb') as f:
            pickle.dump(bottleneck, f)


          print("Done")   

          return bottleneck



      def get_data_set(name="train"):
          x = None
          y = None
          folder_name = 'cifar_10'
          main_directory = "./data_set"
          url = "http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"

          utils.maybe_download_and_extract(url, main_directory,folder_name, "cifar-10-batches-py")


          f = open(os.path.join(main_directory,folder_name,"batches.meta"), 'rb')
          f.close()

          if name is "train":
              for i in range(5):
                  f = open('./data_set/'+folder_name+'/data_batch_' + str(i + 1), 'rb')
                  datadict = pickle.load(f, encoding='latin1')
                  f.close()

                  _X = datadict["data"]
                  _Y = datadict['labels']

                  _X = np.array(_X, dtype=float) / 255.0
                  _X = _X.reshape([-1, 3, 32, 32])
                  _X = _X.transpose([0, 2, 3, 1])
#                   _X = _X.reshape(-1, 32*32*3)

                  if x is None:
                      x = _X
                      y = _Y
                  else:
                      x = np.concatenate((x, _X), axis=0)
                      y = np.concatenate((y, _Y), axis=0)

          elif name is "test":
              f = open('./data_set/'+folder_name+'/test_batch', 'rb')
              datadict = pickle.load(f, encoding='latin1')
              f.close()

              x = datadict["data"]
              y = np.array(datadict['labels'])

              x = np.array(x, dtype=float) / 255.0
              x = x.reshape([-1, 3, 32, 32])
              x = x.transpose([0, 2, 3, 1])
#               x = x.reshape(-1, 32*32*3)

          return x, utils._dense_to_one_hot(y)


      def _dense_to_one_hot( labels_dense, num_classes=10):
          num_labels = labels_dense.shape[0]
          index_offset = np.arange(num_labels) * num_classes
          labels_one_hot = np.zeros((num_labels, num_classes))
          labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1

          return labels_one_hot


      


      def maybe_download_and_extract( url, main_directory,filename, original_name):
          def _print_download_progress( count, block_size, total_size):
            pct_complete = float(count * block_size) / total_size
            msg = "\r --> progress: {0:.1%}".format(pct_complete)
            sys.stdout.write(msg)
            sys.stdout.flush()
          
          if not os.path.exists(main_directory):
              os.makedirs(main_directory)
              url_file_name = url.split('/')[-1]
              zip_file = os.path.join(main_directory,url_file_name)
              print("Downloading ",url_file_name)

              try:
                file_path, _ = urlretrieve(url=url, filename= zip_file, reporthook=_print_download_progress)
              except:
                os.system("rm -r "+main_directory)
                print("An error occurred while downloading: ",url)

                if(original_name == 'vgg16-20160129.tfmodel'):
                  print("This could be for a problem with github. We will try downloading from the Google Drive")
                  from google_drive_downloader import GoogleDriveDownloader as gdd

                  gdd.download_file_from_google_drive(file_id='1xJZDLu_TK_SyQz-SaetAL_VOFY7xdAt5',
                                                      dest_path='./models/vgg16-20160129.tfmodel',
                                                      unzip=False)
                else: print("This could be for a problem with the storage site. Try again later")
                return

              print("\nDownload finished.")
              if file_path.endswith(".zip"):
                  print( "Extracting files.")

                  zipfile.ZipFile(file=file_path, mode="r").extractall(main_directory)
              elif file_path.endswith((".tar.gz", ".tgz")):
                  print( "Extracting files.")
                  tarfile.open(name=file_path, mode="r:gz").extractall(main_directory)
                  os.remove(file_path)

              os.rename(os.path.join(main_directory,original_name), os.path.join(main_directory,filename))
              print("Done.")
     
      def data_augmentation(images, labels):
        
          def random_rotation(image_array):
              # pick a random degree of rotation between 25% on the left and 25% on the right
              random_degree = random.uniform(-15, 15)
              return sk.transform.rotate(image_array, random_degree)

          def random_noise(image_array):
              # add random noise to the image
              return sk.util.random_noise(image_array)

          def horizontal_flip(image_array):
              # horizontal flip doesn't need skimage, it's easy as flipping the image array of pixels !
              return image_array[:, ::-1]
          print("Augmenting data...")
          aug_images = []
          aug_labels = []

          aug_images.extend( list(map(random_rotation, images)) )
          aug_labels.extend(labels)
          aug_images.extend( list(map(random_noise,    images)) )
          aug_labels.extend(labels)
          aug_images.extend( list(map(horizontal_flip, images)) )
          aug_labels.extend(labels)


          return np.array(aug_images), np.array(aug_labels)
        
        
        
      
      def generate_confusion_matrix( predictions, class_names):
        
        def plot_confusion_matrix(cm, classes,
                                    normalize=False,
                                    title='Confusion matrix',
                                    cmap=plt.cm.Blues):
                """
                This function prints and plots the confusion matrix.
                Normalization can be applied by setting `normalize=True`.
                """
                if normalize:
                    cm = 100 * cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
                    print("Normalized confusion matrix")
                else:
                    print('Confusion matrix, without normalization')

                print(cm.shape)

                plt.imshow(cm, interpolation='nearest', cmap=cmap)
                plt.title(title)
                plt.colorbar()
                
                tick_marks = np.arange(len(classes))
               
          
                plt.xticks(tick_marks, classes, rotation=45)
                plt.yticks(tick_marks, classes)

                fmt = '.2f' if normalize else 'd'
                thresh = cm.max() / 2.
                symbol = "%" if normalize else ""
                for i, j in product(range(cm.shape[0]), range(cm.shape[1])):
                    plt.text(j, i, format(cm[i, j], fmt)+symbol,
                            horizontalalignment="center",
                            color="white" if cm[i, j] > thresh else "black")

                plt.tight_layout()
                plt.ylabel('Real')
                plt.xlabel('Predicted')
        # Compute confusion matrix
        cnf_matrix = confusion_matrix(predictions["labels"],predictions["classes"])
        np.set_printoptions(precision=2)
        

#         # Plot non-normalized confusion matrix
#         plt.figure(figsize=(10,7))
#         plot_confusion_matrix(cnf_matrix, classes=class_names,
#                             title='Confusion matrix, without normalization')
#         plt.grid('off')

        # # Plot normalized confusion matrix
        plt.figure(figsize=(10,7))
        plot_confusion_matrix(cnf_matrix, classes=class_names, normalize=True,
                            title='Normalized confusion matrix')
        plt.grid('off')

        #plt.savefig("./confusion_matrix.png") #Save the confision matrix as a .png figure.
        plt.show()
#         

###The function "get_vgg16" returns a pretrained vgg16 model.

All the work of load and restore the weights of the model is responsibility of tensorflow. We just need to choose which layer we want to cut and pass it as parameter for the function "get_vgg16". In transfer learning it is common to dispose the fully connected layer and reuse only the convolutional ones. It occurs because the new problem/dataset use to be different from the original (such one that was used for training the model), and the numbers of classes is often different as well.<br>

In a CNN, the first layers are responsible for selecting borders, the middle layers for selecting some kinds of patterns, based on combinations of those edges obtained previously and de last ones for composing patterns with a high level of representation, also known as semantic layers. Thereby, when the new dataset is much different of the original, the last layers are not indicated to be used. Since these ones likely represents particular patterns that will not help the new dataset. So, it is common to use the first layers in the transfer learning or fine tuning and add new fully connected ones in order to be trained from scratch.

In [0]:


def get_vgg16(input_images, cut_layer = "pool5", scope_name = "vgg16", fine_tunning = False):  
 
  file_name = 'vgg16-20160129.tfmodel'
  main_directory = "./models/"
#   !rm -r ./models/
  vgg_path = os.path.join(main_directory,file_name)
  if not os.path.exists(vgg_path):
      vgg16_url = "https://media.githubusercontent.com/media/pavelgonchar/colornet/master/vgg/tensorflow-vgg16/vgg16-20160129.tfmodel"
      utils.maybe_download_and_extract(vgg16_url, main_directory, file_name, file_name)


  with open(vgg_path, mode='rb') as f:
      content = f.read()
      graph_def = tf.GraphDef()
      graph_def.ParseFromString(content)
      graph_def = tf.graph_util.extract_sub_graph(graph_def, ["images", cut_layer])
      tf.import_graph_def(graph_def, input_map={"images": input_images})
  del content

  graph = tf.get_default_graph()
  vgg_node = "import/{}:0".format(cut_layer) #It possible to cut the graph in other node. 
                                             #For this, it is necessary to see the name of all layers by using the method 
                                             #"get_operations()": "print(graph.get_operations())" 

  
  vgg_trained_model = graph.get_tensor_by_name("{}/{}".format(scope_name, vgg_node) )
  
  if not fine_tunning:
    print("Stopping gradient")
    vgg_trained_model = tf.stop_gradient(vgg_trained_model) #Just use it in case of transfer learning without fine tunning
  
  
#   print(graph.get_operations())
  return vgg_trained_model, graph


###Creating the model
The function  "transfer_learning_model" is responsible for creating the model that will be used for recognizing the CIFAR10 images.

The first scope ("placeholders_variables") defines:
* ** input images** -  the images that will feed the model
* **labels** - each image that feeds the input placeholder,  need to have a correspondent label, wich will feed this  placeholder when the loss was calculated.
* **dropout_keep** - it defines a percent of neurons that will not be activated in each fully connected layer. The number to be fed is between 0 and 1.
* **global_step** - As the train process is running, this variable stores the value of the current step. This value can be used for saving a checkpoint in an specific step, and, when restored, all the model continues the training process from this point/step.
* **learning rate** - it defines the learning rate to be used by the optimizer. In this case, the global step is used in order to provide a decay point even whether the training is restarted or not. It starts from an initial learning rate and decays according to an specific rate, with each number of steps.

These parameters are able to  influence directly the success of the training, so they are defined as hyperparameters of the model (class "Hyperparameters"), and must be treated and chosen carefully.


In [0]:
def transfer_learning_model(params = None, fine_tunning = False, bottleneck = False):
   
    if params is None:
      params = Hyperparameters()
      
    with tf.name_scope('placeholders_variables'):
        input_images = tf.placeholder(tf.float32, shape=[None,params.image_size, params.image_size, params.image_channels], name='input')
        labels = tf.placeholder(tf.float32, shape=[None, params.num_classes], name='labels')
#         reshaped_images = tf.reshape(input_images, [-1, params.image_size, params.image_size, params.image_channels], name='images')
        dropout_keep  =  tf.placeholder(tf.float32, name='dropout_keep')
        global_step = tf.train.get_or_create_global_step()
        learning_rate = tf.train.exponential_decay(params.initial_learning_rate, global_step, 
                                               params.decay_steps,params.decay_rate, staircase=True)       
    

    with tf.name_scope('vgg16'):
     # Create a VGG16 model and reuse its weights.
      vgg16_out,_ = get_vgg16(input_images=input_images,cut_layer = params.cut_layer, fine_tunning = fine_tunning)
      
    with tf.name_scope("flatten"):
      flatten = tf.layers.flatten(vgg16_out, name="flatten")
    
    if (not fine_tunning) and bottleneck:
        out_list = flatten.shape.as_list()
        BOTTLENECK_TENSOR_SIZE = np.prod(out_list[1:]) # All input layer size, less the batch size
        with tf.name_scope('bottleneck'):
            bottleneck_tensor = flatten
            bottleneck_input = tf.placeholder(tf.float32,
            shape=[None, BOTTLENECK_TENSOR_SIZE],
            name='InputPlaceholder')

        with tf.name_scope('fully_conn'):
          logits = fc_model(bottleneck_input, params.hidden_layers) #Create a fully connect model which will be feed by the vgg16
    
    else:
      with tf.name_scope('fully_conn'):
          logits = fc_model(flatten, params.hidden_layers) #Create a fully connect model which will be feed by the vgg16

        

    with tf.name_scope('loss'):
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels))
#         loss = regularize(loss)
        tf.summary.scalar("loss", loss)


    with tf.name_scope('sgd'):
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(update_ops):
            optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step)

    with tf.name_scope('train_accuracy'):
        acc = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
        acc = tf.reduce_mean(tf.cast(acc, tf.float32))
        tf.summary.scalar("accuracy", acc)
   
    
    predictions = {
                   "classes": tf.argmax(logits, 1),
                   "probs" :  tf.nn.softmax(logits), 
                   "labels": tf.argmax(labels, 1)
                   }
    model = {
              "global_step": global_step,
              "images": input_images,
              "labels": labels,    
              "loss" : loss,
              "optimizer": optimizer,
              "accuracy": acc,
              "predictions":predictions,
              "keep": dropout_keep
          }

    
    if (not fine_tunning) and bottleneck:
        model.update({"bottleneck_tensor":bottleneck_tensor})
        model.update({"bottleneck_input":bottleneck_input})
     
 
        
    return model
        
def get_fc_weights(w_inputs, w_output, id=0):
        weight= tf.Variable(tf.truncated_normal([w_inputs, w_output]), name="{}/weight".format(id))
        bias =  tf.Variable(tf.truncated_normal([w_output]), name="{}/bias".format(id))
        return weight, bias  

def logits_layer(fc_layer, n_classes):
        out_shape = fc_layer.shape.as_list()
        w, b = get_fc_weights(np.prod(out_shape[1:]), n_classes, "logits/weight")
        logits = tf.add(tf.matmul(fc_layer, w), b, name="logits")
        return logits
      
def fc_layer(input_layer, number_of_units, keep = None, layer_id = "fc"):
        pl_list = input_layer.shape.as_list()
        input_size = np.prod(pl_list[1:])
        
        w, b = get_fc_weights(input_size, number_of_units, layer_id)  
        fc_layer = tf.matmul(input_layer, w, name="{}/matmul".format(layer_id))
        fc_layer = tf.nn.bias_add(fc_layer, b, name="{}/bias-add".format(layer_id))
       
        if keep is not None:
          fc_layer = tf.nn.dropout(fc_layer, keep, name="{}/dropout".format(layer_id))
        else:
          print("Dropout was disabled.")
        
        fc_layer = tf.nn.relu(fc_layer, name="{}/relu".format(layer_id))
        return fc_layer
      
def regularize(loss, type = 1, scale = 0.005, scope = None):
        if type == 1:
            regularizer = tf.contrib.layers.l1_regularizer( scale=scale, scope=scope)
        else:
            regularizer = tf.contrib.layers.l2_regularizer( scale=scale, scope=scope)
                
        weights = tf.trainable_variables() # all vars of your graph
        regularization_penalty = tf.contrib.layers.apply_regularization(regularizer, weights)
        regularized_loss = loss + regularization_penalty
        return regularized_loss

def fc_model(flatten, hidden_layers = [512], keep = None):
        fc = flatten
        id = 1
        for num_neurons in hidden_layers:
          fc = fc_layer(fc, num_neurons, keep,  "fc{}".format(id) )
          id = id+1
          
        logits = logits_layer(fc, params.num_classes)
        return logits
    

### Creating a session
The function "create_monitored_session" creates a tensorflow session able to restore weights and/or save them. The parameter "checkpoint_dir" represents where the weights were saved or where one wants they be saved. All the save/restore process is made automatically by tensorflow.

As default, tensorflow allocate all GPU memory in the first called to the session run, thus the "tf.ConfigProto()", by setting the "True" to the "gpu_options.allow_growth", allows the gradual increase of memory. In other words, it allows to allocate the GPU memory by demanding. This is  important mainly when more than one training or prediction process is running on the same GPU.


In [0]:
def create_monitored_session(model,iter_per_epoch, checkpoint_dir):
  config = tf.ConfigProto()
  config.gpu_options.allow_growth = True


  sess = tf.train.MonitoredTrainingSession(checkpoint_dir=checkpoint_dir,
                                      save_checkpoint_secs=120,
                                      log_step_count_steps=iter_per_epoch,
                                      save_summaries_steps=iter_per_epoch,
                                      config=config) 
  return sess

### Testing the model
The function "test" is responsible for applying the test dataset through the trained model. Thus, it is possible to monitor the model progress. This function could be change in order to do a validation test, which uses the validation dataset,  rather than just a test. It would be helpful for problems that do not release a labeled test dataset.

In [0]:
def test(sess, model,input_data_placeholder, data, labels, batch_size = 128):
            global_accuracy = 0
            predictions = {
                           "classes":[],
                           "probs":[],
                           "labels":[]
                          }
           
            size = len(data)//batch_size
            indices = list(range(len(data)))
            
            for i in range(size+1):
               
                begin = i*batch_size
                end = (i+1)*batch_size
                end = len(data) if end >= len(data) else end
                
                next_bach_indices = indices[begin:end]
                batch_xs = data[next_bach_indices]
                batch_ys = labels[next_bach_indices]
                
                pred = sess.run(model["predictions"],
                    feed_dict={input_data_placeholder: batch_xs, model["labels"]: batch_ys, model["keep"]:1.0})
                
                predictions["classes"].extend(pred["classes"])
                predictions["probs"].extend(pred["probs"])
                predictions["labels"].extend(pred["labels"])
            
            
            correct = list (map(lambda x,y: 1 if x==y else 0, predictions["labels"] , predictions["classes"]))
            acc = np.mean(correct ) *100
            
            mes = "--> Test accuracy: {:.2f}% ({}/{})"
            print(mes.format( acc, sum(correct), len(data)))
            
            return predictions
                      

###Training the model: the mainly function
The "train" function is responsible for training the model. It starts checking the hyperparameters and resetting the default graph.  After, the dataset is loaded by using the class "util". The next step consists of  creating the model, where the tensorflow graph is created. Now, a monitored season is created too. This kind of session will save and restore the model automatically, which will be very important when an unexpected event occurs and the model stop the training (such as a power outage or when the Google Colab finishes the session during the training).

With the model and the session created, you are able, if you want, to generate or load the bottlenecks files. This is what the next lines are doing. One of the most important results of theses lines is  obtaining the tensor "input_data_placeholder". It is important because when the bottleneck option is chosen, the "feed_dict" must feed the placeholder of the "bottleneck" rather than the one that feeds input to the VVG16. Thus, if the bottleneck is chosen, the input placeholder will be the "model[bottleneack_input]", else, it will be the input tensor of the vgg16, "model[images]".

In the  the beginning of each epoch, in order to ensure the randomness of the baths,  a list containing the dataset indices is shuffled. So, at every batch, a new range of indices is taken and the batch may feed the placeholder.

Therefore, the session can call the optimizer and training the model. Finally, the last step is to call the test for checking the result of the training for the present epoch.


In [0]:
def train(params = None):
    if params is None:
      params = Hyperparameters()
      
    tf.reset_default_graph()

     
    train_data, train_labels = utils.get_data_set("train")
    train_data, train_labels = utils.data_augmentation(train_data, train_labels)
    
    test_data, test_labels = utils.get_data_set("test")  
    
    
    model = transfer_learning_model(params, params.fine_tunning, params.bottleneck)
   
    steps_per_epoch = int(math.ceil(len(train_data) /  params.batch_size))
    sess = create_monitored_session(model,steps_per_epoch, params.check_points_path)
    
    
    if (not params.fine_tunning) and params.bottleneck:
        indices = list( range(len(train_data)) )
        shuffle(indices)
        
        shuffled_data = train_data[indices]
        shuffled_labels = train_labels[indices]
        
        bottleneck_train = utils.get_or_generate_bottleneck(sess, model, "bottleneck_vgg16_{}_train".format(params.cut_layer), shuffled_data, shuffled_labels)
        bottleneck_test = utils.get_or_generate_bottleneck(sess, model, "bottleneck_vgg16_{}_test".format(params.cut_layer), test_data, test_labels)
        
        
        train_data, train_labels  = bottleneck_train["data"], bottleneck_train["labels"]
        test_data, test_labels = bottleneck_test["data"], bottleneck_test["labels"]
        del bottleneck_train, bottleneck_test
        
        
        input_data_placeholder = model["bottleneck_input"]
        
        
    else:
        input_data_placeholder = model["images"]
        
        
    
    indices = list( range(len(train_data)) )
    msg = "--> Global step: {:>5} - Last batch acc: {:.2f}% - Batch_loss: {:.4f} - ({:.2f}, {:.2f}) (steps,images)/sec"
    
    for epoch in range(params.num_epochs):
        start_time = time()
        
        print("\n*************************************************************")
        print("Epoch {}/{}".format(epoch+1,params.num_epochs))
        
        shuffle(indices)  
        for s in range(steps_per_epoch):
          
            indices_next_batch = indices[s *  params.batch_size : (s+1) * params.batch_size]
            batch_data = train_data[indices_next_batch]
            batch_labels = train_labels[indices_next_batch]
            
            _, batch_loss, batch_acc,step = sess.run(
                [model["optimizer"], model["loss"], model["accuracy"], model["global_step"],],
                feed_dict={input_data_placeholder: batch_data, model["labels"]: batch_labels, model["keep"]:params.keep})
        
        duration = time() - start_time

        print(msg.format(step,  batch_acc*100, batch_loss, (steps_per_epoch / duration), (steps_per_epoch*params.batch_size / duration) ))

        
        _ = test(sess, model, input_data_placeholder, test_data, test_labels )
    
    predictions = test(sess, model, input_data_placeholder, test_data, test_labels )

    sess.close()
    
    class_names = ["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"] 
    utils.generate_confusion_matrix(predictions, class_names)

This part of code instantiates a "Hyperparameters" class, changes it and passes it as parameter to the train function. Thus, the training can be started.

In [0]:
if __name__ == "__main__":
  params = Hyperparameters()
  params.num_epochs = 200
  params.hidden_layers = [512]
  params.initial_learning_rate = 1e-4
  params.cut_layer = "pool3"

  train(params)
