# Modelo de CNN para CIFAR10
Um modelo de **Convolutional Neural Network** construído no *TensorFlow* treinado para classificação de imagens usando a base de dados **CIFAR10**. 

## Carregando Bibliotecas e dados

In [1]:
import tensorflow as tf
import numpy as np
import math
import timeit
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=10000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)


## Definindo a arquitetura
A arquitetura escolhida para o modelo foi:

* **Convolutional Neural Network** - 32 layers, 3x3, 1 stride, pad "SAME"
* **BatchNorm** - Spatial BN
* **Convolutional Neural Network** - 64 layers, 3x3, 1 stride, pad "SAME"
* **BatchNorm** - Spatial BN
* **2x2 MaxPool** - 2 stride, 2 ksize
* **Convolutional Neural Network** - 128 layers, 3x3, 1 stride, pad "SAME"
* **BatchNorm** - Spatial BN
* **FullyConnected** - FC Layer, 4096 neurons
* **BatchNorm** - Vanilla BN
* **Dropout** - Dropout com keep_prob: 50%
* **FullyConnectedLayer** - FC Layer, 10 neurons
* **Softmax** - Cálculo das probabilidades

In [3]:
tf.reset_default_graph() 

# Construindo o modelo
def model(X, y, kprob,is_training):
    
    with tf.name_scope("ConvLayer1"):
        W_conv1 = tf.get_variable("W_conv1", shape=[3, 3, 3, 32])
        b_conv1 = tf.get_variable("b_conv1", shape=[32])
        
        a1 = tf.nn.conv2d(X, W_conv1, strides=[1, 1, 1, 1], padding="SAME") + b_conv1
        h_conv1 = tf.nn.relu(a1)
        
        tf.summary.histogram("weights", W_conv1)
        tf.summary.histogram("biases", b_conv1)
        tf.summary.histogram("activations", h_conv1)
        
    with tf.name_scope("BatchNorm1"):
        h_bn1 = tf.layers.batch_normalization(h_conv1, axis=1, training=is_training)
        tf.summary.histogram("normalized", h_bn1)
    
    with tf.name_scope("ConvLayer2"):
        W_conv2 = tf.get_variable("W_conv2", shape=[3, 3, 32, 64])
        b_conv2 = tf.get_variable("b_conv2", shape=[64])
        
        a2 = tf.nn.conv2d(h_bn1, W_conv2, strides=[1, 1, 1, 1], padding="SAME") + b_conv2
        h_conv2 = tf.nn.relu(a2)
        
        tf.summary.histogram("weights", W_conv2)
        tf.summary.histogram("biases", b_conv2)
        tf.summary.histogram("activations", h_conv2)
        
    with tf.name_scope("BatchNorm2"):
        h_bn2 = tf.layers.batch_normalization(h_conv2, axis=1, training=is_training)
        tf.summary.histogram("normalized", h_bn2)
        
    with tf.name_scope("max-pool"):
        h_pool = tf.nn.max_pool(h_bn2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
        
    with tf.name_scope("ConvLayer3"):
        W_conv3 = tf.get_variable("W_conv3", shape=[3, 3, 64, 128])
        b_conv3 = tf.get_variable("b_conv3", shape=[128])
        
        a3 = tf.nn.conv2d(h_pool, W_conv3, strides=[1, 1, 1, 1], padding="SAME") + b_conv3
        h_conv3 = tf.nn.relu(a3)
        
        tf.summary.histogram("weights", W_conv3)
        tf.summary.histogram("biases", b_conv3)
        tf.summary.histogram("activations", h_conv3)
        
    with tf.name_scope("BatchNorm3"):
        h_bn3 = tf.layers.batch_normalization(h_conv3, axis=1, training=is_training)
        tf.summary.histogram("normalized", h_bn3)
        
    with tf.name_scope("FullyConnected"):
        h_reshaped = tf.reshape(h_bn3, [-1, 16*16*128])
        
        W1 = tf.get_variable("W1", shape=[16*16*128, 4096])
        b1 = tf.get_variable("b1", shape=[4096])
        
        a4 = tf.matmul(h_reshaped, W1) + b1
        h_fc1 = tf.nn.relu(a4)
        
        tf.summary.histogram("weights", W1)
        tf.summary.histogram("biases", b1)
        tf.summary.histogram("activations", h_fc1)
    
    with tf.name_scope("VanillaBatchNorm"):
        h_bn4 = tf.layers.batch_normalization(h_fc1, axis=1, training=is_training)
        tf.summary.histogram("normalized", h_bn4)
        
    with tf.name_scope("Dropout"):
        h_drop = tf.nn.dropout(h_bn4, keep_prob=kprob)
    
    with tf.name_scope("FullyConnectedOut"):
        W2 = tf.get_variable("W2", shape=[4096, 10])
        b2 = tf.get_variable("b2", shape=[10])
        
        scores = tf.matmul(h_drop, W2) + b2
        
        tf.summary.histogram("weights", W2)
        tf.summary.histogram("biases", b2)
        tf.summary.histogram("scores", scores)
        
    return scores

X = tf.placeholder(tf.float32, [None, 32, 32, 3], name="X-input")
y = tf.placeholder(tf.int64, [None], name="y-input")
is_training = tf.placeholder(tf.bool)
kprob = tf.placeholder(tf.float32)

# Decay Learning rate com step, utilizando os intervalos de iterações 7000 e 12500
global_step = tf.Variable(0, trainable=False, name="global_step")
initial_lr = tf.placeholder(tf.float32)
values = [initial_lr, initial_lr*0.5, initial_lr*0.25]
boundaries = [7000, 12500]
learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)

y_out = model(X, y, kprob, is_training)
with tf.name_scope("cross-entropy"):
    mean_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf.one_hot(y, 10), logits=y_out))
    tf.summary.scalar("cross-entropy", mean_loss)

with tf.name_scope("train"):    
    optimizer = tf.train.AdamOptimizer(learning_rate)

    extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(extra_update_ops):
        train_step = optimizer.minimize(mean_loss)

Verificando se **y_out** tem o shape esperado

In [4]:
x = np.random.randn(64, 32, 32,3)
with tf.Session() as sess:
    with tf.device("/cpu:0"): #"/cpu:0" or "/gpu:0"
        tf.global_variables_initializer().run()

        ans = sess.run(y_out,feed_dict={X:x,is_training:True,kprob:0.5,initial_lr: 5e-4})
        %timeit sess.run(y_out,feed_dict={X:x,is_training:True,kprob:0.5,initial_lr:5e-4})
        print(ans.shape)
        print(np.array_equal(ans.shape, np.array([64, 10])))

1 loops, best of 3: 779 ms per loop
(64, 10)
True


## Definindo hiper-parâmetros
Definição de hiper-parâmetros, primeiro a maneira *coarse*, ou seja, bruta, testando um maior *range* nos valores a serem otimizados. 

Os hiper-parâmetros testados serão o **learning rate** inicial e a **probabilidade do dropout**. 

### Coarse 

In [42]:
batch_size = 64

for i in range(10):
    # Definindo os parâmetros
    learning_rate = 10**np.random.uniform(-6, -3)
    keep_p = np.random.uniform(0.25, 0.85)
    
    with tf.Session() as sess:
        # Iniciando as variáveis
        sess.run(tf.global_variables_initializer())
        
        # Fazendo a predição
        correct_prediction = tf.equal(tf.argmax(y_out,1), y)
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        
        # Embaralhando os índices
        train_indices = np.arange(X_train.shape[0])
        np.random.shuffle(train_indices)
        
        # Sempre estará treinando
        training_now = True
        
        # Configurando as variáveis que desejamos avaliar
        variables = c
        
        for k in range(401):
            # Criando os batches
            start_idx = (k * batch_size) % X_train.shape[0]
            idx = train_indices[start_idx:start_idx+batch_size]

            # Criando o dicionário a ser passado
            feed_dict = {X: X_train[idx, :],
                        y: y_train[idx],
                        is_training: training_now,
                        kprob: keep_p,
                        initial_lr: learning_rate}

            # Chamando o método
            train_loss, train_acc, corr, _ = sess.run(variables, feed_dict=feed_dict)
            
            if (k % 100 == 0):
                print("%d" % k, end=" - ")
        # Verificando o validation e training acc
        val_dict = {X: X_val,
                   y: y_val,
                   is_training: False,
                   kprob: 1.0,
                   initial_lr: learning_rate}
        
        val_loss, val_acc, corr, _ = sess.run(variables, feed_dict=val_dict)
        
        print("\nIteration %d:\n\tLearning Rate: %.2e - Dropout Parameter: %.2f" % (i, learning_rate, keep_p))
        print("\tTraining Accuracy: %.2f%%" % (100*train_acc))
        print("\tValidation Accuracy: %.2f%%" % (100*val_acc))

0 - 100 - 200 - 300 - 400 - 
Iteration 0:
	Learning Rate: 1.09e-04 - Dropout Parameter: 0.45
	Training Accuracy: 48.44%
	Validation Accuracy: 52.70%
0 - 100 - 200 - 300 - 400 - 
Iteration 1:
	Learning Rate: 1.97e-04 - Dropout Parameter: 0.76
	Training Accuracy: 51.56%
	Validation Accuracy: 51.80%
0 - 100 - 200 - 300 - 400 - 
Iteration 2:
	Learning Rate: 1.47e-06 - Dropout Parameter: 0.57
	Training Accuracy: 40.62%
	Validation Accuracy: 41.70%
0 - 100 - 200 - 300 - 400 - 
Iteration 3:
	Learning Rate: 3.12e-05 - Dropout Parameter: 0.56
	Training Accuracy: 53.12%
	Validation Accuracy: 56.50%
0 - 100 - 200 - 300 - 400 - 
Iteration 4:
	Learning Rate: 1.64e-04 - Dropout Parameter: 0.54
	Training Accuracy: 35.94%
	Validation Accuracy: 56.40%
0 - 100 - 200 - 300 - 400 - 
Iteration 5:
	Learning Rate: 7.36e-04 - Dropout Parameter: 0.72
	Training Accuracy: 48.44%
	Validation Accuracy: 48.60%
0 - 100 - 200 - 300 - 400 - 
Iteration 6:
	Learning Rate: 1.01e-04 - Dropout Parameter: 0.82
	Training Acc

### Fine
Aplicando 1 epoch em cada um dos ranges que tiveram melhor desempenho na **coarse**.

* **Learning Rate**: -3 a -5
* **Dropout Parameter**: 0.45 a 0.7

In [7]:
batch_size = 64
epochs = 2
for i in range(10): 
    ## Definindo os parâmetros
    learning_rate = 10**np.random.uniform(-3, -5)
    dropout_param = np.random.uniform(0.45, 0.7)
    
    with tf.Session() as sess:
        # Iniciando as variáveis
        sess.run(tf.global_variables_initializer())
        
        # Fazendo a predição
        correct_prediction = tf.equal(tf.argmax(y_out, 1), y)
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, np.float32))
        
        # Embaralhando os índices
        train_indices = np.arange(X_train.shape[0])
        np.random.shuffle(train_indices)
        
        # Sempre estará treinando
        training = True
        
        # Configurando as variáveis que desejamos avaliar
        variables = [mean_loss, accuracy, correct_prediction, train_step]
        
        for e in range(epochs):
            for k in range(int(math.ceil(X_train.shape[0]/batch_size))):
                # Criando os batches
                start_idx = (k * batch_size) % X_train.shape[0]
                idx = train_indices[start_idx:start_idx+batch_size]
                
                # Criando o dicionário para alimentar o modelo
                feed_dict = {X: X_train[idx, :],
                            y: y_train[idx],
                            is_training: training,
                            kprob: dropout_param,
                            initial_lr: learning_rate }
                
                # Chamando o método
                train_loss, train_acc, corr, _ = sess.run(variables, feed_dict=feed_dict)
                
                if (k % 100 == 0):
                    print("%d" % k, end=" - ")
                    
        # Verificando o validation acc
        val_dict = { X: X_val,
                    y: y_val,
                    is_training: False,
                    kprob: 1.0,
                    initial_lr: learning_rate }
        val_loss, val_acc, corr, _ = sess.run(variables, feed_dict=val_dict)
        
        print("\nIteration %d:\n\tLearning Rate: %.2e - Dropout Parameter: %.2f" % (i, learning_rate, dropout_param))
        print("\tTraining Accuracy: %.2f%%" % (100*train_acc))
        print("\tValidation Accuracy: %.2f%%" % (100*val_acc))

0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 
Iteration 0:
	Learning Rate: 8.20e-05 - Dropout Parameter: 0.56
	Training Accuracy: 75.00%
	Validation Accuracy: 66.00%
0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 
Iteration 1:
	Learning Rate: 5.43e-05 - Dropout Parameter: 0.61
	Training Accuracy: 60.00%
	Validation Accuracy: 59.70%
0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 
Iteration 2:
	Learning Rate: 2.88e-04 - Dropout Parameter: 0.64
	Training Accuracy: 67.50%
	Validation Accuracy: 66.90%
0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 
Iteration 3:
	Learning Rate: 8.61e-04 - Dropout Parameter: 0.62
	Training Accuracy: 62.50%
	Validation Accuracy: 59.80%
0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 
Iteration 4:
	Learning Rate: 2.47e-05 - Dropout Paramet

### Modelo final

O modelo final pode ser definido a partir da busca por hiperparâmetros (*coarse to fine*). Pelas visão da busca *fine*, temos que o **learning rate** de 2.88e-4 e o **dropout parameter** de 0.64, trouxeram um desempenho no conjunto de validação de 66.90% e no conjunto de testes de 67.50%, o que mostra um bom desempenho (o segundo melhor entre os 10 testados no conjunto de validação), e uma aproximação grande entre desempenho de testes e de validação, o que mostra que o modelo pode ter sua capacidade aumentada antes de atingir *overfitting*. 

O melhor desempenho foi atingido pelo **learning rate** de 9.6e-5 e um **dropout parameter** de 0.46, que teve um desempenho de 67.80% no conjunto de validação e 72.50% no conjunto de testes. O *gap* entre os dois continua pequeno, mostrando que o conjunto ainda pode ter sua capacidade aumentado antes de ating *overfitting*. 

Tendo esses dois modelos em mãos, vamos fazer o teste do segundo modelo. Como temos um **dropout parameter** razoavelmente alto (próximo de 0.5), podemos aumentar o número de *epochs*. Também será feito as devidas modificações ao fim de fazer a análise do modelo utilizando a ferramenta **Tensor Board** disponibilizada pelo framework *TensorFlow*.

* **Learning Rate**: 3e-4
* **Dropout Parameter**: 0.6
* **Epochs**: 20
* **Batch Size**: 64

In [None]:
lr = 3e-4
drop_param = 0.6
epochs = 20
batch_size = 64

with tf.Session() as sess1:
    # TensorBoard
    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter('/tmp/tensorflow/cifar10/3')
    train_writer.add_graph(sess1.graph)
    
    # Inicializando todas as variáveis
    sess1.run(tf.global_variables_initializer())
    
    with tf.name_scope("accuracy"):
        with tf.name_scope("correct-prediction"):
            correct_prediction = tf.equal(tf.argmax(y_out, 1), y)
        with tf.name_scope("accuracy"):
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, np.float32))
    tf.summary.scalar("accuracy", accuracy)

    # Embaralhando os índices
    train_idx = np.arange(X_train.shape[0])
    np.random.shuffle(train_idx)

    cnt = 0

    for e in range(epochs):
        print("Epoch %d" % (e+1))
        # Calculando quantas iterações existem em uma época
        it_for_epoch = int(math.ceil(X_train.shape[0]/batch_size))
        # Rodando por uma época
        for i in range(it_for_epoch):
            # Gerando índices para o minibatch
            start_idx = (i*batch_size) % X_train.shape[0]
            idx = train_idx[start_idx:start_idx+batch_size]

            # Gerando o feed_dict
            feed_dict = {X: X_train[idx, :],
                         y: y_train[idx],
                         is_training: True,
                         kprob: drop_param,
                         initial_lr: lr }

            # Tamanho real do batch size
            actual_batch_size = y_train[idx].shape[0]

            # Calculando
            if (i % 100 == 0):
                summary_train, train_acc, _ = sess1.run([merged, accuracy, train_step], feed_dict=feed_dict)
                train_writer.add_summary(summary_train, i)
                
                val_dict = {X: X_val,
                            y: y_val,
                            is_training:False,
                            kprob: 1.0,
                            initial_lr: 0 }
                summary_val, val_acc = sess1.run([merged, accuracy], feed_dict = val_dict)
                train_writer.add_summary(summary_val, i)
                print("\tStep %d - Train Acc: %.2f%% -- Val Acc: %.2f%%" % (i, 100*train_acc, 100*val_acc))
            else:
                _ = sess1.run(train_step, feed_dict=feed_dict)
            
        
    train_writer.close()