# Red Neuronal Residual (ResNet)

## Autor

1. Alvaro Mauricio Montenegro Díaz, ammontenegrod@unal.edu.co
2. Daniel Mauricio Montenegro Reyes, dextronomo@gmail.com 


## References

1. [Documentación de Keras](https://keras.io/getting-started/sequential-model-guide/)
2. [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf)
3. ResNet v1:[Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf)
4. ResNet v2:[Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf)
4. [Convolutional Neural Networks at Constrained Time Cost](https://arxiv.org/pdf/1412.1710.pdf)

# Introducción

El problema de las redes profundas (deep neuronal network) es que en la medida que el modelo es mas profundo, el gradiente tiende a desvanecerse (volverse cero).

Para remediar esta situación se han introducido varias soluciones. En esta lección se introduce las redes residuales. Este ha sido un tipo muy exitoso de red profunda.

La idea central es que alguna capas interiores son concetadas con las capas mas interiores

# Normalización por lotes

Con el propósito de acelarar el entrenamiento de las redes y para tratar de evitar el desvanecimiento del gradiente Sergey Ioffe y Sergey Ioffe en la paper  [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf) introdujeron la técnica de normalizar por lotes las activaciones $x$ de cada capa en una red. 

Previamente la función de activación Relu se volvió popular para prevenir el desvanecimiento de gradiente causado en buena parte por la utilización del sigmoide y la tangente hiperbólica.

De acuerdo con los autores, el hecho que la distibución de las entradas a cada capa de la red cambia en cada paso del algoritmo dificulta y demonra ñla convergencia. La propuesta que ellos muestran que funciona es hacer una normalización de tipo estadístico.

La normalización es hecha por los bloques de entrenamiento que ingresan a cada paso de la actualización del gradiente.


Consideremos un mini lote $\mathcal{B}$ de tamaño $m$. Dado que la normalización se aplica a cada activación de forma independiente, vamos a
centrarsnos en una activación particular $x^{(k)}$. Omitimos $k$ por claridad.


Se tienen $m$ valores de esta activación en el mini lote, $\mathcal{B} = \{x_1,\ldots,m\}$. Denotemos los valores normalizados como  $\hat{x}_1,\ldots,\hat{x}_m$, y sus transformaciones lineales como  $y_1,\ldots,y_m$.

Los autores se refieren a la transformación

$$
BN_{\gamma,\beta}: x_{1,\ldots,m} \to y_{1 \ldots, m},
$$


como la transformación de normalización por lotes. Los valores $\gamma,\beta$ son parámetros que deben ser aprendidos.

La transformación $BN$ es como sigue. En el algortimo, $\epsilon$ es una constante usada para estabilización numérica.

## Algoritmo 1. 

- **Entrada**: Valores de $x$  sobre un mini-lote: $\mathcal{B} = \{x_{1,\ldots,m}\}$. Los parámetros $\gamma,\beta$ deben ser aprendidos.
- **Salida** $y_i = BN_{\gamma,\beta}(x_i)$

$$
\begin{align}
\mu_{\mathcal{B}} &= \frac{1}{m} \sum_{i=1}^{m} x_i\\
\sigma^2_{\mathcal{B}} &= \frac{1}{m} \sum_{i=1}^{m}(x_i- \mu_{\mathcal{B}} )^2\\
\hat{x}_i &= \frac{x_i - \mu_{\mathcal{B}}}{\sqrt{\sigma^2_{\mathcal{B}}+\epsilon}}\\
y_i &= \gamma \hat{x}_i + \beta \equiv BN_{\gamma,\beta}(x_i)
\end{align}
$$



# Aprendizaje residual profundo 

Esta arquitectura de red fue introducida por Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sunen el artículo [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) para el tratamiento de imágenes. La propuesta consiste en pasar las entradas a una capa de convolución y conbinarlas con la salida de las capas de convolución siguientes(se suman). Para poder hacer la suma es necesario que la salida de la capa tenga la misma dimension de la entrada, por lo que se usa *padding= 'same* en las respectivas capas de convolución.

En caso que la salida de la capa de convolución sea menor, se usa una proyección de $x$ a la  dimension requerida.

La imagen ilustra la diferencia entre la red convolucinada clasica y la de aprendizaje residual.

De acuerdo con lo descrito en la sección anterior, asumimos capas de convolución Conv2D con normalización por mini-lotes (BN) y ativación ReLU. Esto se denota *Conv2D-BN-ReLu*.



<figure>
<center>
<img src="./Imagenes/residual_NN.png" width="800" height="600" align="center"/>
</center>
<figcaption>
<p style="text-align:center">Comparación de la red convolucionada clasica y la de aprendizaje residual</p>
</figcaption>
</figure>


Supongamos que $\mathcal{F}(W_l,x)$ es la salida de la última capa de convolución en la imagen, en donde $x$ denota la entrada y $W_l$ el conjunto de pesos de las convoluciones.

En el gráfico la entrada a la primera capa convolucionada se denota $x_{l-2}$  con entrada $x_{l-1}$ a la siguiente capa convolucionada. La salida de la segunda capa *Conv2D-BN-ReLu* se denota $x_{l-1}$.

Para la red residual se tiene que la salida de la segunda capa, antes de la activación es $\mathcal{F}(W_l,x)$ . Esta salida se combina con $x_{l-2}$ así:

$$
y_l = \mathcal{F}(W_l,x) + W_s x_{l-2},
$$

en donde $ W_s$ es la proyección (si se requiere). Si $\mathcal{F}(W_l,x)$ y $x_{l-2}$ tienen la misma dimensión, $W_s$ es la matriz identidad.

Finalmente

$$
x_l = Relu(y_l).
$$

La función  $\mathcal{F}$ se denomina función residual. Es bastante flexible la forma de esta función. En el ejemplo esta conformada por dos capas *Conv2D-BN-ReLu*. Los experimentos muesrean que dos o tres capas funcionan bien. Una sola capa es posible, pero en este caso, se reduce a una simple transformación lineal. 


# Datos  CIFAR10

El conjunto de datos CIFAR-10 consta de 60000 imágenes en color de 32x32 en 10 clases, con 6000 imágenes por clase. Hay 50000 imágenes de entrenamiento y 10000 imágenes de prueba.

El conjunto de datos original se divide en cinco lotes de entrenamiento y un lote de prueba, cada uno con 10000 imágenes. El lote de prueba contiene exactamente 1000 imágenes seleccionadas al azar de cada clase. Los lotes de entrenamiento contienen las imágenes restantes en orden aleatorio, pero algunos lotes de entrenamiento pueden contener más imágenes de una clase que de otra. Entre ellos, los lotes de entrenamiento contienen exactamente 5000 imágenes de cada clase.

En el siguiente código generamos una muestra de datos para mostrarlos en pantalla.

In [None]:
'''Demonstrates how to sample and plot CIFAR10 images
using Keras API
'''

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# numpy package
import numpy as np
import math

# keras mnist module
from keras.datasets import cifar10

# for plotting
import matplotlib.pyplot as plt


# load dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

class_id = 0
class_count = 0
images = None
for i in range(100):
    while True:
        index = np.random.randint(0, x_train.shape[0], size=1)
        image = x_train[index]
        if y_train[index] == class_id:
            break

    if images is None:
        images = image
    else:
        images = np.concatenate([images, image], axis=0)
    class_count += 1
    if class_count == 10:
        class_id += 1
        class_count = 0
      
print(images.shape)

plt.figure(figsize=(10, 10))
num_images = images.shape[0]
image_size = images.shape[1]
rows = int(math.sqrt(num_images))
row_names = ['{}'.format(row) for row in ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']]
index = 0
for i in range(num_images):
    ax = plt.subplot(rows, rows, i + 1)
    image = images[i, :, :, :]
    image = np.reshape(image, [image_size, image_size, 3])
    plt.imshow(image)
    # plt.axis('off')
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    ax.grid(False)
    ax.xaxis.set_ticks_position('none') 
    ax.yaxis.set_ticks_position('none') 
    if (i % rows) == 0:
        ax.set_ylabel(row_names[index], rotation=45, size='large')
        ax.yaxis.labelpad = 20
        print(row_names[index])
        index += 1

# plt.tight_layout()
plt.savefig("cifar10-samples.png")
plt.show()
plt.close('all')

# ResNet v1 y v2

# imports

In [2]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.keras.layers import Dense, Conv2D
from tensorflow.keras.layers import BatchNormalization, Activation
from tensorflow.keras.layers import AveragePooling2D, Input
from tensorflow.keras.layers import Flatten, add
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import plot_model
from tensorflow.keras.utils import to_categorical
import numpy as np
import os

# Parámetros de entrenamiento

In [3]:
# training parameters
batch_size = 32 # orig paper trained all networks with batch_size=128
#epochs = 200
epochs = 10
data_augmentation = True
num_classes = 10
# subtracting pixel mean improves accuracy
subtract_pixel_mean = True

# Model parameter
# ----------------------------------------------------------------------------
#           |      | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model     |  n   | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
#           |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20  | 3 (2)| 92.16     | 91.25     | -----     | -----     | 35 (---)
# ResNet32  | 5(NA)| 92.46     | 92.49     | NA        | NA        | 50 ( NA)
# ResNet44  | 7(NA)| 92.50     | 92.83     | NA        | NA        | 70 ( NA)
# ResNet56  | 9 (6)| 92.71     | 93.03     | 93.01     | NA        | 90 (100)
# ResNet110 |18(12)| 92.65     | 93.39+-.16| 93.15     | 93.63     | 165(180)
# ResNet164 |27(18)| -----     | 94.07     | -----     | 94.54     | ---(---)
# ResNet1001| (111)| -----     | 92.39     | -----     | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 3

# model version
# orig paper: version = 1 (ResNet v1), 
# improved ResNet: version = 2 (ResNet v2)
version = 1

# computed depth from supplied model parameter n
if version == 1:
    depth = n * 6 + 2
elif version == 2:
    depth = n * 9 + 2

# model name, depth and version
model_type = 'ResNet%dv%d' % (depth, version)
model_type

'ResNet20v1'

# Carga datos y los preprocesa

In [4]:
# load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# input image dimensions.
input_shape = x_train.shape[1:]
# (32, 32, 3)

# normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# if subtract pixel mean is enabled 
# center the column-data around zero
if subtract_pixel_mean:
    x_train_mean = np.mean(x_train, axis=0)
    x_train -= x_train_mean
    x_test -= x_train_mean

# convert class vectors to binary class matrices.
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)


print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)


x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
y_train shape: (50000, 10)


# Programa la actualización de la rata de aprendizaje

In [5]:
def lr_schedule(epoch):
    """Learning Rate Schedule

    Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
    Called automatically every epoch as part of callbacks during training.

    # Arguments
        epoch (int): The number of epochs

    # Returns
        lr (float32): learning rate
    """
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr

# Define la capa ResNet 

In [6]:
def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder

    Arguments:
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)

    Returns:
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

# Define ResNet v1

In [7]:
def resnet_v1(input_shape, depth, num_classes=10):
    """ResNet Version 1 Model builder 

    Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
    Last ReLU is after the shortcut connection.
    At the beginning of each stage, the feature map size is halved
    (downsampled) by a convolutional layer with strides=2, while 
    the number of filters is doubled. Within each stage, 
    the layers have the same number filters and the
    same number of filters.
    Features maps sizes:
    stage 0: 32x32, 16
    stage 1: 16x16, 32
    stage 2:  8x8,  64
    The Number of parameters is approx 
    ResNet20 0.27M
    ResNet32 0.46M
    ResNet44 0.66M
    ResNet56 0.85M
    ResNet110 1.7M

    Arguments:
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    Returns:
        model (Model): Keras model instance
    """
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32)')
    # start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            # first layer but not first stack
            if stack > 0 and res_block == 0:  
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            # first layer but not first stack
            if stack > 0 and res_block == 0:
                # linear projection residual shortcut
                # connection to match changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2

    # add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

# Define ResNet v2

In [8]:
def resnet_v2(input_shape, depth, num_classes=10):
    """ResNet Version 2 Model builder [b]

    Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or 
    also known as bottleneck layer.
    First shortcut connection per layer is 1 x 1 Conv2D.
    Second and onwards shortcut connection is identity.
    At the beginning of each stage, 
    the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, 
    while the number of filter maps is
    doubled. Within each stage, the layers have 
    the same number filters and the same filter map sizes.
    Features maps sizes:
    conv1  : 32x32,  16
    stage 0: 32x32,  64
    stage 1: 16x16, 128
    stage 2:  8x8,  256

    Arguments:
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    Returns:
        model (Model): Keras model instance
    """
    if (depth - 2) % 9 != 0:
        raise ValueError('depth should be 9n+2 (eg 110 in [b])')
    # start model definition.
    num_filters_in = 16
    num_res_blocks = int((depth - 2) / 9)

    inputs = Input(shape=input_shape)
    # v2 performs Conv2D with BN-ReLU
    # on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters_in,
                     conv_first=True)

    # instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'relu'
            batch_normalization = True
            strides = 1
            if stage == 0:
                num_filters_out = num_filters_in * 4
                # first layer and first stage
                if res_block == 0:  
                    activation = None
                    batch_normalization = False
            else:
                num_filters_out = num_filters_in * 2
                # first layer but not first stage
                if res_block == 0:
                    # downsample
                    strides = 2 

            # bottleneck residual unit
            y = resnet_layer(inputs=x,
                             num_filters=num_filters_in,
                             kernel_size=1,
                             strides=strides,
                             activation=activation,
                             batch_normalization=batch_normalization,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_in,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_out,
                             kernel_size=1,
                             conv_first=False)
            if res_block == 0:
                # linear projection residual shortcut connection
                # to match changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters_out,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = add([x, y])

        num_filters_in = num_filters_out

    # add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

# Define el modelo para este entrenamiento (1)

In [9]:
if version == 2:
    model = resnet_v2(input_shape=input_shape, depth=depth)
else:
    model = resnet_v1(input_shape=input_shape, depth=depth)

# Compila del modelo

In [10]:
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=lr_schedule(0)),
              metrics=['accuracy'])
model.summary()
plot_model(model, to_file="%s.png" % model_type, show_shapes=True)
print(model_type)

Learning rate:  0.001
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 32, 32, 16)   448         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 32, 32, 16)   64          conv2d[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         (None, 32, 32, 16)   0           batch_normalization[0][0]        
________________________________________________________________________

# Entrenamiento con datos aumentados

In [14]:
# prepare model model saving directory.
save_dir = os.path.join(os.getcwd(), './saved_models')
model_name = 'cifar10_%s_model.{epoch:03d}.h5' % model_type
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name)

# prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath=filepath,
                             monitor='val_accuracy',
                             verbose=1,
                             save_best_only=True)

lr_scheduler = LearningRateScheduler(lr_schedule)

lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

callbacks = [checkpoint, lr_reducer, lr_scheduler]

# run training, with or without data augmentation.
if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True,
              callbacks=callbacks)
else:
    print('Using real-time data augmentation.')
    # this will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        # set input mean to 0 over the dataset
        featurewise_center=False,
        # set each sample mean to 0
        samplewise_center=False,
        # divide inputs by std of dataset
        featurewise_std_normalization=False,
        # divide each input by its std
        samplewise_std_normalization=False,
        # apply ZCA whitening
        zca_whitening=False,
        # randomly rotate images in the range (deg 0 to 180)
        rotation_range=0,
        # randomly shift images horizontally
        width_shift_range=0.1,
        # randomly shift images vertically
        height_shift_range=0.1,
        # randomly flip images
        horizontal_flip=True,
        # randomly flip images
        vertical_flip=False)

    # compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)

    # fit the model on the batches generated by datagen.flow().
    model.fit(datagen.flow(x_train, y_train, batch_size=batch_size),
                        validation_data=(x_test, y_test),
                        epochs=epochs, verbose=1, 
                        steps_per_epoch=len(x_train)//batch_size,
                        callbacks=callbacks)




Using real-time data augmentation.
  ...
    to  
  ['...']
Train for 1562 steps, validate on 10000 samples
Learning rate:  0.001
Epoch 1/10
Epoch 00001: val_accuracy improved from -inf to 0.69380, saving model to /home/alvaro/Edu_Colombia/Cuadernos/./saved_models/cifar10_ResNet20v1_model.001.h5
Learning rate:  0.001
Epoch 2/10
Epoch 00002: val_accuracy did not improve from 0.69380
Learning rate:  0.001
Epoch 3/10
Epoch 00003: val_accuracy improved from 0.69380 to 0.71710, saving model to /home/alvaro/Edu_Colombia/Cuadernos/./saved_models/cifar10_ResNet20v1_model.003.h5
Learning rate:  0.001
Epoch 4/10
Epoch 00004: val_accuracy improved from 0.71710 to 0.72950, saving model to /home/alvaro/Edu_Colombia/Cuadernos/./saved_models/cifar10_ResNet20v1_model.004.h5
Learning rate:  0.001
Epoch 5/10
Epoch 00005: val_accuracy improved from 0.72950 to 0.74770, saving model to /home/alvaro/Edu_Colombia/Cuadernos/./saved_models/cifar10_ResNet20v1_model.005.h5
Learning rate:  0.001
Epoch 6/10
Epoch 

In [15]:
# score trained model
scores = model.evaluate(x_test,
                        y_test,
                        batch_size=batch_size,
                        verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Test loss: 0.8800467841148376
Test accuracy: 0.7746
