## Getting started: Introduction to Keras

The core data structure of Keras is a __model__, a way to organize layers. The simplest type of model is the [`Sequential`](https://keras.io/getting-started/sequential-model-guide) model, a linear stack of layers. For more complex architectures, you should use the [Keras functional API](https://keras.io/getting-started/functional-api-guide), which allows to build arbitrary graphs of layers.


Here is the `Sequential` model:

```python
from tensorflow.keras.models import Sequential

model = Sequential()
```

Stacking layers is as easy as `.add()`:

```python
from tensorflow.keras.layers import Dense

model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
```

Once your model looks good, configure its learning process with `.compile()`:

```python
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
```

If you need to, you can further configure your optimizer. A core principle of Keras is to make things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
```python
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True))
```

You can now iterate on your training data in batches:

```python
# x_train and y_train are Numpy arrays --just like in the Scikit-Learn API.
model.fit(x_train, y_train, epochs=5, batch_size=32)
```

Evaluate your performance in one line:

```python
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)
```

Or generate predictions on new data:

```python
classes = model.predict(x_test, batch_size=128)
```

Building a question answering system, an image classification model, a Neural Turing Machine, or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?


## Import Necessary Library

In [0]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout

import tensorflow.keras.backend as K
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.python.client import device_lib

plt.style.use('seaborn-white')

print(tf.__version__)


In [0]:
def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos]

get_available_gpus()

In [0]:
np.random.seed(2017)
tf.set_random_seed(2017)

## Download Dataset

In [0]:
fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

## Data Exploration

In [0]:
train_labels

![alt text](https://storage.googleapis.com/allianz-course/data/fashion_mnist_label.jpg =200x400)

In [0]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

In [0]:
# preview some images in each class

plt.figure(figsize=(10,10))
for i in range(4):
    plt.subplot(2,2,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(str(train_labels[i])+': '+class_names[train_labels[i]], size=25)


In [0]:
print("Train image shape:{0}".format(train_images.shape))
print("Test image shape:{0}".format(test_images.shape))
print("Train class: {0}".format(np.unique(train_labels)))
print("Test class: {0}".format(np.unique(test_labels)))

## Data Preprocessing




### Normalizing(feature_scaling)


In machine learning, we want the model to be able to learn the real structures instead of dealing with the scales difference. Therefore, we would normalize data before feeding it into model.

![Normalizing(feature_scaling)](https://storage.googleapis.com/allianz-course/data/feature_scaling.jpg =300x150)

In [0]:
plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
print('Max value in this image: {}'.format(np.amax(train_images[0])))
print('Min value in this image: {}'.format(np.amin(train_images[0])))

In [0]:
# Normalize Data
train_images = train_images / 255.0
test_images = test_images / 255.0

### **One-hot encoding**

The label now is 0, 1, 2, 3, ...., 9 and they are 'symbols' of classes. However, if we use 0,1,2,3,...,9 to indicate classes, there would be ordinal relationship between the classes.  Therefore, one-hot encoding method is applied to label before training.

![alt text](https://storage.googleapis.com/allianz-course/data/one-hot.jpg =600x400)

In [0]:
train_labels = keras.utils.to_categorical(train_labels, num_classes=10)
test_labels = keras.utils.to_categorical(test_labels, num_classes=10)

In [0]:
train_labels

## Train Model

### **(Lab 2-a) Basic Model: model_sig_sgd_001**
* Hidden Layer: [ 128 , 64 ]
* Activation funcition in Hidden Layers: Sigmoid
* Optimizer: SGD
* Learning Rate: 0.001
* Training Epoch: 20


### **(Lab 2-b) Change activation function to Relu: model_relu_sgd_001**

Now, change the activation function in hidden layers to RELU with the following details:

* Hidden Layer: [ 128 , 64 ]
* Activation funcition in Hidden Layers: **Relu**
* Optimizer: SGD
* Learning Rate: 0.001
* Training Epoch: 20

### (Lab 2-c) Change optimizer to Adam: model_relu_adam_001

Now, change the optimizer to Adam with the following details:

* Hidden Layer: [ 128 , 64 ]
* Activation funcition in Hidden Layers: Relu
* Optimizer: **Adam**
* Learning Rate: 0.001
* Training Epoch: 20

***Hint***: 

To use adam optimizer:
```python
# learning rate is lr
opt = tf.train.AdamOptimizer(learning_rate = lr)
```

In [0]:
# Clean session fisrt
K.clear_session()

# Start building the model 
model_relu_adam_001 = Sequential()
model_relu_adam_001.add(Flatten(input_shape=(28, 28)))
model_relu_adam_001.add(Dense(128, activation = 'relu'))
model_relu_adam_001.add(Dense(64, activation = 'relu'))
model_relu_adam_001.add(Dense(10, activation='softmax'))

############# START CODING HERE #############

# create a adam optimizer with learning rate 0.001 (~ 1 line)(hint: tf.train.AdamOptimizer(learning_rate = n))
opt = tf.train.AdamOptimizer(learning_rate = 0.001)

############# END CODING HERE ###############

model_relu_adam_001.compile(loss='categorical_crossentropy',
                           optimizer = opt,
                           metrics = ['accuracy'])

# Use .summary() to see model details
model_relu_adam_001.summary()

In [0]:
train_relu_adam_001 = model_relu_adam_001.fit(train_images, 
                                              train_labels, 
                                              epochs = 20, 
                                              batch_size = 128, 
                                              validation_split = 0.05, 
                                              shuffle = False)


In [0]:
res = model_relu_adam_001.evaluate(test_images, test_labels)
print(f'Testing Accuracy of model_relu_adam_001: {res[1]}')

###(Lab 2-d) Change Learning Rate to 0.01: model_relu_adam_01

Now, change the learning rate to 0.01 with the following details:

* Hidden Layer: [ 128 , 64 ]
* Activation funcition in Hidden Layers: Relu
* Optimizer: Adam
* Learning Rate: **0.01**
* Training Epoch: 20



### (Lab 2-e) Add more neurons in hidden layers(overfit) : model_large_relu_adam_001

Now, change hidden layer neurons to [2048, 1024] with the following details:

* Hidden Layer: [ 2048 , 1024 ]
* Activation funcition in Hidden Layers: **Relu**
* Optimizer: Adam
* Learning Rate: 0.001
* Training Epoch: 20

***Hint***: 

To use adam optimizer with learning rate lr:
```python
# learning rate is lr
opt = tf.train.AdamOptimizer(learning_rate = lr)
```


In [0]:
# Clean session fisrt
K.clear_session()

# Start building the model 
model_large_relu_adam_001 = Sequential()
model_large_relu_adam_001.add(Flatten(input_shape=(28, 28)))

############# START CODING HERE #############

# Create hidden layer with 2048 neurons and relu activation function (~ 1 line)
model_large_relu_adam_001.add(Dense(2048, activation = 'relu'))


# Create hidden layer with 1024 neurons and relu activation function (~ 1 line)
model_large_relu_adam_001.add(Dense(1024, activation = 'relu'))

# Create an output layer with 10 neurons and softmax activation function (~ 1 line)
model_large_relu_adam_001.add(Dense(10, activation = 'softmax'))


# Create a Adam optimizer with learning rate = 0.001 ( ~ 1 line)(hint: tf.train.AdamOptimizer(learning_rate = n))
opt = tf.train.AdamOptimizer(learning_rate = 0.001)


############# END CODING HERE ###############



model_large_relu_adam_001.compile(loss='categorical_crossentropy',
                                  optimizer = opt,
                                  metrics = ['accuracy'])

# Use .summary() to see model details
model_large_relu_adam_001.summary()


In [0]:
train_large_relu_adam_001 = model_large_relu_adam_001.fit(train_images, 
                                                          train_labels, 
                                                          epochs = 20, 
                                                          batch_size = 128, 
                                                          validation_split = 0.05, 
                                                          shuffle = False)


In [0]:
plt.figure(figsize=(10,5))
epoch = len(train_large_relu_adam_001.history["loss"])
plt.plot(np.arange(1, epoch+1), train_large_relu_adam_001.history['loss'], label='Train', lw=3)
plt.plot(np.arange(1, epoch+1), train_large_relu_adam_001.history['val_loss'], label='Validation', lw=3)
plt.ylabel('loss', family='serif', size=14)
plt.xlabel('Epoch #', family='serif', size=14)
plt.xticks(np.arange(1, epoch+1))
plt.xlim([1, epoch])
plt.legend(prop={'size':14, 'family':'serif'})
plt.show()

## **Overfit Solution**



### (Lab 2-f) Early Stopping

Stop training when a monitored quantity has stopped improving.

```python

keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, mode='auto')

```
* monitor: quantity to be monitored.

* patience: number of epochs with no improvement after which training will be stopped.

* mode: one of {auto, min, max}.    
<br/>


### (Lab 2-g) Dropout



![alt text](https://storage.googleapis.com/allianz-course/data/dropout.jpg =400x200)

<br/>
Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting.
```python
keras.layers.Dropout(rate)
```
rate: float between 0 and 1. Fraction of the input units to drop.


### (Lab 2-h) Regularization

In [0]:
from tensorflow.keras import regularizers

In [0]:
# Clean session fisrt
K.clear_session()

# Start building the model 
model_large_relu_adam_001_re = Sequential()

model_large_relu_adam_001_re.add(Flatten(input_shape=(28, 28)))

model_large_relu_adam_001_re.add(Dense(2048, 
                                       activation = 'relu', 
                                       kernel_regularizer=regularizers.l2(0.001)))

############# START CODING HERE ############

# Add layer Dense layer with 1024 neurons, activation funtion = 'relu' and 
# l2 regularizer with lambda = 0.001




############# END CODING HERE #############


model_large_relu_adam_001_re.add(Dropout(0.5))

model_large_relu_adam_001_re.add(Dense(10, 
                                       activation='softmax'))

opt = tf.train.AdamOptimizer(learning_rate = 0.001)

model_large_relu_adam_001_re.compile(loss='categorical_crossentropy',
                                     optimizer = opt,
                                     metrics = ['accuracy'])

# Use .summary() to see model details
model_large_relu_adam_001_re.summary()

####Solution

In [0]:
# model_large_relu_adam_001_re.add(Dense(1024, 
#                                        activation = 'relu', 
#                                        kernel_regularizer=regularizers.l2(0.001)))

In [0]:
train_model_large_relu_adam_001_re = model_large_relu_adam_001_re.fit(train_images,
                                                                      train_labels, 
                                                                      epochs=20, 
                                                                      validation_split = 0.05,
                                                                      batch_size = 128,
                                                                      shuffle = False)

In [0]:
plt.figure(figsize=(10,5))
epoch = len(train_model_large_relu_adam_001_re.history["loss"])
plt.plot(np.arange(1, epoch+1), train_model_large_relu_adam_001_re.history['loss'], label='Train', lw=3)
plt.plot(np.arange(1, epoch+1), train_model_large_relu_adam_001_re.history['val_loss'], label='Validation', lw=3)
plt.ylabel('loss', family='serif', size=14)
plt.xlabel('Epoch #', family='serif', size=14)
plt.xticks(np.arange(1, epoch+1))
plt.xlim([1, epoch])
plt.legend(prop={'size':14, 'family':'serif'})
plt.show()

# Convolutional Neural Network

In [0]:
from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, BatchNormalization, Activation

### (Lab 2-i) Create a basic CNN model

In [0]:
K.clear_session()

model_basic_cnn = keras.Sequential()

model_basic_cnn.add(Conv2D(filters=32,
                           kernel_size=(3,3),
                           input_shape=(28,28,1),
                           padding='same',
                           activation='relu'))

model_basic_cnn.add(MaxPool2D(pool_size=(2,2),
                              strides=(2,2)))

model_basic_cnn.add(Conv2D(filters=64,
                           kernel_size=(3,3),
                           padding='same',
                           activation='relu'))

model_basic_cnn.add(MaxPool2D(pool_size=(2,2),
                              strides=(2,2)))

############# START CODING HERE #############

# Create a Convolution 2D layer with 128 filters, 
# kernel size =(3,3), padding = 'same', activation = 'relu'(~ 1 line)




# Create a 2D Max pooling layer with pooling size = (2,2) and strides = (2,2) (~ 1 line)



############# END CODING HERE #############


model_basic_cnn.add(Flatten())

model_basic_cnn.add(Dense(10, activation='softmax'))

model_basic_cnn.summary()


####Solution

In [0]:
# model_basic_cnn.add(Conv2D(filters=128,
#                            kernel_size=(3,3),
#                            padding='same',
#                            activation='relu'))



# model_basic_cnn.add(MaxPool2D(pool_size=(2,2),
#                               strides=(2,2)))

In [0]:
opfunc = tf.train.AdamOptimizer(learning_rate = 0.001) 

model_basic_cnn.compile(optimizer = opfunc, 
                        loss = 'categorical_crossentropy',
                        metrics = ['accuracy'])

train_basic_cnn = model_basic_cnn.fit(np.expand_dims(train_images, -1), 
                                      train_labels, 
                                      batch_size=256,
                                      epochs=20, 
                                      validation_split = 0.05,
                                      shuffle = False,
                                      verbose=1)

In [0]:
test_model_basic_cnn = model_basic_cnn.evaluate(np.expand_dims(test_images, -1) , test_labels)
print(f'Testing Accuracy of the Basic CNN: {test_model_basic_cnn[1]}')


## (Lab 2-j) Create a advanced CNN model

In [0]:
K.clear_session()

model_cnn = keras.Sequential()

model_cnn.add(Conv2D(filters=32,
                     kernel_size=(3,3),
                     input_shape=(28,28,1),
                     padding='valid',
                     use_bias=False,
                     activation=None))

model_cnn.add(BatchNormalization())

model_cnn.add(Activation('relu'))

model_cnn.add(Conv2D(filters=64,
                     kernel_size=(3,3),
                     padding='valid',
                     use_bias=False,
                     activation=None))

model_cnn.add(BatchNormalization())

model_cnn.add(Activation('relu'))

############# START CODING HERE #############

# Create a Convolution 2D layer with 128 filters, 
# kernel size =(3,3), padding = 'same', not using bias and no activation(~ 1 line)



# Add Batch Normalization (~ 1 line)


# Add activation = relu  (~ 1 line)


# Create a 2D Max pooling layer with pooling size = (2,2) and strides = (2,2) (~ 1 line)


############# END CODING HERE #############

model_cnn.add(Conv2D(filters=256,
                     kernel_size=(3,3),
                     padding='valid',
                     use_bias=False,
                     activation=None))

model_cnn.add(BatchNormalization())  

model_cnn.add(Activation('relu'))

model_cnn.add(Conv2D(filters=512,
                     kernel_size=(3,3),
                     padding='valid',
                     use_bias=False,
                     activation=None))

model_cnn.add(BatchNormalization())

model_cnn.add(Activation('relu'))

model_cnn.add(MaxPool2D(pool_size=(8,8),
                        strides=(1,1)))  

model_cnn.add(Flatten()) 

model_cnn.add(Dense(10, activation='softmax'))

model_cnn.summary()


####Solution

In [0]:
# model_cnn.add(Conv2D(filters=128,
#                      kernel_size=(3,3),
#                      padding='same',
#                      use_bias=False,
#                      activation=None))

# model_cnn.add(BatchNormalization())  

# model_cnn.add(Activation('relu'))

# model_cnn.add(MaxPool2D(pool_size=(2,2),
#                         strides=(2,2)))

In [0]:
opfunc = tf.train.AdamOptimizer(learning_rate = 0.001)

model_cnn.compile(optimizer = opfunc, 
                  loss = 'categorical_crossentropy',
                  metrics = ['accuracy'])

train_model_cnn = model_cnn.fit(np.expand_dims(train_images, -1),
                                train_labels, 
                                batch_size=256,
                                epochs=20, 
                                validation_split = 0.05,
                                verbose=1)

In [0]:
plt.figure()
epoch = len(train_relu_adam_001.history["loss"])
plt.plot(np.arange(1, epoch+1), train_relu_adam_001.history['val_acc'], label='DNN', lw=3)
plt.plot(np.arange(1, epoch+1), train_basic_cnn.history['val_acc'], label='Basic CNN', lw=3)
plt.plot(np.arange(1, epoch+1), train_model_cnn.history['val_acc'], label='Advanced CNN', lw=3)
plt.ylabel('Accuracy', family='serif', size=14)
plt.xlabel('Epoch #', family='serif', size=14)
plt.xticks(np.arange(1, epoch+1))
plt.xlim([1, epoch])
plt.legend(prop={'size':14, 'family':'serif'})
plt.show()

In [0]:
test_result=model_cnn.evaluate(np.expand_dims(test_images, -1)  , test_labels)
print(f'Testing Accuracy of the Advanced CNN: {test_result[1]}')


## (Lab 2-k) MobileNet

In [0]:
from tensorflow.keras.applications.mobilenet import MobileNet
from tensorflow.keras.layers import Lambda, Input
from tensorflow.keras.models import Model
import cv2

In [0]:
K.clear_session()

# The minimun size accepted by mobilenet is 32, and our images now is 28x28.
# Images would be resizd to 2x.

height,width = 56, 56

input_image = Input(shape=(height,width))

# Mobilenet trained on 3 channel images(RGB). Here, we expand dimension to 3 channel, 
# and use the value for all 3 channel

input_image_ = Lambda(lambda x: K.repeat_elements(K.expand_dims(x,-1),3,3))(input_image)

model_mobilenet_base = MobileNet(input_tensor=input_image_,
                                 weights='imagenet',
                                 include_top=False, 
                                 pooling='avg')

x = Dropout(0.5)(model_mobilenet_base.output)

predict = Dense(10, activation='softmax')(x)

model_mobilenet = Model(inputs=input_image, outputs=predict)

opfunc = tf.train.AdamOptimizer(learning_rate = 0.001)

model_mobilenet.compile(optimizer=opfunc, loss='categorical_crossentropy', metrics=['accuracy'])

model_mobilenet.summary()

In [0]:
resized_train_images = np.array([cv2.resize(x, (height, width)).astype(float) for x in train_images])

train_mobilenet = model_mobilenet.fit(resized_train_images, 
                                          train_labels,
                                          batch_size=256,
                                          epochs=20, 
                                          validation_split=0.05,                           
                                          verbose=1)

In [0]:
resized_test_images = np.array([cv2.resize(x, (height, width)).astype(float) for x in test_images])
test_result = model_mobilenet.evaluate(resized_test_images , test_labels)


In [0]:
plt.figure()
epoch = len(train_relu_adam_001.history["loss"])

plt.plot(np.arange(1, epoch+1), train_relu_adam_001.history['val_acc'], label='DNN', lw=3)
plt.plot(np.arange(1, epoch+1), train_basic_cnn.history['val_acc'], label='Basic CNN', lw=3)
plt.plot(np.arange(1, epoch+1), train_model_cnn.history['val_acc'], label='Advanced CNN', lw=3)
plt.plot(np.arange(1, epoch+1), train_mobilenet.history['val_acc'], label='MobileNet', lw=3)

plt.ylabel('Accuracy', family='serif', size=14)
plt.xlabel('Epoch #', family='serif', size=14)
plt.xticks(np.arange(1, epoch+1))
plt.xlim([1, epoch])
plt.legend(prop={'size':14, 'family':'serif'})
plt.show()

In [0]:
print(f'Testing Accuracy of MobileNet: {test_result[1]}')
