 # Keras Tutorial

Keras is a high-level neural networks API, written in Python and capable of running on top of certain lower-level frameworks like Tensorflow. It is one level above Tensorflow and aims at implementing the Deep Learning pipelines easily and quickly.

For our final project, we will be using Keras to build the entire pipeline from scratch. This includes data pre-processing, feature extraction (with CNN) and classification (with fully-connected nets). The goal of this tutorial is to give you adequate knowledge to prepare you for the final project.

What is good about Keras compared with other Deep Learning frameworks:
* It's high-level, which means that you can implement complex things with several lines of simple code
* It works directly with NumPy arrays, so you don't have to spend extra time on creating a Python class for dataset like PyTorch

Remember in the class we talked about the pipeline of a real computer vision system, in which we:

1. First clean the data to the format to be used for later steps (which includes data loading, data pre-processing, dataset splitting (we'll talk about this on Friday), data augmentation (which we're not gonna cover), etc);

2. Then we build the model for feature extraction as well as for final regression / classification. Remember we have many choices like linear model, fully connected neural nets, convolutional neural nets, etc. And we can implement these models very easily in Keras with just one line of code;

3. After we get the data and the model, we need to code up the optimization part (for which we'll use gradient descent). 

In this tutorial, we'll go over these parts sequentially.

## Data Loading and Pre-processing

So in Keras we don't need anything specific for data, we just use NumPy and represent our data in Numpy arrays. Now we're gonna create some fake data to be used later.

In [1]:
# Import necessary packages
import numpy as np

In [4]:
# Create random numpy arrays (ldata loading)
rand_data = np.random.random((1000, 32, 32, 3)) # We have 1000 fake images with spatial size 32 * 32
rand_label = np.array([0]*500 + [1]*500)        # Create fake binary labels for these images  

print(rand_data.shape)

(1000, 32, 32, 3)


In [6]:
# Split data into train, validation and test sets (we'll talk more about this on Friday)
train_ratio, val_ratio = 0.9, 0.05

X_train = rand_data[:int(rand_data.shape[0]*train_ratio), ...] # ... means all the other axes
y_train = rand_label[:int(rand_data.shape[0]*train_ratio), ...]

X_val = rand_data[int(rand_data.shape[0]*train_ratio):int(rand_data.shape[0]*(train_ratio+val_ratio)), ...]
y_val = rand_label[int(rand_data.shape[0]*train_ratio):int(rand_data.shape[0]*(train_ratio+val_ratio)), ...]

X_test = rand_data[int(rand_data.shape[0]*(train_ratio+val_ratio)):, ...]
y_test = rand_label[int(rand_data.shape[0]*(train_ratio+val_ratio)):, ...]

print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

(900, 32, 32, 3)
(50, 32, 32, 3)
(50, 32, 32, 3)


## Model construction

Now we have all the data, next we're gonna build our model for feature extraction as well as classification. In Keras, you can easily build many models, as shown below.

In [7]:
import keras
from keras.models import Sequential # Sequential is one of the main models in Keras, which is basically a sequentially stacked series of layers

model = Sequential() # Initialize a Sequential model instance

Using TensorFlow backend.
W0705 09:25:03.882609 140736799790016 deprecation_wrapper.py:119] From /anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.



In [8]:
# First we'll use fully-connected neural nets
from keras.layers import Dense # Dense is Keras's name for fully connected layers

# We can stack layers like lego blocks by simplying using `add()`
# `units` is the number of neurons
# `activation` is the nonlinear function we add for each layer
# We only need to specify `input_dim` which is the input dimension for the layer for the input layer, because for later layers the input is just the output from last layer
# Once again, the number of neurons in hidden layers (e.g., 64 and 16 here) are design choices

model.add(Dense(units=64, activation='sigmoid', input_dim=32*32*3)) 
model.add(Dense(units=16, activation='sigmoid'))
model.add(Dense(units=1, activation='sigmoid'))

W0705 09:25:47.055863 140736799790016 deprecation_wrapper.py:119] From /anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0705 09:25:47.068882 140736799790016 deprecation_wrapper.py:119] From /anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.



In [9]:
# Once the model is build, we then configure the learning process with `compile()`
# We need to specify the loss function, the optimizer and the metric we use to evaluate our model
# For loss here we're using a function called binary cross-entropy loss, which is specifically for binary classification
# For optimizer we're using gradient descent, which is written as 'sgd' in Keras
# Since we're doing classification, normally the classification accuracy is how we evaluate the model

model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

W0705 09:27:17.274795 140736799790016 deprecation_wrapper.py:119] From /anaconda3/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0705 09:27:17.315573 140736799790016 deprecation_wrapper.py:119] From /anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3376: The name tf.log is deprecated. Please use tf.math.log instead.

W0705 09:27:17.327483 140736799790016 deprecation.py:323] From /anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [10]:
# The above is actually a convenient way that Keras provides for easy implementation. If you want to have more control over the learning process (e.g., the learning rate), you can use the following:

model.compile(loss=keras.losses.binary_crossentropy, optimizer=keras.optimizers.SGD(lr=0.001)) # specify learning rate

In [11]:
# Up to this point we're all doing configurations. Now everything is set up so we're letting the model do real things!

# Since now we're using a fully-connected nets, remember we need to flatten the image to a single long vector first
X_train_flat = X_train.reshape((-1, 32*32*3)) # -1 means letting NumPy to figure this axis out automatically
X_val_flat = X_val.reshape((-1, 32*32*3))
X_test_flat = X_test.reshape((-1, 32*32*3))

print(X_train_flat.shape)
print(X_val_flat.shape)
print(X_test_flat.shape)

# Then use fit() to actually train our model
# epochs is basically how many iterations we want for the update process. The model needs some time to reach the optimal state!
# batch_size is how many images we use each time to estimate the gradient. Remember that the more we use the more accurate each update will be, but it will also be slower

model.fit(X_train_flat, y_train, epochs=5, batch_size=32, validation_data=(X_val_flat, y_val))
#epochs - how many iterations
#batch size - sgd batch

(900, 3072)
(50, 3072)
(50, 3072)


W0705 09:30:14.012826 140736799790016 deprecation_wrapper.py:119] From /anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.



Train on 900 samples, validate on 50 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0xb36289198>

In [13]:
# Now let's see how our model does
acc = model.evaluate(X_test_flat, y_test)
print('The test accuracy is: {}'.format(acc))

# And make predictions
prob = model.predict(X_test_flat) # These are probabilities, and we want to convert them to class labels
label = np.array(prob > 0.5, dtype=int)

print('The predicted probabilities are: {}'.format(prob))
print('The predicted class labels are: {}'.format(label))


The test accuracy is: 1.0107408666610718
The predicted probabilities are: [[0.36616594]
 [0.35237712]
 [0.36673626]
 [0.35590237]
 [0.3516262 ]
 [0.3755673 ]
 [0.34726173]
 [0.38540763]
 [0.36488223]
 [0.35786822]
 [0.3634643 ]
 [0.36378264]
 [0.3635221 ]
 [0.36671293]
 [0.37281144]
 [0.37322086]
 [0.35881683]
 [0.3760087 ]
 [0.36336446]
 [0.36022934]
 [0.3595308 ]
 [0.36114463]
 [0.36704606]
 [0.36410132]
 [0.36501738]
 [0.36035633]
 [0.3787157 ]
 [0.3612965 ]
 [0.36609855]
 [0.360236  ]
 [0.36341494]
 [0.36851645]
 [0.3683704 ]
 [0.36051065]
 [0.3696788 ]
 [0.35962012]
 [0.35956424]
 [0.3586552 ]
 [0.36346418]
 [0.35310268]
 [0.36983705]
 [0.35319704]
 [0.3713618 ]
 [0.35150304]
 [0.36883247]
 [0.36370438]
 [0.36377287]
 [0.3889732 ]
 [0.34508684]
 [0.37197754]]
The predicted class labels are: [[0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0

In [16]:
# As we can expect, the results are totally random
# You can also play with other models, e.g., convnets
# So we do the same procedure once more

model = Sequential() # Re-initialize the model

# Feature extractor
# We're using such an architecture: conv -> maxpool -> conv -> maxpool
# 'same' padding means we zero-pad the images so that the output will be of the same size as the input
model.add(keras.layers.Conv2D(filters=16, kernel_size=3, strides=(2, 2), padding='same'))
model.add(keras.layers.Activation('sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2))) # By default the stride is the same as the pooling size

model.add(keras.layers.Conv2D(filters=32, kernel_size=2, strides=(1, 1), padding='same'))
model.add(keras.layers.Activation('relu')) # ReLU is another kind of non-linear function
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))

# Classifier
# We're using a 2-layer FC net for classification 
model.add(keras.layers.Flatten()) #flatten before fully connected part

model.add(keras.layers.Dense(32))
model.add(keras.layers.Activation('relu'))

model.add(keras.layers.Dense(1))
model.add(keras.layers.Activation('sigmoid'))

# Compilation
model.compile(loss=keras.losses.binary_crossentropy, optimizer=keras.optimizers.SGD(lr=0.001))

# Training
model.fit(X_train, y_train, epochs=30, batch_size=32, validation_data=(X_val, y_val))

# Evaluation
acc = model.evaluate(X_test, y_test)
print('The test accuracy is: {}'.format(acc))

# And make predictions
prob = model.predict(X_test) # These are probabilities, and we want to convert them to class labels
label = np.array(prob > 0.5, dtype=int)

print('The predicted probabilities are: {}'.format(prob))
print('The predicted class labels are: {}'.format(label))

Train on 900 samples, validate on 50 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
The test accuracy is: 0.8035309720039367
The predicted probabilities are: [[0.45778173]
 [0.44722468]
 [0.4344414 ]
 [0.450739  ]
 [0.45624125]
 [0.44665116]
 [0.4468334 ]
 [0.45389217]
 [0.44794866]
 [0.43932387]
 [0.44235206]
 [0.4422893 ]
 [0.4503607 ]
 [0.44908655]
 [0.44776154]
 [0.4482193 ]
 [0.44707814]
 [0.44870153]
 [0.4453558 ]
 [0.44067398]
 [0.4392417 ]
 [0.44654492]
 [0.4539744 ]
 [0.4470567 ]
 [0.44811758]
 [0.44737962]
 [0.4486566 ]
 [0.4454322 ]
 [0.45271772]
 [0.44514996]
 [0.44809157]
 [0.44471633]
 [0.45039332]
 [0.44262478]
 [0.45571995]
 [0.44439033]
 [0.44184628]
 [0.44318813