# Keras

## Tensorflow 1 versus Tensorflow 2

Although TF1 will replace TF2, there is still lots of TF1 code around.  It is useful to know what you are looking at.  

If you ever see `tf.Session as sess`, `tf.placeholder` or `sess`. you are looking at tf1 code.

Some example TF1 code:
```python

features = tf.placeholder(tf.float32, shape=(32,))
output = tf.add(features, 2)

with tf.Session() as sess:
    out = sess.run(output, {features: np.zeros(32)})
```

And the same thing in TF2:
```python
features = tf.zeros(32)
output = features + 2
```

## Keras

Keras is a library that originally wrapped around two deep learning frameworks (Tensorflow & Theano).  In TF2 the integration between Keras & Tensorflow is very tight.

In TF2 Keras offers two API's - the higher level **Sequential API** and a lower level **Functional API**. 

Benefits of Sequential
- eaiser & quicker to develop models
- less flexible

Benefits of Functional
- can handle models with non-linear topology
- weight sharing
- multiple inputs or outputs

Start first with Sequential, then move to the Functional if required.

## Keras Sequential API

Let's make a simple feedforward neural network:

In [None]:
import pandas as pd
import tensorflow as tf

layers = [tf.keras.layers.Dense(n, activation='relu') for n in [16, 8, 4]]

model = tf.keras.Sequential(layers)

#  note the use of gradient clipping here!
opt = tf.keras.optimizers.Adam(lr=0.001, clipnorm=1.0)

model.compile(loss='mse', optimizer=opt)

Our input & output dimensions have been defined for us in the structure of the network. 

We can do predictions:

In [None]:
pred = model.predict(np.zeros([40, 24]))

We can also train the model:

In [None]:
hist = model.fit(x=np.zeros([50, 24]), y=np.ones([50, 4]))

In [None]:
model.fit?

##  MNIST

MNIST is a classic machine learning dataset 
- a contribution from [LeCun, Cortes, Bridges](http://yann.lecun.com/exdb/mnist/) in 1998
- datasets & benchmarks drive ML progress - curating & contributing datasets is important work

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

idx = 7777
print(y_train[idx])
plt.imshow(x_train[idx], cmap='Greys')

We need to normalize the dataset

In [None]:
x_train = x_train.astype(np.float32) / 255
x_test = x_test.astype(np.float32) / 255

In [None]:
def print_dataset(x_train, y_train, x_test, y_test):
    print(x_train.shape, y_train.shape)
    print(x_test.shape, y_test.shape)
    
    #assert np.max(x_train) == np.max(x_test) == 1.0
    
print_dataset(x_train, y_train, x_test, y_test)

Lets try a dense network.  First we need the shapes of the input & output layers:

In [None]:
#  flattening the image 
input_shape = x_train.reshape(x_train.shape[0], -1).shape[1]

#  one node for each class in the output layer
num_classes = len(set(y_train))

print(input_shape, num_classes)

In [None]:
#  using a scale hyperparameter can be useful to eaisly increase model capacity
scale = 6
nodes = np.multiply([8, 4, 2], scale)

layers = [tf.keras.layers.Dense(nodes[0], input_shape=(input_shape,), activation='relu')]

layers += [tf.keras.layers.Dense(n, activation='relu') for n in nodes[1:]]

layers += [tf.keras.layers.Dense(num_classes, activation='softmax')]

dense = tf.keras.Sequential(layers)

opt = tf.keras.optimizers.Adam(lr=0.001, clipnorm=1.0)

dense.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

dense.summary()

In [None]:
hist = dense.fit(
    x_train.reshape(x_train.shape[0], -1),
    y_train,
    epochs=15,
    validation_data=(x_test.reshape(x_test.shape[0], -1), y_test),
    batch_size=128,
    callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=32, write_graph=True, write_grads=True, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None, embeddings_data=None, update_freq='epoch')
]
)

In [None]:
pd.DataFrame(hist.history).loc[:, ['accuracy', 'val_accuracy']].plot()

In [None]:
pd.DataFrame(hist.history).loc[:, ['loss', 'val_loss']].plot()

##  Early stopping

A no-brainer :)

In [None]:
tf.keras.callbacks.EarlyStopping?

In [None]:
scale = 5
nodes = np.multiply([8, 4, 2], scale)

layers = [tf.keras.layers.Dense(nodes[0], input_shape=(input_shape,), activation='relu')]

layers += [tf.keras.layers.Dense(n, activation='relu') for n in nodes[1:]]

layers += [tf.keras.layers.Dropout(0.2)]

layers += [tf.keras.layers.Dense(num_classes, activation='softmax')]

dense = tf.keras.Sequential(layers)

opt = tf.keras.optimizers.Adam(lr=0.001)

dense.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

dense.summary()

In [None]:
hist = dense.fit(
    x_train.reshape(x_train.shape[0], -1),
    y_train,
    epochs=20,
    validation_data=(x_test.reshape(x_test.shape[0], -1), y_test),
    batch_size=128,
    callbacks=[tf.keras.callbacks.EarlyStopping(restore_best_weights=True)]
)

In [None]:
pd.DataFrame(hist.history).loc[:, ['loss', 'val_loss']].plot()

## Keras Functional API

A simple feedforward neural network, setup for MNIST:

In [None]:
features = tf.keras.Input(shape=(784,))

h1 = tf.keras.layers.Dense(64, activation='relu')(features)
h2 = tf.keras.layers.Dense(32, activation='relu')(h1)
classes = tf.keras.layers.Dense(10, activation='softmax')(h2)

model = tf.keras.Model(inputs=features, outputs=classes)
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

In [None]:
hist = model.fit(x_train.reshape(x_train.shape[0], -1), y_train,)