Welcome to the Deep Learning Lab :)  
Before starting this journey, here is a couple of ways to **load data into Colab** (in case you haven't done it before).  
Colab can generate this scripts for you by clicking on the code icon (<>) on the left bar and selecting the code snippet you want.

**Loading files from your local drive**

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

You can access uploaded files via the folder icon on the left bar. You can also manipulate files by code, of course :)

In [None]:
with open('plain_text_file.txt', 'r') as f:
  for line in f.readlines():
    print(line)

Or you can access directly your Google Drive!

In [None]:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive

In [None]:
!ls


# Introduction to Keras



Keras is a high-level, flexible library for deep learning experiments.  
It is tightly integrated with Tensorflow, which provides low-level support.  

**Warning:** Unless you are confident with what you are doing, in the beginning it is better if you stick with Keras as much as possible.  
Use Tensorflow only when there is no alternative.

## Where do I start if I want to learn Keras?

Getting started *guide*: https://keras.io/getting_started/  
You may want to choose *Introduction to Keras for Engineers*.

Developer guide: https://keras.io/guides/  

API documentation: https://keras.io/api/

In [None]:
from tensorflow import keras as K
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
%matplotlib inline

## Data manipulation

In [None]:
help(make_classification)

In [None]:
N_CLASSES = 3
N_PATTERNS_PER_CLASS = 5000

N_PATTERNS = N_CLASSES * N_PATTERNS_PER_CLASS
X, y = make_classification(n_samples=N_PATTERNS, n_classes=N_CLASSES, n_informative=5)

In [None]:
X.shape, y.shape

In [None]:
X.dtype, y.dtype

In [None]:
test_size = int(0.25 * y.shape[0])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, shuffle=True, stratify=y)

In [None]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

### Enter Keras

Well, actually you can use numpy arrays directly in Keras, you don't have to do much...  
You can also use Python generators

But believe me, you will need something more advanced sooner or later.   
Let's build a real Keras (actually, a Tensorflow) `Dataset` from those arrays!


In [None]:
dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))

Ideally, you would like to use your dataset in a **training loop**

In [None]:
for x, y in dataset:
    print(x.shape)
    break

mmmm... no batch size?

In [None]:
dataset = dataset.shuffle(buffer_size=1024).batch(32)
for x, y in dataset:
    print(x.shape)
    break

You can create a dataset from different sources (e.g. files in your hard disk).
For example, try to build a dataset [from `csv` file](https://www.tensorflow.org/api_docs/python/tf/data/experimental/make_csv_dataset).

**Exercise**: try to get acquainted with Tensorflow datasets. Try to build data from different sources (tensors, csv files, plain text file, folder structure...). Try to build datasets with one or more elements per iteration. 

### Data Preprocessing

In [None]:
from tensorflow.keras.layers.experimental.preprocessing import Rescaling, Normalization

Classes which take care of preprocessing your dataset. They can also contain a state (e.g. mean and std of your data). Updated with the `adapt` call (the `fit` method of scikit-learn).  
**N.B.** the call to `adapt` changes the internal state of the normalizer, not the data!

In [None]:
norm_layer = Normalization(axis=-1)
norm_layer.adapt(X_train)
norm_layer.mean, norm_layer.variance
normalized_X_train = norm_layer(X_train)
print(np.mean(normalized_X_train))
print(np.var(normalized_X_train))

Rescaling and Normalization operates very similarly. However, `Rescaling` does not require the call to `adapt` since it has no internal state.

In [None]:
help(Rescaling)

## Functional API 

This is one of the powerful features of Keras. **Easily build complex models!**

A model is composed by **layers**.

A layer takes an input and returns an output (usually by using adaptive parameters).  
A model is built by composing many layers and it also exposes a more complex interface with methods for training, inference etc...

In [None]:
model = K.Sequential()
model.add(K.layers.Dense(units=64, activation='relu'))
model.add(K.layers.Dense(units=N_CLASSES, activation='softmax'))
#model.summary() # will fail! what is the input of this model?

Keras does not require you to specify the `Input` of a model.  
Instead, it tries to dynamically infer the model input layer when yuo call it with data. However, you can always specify it explicitly.

We need **sparse_categorical_crossentropy** because we are *not* dealing with one-hot targets but with numerical targets.  
You can use **categorical_crossentropy** if the targets are one-hot encoded.  
Keras can convert to one-hot: `K.utils.to_categorical`.  
[You can also use scikit-learn to encode your targets in one-hot form](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html).

### Training

Keras metrics description: https://keras.io/api/metrics/

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=64)
model.summary()

You can also pass a Keras dataset to the fit. Try it out!

Since I did not specify the `Input` what will happen if I change the input size?

In [None]:
# model.fit(X_train[:, :3], y_train, epochs=10, batch_size=64)

What's happening under-the-hood of `fit`?  
Basically, loop over training set, compute model predictions, compute loss, compute gradients, update model parameters (and much more). We will see how to implement a basic fit from scratch later on. We will need Tensorflow for that.

### Evaluation

*Keras* returns loss and all the metrics you previously specified

In [None]:
metrics = model.evaluate(X_test, y_test)
metrics # loss, accuracy

In [None]:
predictions = model.predict(X_test[:20])
print(predictions.shape)
predictions.argmax(axis=1)

You can use metrics also standalone by instantiating them, calling `update_states`, `reset_states` and `result`.

### Save your model and load it again

Fundamental to manage long training processes and to use your trained model for inference.

`Model serialization` helps also when training on colab (runtime can disconnect after a while).

In [None]:
# save weights, optimizer state, model topology
model.save('my_model.h5') # common file format to save models
del model
# if it was already compiled, it will be compiled and viceversa
loaded_model = K.models.load_model('my_model.h5')

Alternatively, you can only save and load the weights. Try the `save_weights` and `load_weights`.  
Model saving guide: https://keras.io/guides/serialization_and_saving/

### Functional API v2

***Alternative (but similar) way to use the functional API*** *italicized text*

None is used in a tensor size when you don't know the size. In the functional API, `batch size` is assumed to be None and added by default.

In [None]:
# input layer
inputs = K.Input(shape=(20,)) # here the size is (None, 20)
x = norm_layer(inputs)
x = K.layers.Dense(units=64, activation='relu')(x)
outputs = K.layers.Dense(units=N_CLASSES, activation='softmax')(x)
model = K.Model(inputs=inputs, outputs=outputs)
model.summary()

In [None]:
model.compile(optimizer=K.optimizers.SGD(learning_rate=1e-2), loss=K.losses.SparseCategoricalCrossentropy())
model.fit(X_train, y_train, batch_size=64, epochs=10)

In [None]:
metrics = model.evaluate(X_test, y_test)
metrics

## Use validation dataset and plot learning curves

In [None]:
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, 
                                        test_size=int(0.25*y_train.shape[0]), shuffle=True, stratify=y_train)
X_train.shape, y_train.shape, X_valid.shape, y_valid.shape

we should recompute the `Normalization`, because statistics are computed also on what now is the validation dataset! Anyway...

In [None]:
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_valid, y_valid))
vars(history)

In [None]:
help(K.Model.fit)

In [None]:
help(K.callbacks.History)

Mmm... callbacks? Seems interesting!

In [None]:
plt.plot(history.history['loss'], 'b-', label='train_loss')
plt.plot(history.history['val_loss'], 'r--', label='validation_loss')
plt.legend(loc='best')

**Exercise**: at this point you have more or less all you need to perform basic DL experiments. Try to train your first model on a Keras dataset and see what happens. Do not focus on performance, but rather on setting up your code to be reused later.

## Checkpointing the model

Callbacks are functions that are called at particular moments in time automatically. You can pass callbacks to the `fit` function.
https://keras.io/api/callbacks/

In [None]:
help(K.callbacks.ModelCheckpoint)

In [None]:
callback_list = [K.callbacks.ModelCheckpoint(
                    filepath='model_{epoch}',
                    save_freq='epoch')]

In [None]:
history = model.fit(dataset, epochs=10, callbacks=callback_list)
metrics = model.evaluate(X_test, y_test)
print(metrics)

## Eager execution vs. compiled execution!

Compiled means that each line of code adds a component to the **computational graph**, it does not execute what the line states.  
After PyTorch came in, the advantages of eager execution became evident, especially when it comes to rapid prototyping and debugging. In eager execution, each line of code is immediately executed and the results returned to the user (imperative programming interface). You can use `print` and debugger to see results of your operations.  
For deployment in real world applications, compiled is far more efficient (PyTorch now provides support also for this version).

Keras run the model in compiled version by default, TF now uses eager by default. That means you cannot debug it line by line or through prints. To enable eager execution, set `run_eagerly=True` in `compile` call.  
In TF you can use a TF Function to pass to compiled graph: https://www.tensorflow.org/guide/function

## GPU or CPU

TF and Keras automatically use GPU, when available.
You can specify where to send each tensor explicitly, if you prefer. 

In [None]:
import tensorflow as tf
with tf.device('/CPU:0'):
  a = tf.Variable([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

In [None]:
a.device

Check out if there is a GPU available

In [None]:
len(tf.config.list_physical_devices('GPU'))

In [None]:
b = tf.Variable([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b.device

# Low Level API - Tensorflow

Tensorflow is the "backend" of Keras: https://www.tensorflow.org/overview

In [None]:
print(tf.executing_eagerly())

In [None]:
x = tf.constant([1,2,3,4,5,6], dtype=tf.float32)
print(x, x.shape)
print(x.numpy())

In [None]:
print(tf.cast(x, tf.float64))
print(tf.cast(x, tf.int32))
#print(tf.cast(x, tf.string)) # error!
print(tf.strings.as_string(x))


In [None]:
for el in x:
  print(el)
  print(float(el))
  break

In [None]:
new_x = tf.reshape(x, [2,3])
print(new_x, new_x.shape)

In [None]:
# indexing
print(new_x[:, 0])
print(new_x[:, :])
print(new_x[0, 2])
print(new_x[-1, :])

In [None]:
tf.transpose(new_x, [1, 0])

In [None]:
tf.random.uniform(minval=0, maxval=1, shape=(3,4), dtype=tf.float32)

Check out also `tf.zeros`, `tf.ones`.

A `Variable` is a tensor with a state you can update

In [None]:
w = tf.Variable(x) # set x as initial value for w

In [None]:
print(w.assign(x + 1)) # + operator does broadcast
print(w.assign_add(x))
# print(w.assign_add(1)) # assign_add does not broadcast

## Compute Gradients

In [None]:
print(w)
with tf.GradientTape() as tape:
    tape.watch(w)
    w_squared =  tf.square(w)
    grad = tape.gradient(w_squared, w)
print(grad)

does `w` need to be a `Variable`?

In [None]:
print(x)
with tf.GradientTape() as tape:
    tape.watch(x) # what happens if you remove this line?
    x_squared =  tf.square(x)
    grad = tape.gradient(x_squared, x)
print(grad)

`Variable` is watched automatically (tensorflow supposes that you will be interested in that gradient)

In [None]:
print(w)
with tf.GradientTape() as tape:
    w_squared =  tf.square(w)
    grad = tape.gradient(w_squared, w)
print(grad)

Second-order derivatives??

In [None]:
# second derivatives
print(w)
with tf.GradientTape() as tape:
    with tf.GradientTape() as tape_inner:
        w_squared =  tf.square(w)
        grad = tape_inner.gradient(w_squared, w)
    print(grad)
    grad2 = tape.gradient(grad, w)
print(grad2)

### From eager to compiled

In [None]:
@tf.function  # python decorator
def compiled_function(x):
  y = x * 3
  print("Compiled tensor: ", y)
  return y

In [None]:
out = compiled_function(tf.Variable(tf.ones([2, 5], tf.int32)))
print("Eager result: ", out)

The compilation prevents the tensor in the compiled function to be printed.  
What is actually printed is the name of the node in the computational graph.

# Combining Keras and TF

Despite being a high-level library for DL, Keras offers you the possibility to customize different parts of your DL pipeline.  
Especially if you are willing to deal with TF.

In [None]:
class CustomDense(K.layers.Layer):
    def __init__(self, input_dim, units=64):
        super(CustomDense, self).__init__() # this call is needed to set up the Layer

        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(
            initial_value=w_init(shape=(input_dim, units), dtype="float32"),
            trainable=True) # this means that w will be updated

        b_init = tf.zeros_initializer()
        self.b = tf.Variable(
            initial_value=b_init(shape=(units,), dtype="float32"), 
            trainable=True)

    def call(self, inputs):
        """
        This method is automatically called when you call an instance of Linear
        e.g. out = linear(x)
        """
        preact = tf.matmul(inputs, self.w) + self.b
        postact = tf.nn.relu(preact)
        return postact

or equivalently

In [None]:
class CustomDense2(K.layers.Layer):

    def __init__(self, units=64):
      super(CustomDense2, self).__init__()
      self.units = units
      # we are not specifying input size here...
      # ring a bell?

    def build(self, input_shape):
      """
      Lazily called when an input is provided to the model
      """
      self.w = self.add_weight(
        shape=(input_shape[-1], self.units),
        initializer="random_normal",
        trainable=True)
        
      self.b = self.add_weight(
        shape=(self.units,), initializer="random_normal", trainable=True)

    def call(self, inputs):
      preact = tf.matmul(inputs, self.w) + self.b
      postact = tf.nn.relu(preact)
      return postact

**Many** Keras predefined `Layers`: https://keras.io/api/layers/  
**Many** TF activation functions: https://www.tensorflow.org/api_docs/python/tf/nn/

In [None]:
l = CustomDense(4)
x = tf.linspace(0, 20, 20)
out = l(tf.reshape(x, [5,4]))
print(out.shape)

In [None]:
l.weights

Ok... now what?

In [None]:
# Instantiate an optimizer.
optimizer = K.optimizers.SGD(learning_rate=1e-3)
criterion = K.losses.SparseCategoricalCrossentropy(from_logits=True)
model = CustomDense2()
for step, (x, y) in enumerate(dataset):

    with tf.GradientTape() as tape:

        logits = model(x)
        loss = criterion(y, logits)
    gradients = tape.gradient(loss, model.trainable_weights)
    # you can modify gradients before updating the model
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))


A `weight` with `training=False` will not appear in `trainable_weights` iterator. There is a specific `non_trainable_weights`.

**Exercise**: build a custom Multi Layer Perceptron (i.e. a feedforward neural network) by leveraging the modules we already created. Try and experiment with this model by training it on a dataset (either a Keras one or a fake one).

**Exercise**: try out an Autoencoder. Encoder and Decoder are both feedforward networks. Try to encode patterns of a dataset, decode them and see how much reconstruction error you got! 