# Lecture 2 - TensorFlow and Keras 

## TensorFlow
*TensorFlow* is a Python-based, free, open source machine learning platform. It was released in November 2015. TensorFlow is much more than a single library. It’s really a platform, home to a vast ecosystem of components, some developed by Google and some developed by third parties.  Much like *NumPy*, the primary purpose of TensorFlow is to facilitate manipulating mathematical expressions over numerical tensors. But TensorFlow goes  beyond the scope of NumPy in the following ways:
- It can automatically compute the gradient of any differentiable expressions.
- It can run on CP on GPUs and TPUs.
- Computation defined in TensorFlow can be easily distributed across many machines.
- TensorFlow programs can be exported to other runtimes, such as C++, JavaScript, or TensorFlow Lite.

### TensorFlow APIs
TensorFlow enables us to do low-level tensor manipulation. TensorFlow APIs:
- Tensors, including special tensors that store the network’s state (variables)
- Tensor operations such as addition, relu, matmul
- Backpropagation, a way to compute the gradient of mathematical expressions (handled in TensorFlow via the GradientTape object).

### Constant tensors and variables

In [None]:
import tensorflow as tf
import numpy as np

#### All-ones or all-zeros tensors

In [None]:
x = tf.ones(shape=(2, 1))
y = np.ones((2,1))
print("x=",x)
print("y=",y)

In [None]:
x = tf.zeros(shape=(2, 4, 3))
y = np.zeros(shape=(2, 4, 3))
x,y

#### Random tensors

In [None]:
x = tf.random.normal(shape=(3, 4), mean=0., stddev=1.)
x

In [None]:
x = tf.random.uniform(shape=(4, 6), minval=-10, maxval=10.)
x

#### TensorFlow tensors aren’t assignable: they’re constant.  To manage modifiable state in TensorFlow we need class tf.Variable.

In [None]:
x = np.ones(shape=(2, 2))
x[0, 0] = 0

In [None]:
x = tf.ones(shape=(2, 2))
x[0, 0] = 0.

In [None]:
v = tf.Variable(initial_value=tf.random.normal(shape=(2, 3)))
v

In [None]:
v.assign(tf.ones((2, 3)))
v

In [None]:
v[0, 0].assign(0)
v

In [None]:
v.assign_add(tf.ones((2, 3)))
v

### Tensor operations

In [None]:
a = tf.Variable(initial_value=[[1.,2.],[3.,4.]])
a

In [None]:
b = tf.square(a)
b

In [None]:
c = tf.sqrt(a)
c

In [None]:
d = b + c
d

In [None]:
e = tf.matmul(b,c)
e

In [None]:
e *= d
e

### The use of tensorflow for MNIST dataset

In [None]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255  
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255 

#### A DENSE CLASS

In [None]:
import tensorflow as tf
  
class NaiveDense:
    '''creates two TensorFlow variables, W and b, and exposes a __call__() method that applies the preceding transformation'''
    def __init__(self, input_size, output_size, activation):
        self.activation = activation
        #Create a matrix, W, of shape (input_size, output_size), initialized with random values
        w_shape = (input_size, output_size)                                
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)
        #Create a vector, b, of shape (output_size,), initialized with zeros
        b_shape = (output_size,)                                          
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)
  
    #Apply the forward pass:
    def __call__(self, inputs):                                          
        return self.activation(tf.matmul(inputs, self.W) + self.b)
  
    #Convenience method for retrieving the layer’s weights:
    @property
    def weights(self):                                                     
        return [self.W, self.b]

#### A SEQUENTIAL CLASS

In [None]:
class NaiveSequential:
    def __init__(self, layers):
        '''wraps a list of layers'''
        self.layers = layers

    def __call__(self, inputs):
        '''calls the underlying layers on the inputs, in order'''
        x = inputs
        for layer in self.layers:
           x = layer(x)
        return x

    @property 
    def weights(self):
        '''keepa track of the layers’ parameters'''
        weights = []
        for layer in self.layers:
           weights += layer.weights
        return weights

Using this `NaiveDense` class and this `NaiveSequential` class, we can create a Keras model:

In [None]:
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
]) 
assert len(model.weights) == 4 

#### A BATCH GENERATOR

In [None]:
import math
  
class BatchGenerator:
    def __init__(self, images, labels, batch_size=128):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)
 
    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

#### Running one training step

In [None]:
def one_training_step(model, images_batch, labels_batch):
    #Run the “forward pass” (compute the model’s predictions under a GradientTape scope):
    with tf.GradientTape() as tape:                                         
        predictions = model(images_batch)                                   
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(labels_batch, predictions)                                      
        average_loss = tf.reduce_mean(per_sample_losses)                    
    gradients = tape.gradient(average_loss, model.weights)#Compute the gradient of the loss with regard to the weights                
    update_weights(gradients, model.weights)#Update the weights using the gradients                                
    return average_loss

In [None]:
learning_rate = 1e-3 
  
def update_weights(gradients, weights):
    for g, w in zip(gradients, weights):
        w.assign_sub(g * learning_rate) #assign_sub is the equivalent of -= for TensorFlow variables

We could use an Optimizer instance from Keras:

#### The full training loop:

In [None]:
def fit(model, images, labels, epochs, batch_size=128):
    for epoch_counter in range(epochs):
        print(f"Epoch {epoch_counter}")
        batch_generator = BatchGenerator(images, labels)
        for batch_counter in range(batch_generator.num_batches):
            images_batch, labels_batch = batch_generator.next()
            loss = one_training_step(model, images_batch, labels_batch)
            if batch_counter % 100 == 0:
                print(f"loss at batch {batch_counter}: {loss:.2f}")

In [None]:
fit(model, train_images, train_labels, epochs=10, batch_size=128)

#### Evaluating the model:

In [None]:
predictions = model(test_images)
predictions = predictions.numpy()#converts it to a NumPy tensor
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {matches.mean():.2f}")

## Keras
*Keras* was released in March 2015. It is a deep learning API for Python that provides a convenient way to define and train any kind of deep learning model. Keras was initially developed for research, with the aim of enabling fast deep learning experimentation. 

Keras was originally built on top of *Theano*, another tensor-manipulation library that provided automatic differentiation and GPU support—the earliest of its kind. Theano, developed at the Montréal Institute for Learning Algorithms (MILA) at the Université de Montréal, was a precursor of TensorFlow. It pioneered the idea of using static computation graphs for automatic differentiation and for compiling code to both CPU and GPU. In late 2015, after the release of TensorFlow, Keras was refactored to a multibackend architecture: it became possible to use Keras with either Theano or TensorFlow. By September 2016, TensorFlow had reached a level of technical maturity where it became possible to make it the default backend option for Keras.

### Properties of Keras:
- Through TensorFlow, Keras can run on top of different types of hardware - GPU, TPU, or CPU-and can be seamlessly scaled to thousands of machines.
- Keras offers consistent and simple workflows, it minimizes the number of actions required for common use cases, and it provides clear and actionable feedback upon user error. This makes Keras easy to learn for a beginner, and highly productive to use for an expert.
- Keras is used by academic researchers, engineers, and data scientists at Google, Netflix, Uber, CERN, NASA, Yelp, Instacart, Square, and hundreds of startups working on a wide range of problems across every industry.
- Keras enables a wide range of different workflows, from the very high level to the very low level, corresponding to different user profiles.inux (WSL).

### Keras APIs
Keras APIs are used for high-level deep learning concepts:
- Layers, which are combined into a model
- A loss function, which defines the feedback signal used for learning
- An optimizer, which determines how learning proceeds
- Metrics to evaluate model performance, such as accuracy
- A training loop that performs mini-batch stochastic gradient descent

## Practical issues concerning workspace
It’s highly recommended to run deep learning code on a modern NVIDIA GPU rather than a computer’s CPU. Some applications—in particular, image processing with convolutional networks—will be excruciatingly slow on CPU, even a fast multicore CPU. There are three options to do deep learning on a GPU:

Use the free GPU runtime from Colaboratory https://colab.research.google.com.
Use GPU instances on Google Cloud or AWS EC2.
Buy and install a physical NVIDIA GPU on your workstation.
Colaboratory is the easiest way to get started, as it requires no hardware purchase and no software installation. However, the free version of Colaboratory is only suitable for small workloads. Running deep learning experiments in the cloud is a simple, low-cost way to move to larger workloads without having to buy any additional hardware. Nevertheless, this setup isn’t sustainable in the long term—or even for more than a few months. For heavy users of deep learning, setting up a local workstation with one or more GPUs is the best solution.

Moreover, it’s better to be using a Unix workstation. Although it’s technically possible to run Keras on Windows directly, it is not recommended. To do deep learning on Windows workstation, the simplest solution is to set up an Ubuntu dual boot, or to leverage Windows Subsystem for Linux (WSL).



## Layer
is a fundamental data structure in neural networks. It is a data processing module that takes as input one or more tensors and that outputs one or more tensors. Some layers are stateless, but more frequently layers have a state: the layer’s weights, one or several tensors learned with stochastic gradient descent, which together contain the network’s knowledge.

Different types of layers are appropriate for different tensor formats and different types of data processing:
- simple vector data, stored in rank-2 tensors of shape (samples, features), is often processed by densely connected layers  (the Dense class in Keras);
- sequence data, stored in rank-3 tensors of shape (samples, timesteps, features), is typically processed by recurrent layers, such as an LSTM layer, or 1D convolution layers (Conv1D)
- image data, stored in rank-4 tensors, is usually processed by 2D convolution layers (Conv2D).

A Layer is an object that encapsulates some state (weights) and some computation. The weights are typically defined in a build() (although they could also be created in the constructor, __init__()), and the computation is defined in the call() method.

In [None]:
from tensorflow import keras

class SimpleDense(keras.layers.Layer):#All Keras layers inherit from the base Layer class

    def __init__(self, units, activation=None):
        '''constructor'''
        super().__init__()
        self.units = units
        self.activation = activation

    def build(self, in_shape):
        '''Weight creation'''
        input_dim = in_shape[-1]
        self.W = self.add_weight(shape=(input_dim, self.units), initializer="random_normal")
        self.b = self.add_weight(shape=(self.units,), initializer="zeros")

    def call(self, inputs):
        '''the forward pass computation '''
        y = tf.matmul(inputs, self.W) + self.b
        if self.activation is not None:
            y = self.activation(y)
        return y

Once instantiated, a layer like this can be used just like a function, taking as input a TensorFlow tensor:

In [None]:
my_dense = SimpleDense(units=32, activation=tf.nn.relu)
input_tensor = tf.ones(shape=(2, 784))
output_tensor = my_dense(input_tensor)
print(my_dense.units)
print(my_dense.W)
print(my_dense.b)
print(output_tensor)

When using Keras, we don’t have to worry about size compatibility most of the time, because the layers we add to models are dynamically built to match the shape of the incoming layer. Thus, the following model

```python
model = NaiveSequential([
    NaiveDense(input_size=784, output_size=32, activation="relu"),
    NaiveDense(input_size=32, output_size=64, activation="relu"),
    NaiveDense(input_size=64, output_size=32, activation="relu"),
    NaiveDense(input_size=32, output_size=10, activation="softmax")
])
```

is equivalent to

```python
model = keras.Sequential([
    SimpleDense(32, activation="relu"),
    SimpleDense(64, activation="relu"),
    SimpleDense(32, activation="relu"),
    SimpleDense(2, activation="softmax")
])
```

## Model
is a graph of layers and in Keras is represented by the Model class. The following are the most common network topologies:
- Sequential models
- Two-branch networks
- Multihead networks
- Residual connections

The topology of a model defines a hypothesis space. By choosing a network topology, we constrain our space of possibilities (hypothesis space) to a specific series of tensor operations, mapping input data to output data. Thus, the architecture of our model is extremely important. However, picking the right network architecture is more an art than a science, and although there are some best practices and principles we can rely on, only practice can help become a proper neural-network architect.

## Configuration of the learning process
Once the model architecture is defined, we still have to choose three more things:
- *Loss function* (objective function) — The quantity that will be minimized during training. It represents a measure of success for the task at hand.
- *Optimizer* — Determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent (SGD).
- *Metrics* — The measures of success we want to monitor during training and validation, such as classification accuracy. Unlike the loss, training will not optimize directly for these metrics. As such, metrics don’t need to be differentiable.

When loss, optimizer, and metrics are picked, we can use the built-in compile() and fit() methods to start training your model. The `compile()` method configures the training process and takes the arguments `optimizer`, `loss`, and `metrics`. For instance, we can configure the learning process as follows:

In [None]:
model = keras.Sequential([keras.layers.Dense(1)])
model.compile(optimizer="rmsprop",
              loss="mean_squared_error",
              metrics=["accuracy"])

or equivalently

In [None]:
model.compile(optimizer=keras.optimizers.RMSprop(),
              loss=keras.losses.MeanSquaredError(),
              metrics=[keras.metrics.BinaryAccuracy()])

We can also pass a learning_rate argument to the optimizer

```python
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-4),
              loss=my_custom_loss,
              metrics=[my_custom_metric_1, my_custom_metric_2])
```

Generally, we won’t have to create our own losses, metrics, or optimizers from scratch, because Keras offers a wide range of built-in options:
- Optimizers:
    - SGD (with or without momentum)
    - RMSprop
    - Adam
    - Adagrad
- Losses:
    - CategoricalCrossentropy
    - SparseCategoricalCrossentropy
    - BinaryCrossentropy
    - MeanSquaredError
    - KLDivergence
    - CosineSimilarity
- Metrics:
    - CategoricalAccuracy
    - SparseCategoricalAccuracy
    - BinaryAccuracy
    - AUC
    - Precision
    - Recall

Choosing the right loss function for the right problem is extremely important: our network will take any shortcut it can to minimize the loss, so if the objective doesn’t fully correlate with success for the task at hand, our network will end up doing things we may not have wanted. Fortunately, when it comes to common problems such as classification, regression, and sequence prediction, there are simple guidelines we can follow to choose the correct loss. For instance, we’ll use binary crossentropy for a two-class classification problem, categorical crossentropy for a many-class classification problem.

After `compile()` comes `fit()`. The `fit()` method implements the training loop itself. These are its key arguments:
- The data (inputs and targets) to train on. It will typically be passed either in the form of NumPy arrays or a TensorFlow Dataset object. 
- The number of epochs to train for: how many times the training loop should iterate over the data passed.
- The batch size to use within each epoch of mini-batch gradient descent: the number of training examples considered to compute the gradients for one weight update step. 

In [None]:
negative_samples = np.random.multivariate_normal(   
    mean=[0, 3],                                    
    cov=[[1, 0.5],[0.5, 1]],                        
    size=1000) 
positive_samples = np.random.multivariate_normal(   
    mean=[3, 0],                                    
    cov=[[1, 0.5],[0.5, 1]],                        
    size=1000) 
inputs = np.vstack((negative_samples, positive_samples)).astype(np.float32)
targets = np.vstack((np.zeros((1000, 1), dtype="float32"),
                     np.ones((1000, 1), dtype="float32")))

In [None]:
history = model.fit(
    inputs,          
    targets,         
    epochs=100,        
    batch_size=128
)

In [None]:
history.history

## Validation data
The goal of machine learning is to obtain models that perform well in general, and particularly on data points that the model has never encountered before. Therefore it’s standard practice to reserve a subset of the training data as validation data. It is essential to keep the training data and validation data strictly separate.

In [None]:
model = keras.Sequential([keras.layers.Dense(1)])
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=0.1),
              loss=keras.losses.MeanSquaredError(),
              metrics=[keras.metrics.BinaryAccuracy()])

indices_permutation = np.random.permutation(len(inputs))
shuffled_inputs = inputs[indices_permutation]
shuffled_targets = targets[indices_permutation]

num_validation_samples = int(0.3 * len(inputs))
val_inputs = shuffled_inputs[:num_validation_samples]
val_targets = shuffled_targets[:num_validation_samples]
training_inputs = shuffled_inputs[num_validation_samples:]
training_targets = shuffled_targets[num_validation_samples:]
model.fit(
    training_inputs,
    training_targets,
    epochs=5,
    batch_size=16,
    validation_data=(val_inputs, val_targets)
)

To compute the validation loss and metrics after the training is complete, we can call the `evaluate()` method

In [None]:
loss_and_metrics = model.evaluate(val_inputs, val_targets, batch_size=128)
loss_and_metrics

## Inference
is the use of a model after training, in particular, to make predictions on new data. To do this, a naive approach would simply be to `__call__()` the model: `predictions = model(new_inputs)`. However, a better way to do inference is to use the `predict()` method since:
- it will iterate over the data in small batches and return a `NumPy` array of predictions
- unlike `__call__()`, it can also process TensorFlow Dataset objects.

In [None]:
import matplotlib.pyplot as plt
predictions = model.predict(val_inputs, batch_size=128)
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.scatter(val_inputs[:, 0], val_inputs[:, 1], c=val_targets)
ax2.scatter(val_inputs[:, 0], val_inputs[:, 1], c=predictions[:, 0] > 0.5)