We will use a famous library named TensorFlow (which is now available in version 2+) and in particular one of its subpackages, Keras, together with some utilities libraries like: 

- Matplotlib (for visualization);
- Numpy (to work with arrays);

## GPU Runtime

Neural Network training requires high parallel computation. 


In [1]:
# Utilities
import numpy as np
import matplotlib.pyplot as plt

# Deep Learning
import tensorflow as tf
from tensorflow import keras

# Check tensorflow version (must be >2!)
print(tf.__version__)

2024-08-11 10:56:46.899616: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1


2.4.1


## Read a dataset

To train a Neural Network model, we will need to load in memory a dataset. You can load it in lots of ways, depending on the time of data that you need.

we will use built-in data on keras. In particoular, we are interested in:

- Fashion-MNIST: It is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. 

we will download the dataset locally and experiment with it.
You can visualize the data [here](https://knowyourdata-tfds.withgoogle.com/#tab=STATS&dataset=fashion_mnist).

In [2]:
# Import keras dataset Fashion Mnist
from tensorflow.keras.datasets import fashion_mnist

# Load the data
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

# Check data dimensionality
print(f"Training set dimension: Input {x_train.shape}, Output {y_train.shape}")
print(f"Test set dimension: Input {x_test.shape}, Output {y_test.shape}")

Training set dimension: Input (60000, 28, 28), Output (60000,)
Test set dimension: Input (10000, 28, 28), Output (10000,)


We would like our input data to lies in the interval $[0, 1]$. If our data does not lies in this interval, we can transform it as:

$$
x' = \frac{x - x_{min}}{x_{max}-x_{min}}
$$

Where $x_{min} = \min(x)$, $x_{max} = \max(x)$. Note that $x'$ always lies in the interval $[0, 1]$.

In [3]:
#Normalization function
normalize_data = (lambda X: ((X - X.min()) / (X.max() - X.min())))

This operation is called normalization(min-max feature scaling) and allow us to work easily with the data, speeding up the computation and in some case improving even the results of the training.

In [4]:
print(f"Input (train) data lies in the interval [{x_train.min()}, {x_train.max()}]")
print(f"Input (test) data lies in the interval [{x_test.min()}, {x_test.max()}]")
    
x_train = normalize_data(x_train)
x_test = normalize_data(x_test)

# Check the interval after normalization
print("\n")
print(f"Input (train) data lies in the interval [{x_train.min()}, {x_train.max()}]")
print(f"Input (test) data lies in the interval [{x_test.min()}, {x_test.max()}]")

Input (train) data lies in the interval [0, 255]
Input (test) data lies in the interval [0, 255]


Input (train) data lies in the interval [0.0, 1.0]
Input (test) data lies in the interval [0.0, 1.0]


In [5]:
print(f"y[0]: {y_train[0]}")

# One hot encode the output variables (needed to compute the output)
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

n_classes = len(y_train[0])

print(f"y[0] after the one-hot encoding: {y_train[0]}")

y[0]: 9
y[0] after the one-hot encoding: [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]


## Build the first model!

Now that our data is ready, we can build our first model. As you studied, a Neural Network model is defined as a sequence of *Layers*, which is obtained by composing an Affine Transformation with a Non-Linear Activation function. 

The simplest possible layer is the **Dense** layer, which is the fully-connected layer describing the operation $\sigma(Ax + b)$, where $A, b$ are learnable parameters, $A$ is a full matrix, and $\sigma$ is the activation function. Since **Dense** layers applies to vectors (not images), we first need to flatten our data. This can be done either via the **Flatten** layer or via the **Reshape** layer. Moreover, every model must begin with an **Input** layer, that describes the type of data our model will expect as input.

### Summary
- Input: First Layer of the Network.
- Flatten: Utility Layer. It is used to flatten 3-dimensional data of the form $(d_1, d_2, c)$ to a 1-dimensional array of length $d_1 * d_2 * c$. 
- Reshape: Utility Layer. It reshape the input in the way you want, as long as the dimensions match.
- Dense: Basic Layer. It computes a generic Linear transform followed by a non-linear activation function.


In [6]:
# Import Layers from Keras
from tensorflow.keras.layers import Input, Dense, Flatten, Reshape
from tensorflow.keras.models import Model

In [7]:
# Sequential API

# Define the model
from tensorflow.keras.models import Sequential

model = Sequential([Flatten(),
                    Dense(units=64, activation='relu'),
                    Dense(units=10, activation='softmax')])

2024-08-11 11:26:09.798152: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2024-08-11 11:26:09.799796: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2024-08-11 11:26:09.838575: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-08-11 11:26:09.838792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3050 Ti Laptop GPU computeCapability: 8.6
coreClock: 1.485GHz coreCount: 20 deviceMemorySize: 3.80GiB deviceMemoryBandwidth: 178.84GiB/s
2024-08-11 11:26:09.838829: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2024-08-11 11:26:09.848970: I tensorflow/stream_execut

In [8]:
# Functional API 

"""
This implementation is equivalent to the implemantation above with the Sequential
API, but it is written using Functional API.
"""

def get_model(input_shape, output_shape):
    d1, d2 = input_shape
    # Define the model by concatenating Layers
    x = Input(shape=(d1, d2))

    #h = Flatten()(x)
    h = Reshape((d1*d2,), input_shape=(d1, d2))(x)
    h = Dense(units=64, activation='relu')(h)

    y = Dense(units=output_shape, activation='softmax')(h)

    # Get the model
    model = Model(x, y)

    # Visualize summary of the newly created Model
    model.summary()
    return model

### Why you should use the functional model?

For this small use case both the Sequential and Functional implementations of the model are correct and apparently equivalent. The former is easier to implement, since it is only required to define the ordering of the layers one after the other and Tensorflow will concatenate them to build the Neural Network. The Functional API instead is harder since it requires to define not only the list of the Layers, but also the relationship between them. On the other hand, the Functional API will allow to define architecture with complex relationship between layers (e.g. skip connections), which is impossible while using Sequential API.
One example of the use of skip connections are the Residual Networks. Resnet were proposed by [He et al.](https://arxiv.org/pdf/1512.03385.pdf) in 2015 to solve the image classification problem. In ResNets, the information from the initial layers is passed to deeper layers by matrix addition. This operation doesn’t have any additional parameters as the output from the previous layer is added to the layer ahead. A single residual block with skip connection looks like this:

![Resnet](https://www.researchgate.net/profile/Olarik-Surinta/publication/330750910/figure/fig1/AS:720960386781185@1548901759330/Illustration-of-the-residual-learning-between-a-plain-network-and-b-a-residual.ppm)