![ACM SIGCHI Summer School on Computational Interaction  
Inference, optimization and modeling for the engineering of interactive systems  
13th June - 18th June 2022  
Saarland University in Saarbrücken, Germany](imgs/header.png)

# <font face="gotham" color="Brown">  Getting started with  </font>


![PyTorch](https://keras.io/img/logo.png)




Official resources:
* [Getting started ](https://keras.io/getting_started/)
* [Introduction to Keras for engineers](https://keras.io/getting_started/intro_to_keras_for_engineers/)
* [Introduction to Keras for researchers.](https://keras.io/getting_started/intro_to_keras_for_researchers/)


Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library.
![keras](imgs/tf.keras.jpeg)


## 1. Basic concepts

Tensors
--------------------------------------------

Tensors are a specialized data structure that are very similar to arrays
and matrices. We use tensors to encode the inputs and
outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on
GPUs or other specialized hardware to accelerate computing. 



In [2]:
import tensorflow as tf
from tensorflow import keras
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

Let's take a look at the object that is at the core of TensorFlow: the Tensor.

Here's a constant tensor:

In [3]:
x = tf.constant([[5, 2], [1, 3]])
print(x)

tf.Tensor(
[[5 2]
 [1 3]], shape=(2, 2), dtype=int32)


You can get its value as a NumPy array by calling .numpy():

In [4]:
x.numpy()

array([[5, 2],
       [1, 3]], dtype=int32)

Much like a NumPy array, it features the attributes dtype and shape:

In [5]:
print("dtype:", x.dtype)
print("shape:", x.shape)

dtype: <dtype: 'int32'>
shape: (2, 2)


A common way to create constant tensors is via tf.ones and tf.zeros (just like np.ones and np.zeros):

In [6]:
print(tf.ones(shape=(2, 1)))
print(tf.zeros(shape=(2, 1)))

tf.Tensor(
[[1.]
 [1.]], shape=(2, 1), dtype=float32)
tf.Tensor(
[[0.]
 [0.]], shape=(2, 1), dtype=float32)


You can also create random constant tensors:

In [7]:
x = tf.random.normal(shape=(2, 2), mean=0.0, stddev=1.0)

x = tf.random.uniform(shape=(2, 2), minval=0, maxval=10, dtype="int32")

## Variables

Variables are special tensors used to store mutable state (such as the weights of a neural network). You create a Variable using some initial value:

In [8]:
initial_value = tf.random.normal(shape=(2, 2))
a = tf.Variable(initial_value)
print(a)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[ 1.6822869 ,  0.5285346 ],
       [-0.19078438,  0.78927237]], dtype=float32)>


You update the value of a Variable by using the methods ```.assign(value)```, ```.assign_add(increment)```, or ```.assign_sub(decrement)```:

In [9]:
new_value = tf.random.normal(shape=(2, 2))
a.assign(new_value)
for i in range(2):
    for j in range(2):
        assert a[i, j] == new_value[i, j]

added_value = tf.random.normal(shape=(2, 2))
a.assign_add(added_value)
for i in range(2):
    for j in range(2):
        assert a[i, j] == new_value[i, j] + added_value[i, j]

## Doing math 

If you've used NumPy, doing math in TensorFlow will look very familiar. The main difference is that your TensorFlow code can run on GPU and TPU.

In [10]:
a = tf.random.normal(shape=(2, 2))
b = tf.random.normal(shape=(2, 2))

c = a + b
d = tf.square(c)
e = tf.exp(d)

 # 2. Building a Neural Network

## 2.1. Data


In this tutorial we will see how to use the Keras frmaework to implement a multiclass classifier on a popular dataset called MNIST.  




[MNIST](http://yann.lecun.com/exdb/mnist/) dataset has 70k small grayscale images.

![MNIST](imgs/mnist.png)

## Load the data

In [15]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

## 2.2. Build the model¶

To build our model in keras, we use the [tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model by stacking layers. 



In [18]:
from keras.models import Sequential

Define the model: Choose network architecture.

In [19]:

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), # Images are 28x28 px
  tf.keras.layers.Dense(64, activation='relu'), # Hidden layer
  tf.keras.layers.Dense(64, activation='relu'),                  # Normalization layer
  tf.keras.layers.Dense(10, activation='softmax')# There are 10 classes
])

First layer takes in 28x28, because our images are 28x28 images of hand-drawn digits. A basic neural network is going to expect to have a flattened array, so not a 28x28, but instead a 1x784.

We used relu(rectified linear unint) as activation function. 

Basically, these activation functions are keeping our data scaled between 0 and 1.

Finally, for the output layer, we used softmax. Softmax makes sense to use for a multi-class problem, where each thing can only be one class or the other. This means the outputs themselves are a confidence score, adding up to 1

There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:
- The number of hidden neurons should be between the size of the input layer and the size of the output layer.
- The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
- The number of hidden neurons should be less than twice the size of the input layer.

Moreover, the number of neurons and number layers required for the hidden layer also depends upon training cases, the complexity of, data that is to be learned, and the type of activation functions used.


## 2.3. Compile model: 

Choose optimizer, loss function, and optionally a monitoring metric.

In [20]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate= 0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

* Our ```loss_function``` is what calculates "how far off" our classifications are from reality.
It is a measurement of how far off the neural network is from the targeted output.

* ```optimizer``` adjusts our model's adjustable parameters like the weights, to slowly, over time, fit our data.

* The learning rate dictates the magnitude of changes that the optimizer can make at a time. Thus, the larger the LR, the quicker the model can learn, but also you might find that the steps you allow the optimizer to make are actually too big and the optimizer gets stuck bouncing around rather than improving. Too small, and the model can take much longer to learn as well as also possibly getting stuck. Indeed a too small value will require a very large number of epochs to converge while the algorithm might not converged by setting a too large value.


![LR](https://deeplearningmath.org/images/learning_rate_choice.png)

Moreover, it is not recommended to use a constant learning rate. Indeed, even if a large value can help the algorithm to arrive quickly to a good solution, then it might oscillate around this state for a long time or diverge if the learning rate is maintained. A solution is to allow the learning rate to decay over time.

> For simpler tasks, a learning rate of 0.001 usually is more than fine. For more complex tasks, you will see a learning rate with what's called a decay. Basically you start the learning rate at something like 0.001, or 0.01...etc, and then over time, that learning rate gets smaller and smaller. The idea being you can initially train fast, and slowly take smaller steps, hopefully getthing the best of both worlds.

A common approach is to half the learning rate every 5 epochs, or by 0.1 every 20 epochs. A proposed heuristic is to track the validation error while training with a fixed learning rate, and if the validation error stops improving then reduce the learning rate by a constant (e.g. 0.5).

## 2.4. Training the network

### Epoch

![Epoch](https://miro.medium.com/max/1024/1*cDhZ56QNC5mrl6kjE0C2JA.png)

In deep learning an epoch is a [hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) which is defined before training a model. In other words, one epoch is when an entire dataset is passed both forward and backward through the neural network only once.

The reason why we have to split the training step by epochs is decrease the amount of data we feed to the computer at once. So, we divide it in several smaller batches. 

We use more than one epoch because passing the entire dataset through a neural network is not enough and we need to pass the full dataset multiple times to the same neural network. But since we are using a limited dataset we can do it in an iterative process. A batch is the total number of training examples present in a single batch and an iteration is the number of batches needed to complete one epoch.

**Example**: 

If we divide a dataset of 2000 training examples into 500 batches, then 4 iterations will complete 1 epoch.

* Too few epochs, and your model wont learn everything it could have.

* Too many epochs and your model will over fit to your in-sample data (basically memorize the in-sample data, and perform poorly on out of sample data).

Let's go with 5 epochs for now. So we will loop over epochs, and each epoch will loop over our data

In [21]:
model.fit(X_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x172988490>

## 2.5. Test the network on the test data


In [22]:
# Evaluate the model on the test data using `evaluate`
print("Evaluate on test data")
results = model.evaluate(X_test, y_test, batch_size=128)
print("test loss, test acc:", results)

Evaluate on test data
test loss, test acc: [0.27070385217666626, 0.932699978351593]


The image classifier is now trained to ~94% accuracy on this dataset.

## 2.6.  Saving the model

Model training usually takes a lot of time, so once the model is trained it is smart to save it.

In [None]:
model.save('Models/mnist.pth')

## 🏁 3. Conclusion

Now, you know:

1. a popular deep learning framework: Keras,
2. the basic building blocks of deep learning,
3. how to load data and define a Neural Network, 
4. how to train and test a Neural Network using Keras.