<h1 class="intro_title" style="text-align:center; font-size: 45px;">Python course 2021</h1>
<h2 class="intro_subtitle" style="text-align:center; font-size: 30px;">Introduction to machine learning<br/> with Keras</h2>

<img class="intro_logo" style="width:400px" src="https://static.poul.org/assets/logo/logo_text_g.svg" alt="POuL logo"/>

<p class="intro_author" style="text-align: center; font-size: 18px;">Roberto Bochet &lt;avrdudo@poul.org&gt;</p>

## What is machine learning?
##### (as basic as possible)
<small style="font-size: 0.5em;">Engineers, mathematicians and scientists have mercy of me!</small>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import math
np.random.seed(0)

In [None]:
f = lambda x: np.real(-0.5j*(math.e**(1j*x) - math.e**(-1j*x)) + \
              0.15*(math.e**(10j*x) + math.e**(-10j*x))) + \
              1e-4*np.e**x + np.random.normal(0,0.1,len(x))
x = np.random.uniform(-5,5, 200)
y = f(x) 
plt.title("What is this?")
plt.plot(x, y, "o");

In [None]:
import scipy as sp
from scipy import signal
x_fin = np.arange(-5, 5, 0.01)
f_tri = lambda x: 2*np.abs(sp.signal.sawtooth(x - np.pi/2)) - 1
plt.title("Is it a triangle wave?")
plt.plot(x, y.real, "o")
plt.plot(x_fin, f_tri(x_fin), "g");

In [None]:
f_sin = lambda x: np.sin(x)
plt.title("...or is it a sine?")
plt.plot(x, y.real, "o")
plt.plot(x_fin, f_sin(x_fin), "r");

In [None]:
f_poly = lambda x: x - x**3/6 + x**5/120 - x**7/5040 + x**9/362880 - x**11/39916800
plt.title("...maybe a polynomial?")
plt.plot(x, y, "o")
plt.plot(x_fin, f_poly(x_fin), "m");

#### What are `triangle wave`, `sine` or `polynomial`?
##### (Recap)

We had some data in the form of tuple `(x,y)`

We notice that there is a kind of relation between `x` and `y`, they are not random (mostly)

So, we asked ourselves what value `y` assumes given a generic value of `x` (not presents in the orginal dataset)

We answered with some mathematical functions which seems approximate the data quite well

#### So, from which among the suggested functions the dataset are generated?

### Short answer: From nothing of them

In a real scenario is unrealistic to completely identify the "real process" behind a dataset

A **mathematical system can provides nothing more than an approximation of a real system**  
and this is true for all the real system

### The mathematical functions  
### we considered are called **mathematical models**
an alternative to mathematical models could be the **physical models**

So, a rasonable question we should answer could be  
**"Which mathematical model approximates better the behaviour of our real system?"**

# Machine learning

>is the study of **computer algorithms** that  
    improve **automatically** through **experience**  
    and by the use of data.  
>
>    &#91;...&#93;  
>
>Machine learning algorithms build a model based  
    on sample data, &#91;...&#93; in order to make **prediction**  
    or **decisions** without being  
    **explicitly programmed to do so**.  
>
>[from wikipedia](https://en.wikipedia.org/wiki/Machine_learning)

## ML branches

ML splits itself in three macro areas

*(incredible simplified summary)*

### Reinforced learning
The model is trained like you would do with a pet:  
it does a good job it is rewarded,  
it does a mistake it is punished.

The model should try to maximize the reward and avoid the punishes,  
consequentially it would learn to do a good job without makes mistakes.

Some applications:  
[songs suggestion](https://medium.com/analytics-vidhya/emotion-based-music-recommendation-system-using-a-deep-reinforcement-learning-approach-6d23a24d3044),
[autonomous drive](https://towardsdatascience.com/do-you-want-to-train-a-simplified-self-driving-car-with-reinforcement-learning-be1263622e9e)

### Supervised learning
To the model is provided the input data and the result we would expect from it.

The model should learn and generalize the relation between input and output,  
so that given a never seen input it can be provided a reasonable output.

Some applications:  
[text translation](https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571),
[image classification](https://developers.google.com/machine-learning/practica/image-classification)

### Unsupervised learning
To the model are provided only the input data, without what we want aspect as output,  
will be the model that will identify scheme and recurrences in the data.

Some applications:  
[paints style transfer](https://github.com/jcjohnson/neural-style),
[words embedding](https://nlp.stanford.edu/projects/glove/)

It could be that for complex problems they are used together.

#### However today we will talk olny about **supervised learning**!

# Feed-forward Neural Network
is a really simple model inspired by the functioning of the brain

## Let us see how to compose it

### Neuron
is the basic unit that composed the FNN
![a neuron](./images/neuron.svg)

Let us start with a really simple model, a linear combination of the input

$ x = w_0 + w_1 u_1 + w_2 u_2 + \dots + w_k u_k $

*where $w_0$ is a parameter called bias, $u_i$ is the i-th input and $w_i$ is an arbitrary multiplication factor*

So, the input data are linearly combined to get the value $x$ 

> The function could be seen (with $x=0$) as an equation  
    defining a k-dimension [hyperplane](https://en.wikipedia.org/wiki/Hyperplane) of parameters $w_0, w_1, \dots, w_k$

Then we can transform the value $x$ to get an output exploiting an arbitrary function

$y = g(x)$

Where $g(\cdot)$ is called [**activation function**](https://en.wikipedia.org/wiki/Activation_function) (and it is generally non-linear one)

In [None]:
import tensorflow.keras as kr
x_act = np.arange(-6,6,0.01)
fig, axs = plt.subplots(2,2)
fig.suptitle("Some exmples of activatcion functions")
axs[0,0].set_title("Linear")
axs[0,0].plot(x_act, kr.activations.linear(x_act))
axs[0,1].set_title("tanh")
axs[0,1].plot(x_act, kr.activations.tanh(x_act))
axs[1,0].set_title("Sigmoid")
axs[1,0].plot(x_act, kr.activations.sigmoid(x_act))
axs[1,1].set_title("ReLU")
axs[1,1].plot(x_act, kr.activations.relu(x_act));

$ x = w_0 + w_1 u_1 + w_2 u_2 + \dots + w_k u_k $  
$ y = g(x) $

This couple of equations define entirely the concept of **Neuron** for the **FNN**

A single **neuron** defines the whole model of the simplest possible **FNN** at the base of the [**Perceptron**](https://en.wikipedia.org/wiki/Perceptron), a supervised algorithm invented in 1958 by [*Frank Rosenblatt*](https://en.wikipedia.org/wiki/Frank_Rosenblatt).

It composed a binary classifier:  
given an input it decided if this was part of a first class or a second one  
(you are in or you are out)

### Layer
Anyway a lone Neuron is rather useless, so they are composed in structure called layer.
An arbitrary number of neuron can be arranged side by side, in order to create a layer with $m$ output, where $m$ is the number of Neuron in the layer.

This kind of layer is called **Dense layer** or **Fully-connected layer**

![a single layer FNN](./images/layer.svg)

Layers, in turn, can be stacked in order to improve the complexity of the final model.

A single **Neuron** of a **Dense layer** has as inputs all the output of the previously layer  
(from here the name **Fully-connected layer**) 

![a multi layers FNN](./images/multi_layers.svg)

n.b. the propagation of signals go only from the input to the output, **there are not loops**!  
From here the name **Feed-forward Neural Network**!

A **FNN** model is defined by the **number of inputs**, the **number of layers**, the **number of neurons for layer** (each layer can have a difference number of them) and the **neurons' activation functions**, we called these parameters [**Hyperparameters**](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)).

And by the values **$w_{i,j}$** which are called **weights**.

Defined the **Hyperparameters** we defined a **FNN** working model, but it is a dummy!  
##### An Hardware without a Software!

## How to "train" a FNN?

Up to now we only saw a **mathematical model**, based on arranging an arbitrary number of **Neurons**, but we still have no idea to how use the data to **teach** a **behavior** to our models.

The **FNN** training is treated (as always) as an optimization problem.

So, after defined the **Hyperparameters** we should ask ourselves with values of the **weights** are the best possible ones to represent the behavior of our "real system".

# Our first FNN
As first experiment, we resume the first dataset we saw at the start of this talk  
and let us try to build, train and validate a FNN on it.

### First step, split dataset in 

In [None]:
import tensorflow.keras as kr

model = kr.Sequential()

model.add(kr.layers.InputLayer(input_shape=(1,)))
model.add(kr.layers.Dense(30, activation=kr.activations.sigmoid))
model.add(kr.layers.Dense(10, activation=kr.activations.sigmoid))
model.add(kr.layers.Dense(1, activation=kr.activations.linear))

model.summary()

In [None]:
model.compile(
    optimizer=kr.optimizers.Adam(learning_rate=1e-3),
    loss=kr.losses.mean_squared_error,
    #metrics=[kr.losses.mean_squared_error]
)

In [None]:
import sklearn as skl
import sklearn.model_selection as ms
train_x, val_x, train_y, val_y = skl.model_selection.train_test_split(x, y,
                                                                      train_size=0.7,
                                                                      random_state=0)

In [None]:
plt.title("split training and validation")
plt.plot(train_x, train_y, "o")
plt.plot(val_x, val_y, "om")
plt.show()

In [None]:
model.fit(train_x, train_y, #batch_size=200,
          validation_data=(val_x, val_y), epochs=750)

In [None]:
x_pred = np.arange(-5, 5, 0.01)
y_pred = model.predict(x_pred)

plt.title("split training and validation")
plt.plot(x, y, "o")
plt.plot(x_pred, y_pred, "r")
plt.show()

<h1 class="outro_title" style="text-align:center; font-size: 35px;">Thank you!</h1>

<img class="outro_logo" style="width: 20%;" src="https://static.poul.org/assets/logo/logo_g.svg" alt="POuL logo">

<a class="outro_license" style="display: block; margin: 20px auto;" rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a>
<p class="outro_license_text" style="font-size: 15px; text-align: center;">Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International</p>

<p class="outro_author" style="text-align: center; font-size: 18px;">Roberto Bochet &lt;avrdudo@poul.org&gt;</p>