In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers

### __Deep Learning workflow:__
<font size=3>
    
1. Import and data pre-processing;   
2. Neural network modeling;
3. Model compilation;
4. Train and validation;
5. Final training;
6. Test evaluation;
7. Saving the model.

### __1. Import and data pre-processing:__
<font size=3>
    
1.1 Import data;\
1.2 Data visualization;\
1.3 Feature engineering;\
1.4 Data shuffling;\
1.5 Train, validation, and test tensor divition.

Our next problem is a supervised regression task using the classical [MNIST](https://en.wikipedia.org/wiki/MNIST_database) handwritten digits. The data is available in the [Keras dataset](https://keras.io/api/datasets/mnist/).  

In [None]:
# import MNIST data:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data(path="mnist.npz")

print(f"x-train:{x_train.shape}, y-train:{y_train.shape}")
print(f"x-test:{x_test.shape}, y-test:{y_test.shape}")

In [None]:
# visualizing handwritten digits:
i = 0

plt.figure(figsize=(5,3))
plt.title("Number "+str(y_train[i]))
plt.imshow(x_train[i], cmap='gray')
plt.xticks([])
plt.yticks([])
plt.show()

In [None]:
# shuffling train data:

# shuffling test data:


In [None]:
# normalization:


#### __One-hot encoding:__
<font size=3>
    
One-hot encoding is a technique for multiclass data numerical encoding, such as the digit's labels $(0,1,2,3,4,5,6,7,8,9)$. It involves representing a label as the maximum probability in a vector with the same size as the number of classes. This means the label is assigned a vector position with a probability of 1, while the coefficients of the other vectors receive a probability of 0.

$0:(1,0,0,0,0,0,0,0,0,0);\; 1: (0,1,0,0,0,0,0,0,0,0);\; \cdots;\; 9: (0,0,0,0,0,0,0,0,0,1)$

For digit $\mathbf 3$, the model can output $(2.4,6.2,1.2,\mathbf{9.6},0.8,4.7,3.1,1.7,5.3,4.3)$.

We can use TensorFlow's [one_hot](https://www.tensorflow.org/api_docs/python/tf/one_hot) function to do the encoding, or we can do it by "hand" as following.

In [None]:
# one-hot encoding for label data:


In [None]:
# flatten x data (the number arrays) as dense layers' input vectors: (N, 28, 28) -> (N, 28*28)


In [None]:
# splitting the train data into train and validation:


### __2. Neural network modeling:__
<font size=3>
    
2.1 Define initial layer's shape;\
2.2 Define output layer's shape and its [activation function](https://keras.io/api/layers/activations/);\
2.3 Define hidden layers.

When the model needs a probability distribution output vector, we use the [softmax activation function](https://en.wikipedia.org/wiki/Softmax_function) to range the values in the interval $[0,\,1]$ and sum  to $1$, given by
$$
    \sigma_l(\vec a_l) = \frac{e^{a_l^i}}{\sum_{j=1} e^{a_l^j}} \, .
$$

In [None]:
# plotting softmax:

In [None]:
# model:

### __3. Model compilation:__
<font size=3>

3.1 Define [optimizer](https://keras.io/api/optimizers/);\
3.2 Define [loss function](https://keras.io/api/losses/);\
3.3 Define [validation metric](https://keras.io/api/metrics/).

For loss function optimization, the [Cross-Entropy function](https://en.wikipedia.org/wiki/Cross-entropy) can be used to handle probability distributions. It measures the match between the predicted distribution $q$ and the true distribution $p$, 
$$
    H(p,\,q) = -\sum_i p_i\,\ln q_i \, .
$$

In Keras, we have a [list of probability losses](https://keras.io/api/losses/) where the [categorical cross-entropy loss](https://keras.io/api/losses/probabilistic_losses/#categoricalcrossentropy-class) is best suited to dealing with multiclass one-hot labels.

### __4. Train and validation__
<font size=3>
    
Here, using the training data, the optimizer updates the values of the model's inner parameters (_i.e._, weights, biases, etc.) over the epochs while minimizing/maximizing the loss function. Meanwhile, the model's performance is measured for each epoch using the validation data. At this workflow stage, we model the neural network architecture to avoid [overfitting and underfitting](https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/).
   

### __5. Final training__
<font size=3>

Once the modeling is completed, we concatenate train and validation data to fit again the model.

__Note:__ use the same number of __epochs__ and __batch-size__ from the previous step.
    

### __6. Test evaluation__:

    6.1 Make the evaluation using the test data;
    6.1 Make some predictions to visualize the results;
   

### __7. Saving the model__:

<font size=3>
    
For model __loading__, see [2.2-notebook](2.2-notebook.ipynb).