# Training of a Neural Network with Keras

### Level: Basic

In this notebook, we show how to use Keras library to create a basic pipeline for ANN training. 

First, we create a basic neural network. Then, we adress the pipeline for training it. Finally, we save and use this model to make some predictions.

At the end of this notebook there is some useful documentation of this topic.

#### Install dependencies

First, we have to install from terminal the libraries we are going to use:

- pip install pandas
- pip install scikit-learn
- pip install keras

**TIP**

For a cleaner version, we create what is called an `environment`, which we will use to install all needed packages [2].

In `VSCode`, just go to `Terminal`>`New Terminal` and:

- Install conda, changing the second line with the version for your OS [1].
```
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
```
- Create and activate your environment.
```
conda create -n NN_training
conda activate NN_training
```

- Install your libraries as previously indicated.

- Activate the environment in your notebook, if needed. In VSCode, click on `Select Kernel`>`Python Environments...`>`NN_training`.

#### Import dependencies

Once installed, we import in our program the necessary functions of those dependencies to use them.

In [1]:
import keras
import pandas as pd
from sklearn.model_selection import train_test_split

2024-11-20 19:05:59.123854: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-20 19:05:59.125519: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-11-20 19:05:59.133583: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-11-20 19:05:59.151116: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1732125959.189462   67038 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732125959.19

#### Set some parameters

In [2]:
# model optimizer
loss = keras.losses.MeanAbsoluteError()  # or "mae"
lr = 1e-3
optimizer = keras.optimizers.Nadam(learning_rate=lr)  # or "nadam"
metrics = ["accuracy", "mse"]

# training
epochs = 200
batch_size = 64

# early stopping
monitor = "val_loss"
patience = int(0.1 * epochs)

2024-11-20 19:06:02.048210: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


#### Data

This will be different for each proyect. Once you have enough and valuable data (which we all know is a difficult task!), it has to be loaded and processed depending of their format, even normalizing/standardizing it, which is a highly recommended practice [7]. 

A personal recomendation is to use `pandas` for this task, as indicated in commented lines.

For didactic purposes, we define a set of data with binary representation of 0 to 3 as input and the decimal representation as output.

Once we have meaningful and clean data, we split it into `train`/`test` splits, being the first the data used for training our Neural Network and the last to evaluate its performance [3]. 

A common split is 90/10, so we use it as starting point [9].

In [3]:
# our clean and nice normalized data is saved in features/targets parquets
# X = pd.read_parquet("features.parquet")
# y = pd.read_parquet("targets.parquet")

# synthetic data of example
X = pd.DataFrame(
    [
        [0, 0],
        [0, 1],
        [1, 0],
        [1, 0],
        [0, 1],
        [0, 1],
        [1, 1],
        [1, 0],
        [1, 1],
        [0, 0],
        [0, 0],
    ]
)
y = pd.Series([0, 1, 2, 2, 1, 1, 3, 2, 3, 0, 0])

# we set a fixed random state for reproducibility and teaching purposes,
# but our results have to be consistent across multiple seeds to be relevant
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.9, test_size=0.1, random_state=0
)

# also fix keras random seed
keras.utils.set_random_seed(0)

#### Neural Network definition

Although we can define Multi-layer Perceptrons with scikit-learn library [4], we adress the use of Keras for higher flexibility.

With Keras, we can set the model layers several ways, here we adress what they call the `Functional API` [5, 6].

In addition, we define the optimizer, this is, the method to find the attributes of the neural network that reduces the loss, i.e., the difference between the target and predicted output.

In [4]:
# shape of input features and output predictions (do not worry for this! could be defined just by hand)
features_shape = X_train.iloc[0].shape
target_shape = y_train.iloc[0].shape if bool(y_train.iloc[0].shape) else 1

# input layer, which receives X_train
input_layer = keras.Input(shape=features_shape)
# inner layers, with SELU activation function as recommended in [7]
inner_layer_1 = keras.layers.Dense(64, activation="selu")(input_layer)
inner_layer_2 = keras.layers.Dense(32, activation="selu")(inner_layer_1)
inner_layer_3 = keras.layers.Dense(16, activation="selu")(inner_layer_2)
# output layer, no activation as we are not simulating, e.g., a classification
# problem. Notice in this and other cases an activation function in the output
# layer is required
output_layer = keras.layers.Dense(target_shape)(inner_layer_3)

# determined the layers, create the model
model = keras.Model(inputs=input_layer, outputs=output_layer, name="NN_model")

# specify optimizer parameters, again with recomendations in [7]
model.compile(
    loss=loss,  # function to evaluate the difference between target and predicted values
    optimizer=optimizer,  # optimization method
    metrics=metrics,  # additional metrics to track
)

# see our created model
model.summary()

In addition, some regularization techniques like dropout layers can be added, but we skip it for future lessons.

#### Hiperparameter optimization and training

Once we have our model, we train it with the use of an early stopping with a `train`/`validation` split of 90/10 [9], until `val_loss` increases. 

Finally, evaluate and save it.

In [5]:
# set an early stopping for training
callbacks = [keras.callbacks.EarlyStopping(monitor=monitor, patience=patience)]

# train the model while monitoring it
model.fit(
    X_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_split=0.1,
    callbacks=callbacks,
)
score = model.evaluate(X_test, y_test, verbose=0)
print("\n[test loss, test accuracy, test mse]:", score)

# save the final model
model.save("../models/model__Training_NN_Keras.keras")

Epoch 1/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.3750 - loss: 0.9731 - mse: 1.4732 - val_accuracy: 1.0000 - val_loss: 0.2090 - val_mse: 0.0437
Epoch 2/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step - accuracy: 0.3750 - loss: 0.8486 - mse: 1.0655 - val_accuracy: 1.0000 - val_loss: 0.2163 - val_mse: 0.0468
Epoch 3/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 96ms/step - accuracy: 0.3750 - loss: 0.7517 - mse: 0.8315 - val_accuracy: 1.0000 - val_loss: 0.2456 - val_mse: 0.0603
Epoch 4/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 103ms/step - accuracy: 0.3750 - loss: 0.6627 - mse: 0.6314 - val_accuracy: 1.0000 - val_loss: 0.2735 - val_mse: 0.0748
Epoch 5/200
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 90ms/step - accuracy: 0.3750 - loss: 0.5739 - mse: 0.4600 - val_accuracy: 1.0000 - val_loss: 0.3041 - val_mse: 0.0924
Epoch 6/200
[1m1/1[0m [32m━━━━━━━━━━━━

#### Use your model!

Load saved model and use it for predictions.

In [6]:
# load saved model
model = keras.saving.load_model("../models/model__Training_NN_Keras.keras")

# make some predictions
predictions = model.predict(X_test)

# print test data, predicted value and true value
for idx, test_idx in enumerate(X_test.index):
    sample = X_test.loc[test_idx].values
    prediction = predictions[idx]
    true = y_test[test_idx]
    print(f"\nSample {sample} \nPrediction {prediction} \nTrue {true}\n")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step

Sample [0 1] 
Prediction [1.3244736] 
True 1


Sample [0 0] 
Prediction [-0.01905827] 
True 0



#### References 
.. [1] https://docs.anaconda.com/miniconda/ 

.. [2] https://code.visualstudio.com/docs/python/environments#_work-with-python-interpreters 

.. [3] https://scikit-learn.org/1.5/modules/cross_validation.html#cross-validation 

.. [4] https://scikit-learn.org/1.5/modules/neural_networks_supervised.html

.. [5] https://keras.io/getting_started/intro_to_keras_for_engineers/ 

.. [6] https://keras.io/guides/functional_api/

.. [7] A. Géron. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, 2019.

.. [8] https://keras.io/about/  

.. [9] J.J. García-Esteban. Deep Learning and Radiative Heat Transfer. Universidad Autónoma de Madrid, 2024.