### Training a neural network

Training in supervised learning is about finding a parameter set (weights and biases) for the neural network such that at best all labels of a labeled dataset are reproduced.

Having the network set up and provided a dataset to use for *supervised learning*, we are now in the position to train our network.

The training method is controlled by the option key `optimizer`. By default it's set to *GradientDescent*.

Moreover, the following options control the training:

- `batch_size` : if smaller than the dataset dimension, then we get Stochastic Gradient Descent (SGD)
- `learning_rate` : learning rate to use for training, i.e. scaling the gradient per step
- `loss` : type of training function to use
- `max_steps` : fixed amount of training steps to execute
- `seed` : influencing the random starting parameter choice

In [None]:
import TATi.simulation as tati

#### Available loss functions

Loss functions available in tensorflow are also available in `tati`. We can get a list as follows.

In [None]:
print(tati.get_losses())

#### Performing the fit

Let us use simple *mean_squared* for the moment to train a network on the provided dataset.

In [None]:
nn = tati(batch_data_files=["dataset-twoclusters.csv"],
          output_activation="linear",
          learning_rate=0.1,
          loss="mean_squared",
          max_steps=100,
          seed=426,
          trajectory_file="training.csv")
training_data = nn.fit()
print(nn.loss())

The `fit()` function returned a object `training_data`, we'll come to that in a moment.

Let us first take a look at the minimum's parameters: two weights, one bias.

In [None]:
print(nn.parameters)

The `training_data` object contains three dataframes: `run_info`, `trajectory`, and `averages`.

- *run_info* contains information per step such as loss, accuracy, time spent, norm of gradient, ...
- *trajectory* contains the parameter in each step
- *averages* contains running averages for kinetic energy, for virials, for the ensemble loss, ...

#### Run information

First, we inspect how the training actually proceeded by looking at the loss value per step.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

run = training_data.run_info[['step','loss']].values

plt.plot(run[:,0], run[:,1])
plt.xlabel("Training step")
plt.ylabel("Loss")
plt.show()

The loss has decreased well and is close to 0.

We can also look at other values. For pandas `DataFrame`'s, `columns` contains the name of each column.

In [None]:
print(training_data.run_info.columns)

> Different optimizer and sampler methods produce different columns in *run_info* depending on their respective properties.

Let us inspect the norm of the gradient, the *scaled_gradient*. It is scaled by the `learning_rate`.

Generally, all columns in run info are scaled such that they are comparable among another, based on their effect on the parameters during the update step.

In [None]:
run = training_data.run_info[['step','scaled_gradient']].values

plt.semilogy(run[:,0], run[:,1])
plt.xlabel("Training step")
plt.ylabel("Scale of gradient")
plt.show()

### Trajectory

Let us also look at the training trajectory.

In [None]:
print(training_data.trajectory.columns)
trajectory = training_data.trajectory[['weight0','weight1']].values

plt.plot(trajectory[:,0], trajectory[:,1])
plt.xlabel("w1")
plt.ylabel("w2")
plt.show()

#### Predict labels of unknown data

Having trained a network, we would like to make predictions for new data that the network has not seen, yet.

To this end, we use the `predict()` function which needs to be supplied with the unknown features.

In [None]:
unknown_data = np.array([[0,0], [1,1], [-1,1], [1,-1], [-1,-1]])
print(np.sign(nn.predict(unknown_data)))

We get a list of labels, one per item, in return. We have used `numpy.sign` to turn this into a lists with entries in {-1,1}.

### Summary

- how to train a network
- how to visualize the run information and trajectory
- how to predict using a pre-trained network on unknown data