In [None]:
from numpynn import layers, networks, preprocessing, utils, optimizers, losses, metrics, activations, inits
import pandas as pd

In [None]:
utils.set_numpy_format()

### Prepare data
You will need to download the dataset from https://www.kaggle.com/datasets/uciml/iris and place it into the *data* directory.

In [None]:
data_orig = pd.read_csv('data/iris.csv')
data = data_orig.copy()
data.drop(columns=['Id'], inplace=True)
data.head()

In [None]:
data.describe()

In [None]:
data.info()

The labels are categorical values. To be used in the model, all data needs to be numerical. The function `categorical_to_numeric()` can be used to one-hot-encode all categorical data of a Pandas DataFrame object.

In [None]:
data_enc = preprocessing.categorical_to_numeric(data)
data_enc.sample(10)

Next the data is split into a training, validation and a testing dataset using the `split_train_test_val_data()` to evaluate the model later on. Before splitting the data is also shuffled, since sometimes raw data is sorted in some way.

In [None]:
tensor = data_enc.to_numpy()
t_train, t_val, t_test = preprocessing.split_train_val_test(tensor)
t_train[:5]

Features and labels are now seperated.

In [None]:
x_train, y_train = preprocessing.split_features_labels(t_train, 4)
x_val, y_val = preprocessing.split_features_labels(t_val, 4)
x_test, y_test = preprocessing.split_features_labels(t_test, 4)

Neural networks tend to run into problems if values are very high. Therefore it is common to normalize the data. This can be done using the `normalize()` function, which applies min-max feature scaling to a tensor.<br><br>
$ X'=a+\frac{(X-X_{min})\cdot(b-a)}{X_{max}-X_{min}} $<br><br>, where<br><br>$ a $ ... lower bound<br>$ b $ ... upper bound

In [None]:
x_train = preprocessing.normalize(x_train)
x_val = preprocessing.normalize(x_val)
x_test = preprocessing.normalize(x_test)
x_train[:5]

### Build the neural network structure
Here the individual layers of the neural network models are defined. For linear layers, activation functions and weight initialization methods can be defined. 

In [None]:
model = networks.Sequential(input_shape=(4,), layers=[
    layers.Linear(out_channels=8, act_fn=activations.Tanh(), init_fn=inits.kaiming),
    layers.Linear(out_channels=8, act_fn=activations.Tanh(), init_fn=inits.kaiming),
    layers.Linear(out_channels=8, act_fn=activations.Tanh(), init_fn=inits.kaiming),
    layers.Linear(out_channels=3, act_fn=activations.Softmax(), init_fn=inits.kaiming)
])

The network is compiled to internally connect it's layers and initialize the model. The SGD optimizer provides an optional momentum term and nesterov momentum.

In [None]:
model.compile(
    optimizer=optimizers.SGD(l_r=1e-2, momentum=0.9, nesterov=True),
    loss_fn=losses.Crossentropy(),
    metric=metrics.accuracy
)

In [None]:
model.summary()

### Train the model

In [None]:
hist = model.train(x_train, y_train, epochs=100, val_data=(x_val, y_val), verbose=False)

Model parameters and values can be analyzed.

In [None]:
model.plot_training_loss(hist)

In [None]:
model.plot_activations()

In [None]:
model.plot_gradients()

### Evaluate the model
Using the defined metric, the model's performance can be evaluated using testing/validation data.

In [None]:
model.evaluate(x_test, y_test)