In [None]:
import pandas as pd
import walnut

# Example 2

### Deep neural network using multiple linear layers

### Step 1: Prepare data
You will need to download the dataset from https://www.kaggle.com/datasets/uciml/iris and place it into the *data* directory.

In [None]:
data_orig = pd.read_csv('data/iris.csv')
data = data_orig.copy()
data.drop(columns=['Id'], inplace=True)
data.head()

In [None]:
data.describe()

In [None]:
data.info()

The labels are categorical values. To be used in the model, all data needs to be numerical. The function `categorical_to_numeric()` can be used to one-hot-encode all categorical data of a Pandas DataFrame object.

In [None]:
data_enc = walnut.preprocessing.pd_one_hot_encode(data, columns=['Species'])
data_enc.sample(10)

Next the data is split into a training, validation and a testing dataset using the `split_train_test_val_data()` to evaluate the model later on. Before splitting the data is also shuffled, since sometimes raw data is sorted in some way.

In [None]:
tensor = walnut.pd_to_tensor(data_enc)
t_train, t_val, t_test = walnut.preprocessing.split_train_val_test(tensor)
t_train[:5]

Features and labels are now seperated.

In [None]:
x_train, y_train = walnut.preprocessing.split_features_labels(t_train, num_x_cols=4)
x_val, y_val = walnut.preprocessing.split_features_labels(t_val, num_x_cols=4)
x_test, y_test = walnut.preprocessing.split_features_labels(t_test, num_x_cols=4)

Neural networks tend to run into problems if values are very high. Therefore it is common to normalize the data. This can be done using the `normalize()` function, which applies min-max feature scaling to a tensor.<br><br>
$ X'=a+\frac{(X-X_{min})\cdot(b-a)}{X_{max}-X_{min}} $<br><br>, where<br><br>$ a $ ... lower bound<br>$ b $ ... upper bound

In [None]:
x_train = walnut.preprocessing.normalize(x_train, axis=0)
x_val = walnut.preprocessing.normalize(x_val, axis=0)
x_test = walnut.preprocessing.normalize(x_test, axis=0)
x_train[:5]

### Step 2: Build the neural network structure
Here the individual layers of the neural network models are defined. For linear layers, activation functions and weight initialization methods can be defined. 

In [None]:
import walnut.nn as nn
from walnut.nn import layers

#using weight initialization following a normal distribution
model = nn.Sequential(layers=[
    layers.Linear(16, input_shape=(4,), act="tanh", init="normal"),
    layers.Linear(16, act="tanh", init="normal"),
    layers.Linear(16, act="tanh", init="normal"),
    layers.Linear(3, act="softmax", init="normal")
])

# using kaiming he initializaton method
# model = nn.Sequential(layers=[
#     layers.Linear(16, input_shape=(4,), act="tanh", init="kaiming_he"),
#     layers.Linear(16, act="tanh", init="kaiming_he"),
#     layers.Linear(16, act="tanh", init="kaiming_he"),
#     layers.Linear(3, act="softmax", init="kaiming_he")
# ])

The network is compiled to internally connect it's layers and initialize the model. The SGD optimizer provides an optional momentum term and nesterov momentum.

In [None]:
model.compile(
    optimizer=nn.optimizers.SGD(l_r=1e-2, momentum=0.9, nesterov=True),
    loss_fn=nn.losses.Crossentropy(),
    metric=nn.metrics.Accuracy()
)

model

### Step 3: Train the model

In [None]:
hist = model.train(x_train, y_train, epochs=100, val_data=(x_val, y_val))

### Step 4: Evaluate the model
Using the defined metric, the model's performance can be evaluated using testing/validation data.

In [None]:
loss, accuracy = model.evaluate(x_test, y_test)
print('loss', loss)
print('accuracy', accuracy)

### Step 5: Analyze the model
Usind different plots, the models performance and training behaviour can be analyzed.

In [None]:
nn.analysis.plot_curve(hist)

If the `random` weight initialization method is used, the tanh activations get saturated very fast and the gradients "die out". If `kaiming_he` is used this couteracts this behaviour. Furthermore the initial loss is lower and the model ist therefore not wasting time correcting unnecessary high weight values in the beginning.

In [None]:
activations = {f"{i + 1} {l.__class__.__name__}" : l.y.data.copy() for i, l in enumerate(model.layers) if l.__class__.__name__ == "Tanh"}
nn.analysis.plot_distrbution(activations, title="activation distribution") 

In [None]:
gradients = {f"{i + 1} {l.__class__.__name__}" : l.y.grad.copy() for i, l in enumerate(model.layers) if l.__class__.__name__ == "Linear"}
nn.analysis.plot_distrbution(gradients, title="gradient distribution")