# Feed-forward Neural Network

In diesem Notebook schauen wir Neurale Netzwerke an.
In der Praxis verwendet man für Bilderdaten üblicherweise Convolutional Neural Network (CNN), diese haben wir im Theorie Teil aber nicht im Detail angeschaut.
Daher verwenden wir hier die Feed-forward Neural Networks mit einem Hidden Layer.

Das Neural Network ist in `tensorflow` programmiert.

In [None]:
!pip install tensorflow

In [None]:
def plot_history(history):
    plt.plot(history.history['accuracy'], label='train_accuracy')
    plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import layers, activations, Sequential, losses
from tensorflow.keras.regularizers import L2

import pickle

import pandas as pd

from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

In [None]:
def plot_confusion_matrix(y_true: any, y_pred: any):
    labels = np.unique(y_true)
    fig = plt.figure(figsize=(len(labels), len(labels)))
    ConfusionMatrixDisplay(
      confusion_matrix=confusion_matrix(y_true=y_true, y_pred=y_pred, labels=labels, normalize='all'),
      display_labels=labels
    ).plot(ax=fig.gca(), cmap="BuPu", xticks_rotation='vertical', include_values=True)
    plt.show()

# Prepare data

In [None]:
# Load the data and split into features and labels
with open('../data/train.pkl', 'rb') as f:
    data_train = pickle.load(f)
X_data = data_train["images"]
y_data = data_train["labels"]

In [None]:
X_train, X_val, y_train, y_val = train_test_split(X_data, y_data, random_state=42)

##### Preprocessing

Für tensorflow müssen wir die Text-Labels (wie `frog`) in Zahlen verwandeln, dazu verwenden wir den `LabelEncoder`.

Die Daten werden hier mittels `tf.image.per_image_standardization` standartisiert.

In [None]:
le = LabelEncoder()
y_train_enc = le.fit_transform(y_train)
y_val_enc = le.transform(y_val)

In [None]:
# tf.image.per_image_standardization ist ein übliches Preprocessing für Bilderdaten.
X_train_std = tf.image.per_image_standardization(X_train).numpy()
X_val_std = tf.image.per_image_standardization(X_val).numpy()

#### NN (0 hidden layer -> Logistic Regression)

Zuerst bauen wir die `Logistic Regression` als Neural Network nach.

![Logistic Regression als Neural Network](./img/logistic_regression_as_nn.png)

Die Performanz sollte ähnlich sein zu unserer `Logistic Regression` Baseline.

In [None]:
lr = Sequential([
    layers.InputLayer(input_shape=(32*32*3), name='input_layer'),
    layers.Dense(10, activation=activations.linear, kernel_regularizer=L2(), name='output_layer'),
])
lr.compile(
    optimizer='sgd',
    loss=losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

print(lr.summary())

history = lr.fit(X_train_std.reshape(-1, 32 * 32 * 3), y_train_enc, batch_size=128, epochs=40, validation_data=(X_val_std.reshape(-1, 32 * 32 * 3), y_val_enc))
plot_history(history)

y_val_hat_prob = lr.predict(X_val_std.reshape(-1, 32 * 32 * 3))
y_val_hat = np.argmax(y_val_hat_prob, axis=1)

print(accuracy_score(y_val_hat, y_val_enc))

#### NN (1 hidden layer) no Regularization

Nun fügen wir einen `Hidden Layer` hinzu.

![Neural Network mit einem Hidden Layer](./img/one_hidden_nn.png)

In [None]:
nn = Sequential([
    layers.InputLayer(input_shape=(32*32*3), name='input_layer'),
    layers.Dense(1024, activation=activations.relu, name='hidden_layer'),
    layers.Dense(10, activation=activations.linear, name='output_layer'),
])
nn.compile(
    optimizer='sgd',
    loss=losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

print(nn.summary())

history = nn.fit(X_train_std.reshape(-1, 32 * 32 * 3), y_train_enc, batch_size=128, epochs=40, validation_data=(X_val_std.reshape(-1, 32 * 32 * 3), y_val_enc))
plot_history(history)

y_val_hat_prob = nn.predict(X_val_std.reshape(-1, 32 * 32 * 3))
y_val_hat = np.argmax(y_val_hat_prob, axis=1)

print(accuracy_score(y_val_hat, y_val_enc))

#### NN (1 hidden layer) with L2 Regularization

Nun fügen wir L2 Regularisierung hinzu, um gegen das Overfitting zu helfen.

In [None]:
nn_l2 = Sequential([
    layers.InputLayer(input_shape=(32*32*3), name='input_layer'),
    layers.Dense(1024, activation=activations.relu, kernel_regularizer=L2(0.01), name='hidden_layer'),
    layers.Dense(10, activation=activations.linear, kernel_regularizer=L2(0.01), name='output_layer'),
])
nn_l2.compile(
    optimizer='sgd',
    loss=losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

print(nn_l2.summary())

history = nn_l2.fit(X_train_std.reshape(-1, 32 * 32 * 3), y_train_enc, batch_size=128, epochs=40, validation_data=(X_val_std.reshape(-1, 32 * 32 * 3), y_val_enc))
plot_history(history)

y_val_hat_prob = nn_l2.predict(X_val_std.reshape(-1, 32 * 32 * 3))
y_val_hat = np.argmax(y_val_hat_prob, axis=1)

print(accuracy_score(y_val_hat, y_val_enc))

In [None]:
plot_confusion_matrix(
    y_true=y_val,
    y_pred=le.inverse_transform(y_val_hat)
)

# Predict classes for test set

If we are happy with the performance of our model on the validation set, we can apply it to the test set.

In [None]:
with open('../data/test.pkl', 'rb') as f:
    X_test = pickle.load(f)

In [None]:
y_test_pred_prob = nn_l2.predict(X_test.reshape(-1, 32 * 32 * 3))
y_test_pred = np.argmax(y_test_pred_prob, axis=1)
y_test_pred_enc = le.inverse_transform(y_test_pred)
y_test_pred_df = pd.DataFrame(y_test_pred_enc, columns=['label'])

To submit the predictions to Kaggle we write them into a .csv file, which you can manually submit.

In [None]:
y_test_pred_df.to_csv('../out/neural_network.csv', header=True, index_label='id')