### Multiclass data classification

The second problem considered is the multi-class classification task. For this purpose, we will use the well-known Reuters database, published in $1986$, which contains a set of short press articles on $46$ various topics. Of course, each note is classified into only one topic, and each topic has at least $10$ examples in the training data set. Additionally, the Reuters dataset is part of the Keras package. Below we present a fragment of the code whose task is to load the set into the appropriate data tensors.

In [None]:
from tensorflow.keras.datasets import reuters

L=100
(train_data,train_labels),(test_data,test_labels)=reuters.load_data(num_words=L+4)

This way we can load both the training and the test sets. The **num_words** parameter specifies the maximum number of most frequently occurring keywords. In the analyzed case, this is a number defined as $L$. We add the value $4$ here because the symbols $0$ to $3$ do not describe keywords, similarly to the IMDb database. An example note for $L=100$ might look like this:

In [None]:
print(train_data[0])

where individual numbers indicate word indexes in the keyword dictionary. Of course, based on the vector above, we can recreate the note using the code below:

In [None]:
def decode_data(data):

    dictionary=reuters.get_word_index()
    my_dictionary=dict([(k,v) for (v,k) in dictionary.items()])
    s=' '.join([my_dictionary.get(d-3,'?') for d in data])
    return s

print(decode_data(train_data[0]))

Input data vectors read straight from the database cannot be directly used in the training process, similarly to the previously discussed example. Therefore, it is required to transform them into vectors of fixed length, independent of a specific article. For this purpose, we use a function analogous to the previous example.

In [None]:
import numpy as np

def prepare_data(name,data,labels):
    global L

    x=np.zeros((len(data),L),float)
    y=np.zeros((len(data),46),float)
    for i in range(0,len(data)):
        for j in data[i]:
            if (j>=4):
                x[i][j-4]+=1.0
        if (np.linalg.norm(x[i])>0):
            x[i]/=np.linalg.norm(x[i])
        y[i][labels[i]]=1.0
    np.save(name+'_data.npy',x)
    np.save(name+'_labels.npy',y)
    return

prepare_data('train',train_data,train_labels)
prepare_data('test',test_data,test_labels)

After processing, vectors are saved in files with the extension "\*.npy". Note that for the Reuters database we have $8982$ vectors in the training set and $2246$ in the test set. The vectors expected at the output are vectors with $46$ elements and have one value equal to $1$, which determines belonging to one selected class, and the remaining elements are zeroed.

The process of training a network model begins with loading training vectors and dividing them into a training and validation set. We use the following code for this:

In [None]:
train_x=np.load('train_data.npy')
train_y=np.load('train_labels.npy')
test_x=np.load('test_data.npy')
test_y=np.load('test_labels.npy')
N=len(train_x)
N2=N//2
(train2_x,validate_x)=(train_x[0:N2],train_x[N2:N])
(train2_y,validate_y)=(train_y[0:N2],train_y[N2:N])

The network training model itself, as well as the structure of the neural network, are topologically similar to the structure considered in the previous task. We therefore use the same number of hidden layers, i.e. three layers, but the size of the input data and, in particular, the output data changes. In the case under consideration, we have $46$ of different classes, which means that we must have $46$ of neurons in the last layer of the network. The following code creates and compiles a network training model.

In [None]:
import tensorflow as tf

model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(2,activation='relu'))
model.add(tf.keras.layers.Dense(2,activation='relu'))
model.add(tf.keras.layers.Dense(46,activation='softmax'))
model.build(input_shape=(N,L))
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])

In our example model, we use $2$ neurons in the first and second hidden layers. The activation function in these layers is the ReLU function. Those numbers may be not enough.

In the last layer, i.e. the output layer, which has $46$ neurons, we use the "softmax" activation function for obvious reasons. This function causes the network output, in response to the input vector, to obtain a probability distribution of the input vector belonging to $46$ possible classes. The model compilation assumes the use of the Adam optimizer, and the loss function is categorical cross-correlation (see the previous section). When training the network, we will also use the classification accuracy metric "accuracy". The training process itself is started using the following code:

In [None]:
history=model.fit(train2_x,train2_y,epochs=50,validation_data=(validate_x,validate_y),verbose=True)
tf.keras.models.save_model(model,'model_multiclass.h5')

In [None]:
import matplotlib.pyplot as plt

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(['loss','validation loss'])
plt.show()

In [None]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['accuracy','validation accuracy'])
plt.show()

Once the model is trained and saved to disk in a file, we can read the file and use the saved model multiple times to classify data. For example:

In [None]:
model=tf.keras.models.load_model('model_multiclass.h5')
print(np.round(model.predict(test_x[0:1]),1)[0])
print(test_y[0])

We can perform the model evaluation process on the entire test set as follows:

In [None]:
model=tf.keras.models.load_model('model_multiclass.h5')
model.evaluate(test_x,test_y)

**Task**

Further experimental research into this problem may include:
- increasing the number $L$ of keywords,
- increasing the number of hidden layers of the network and checking how such a procedure will affect the classification results obtained for all three sets (training, validation, testing),
- modifying (increasing, decreasing) the number of neurons in hidden layers and checking the impact of changes on the classification results,
- replacing the loss function from categorical_crossentropy to the mse function,
- replacing the ReLU or "softmax" activation function with sigmoid functions and checking the impact of such an operation on the learning process itself, as well as on the final results obtained.

---

### <center>Experiments</center>

- keywords: 200, 500, 1000
- text representations: Normalized
- number of hidden layers: 2, 3, 4
- number of neurons: 128, 256, 512
- loss: Categorical Cross Entropy, MSE
- hidden activation functions: ReLU, sigmoid
- output activation functions: linear (MSE), softmax (Categorical Cross Entropy)

In [None]:
from tensorflow.keras.losses import CategoricalCrossentropy, MeanSquaredError
from tensorflow.keras.metrics import CategoricalAccuracy, MeanAbsoluteError, RootMeanSquaredError
from tensorflow.keras import models, layers
import optuna
import pandas as pd
import os


def prepare_data(name, data, labels, keyword, data_size):
    x = np.zeros((data_size, keyword), float)
    y = np.zeros((data_size, 46), float)
    for i in range(0, data_size):
        for j in data[i]:
            if j >= 4:
                x[i][j-4] += 1.0
        if np.linalg.norm(x[i]) > 0:
            x[i] /= np.linalg.norm(x[i])
        y[i][labels[i]] = 1.0

    if "data" not in os.listdir(os.getcwd()):
        os.mkdir("data")

    file_name_data = f"{name}_{keyword}_data.npy"
    file_name_labels = f"{name}_{keyword}_labels.npy"

    np.save(os.path.join("data", file_name_data), x)
    np.save(os.path.join("data", file_name_labels), y)

def load_data(name: str, keyword: int) -> tuple:
    x, y = np.load(os.path.join("data", f"{name}_{keyword}_data.npy")), np.load(os.path.join("data", f"{name}_{keyword}_labels.npy"))
    return x, y


search_space = {
    "keywords": [200, 500, 1000],
    "number_of_layers": [2, 3, 4],
    "min_neurons": [32, 64],
    "max_neurons": [128, 256],
    "activation": ["relu", "sigmoid"],
    "is_increasing": [True, False],
    "layer_pattern": ["linear", "random"]
}

loss_functions = ["categorical_crossentropy", "mse"]

if "data" not in os.listdir(os.getcwd()):
    for keywords in search_space["keywords"]:
        (train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=keywords + 4)
        prepare_data("train", train_data, train_labels, keywords, train_data.shape[-1])
        prepare_data("test", test_data, test_labels, keywords, test_data.shape[-1])

#### Defining model and experiments

In [None]:
def define_nn(params: dict, input_size: int, loss_function: str, classes: int):
    output_activation = {
        "mse": "linear",
        "categorical_crossentropy": "softmax"
    }[loss_function]

    number_of_layers = params["number_of_layers"]
    min_neurons = params["min_neurons"]
    max_neurons = params["max_neurons"]
    activation = params["activation"]
    is_increasing = params["is_increasing"]
    layer_pattern = params["layer_pattern"]

    if layer_pattern == "linear":
        neurons_in_layers = np.linspace(min_neurons, max_neurons, number_of_layers, dtype=int)

    else:
        neurons_in_layers = np.random.randint(min_neurons, max_neurons + 1, size=number_of_layers)
        if is_increasing:
            neurons_in_layers = np.sort(neurons_in_layers)
        else:
            neurons_in_layers = np.sort(neurons_in_layers)[::-1]

    if not is_increasing and layer_pattern in ["linear"]:
        neurons_in_layers = neurons_in_layers[::-1]

    neurons_in_layers = np.maximum(neurons_in_layers, 8)

    model_layers = [layers.Input(shape=(input_size,))]

    for i, num_neurons in enumerate(neurons_in_layers):
        model_layers.append(
            layers.Dense(
                int(num_neurons),
                activation=activation,
                name=f"hidden_{i+1}_{int(num_neurons)}neurons"
            )
        )

    model_layers.append(
        layers.Dense(classes, activation=output_activation, name="output")
    )

    return models.Sequential(model_layers)


def objective_accuracy(trial: optuna.Trial):
    params = {
        "keywords": trial.suggest_categorical("keywords", search_space["keywords"]),
        "number_of_layers": trial.suggest_categorical("number_of_layers", search_space["number_of_layers"]),
        "min_neurons": trial.suggest_categorical("min_neurons", search_space["min_neurons"]),
        "max_neurons": trial.suggest_categorical("max_neurons", search_space["max_neurons"]),
        "activation": trial.suggest_categorical("activation", search_space["activation"]),
        "is_increasing": trial.suggest_categorical("is_increasing", search_space["is_increasing"]),
        "layer_pattern": trial.suggest_categorical("layer_pattern", search_space["layer_pattern"])
    }

    if params["min_neurons"] >= params["max_neurons"]:
        raise optuna.exceptions.TrialPruned()

    x_train, y_train = load_data(name="train", keyword=params["keywords"])
    x_test, y_test = load_data(name="test", keyword=params["keywords"])

    my_model = define_nn(params, x_train.shape[-1], "categorical_crossentropy", y_train.shape[-1])
    my_model.compile(
        optimizer="adam",
        loss=CategoricalCrossentropy(),
        metrics=[CategoricalAccuracy(name="accuracy")]
    )

    trial_history = my_model.fit(
        x_train, y_train,
        epochs=20,
        batch_size=128,
        validation_split=0.2,
        verbose=0,

    )

    test_loss, test_accuracy = my_model.evaluate(x_test, y_test, verbose=0)
    test_results = {
        "test_loss": test_loss,
        "test_accuracy": test_accuracy,
        **params
    }
    history_of_trials_accuracy.append(test_results)

    return np.max(trial_history.history["val_accuracy"])


def objective_mse(trial: optuna.Trial):
    params = {
        "keywords": trial.suggest_categorical("keywords", search_space["keywords"]),
        "number_of_layers": trial.suggest_categorical("number_of_layers", search_space["number_of_layers"]),
        "min_neurons": trial.suggest_categorical("min_neurons", search_space["min_neurons"]),
        "max_neurons": trial.suggest_categorical("max_neurons", search_space["max_neurons"]),
        "activation": trial.suggest_categorical("activation", search_space["activation"]),
        "is_increasing": trial.suggest_categorical("is_increasing", search_space["is_increasing"]),
        "layer_pattern": trial.suggest_categorical("layer_pattern", search_space["layer_pattern"])
    }

    if params["min_neurons"] >= params["max_neurons"]:
        raise optuna.exceptions.TrialPruned()

    x_train, y_train = load_data(name="train", keyword=params["keywords"])
    x_test, y_test = load_data(name="test", keyword=params["keywords"])

    my_model = define_nn(params, x_train.shape[-1], "mse", y_train.shape[-1])
    my_model.compile(
        optimizer="adam",
        loss=MeanSquaredError(),
        metrics=[RootMeanSquaredError(name="rmse"), MeanAbsoluteError(name="mae")]
    )

    trial_history = my_model.fit(
        x_train, y_train,
        epochs=20,
        batch_size=512,
        validation_split=0.2,
        verbose=0
    )

    test_loss, test_rmse, test_mae = my_model.evaluate(x_test, y_test, verbose=0)
    test_results = {
        "test_loss": test_loss,
        "test_rmse": test_rmse,
        "test_mae": test_mae,
        **params
    }
    history_of_trials_mse.append(test_results)

    return np.min(trial_history.history["val_rmse"])

#### Accuracy as primary metric

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

In [None]:
history_of_trials_accuracy = []
study_accuracy = optuna.create_study(
    study_name="Zadanie 4 Accuracy",
    direction="maximize",
    sampler=optuna.samplers.GridSampler(search_space=search_space, seed=42)
)

optuna.logging.set_verbosity(optuna.logging.INFO)
study_accuracy.optimize(objective_accuracy)

In [None]:
if "results" not in os.listdir(os.getcwd()):
    os.mkdir("results")

history_of_trials_accuracy = pd.DataFrame(history_of_trials_accuracy)
history_of_trials_accuracy.to_csv(os.path.join("results", "history_of_trials_accuracy.csv"))

#### MSE as primary metric

In [None]:
history_of_trials_mse = []
study_mse = optuna.create_study(
    study_name="Zadanie 4 MSE",
    direction="minimize",
    sampler=optuna.samplers.GridSampler(search_space=search_space, seed=42)
)

optuna.logging.set_verbosity(optuna.logging.INFO)
study_mse.optimize(objective_mse)

In [None]:
if "results" not in os.listdir(os.getcwd()):
    os.mkdir("results")

history_of_trials_mse = pd.DataFrame(history_of_trials_mse)
history_of_trials_mse.to_csv(os.path.join("results", "history_of_trials_mse.csv"))

#### Results

In [None]:
history_of_trials_accuracy.sort_values(by=["test_accuracy"], ascending=False).head()

In [None]:
history_of_trials_mse.sort_values(by=["test_rmse", "test_mae"], ascending=True).head()

In [None]:
import seaborn as sns


analyze_accuracy = pd.get_dummies(history_of_trials_accuracy, columns=["activation", "layer_pattern"]).corr()
sns.heatmap(
    analyze_accuracy[["test_loss", "test_accuracy"]],
    annot=True,
    fmt='.2f'
)

In [None]:
analyze_rmse_mae = pd.get_dummies(history_of_trials_mse, columns=["activation", "layer_pattern"]).corr()
sns.heatmap(
    analyze_rmse_mae[["test_loss", "test_rmse", "test_mae"]],
    annot=True,
    fmt='.2f'
)