## Neural Network training and evaluation

In this notebook, you will explore and train a Neural Network classifier, using the dataset provided and the features extracted previously.
The Neural Network we'll be using is the one from PyTorch.

To get started, we have to import some tools to train, test, save and load the model.

In [None]:
import pickle
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Loading the training data and creating a train testing split

Using the pickle library, we can retrieve the features we extracted in the feature extraction stage.
The format of the data that is stored is (X,y) where:
- X is the list of feature vectors.
- y is the list of labels.
- The label in position y[i] is respective to the feature vector in X[i].

After loading, we create a train_test split. The split is done randomly (we set a random_state to make it deterministic so you can run this cell multiple times), with 20% being set for testing and the remaining for training.
We will use the training for the model training and the testing to evaluate the trained models.

Because PyTorch uses algorithms specialized to run the calculations necessary, we have to provide data in a specific structure, DataLoader.
- train_loader is the one we use for training.
- val_loader is the one we use for evaluation and validation.

In [None]:
training_vectors = "training_vectors.pkl"  # Path for the .pkl containing the extracted training data
batch_size = 32 #You can go with 32 or 16, that's what we recommend

with open(training_vectors, "rb") as f:
    data = pickle.load(f)

# Assuming the pickle contains a tuple: (X, y)
X, y = data

# Convert to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)

# Split into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size)

### Building the Neural Network

Neural Networks can have wildly different architectures, so this will be your playground to improve the model.
The architecture itself is defined inside the \_\_init\_\_, where you can determine the number of layers, their size (hidden_dim), activation functions.
We then have a forward function. This function is what it's used for the feed forward of the neural network, that will be used to classify the vector.

The example we have below has the following architecture:
- An input layer, receiving the features vector.
- 2 hidden layers with hidden_dim size (defined by you).
- An output layer, that will result in a vector of the classes that will determine the class of that vector.
- All transitions between layers use the Tanh (hyperbolic tangent) activation function.

You can shape this model as much as you want! Some ideas are:
- The number and size of of layers can be a factor.
- The activation functions: ReLU(), LeakyReLU(), GELU(), Sigmoid()
- You can add a dropout in the NN (at the end, for example).

Keep in mind Neural networks take time to train, so be sure to not exagerate the number of architectures.
Make sure to explore, search the internet and ask around to understand which changes would be interesting. Chatbots are good at giving ideas and you can look into them.


In [None]:
#Change the input size and num_classes according to:

#architecture of the Neural network
class SimpleNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_classes):
        super(SimpleNN, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),           # Go from input to hidden layer 1
            nn.Tanh(),                                  # Activation Function
            nn.Linear(hidden_dim, hidden_dim),          # Go from hidden layer 1 to 2
            nn.Tanh(),                                  # Activation Function
            nn.Linear(hidden_dim, num_classes),         # Go from hidden layer 2 to output layer
        )

    def forward(self, x):
        return self.model(x)

### Decision Tree Training

Now we create a decision tree (dt) with set parameters, and then train it using the fit() method.
By playing with the parameters, you can improve the quality of your model, so play with the parameters.
For each model you train, make sure to change the output_model_path, so you save all of them in different files.

In [None]:
input_size = 1      #Size of the feature vector
num_classes = 2     #Classes that will be in the output
hidden_dim = 200    #Hidden layers size (you can make this vary by specifying it in the architecture)
num_epochs = 1000   #Epochs of training the NN

model = SimpleNN(input_size, hidden_dim, num_classes)

criterion = nn.CrossEntropyLoss()  # Raw logits expected
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train settings
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Training loop
for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    avg_train_loss = train_loss / len(train_loader)

    # Validation step
    model.eval()
    val_loss = 0.0
    all_preds = []
    all_targets = []
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            preds = torch.argmax(outputs, dim=1)
            all_preds.extend(preds.cpu().numpy())
            all_targets.extend(labels.cpu().numpy())

    avg_val_loss = val_loss / len(val_loader)
    val_acc = accuracy_score(all_targets, all_preds)

    print(f"Epoch {epoch+1}/{num_epochs} - "
          f"Train Loss: {avg_train_loss:.4f} - "
          f"Val Loss: {avg_val_loss:.4f} - "
          f"Val Acc: {val_acc:.4f}")

# Save model
torch.save(model.state_dict(), "simple_nn_model.pt")

### Neural Network Evaluation

Now that we have trained some models, you can evaluate them and compare their metrics.
We have accuracy as an example, however we encourage to explore multiple metrics, such as accuracy, precision, recall, f1-score, evaluate the confusion matrix or calculate the AUC.

To do it, we load the model and test it by:
- Using the model to predict the labels of the X_test vector.
- Compare the actual labels (all_targets) with the predicted labels (all_preds) through the metrics.

In [None]:
model = SimpleNN(input_size, hidden_dim, num_classes)
model.load_state_dict(torch.load("simple_nn_model.pt"))
model.eval()

# Optional: move to GPU if used in training
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Run predictions on the validation set
all_preds = []
all_targets = []

with torch.no_grad():
    for inputs, labels in val_loader:
        inputs = inputs.to(device)
        outputs = model(inputs)
        preds = torch.argmax(outputs, dim=1)
        all_preds.extend(preds.cpu().numpy())
        all_targets.extend(labels.numpy())

# Accuracy
accuracy = accuracy_score(all_targets, all_preds)
print(f"Validation Accuracy: {accuracy:.2%}")