# Training CNN model for tree classification based on images of their bark
In this notebook a CNN model will be developed to classify bark according to tree species. The CNN will be trained to classify the original images. After the training the CNN will be evaluated with some post-hoc model analysis methods like LIME and SHAP.

## Organizing the data structure (only done once, after downloading the dataset)

First, the dataset (https://www.kaggle.com/datasets/saurabhshahane/barkvn50) should be downloaded to directory: "./data/BarkVN-50/" and unzipped. You should then have the following structure:
- data
    - BarkVN-50
        - BarkVN-50_mendeley
            - Acacia
            - Adenanthera microsperma
            - Adenieum species
            - Anacardium occidentale
            - ...

Since this is not ideal for this CNN, a subset of the data is selected and split into training data using the code in the next cell:

In [None]:
# import helpers.split
# helpers.split.train_test_split()

Note: this cell only needs to be executed once (this is why it is commented out by default).

After execution the new data structure looks like this:
- data
    - BarkVN-50
        - BarkVN-50_mendeley
            - Acacia
            - Adenanthera microsperma
            - Adenieum species
            - Anacardium occidentale
            - ...
        - Test
            - Adenanthera microsperma
            - Cananga odorata
            - Cedrus
            - Cocos nucifera
            - Dalvergia oliveri
        - Train
            - Adenanthera microsperma
            - Cananga odorata
            - Cedrus
            - Cocos nucifera
            - Dalvergia oliveri

Note: the directory "./data/BarkVN-50/BarkVN-50_mendeley" may be deleted after this step.

## Loading the Dataset and creating DataLoaders
Since the used dataset is a custom one, we need to first create a custom Dataset for loading, transforming and delivering datapoints.

In [None]:
from helpers.dataset import BarkVN50Dataset
from torch.utils.data import DataLoader
from torch import device, manual_seed
from torch.cuda import is_available

# setting random seed
manual_seed(0)

# recognizing device
DEVICE = device("cuda" if is_available() else "cpu")

# load train dataset and create DataLoaders that automatically create minibatches and shuffle the data
train_dataset = BarkVN50Dataset(train=True, device=DEVICE)
test_dataset = BarkVN50Dataset(train=False, device=DEVICE)

train_dataloader = DataLoader(train_dataset, batch_size=20, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=20, shuffle=False)

## Training the CNN model
To train the model we can use either the Train/Test split or the K-Fold Cross Validation split. In my case I first split the dataset into a Train and Test Subset and will mostly be training the model with KF CV. This has the advantage that I will always have a dataset I can use to assess all of the models separately and also see how much the model's performance depends on a lucky/unlucky dataset split.

### Training the CNN model with Train/Test Split...
This chapter will show how to train a single model (either a wholly new one or a pre-trained one) using the Train/Test split.

In [None]:
from helpers.cnn import ConvolutionalNeuralNetwork
from torch.optim import Adam

# model and optimizer
model = ConvolutionalNeuralNetwork().to(device=DEVICE)
optimizer = Adam(model.parameters(), lr=3e-4, weight_decay=1e-4)

If an already existing model should be trained further, it can be loaded from disk:

In [None]:
# from torch import load

# checkpoint = load("models/checkpoint-2024-11-04-18-14-59.tar", weights_only=True)
# model.load_state_dict(checkpoint["model_state_dict"])
# optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
# epoch = checkpoint["epoch"]
# loss = checkpoint["loss"]

In [None]:
from helpers.train import train_cnn
from torch.nn import CrossEntropyLoss

model.train()
num_epochs = 15
loss = train_cnn(
    num_epochs=num_epochs,
    model=model,
    criterion=CrossEntropyLoss(),
    dataloader=train_dataloader,
    optimizer=optimizer,
)

If the model should be trained again later on, it can be saved using the .tar (PyTorch convention for model checkpoints) format:

In [None]:
from torch import save
from datetime import datetime

time = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

# If the model should be trainable
save(
    {
        "epoch": num_epochs,
        "model_state_dict": model.state_dict(),
        "optimizer_state_dict": optimizer.state_dict(),
        "loss": loss,
    },
    f"models/checkpoint-{num_epochs}ep-{time}.tar",
)

And if it shouldn't be trainable, but nonetheless be evaluated, only the model's state_dictionary can be saved with the .pt format (PyTorch convention for finished models):

In [None]:
from torch import save
from datetime import datetime

# If the model will only be used for inference (requires 2-3 times less storage than the other save option)
time = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
save(model.state_dict(), f"models/eval-model-{num_epochs}ep-{time}.pt")

### ...or training it using K-Fold Cross Validation
Another option would be to train the model using K-Fold Cross Validation. This has the advantage of evaluating the model more thoroughly thus reducing the "lucky/unlucky split" problem.

In [None]:
from helpers.kfold import train_cnn_kfold
from torch.nn import CrossEntropyLoss

train_cnn_kfold(
    epoch_per_kfold=10,
    num_kfold=10,
    train_dataset=train_dataset,
    test_loader=test_dataloader,
    criterion=CrossEntropyLoss(),
    learning_rate=3e-4,
    weight_decay=1e-4,
    device=DEVICE,
)

## Evaluating the model's accuracy on test data
Now that a CNN model has been trained it is time to evaluate it on original data, test data as well as data altered with noise and/or overlapping pixels. Using LIME and SHAP the CNN's classification and created heatmap changes will be evaluated.

To load a saved model for evaluation, the following commands can be used:

In [None]:
from torch import load
from helpers.cnn import ConvolutionalNeuralNetwork

model = ConvolutionalNeuralNetwork().to(device=DEVICE)
# model.load_state_dict(load("models/ignoring-0/eval-model-20ep-2024-11-15-20-38-07.pt", weights_only=True))
model.load_state_dict(load("models/kfold-2024-11-19-15-15-12-1.tar", weights_only=True)["model_state_dict"])

Before evaluation with more complex algorithms, it is helpful to visualize the model's performance with a confusion matrix:

In [None]:
from helpers.evaluate import evaluate_cnn
from torch.nn import CrossEntropyLoss

model.eval()
evaluate_cnn(
    criterion=CrossEntropyLoss(),
    test_dataloader=test_dataloader,
    train_dataloader=train_dataloader,
    model=model,
)

### Model evaluation with SHAP and LIME
Okay, now that we see how the trained CNN performs it is time to see *why* the CNN performs how it does.

In [None]:
from helpers.shap import shap_evaluate_cnn

shap_evaluate_cnn(model=model, test_dataset=test_dataset, train_dataset=train_dataset)

In [None]:
from helpers.lime import lime_evaluate_cnn

lime_evaluate_cnn(model=model, test_dataset=test_dataset, device=DEVICE)

### Model evaluation with activation hooks
To evaluate the model's acivations we need to first setup forward hooks on every layer of the model:

In [None]:
from helpers.activations import setup_hooks

activations = setup_hooks(model, {})

With the forward hooks in place, we can choose a random image and put it through the model:

In [None]:
import matplotlib.pyplot as plt
# from numpy.random import randint

# Select random OR specific image
# index = randint(0, len(test_dataset.images))
index = 0
image = test_dataset.images[index : index + 1]
label = test_dataset.labels[index]

# Display the input image
plt.imshow(image.squeeze(), cmap="gray")
plt.title(f"Input Image - Label: {label}")
plt.axis("off")
plt.show()

# Run the model on the sample image
output = model(image)

The implemented hooks have recorded the data from this forward pass. We can now visualize it.

The first cell plots the FC layer at the end of the model. It shows the model's prediction of how likely it is that the model belongs to each of the classes.

The second and third cells will plot the activations of the first and second convolutional layers.

In [None]:
from helpers.activations import plot_fc_activations

# Get activations from FC layer
plot_fc_activations(activations, "linear")

In [None]:
from helpers.activations import plot_conv_activations

# Get activations from the first convolutional layer
plot_conv_activations(activations, "cnn0")

In [None]:
# Get activations from the second convolutional layer
plot_conv_activations(activations, "cnn4")

To further analyze the layer activations of the model, we can generate a random image and run the model on it. Using the Adam optimizer the model will then be adjusted after each epoch to alter it in such a way that the selected layer's activation is maximized. This function will plot images that maximize the activation for different filters of the selected layer:

In [None]:
import helpers.activations
from importlib import reload

reload(helpers.activations)

# Maximize activations for the first ReLU layer
helpers.activations.filter_activation_maximization(
    cnn_layer_num=1,
    model=model,
    input_size=(1, 1, 404, 303),
    lr=0.1,
    iterations=100,
)

In [None]:
# Maximize activations for the second ReLU layer
helpers.activations.filter_activation_maximization(
    cnn_layer_num=5,
    model=model,
    input_size=(1, 1, 404, 303),
    lr=0.1,
    iterations=100,
)