# MonkeyMadness

## Start by enabling the GPU: (Deep learning requires many computations and the training time will be much faster if we do it on the GPU)
1. Go to "RunTime" menu and select 
2. "Change runtime type." A dialog box will appear where you can choose the runtime type and hardware accelerator.
3. Select "GPU" as the hardware accelerator and click "Save."

In [None]:
# Check if GPU is available
!nvidia-smi

import torch

if torch.cuda.is_available():
    print(f"GPU is available: {torch.cuda.get_device_name(0)}")
else:
    print("GPU is not available.")

# Setup environment and enable GPU

In [None]:
!git clone https://github.com/LiU-AI-Society/MonkeyMadness.git
import os
HOME = os.getcwd()
print(HOME)
%cd {HOME}/Classification-Game

%pip install req_colab.txt

# Supervised Learning

In this task, you will use supervised learning to classify images of monkeys. Supervised learning is one of the three main areas of machine learning.  

Supervised learning is like teaching the computer to recognize monkeys by using flashcards.  

1. **Labeled Data**: You show the computer images of monkeys and tell it which species each one is (e.g., "This is a chimpanzee").  
2. **Training**: The computer studies these examples to learn patterns.  
3. **Comparison**: It makes a prediction of the image and then compares it to the labeled data. It will be penalized based on how "wrong" it is.  
4. **Prediction**: Once trained, it can look at a new image and guess the species of the monkey based on what it learned.  

It’s called “supervised” because the model learns under guidance (the labeled data).  

## There are some main ingredients:

1. **Data**  
    - The data has to be labeled, i.e., someone has to manually note down what monkey is present in the image.  

2. **Model**  
    - It is the model that makes predictions. It does so by looking at the input and making a guess about what monkey is present. The model consists of mathematical operations and weights (these are adjustable). In this task we will use so called Convolutional Neural Network which are very good at handling image data.

3. **Loss Function**  
    - Somehow we need to tell the computer how wrong the guess is.  
    - The model will output probabilities for each monkey class. Let’s say it sees an image of an orangutan, then it could perhaps output the following:  

        ```markdown
        Chimpanzee: 0.70 (70%)  
        Orangutan: 0.20 (20%)  
        Other monkeys: 0.10 (10%)  
        ```

        The correct label is orangutan (100% probability), but the model guessed only 20% for this class.

        Using a loss function like Cross-Entropy Loss, the score is calculated to show how wrong the prediction is
        The closer the prediction is to 1 (100%), the smaller the loss. 
        The goal is to adjust the model to make higher confidence predictions for the correct class.
4. **Optimizer**
    - Somehow we need to adjust the model to perform better. This is done by calculating, based on the loss how the model should be adjusted.

# What is the Goal of the hackathon?

You will be provided with a data set. This will be divided into a training set and validation set. The model will learn on the training set and you will use the validation set to give an estimate on how it would perform on unseen data. The goal is to tune and experiment with different parameters and architectures to get a model that performs as well as possible on a test set (which you do not have acces to). In the end you will send your best model to us and we can test it (live testing). Due to time and to mimic real life scenario we have limited your resources and thus you will have to think which improvements you will implement. The improvements are available in the shop!

# What is a convolution? 


The convolution is like sliding a small "window" (called a kernel or filter) over an image to look for patterns.

Here's how it works step-by-step:

Kernel: Think of this as a small grid of numbers (e.g., 3x3).
Slide and Multiply: Place the kernel on part of the image. Multiply the numbers in the kernel with the corresponding numbers in the image under it.
Sum Up: Add the results of the multiplication together. This gives one number for that position.
Move the Kernel: Slide the kernel to the next part of the image and repeat.
The result is a new image (called a feature map) that highlights certain patterns like edges or textures.



In [None]:
import torch
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt
import torch.nn.functional as F
from convolution import perform_convolution, plot_images

# Load the image
image_path = 'Monkey/training/training/n7/n7023.jpg'
image = Image.open(image_path).convert("RGB")

# Transform the image to a tensor
transform = transforms.ToTensor()
image_tensor = transform(image).unsqueeze(0)  # Add batch dimension

# Define vertical and horizontal line detection kernels
vertical_line_kernel = torch.tensor([[-1.0, 0.0, 1.0],
                                     [-1.0, 0.0, 1.0],
                                     [-1.0, 0.0, 1.0]], dtype=torch.float32)

horizontal_line_kernel = torch.tensor([[-1.0, -1.0, -1.0],
                                       [0.0,  0.0,  0.0],
                                       [1.0,  1.0,  1.0]], dtype=torch.float32)

# Perform the convolutions
horizontal_lines_image = perform_convolution(horizontal_line_kernel, image)
vertical_line_image = perform_convolution(vertical_line_kernel, image)

# plot the result

plot_images(image, vertical_line_image, horizontal_lines_image)

# Dataset. 
### You will use a monkey dataset consisting of ca 1000 images of monkey. The goal is to be able to classify them.
### There are ten monkey classes


In [None]:
from torchvision import transforms, utils
from Dataset import CustomImageDataset, MonkeyImageDataset

NUM_OF_CLASSES = 10
IMAGE_SIZE = (64, 64) # DO NOT ALTER THIS PARAMETER

DATA_PERCENTAGE = 0.7
transform = transforms.Compose([
    #Randomly flip the images vertically
    #transforms.RandomVerticalFlip(p=0.2),  # Randomly flip the image vertically with 20% probability
    #transforms.RandomHorizontalFlip(p=0.2),  # Randomly flip the image horizontally with 20% probability
    #transforms.RandomRotation(degrees=15),  # Rotate the image randomly within a 15-degree range
    #transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),  # Randomly change brightness, contrast, etc.
    transforms.ToTensor(),
    transforms.Resize((IMAGE_SIZE[0], IMAGE_SIZE[1])), 

    #for imagenet
    #transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
dataset = MonkeyImageDataset('Monkey/training/training', transform, NUM_OF_CLASSES, data_percentage = DATA_PERCENTAGE )
dataset.visualize(5)
#dataset.visualize_all_classes()


# Let's define our model! 
### Lets build a Convolutional Neural Network (CNN). It uses convolutions (one can think of it as filters) to learn the different classes

In [75]:
#define model

import torch.nn as nn
import torch.nn.functional as F
import torch

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        self.conv = nn.Conv2d(2, 1, kernel_size=kernel_size, padding=kernel_size // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Average and Max pooling
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        # Concatenate pooled outputs
        combined = torch.cat([avg_out, max_out], dim=1)
        # Convolve and apply sigmoid
        attention_map = self.sigmoid(self.conv(combined))
        return x * attention_map


class MonkeyNET(nn.Module):
    def __init__(self, num_classes=10, input_size=(500, 500)):
        super(MonkeyNET, self).__init__()
        
        # First convolutional layer: 3 input channels (RGB), 32 output channels, kernel size 5, padding 2 to preserve size
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, padding=1)
        #self.attention = SpatialAttention()
        # Calculate the size of the fully connected layer dynamically
        self.fc_input_size = self._get_fc_input_size(input_size)
        self.fc = nn.Linear(self.fc_input_size, num_classes)  # Adjusted for the final size after pooling
        
        # Prediction layer
        #self.prediction = nn.Linear(16, num_classes)
        
    def _get_fc_input_size(self, input_size):
        x = torch.zeros(1, 3, *input_size)  # Create a dummy input tensor
        #x = self.attention(x)
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, kernel_size=(4, 4), stride=4)

        x = F.relu(self.conv2(x))

        x = F.max_pool2d(x, kernel_size=(8, 8), stride=8)


        return x.numel()  # Total number of elements after conv layers
    
    def forward(self, x):
        # First conv -> ReLU -> Max Pooling
        x = F.relu(self.conv1(x))

        #x = self.attention(x)

        x = F.max_pool2d(x, kernel_size=(4, 4), stride=4)

        x = F.relu(self.conv2(x))

        x = F.max_pool2d(x, kernel_size=(8, 8), stride=8)




        # Flatten the tensor for fully connected layer
        x = x.view(x.size(0), -1)  # Output: (batch_size, 128 * 16 * 16) for 500x500 input

        # Fully connected layer -> ReLU
        x = F.relu(self.fc(x))

        # Output layer (no activation, to be combined with a loss function later)
        #x = self.prediction(x)
        # Optionally remove Softmax from here
        return x
    



# See the summary of the model in a compact way:

In [76]:
from torchsummary import summary
import torchlens as tl
import graphviz
from IPython.display import Image



model = MonkeyNET(num_classes=NUM_OF_CLASSES, input_size=IMAGE_SIZE)

summary(model, (3, IMAGE_SIZE[0], IMAGE_SIZE[1]))

# ---- Uncomment the lines below to get a visual of the model ------
#import torchlens as tl
#import graphviz
#graphviz.set_jupyter_format('png')
#x = torch.rand(1, 3, IMAGE_SIZE[0], IMAGE_SIZE[1])
#model_hist = tl.log_forward_pass(model,x, vis_opt='unrolled')
#model_graph.visual_graph

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 64, 64]           1,792
            Conv2d-2           [-1, 32, 16, 16]          18,464
            Linear-3                   [-1, 10]           1,290
Total params: 21,546
Trainable params: 21,546
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 2.06
Params size (MB): 0.08
Estimated Total Size (MB): 2.19
----------------------------------------------------------------


In [82]:
from train import train, training_info
from test_model import test
import torch.optim
from torch.utils.data import DataLoader, random_split
import torch.nn as nn
import torchvision.models as models
import datetime


# Check if CUDA (GPU) is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using CUDA (GPU)")

# Check if MPS (Metal Performance Shaders for macOS) is available (for Apple Silicon Macs)
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Using MPS (Apple GPU)")

# Fallback to CPU if neither CUDA nor MPS is available
else:
    device = torch.device("cpu")
    print("Using CPU")

START_EPOCH = 0
EPOCHS = 30 # how long should the model train
MODEL_NAME = "BaseLie "

current_datetime = datetime.datetime.now()
ID = f"{current_datetime.hour:02}:{current_datetime.minute:02}"
print(ID)
LR = 0.01 # how "much" should the model learn
BATCH_SIZE = 8# how many images should the model see before updating

model_info = {
            'epochs' : EPOCHS,
            'batch_size' : BATCH_SIZE,
            'lr' : LR,
            'ID' : ID,
            'model_name' : MODEL_NAME

}

dataset_size = len(dataset)
train_size = int(0.8 * dataset_size)  # 80% for training
val_size = dataset_size - train_size   # 20% for validation
if NUM_OF_CLASSES > 2:
    criterion = nn.CrossEntropyLoss()
else:
    criterion = nn.BCELoss()
#criterion = DistillationLoss()


# Split the dataset
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

# Create DataLoaders for training and validation sets
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=1, shuffle=False)

# Initialize the model
model = MonkeyNET(num_classes=NUM_OF_CLASSES, input_size=IMAGE_SIZE)


# Create optimizers for the model. This will try to find the optimal parameters in the model i.e this adjusts the model to improve the loss
optimizer = torch.optim.SGD(model.parameters(), lr=LR)



# Move models to gpu and train 
model.to(device)
model, t_loss, t_acc, v_loss, v_acc = train(model, train_loader, val_loader, optimizer, criterion, device, start_epoch=START_EPOCH, num_epochs=EPOCHS, model_name=MODEL_NAME, unique_id=ID)


model_info['t_loss'] = t_loss
model_info['t_acc'] = t_acc

model_info['v_loss'] = v_loss
model_info['v_acc'] = v_acc

training_info(model_info=model_info)

# see metrics on the validation set

acc = test(model=model, testloader=val_loader, device=device, model_name=MODEL_NAME, unique_id=ID)

Using MPS (Apple GPU)
15:54
Starting training on device: mps

Epoch: 0


KeyboardInterrupt: 

# If you are happy with the results you can save the model. Give it a good name like your group name and a brief description e.g 'GroupBananasBest'

In [None]:
import os # Assuming model1 is defined and trained
save_model_name = "Pretrained_ResNET"

# Define the directory to save the model
model_dir = 'saved_models'

# Ensure the directory exists
if not os.path.exists(model_dir):
    os.makedirs(model_dir)

# Set the model to evaluation mode
model.eval()

# Define a dummy input tensor matching the input shape expected by the model
# Here we assume the model expects a 1x3x224x224 image (batch size 1, 3 channels, 224x224 pixels)
# You should adjust the shape based on your actual model's input
dummy_input = torch.randn(1, 3, IMAGE_SIZE[0], IMAGE_SIZE[1]).to(device)

# Export the model to ONNX format
onnx_path = f"{model_dir}/{save_model_name}.onnx"
torch.onnx.export(model, dummy_input, onnx_path, verbose=True)


print(f"Model saved to {onnx_path}")

# See all performed experiments to compare models

In [None]:
import matplotlib.pyplot as plt
from train import plot_experiments

plot_experiments('training_metrics')

# Send your best model to us for us to run it on a test set!