# UCLAIS Tutorial Series Challenge 2

We are proud to present you with our second challenge of the 2022-23 UCLAIS tutorial series: CIFAR10 image classification probkem. You will be introduced to a variety of core concepts in Computer Vision and specifically the implementation of Convolutional Neural Network architecture using `TensorFlow`. 

This Jupyter notebook will guide you through the various general stages involved in end-to-end machine learning projects, including data visualisation, data preprocessing, model selection, model training and model evaluation. Finally, you will get the chance to submit your results to [DOXA](https://doxaai.com/).

If you do not already have a DOXA account, you will want to [sign up](https://doxaai.com/sign-up) first before proceeding.


## Background & Motivation

**CIFAR 10**

![title](https://www.researchgate.net/profile/Sanjiv-Kumar-7/publication/221830068/figure/fig1/AS:339906418233347@1458051413482/A-few-example-images-from-the-CIFAR10-dataset-From-top-row-to-bottom-row-the-image.png)

**Background**: Image classification is one of the fundamental tasks in the domain of Computer Vision. It has revolutionized and propelled technological advancements in the most prominent fields, including the automobile industry, healthcare, manufacturing, and more. Hence, for this challenge, our problem would be to predict (or classify) the class of the given image, which comes from the well-known CIFAR-10 dataset. The images in the dataset belongs to 10 different classes.

**Objective**: Our objective is to be able to predict the class that each image belong to.

**Dataset**: The dataset is based on the following [CIFAR-10 dataset](hhttps://www.cs.toronto.edu/~kriz/cifar.html). We have divided the dataset into **'small dataset'** and **'large dataset'**. The small dataset contains 15,000 images, where each class has 1,500 images. Whereas for the large dataset, it contains 50,000 images in total, where each classs has 5,000 images. The partitioned dataset can be accessed via this [Google Drive](https://drive.google.com/drive/folders/11M8y08hEDTmMpVq3tZCU9ajX7Gui_0nN).

## Installing and Importing Useful Packages

To get started, we will install a number of common machine learning packages.

In [None]:
%pip install numpy pandas matplotlib seaborn scikit-learn doxa-cli gdown

In [None]:
# Import relevant libraries
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import cv2

# Import relevant sklearn classes/functions related to data preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import classification_report, ConfusionMatrixDisplay

# Import relevant TensorFlow classes
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, Dropout
from tensorflow.keras.optimizers import Adam
      
%matplotlib inline

## Data Loading
The first step is to gather the data that we will be using. The data can be downloaded directly via [Google Drive](https://drive.google.com/drive/folders/11M8y08hEDTmMpVq3tZCU9ajX7Gui_0nN) or just by simply running the cell below. 

In [None]:
# Let's download the dataset if we don't already have it!
if not os.path.exists("data"):
    os.makedirs("data", exist_ok=True)

    !gdown https://drive.google.com/drive/folders/11M8y08hEDTmMpVq3tZCU9ajX7Gui_0nN -O ./data --folder

We will be using the small dataset for this tutorial. Feel free to change to the large dataset if you want as it can always improve your model, but at the expense of computing power.

In [None]:
# Load the saved .npz file
data_original = np.load('./data/train_small.npz')
data_original = data_original['data']


In [None]:
# Create a deep copy of the dataset that we can manipulate
# and process while leaving the original intact
# HINT: Use np.load()


In [None]:
# Load the saved label
# HINT: You can use np.genfromtxt()



## Data Understanding & Visualisation
Before we start to train our Machine Learning model, it is important to have a look and understand first the dataset that we will be using. This will provide some insights onto which model, model hyperparameter, and loss function are suitable for the problem we are dealing with. 

In [None]:
# Let's have a look at the shape of our training and testing set




In [None]:
# Print the label that we will be predicting




In [None]:
# Print the label name




Next, let's have a look on a subset of the images we have

In [None]:
# Plot a number of images using matplotlib




## Data Preprocessing 

For this step, there are two basic things we can do before we start building our Neural Network model

**1. Label Encoding**

As shown in the previous section, our label is composed of an integer in the range of 0 to 9. This is not really suitable for our neural network and can be improved by using one hot encoding

**2. Splitting the Training and Validation Set**

The next preprocessing step that need to be done before we can proceed to the training step is to split our dataset into the training set and validation set. The training set will be used for the training of our model while the validation set will be used to compare the performance of different Machine Learning (or Neural Network) models.


In [None]:
# Do One-hot encoding on the label
# HINT: Use OneHotEncoder class provided by scikit-learn




In [None]:
# Split our features and output into a training set and a validation set by 
# HINT: Use train_test_split function from scikit-learn




## Constructing CNN

Now that we have done all of the required preprocessing steps, we can proceed to the most exciting stage, which is constructing the neural network. For this, we will build a Convolution Neural Network which is a neural network architecture that is well known within the Computer Vision domain. 

To construct the Neural Network, we will be using the functionality provided by TensorFlow which greatly simplifies the task of building a neural network. This [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers) provides all the building block we can use to construct a Neural Network model.

In [None]:
model = Sequential([
    #Feel free to add multiple convolutional layers
    
    


    #Flatten
    

    
    
    #Write code for the last output layer

])

In [None]:
# Let's have a look at the summary of the model we've created 
# HINT: Call .summary() on our model




## Training the Model

This is where the magic happens. We will start training our training set with the neural network architecture that we have created before.



In [None]:
# Set all the required hyperparameters before starting the training process 
# HINT: Call .compile()


# Start the training and save its progress in a variable called 'history'
# HINT: Call .fit() 



In [None]:
# Now that we have trained our model, let's plot how our model performed 
# on both the training and validation dataset as the number of iteration increases










From the observation above, our Neural Network architecture seems to have done a good job since the validation loss keeps getting smaller and smaller in tandem with the training loss.

## Analyse the Model

Let's proceed to analyse our model further. The hope is so that we might be able to capture some insight that can be used to create a better CNN architecture.

In [None]:
# Let's do the prediction on validation set and set the 'neuron' that has the largest value as our prediction
# (remember that we have 10 neurons at the end of our CNN architecture)



In [None]:
# Do the same thing for our true label


In [None]:
# Plot a confusion matrix of the true label and predicted label


## Preparing our DOXA Submission

Once we are confident with the performance of our model, we can start deploy our model onto DOXA! 

In [None]:
# Create a submission folder by downloading it 'curling' it from Github
if not os.path.exists("submission"):
  os.makedirs("submission")
  !curl https://raw.githubusercontent.com/UCLAIS/doxa-challenges/main/Challenge-2/submission/doxa.yaml --output submission/doxa.yaml
  !curl https://raw.githubusercontent.com/UCLAIS/doxa-challenges/main/Challenge-2/submission/run.py --output submission/run.py

In [None]:
# Save the CNN model in the submission folder
model.save("submission/model")          

## Submitting to DOXA

Before you can submit to DOXA, you must first ensure that you are enrolled for the challenge on the DOXA website. Visit [the challenge page](https://doxaai.com/competition/uclais-2) and click "Enrol" in the top-right corner.

You can then log in using the DOXA CLI by running the following command:

In [None]:
!doxa login

You can then submit your results to DOXA by running the following command:

In [None]:
!doxa upload submission

Yay! You have (probably) just uploaded your model to DOXA! Let's give DOXA some time for it to evaluate the performance of your model. You will then be able to see how your model perform on the [scoreboard](https://doxaai.com/competition/uclais-2)!