#AIML421 Project

Train a model to classify pictures into one of three classes ('cherry', 'strawberry', 'tomato').  The model is built with the pytorch library, trains on data in one folder, outputs model parameters as a file.  The saved model is called by a second python script which classifies unseen data in a folder 'testdata', so it is important to test this part of the code as well.  

Model design and training can be done in colab with preprocessing and training in one script (train.py), trained model parameters in another script (model.pth) and a third script holding the test procedure (test.py)
1. conduct Exploratory Data Analysis
2. determine pre-processing
3. establish baseline model
4. buil CNN 
5. tune the CNN model:  
6. report

## Exploratory Data Analysis
### Data preparation for analysis and testing


In [9]:
import shutil
# import libraries
import torch
import torchvision
import torchvision.io as tvio
from torchvision import transforms, datasets
from torch.utils import data
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
import os

Data consists of around 4500 JPG images split into three classes:
 - 'cherry', 'tomato', 'strawberry'
Classes are well balanced with around 1500 instances of each class.  
Most of the JPG images are 300x300 pixel, colour images, in a range of different contexts (i.e. against different backgrounds, from different angles and with different lighting).  
The JPG images can be converted to tensors with dimensions [3, 300, 300] representing:
  - three channels, one for each of Red, Green, Blue colour vectors
  - 300 pixels in width, each pixel with integer value between 0:255
  - 300 pixels in height, each pixel with integer value between 0:255

## Preprocessing requirements  
### Transformations  
#### image size
Not all images are 300 x 300 pixels, which can cause issues for matrix multiplication functions within the convolutional neural network (i.e. matrices with incompatible dimensions will raise an error).  Treatment:  enforce image size at 300x300.  This will introduce some distortion, as data is either added to the tensor in order to fill the image size up to 300x300, or data is removed from the tensor to reduce the image size to 300x300.  

#### normalisation
We will normalise the dataset to reduce the effects of extreme values.  

#### colour, shape, texture changes
The classes share some characteristics, in particular:
  - all are fruits
  - all are red, with green stalk
  - cherries and tomatoes are round have smooth skin

So we may also try transforming the images to black and white images, in order to reduce the potentially confusing effect of the colour similarity.  
We may also try shape distorting transformations, in order to reduce the potentially confusing effect of the shape and texture similarities.

### Baseline settings 
#### Transformations  
For all models we apply, at a minimum, the following functional transformations:
  - resize image to 300x300
  - normalise to mean: 0.5, standard deviation: 0.5
  - transform to a pytorch tensor

#### Initial model - multilayer cnn  
The initial model:
Batch size: 4  
Epochs: 10  
Training set: approx 2500 images  
Test set:  approx 1800 images  

Model configuration:  
2D Convolutional layer (channels in = 3, channels out = 6, kernel size = 5)  
2D Max pooling layer (size = 2, stride = 2)  
2D Convolutional layer (channels in = 3, channels out = 6, kernel size = 5)  
2D Max pooling layer (size = 2, stride = 2)  
Linear layer (input size = (4, 82944), output size = 120)  
Linear layer (input size = 120, output size = 84)  
Linear layer (input size = 84, output size = 3)  

All layers except final layer, use ReLU activation.


In [11]:
!python3 train.py

classes:   ('cherry', 'strawberry', 'tomato')
Net() class defined
net instantiated
loss and optimizer functions defined

epoch:  0
[1,    20] loss: 0.011
[1,    40] loss: 0.011
[1,    60] loss: 0.011
[1,    80] loss: 0.011
[1,   100] loss: 0.011
[1,   120] loss: 0.011
[1,   140] loss: 0.011
[1,   160] loss: 0.011
[1,   180] loss: 0.011
[1,   200] loss: 0.011
[1,   220] loss: 0.011
[1,   240] loss: 0.011
[1,   260] loss: 0.011
[1,   280] loss: 0.010
[1,   300] loss: 0.011
[1,   320] loss: 0.011
[1,   340] loss: 0.011
[1,   360] loss: 0.011
[1,   380] loss: 0.011
[1,   400] loss: 0.011
[1,   420] loss: 0.011
[1,   440] loss: 0.011
[1,   460] loss: 0.011
[1,   480] loss: 0.010
[1,   500] loss: 0.010
[1,   520] loss: 0.011
[1,   540] loss: 0.011
[1,   560] loss: 0.010
[1,   580] loss: 0.010
[1,   600] loss: 0.010

epoch:  1
[2,    20] loss: 0.010
[2,    40] loss: 0.010
[2,    60] loss: 0.010
[2,    80] loss: 0.011
[2,   100] loss: 0.010
[2,   120] loss: 0.010
[2,   140] loss: 0.010
[2,   1

In [12]:
!python3 test.py

classes:   ('cherry', 'strawberry', 'tomato')
load PATH =  /home/chad/222 AIML421/Assignments/Project/code/model.pth
Network model loaded
Accuracy of the network on the test images: 50 %
Accuracy for class: cherry is 66.3 %
Accuracy for class: strawberry is 34.9 %
Accuracy for class: tomato is 50.4 %
