<a href="https://colab.research.google.com/github/YixiangShan/Team3.Workspace/blob/main/FlowerDataMLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Downloading and importing the data

The first step is to download and extract the data. We do this directly from the URL below.

After you run the cell below you should see a folder called 'FlowerData' in the file browser to the left of the screen. Click the folder icon to see your files, and maybe the refresh icon to double check you have the data. The images themselves should be in a folder called 'jpg' within 'FlowerData'.

In [None]:
import os, requests, tarfile

try:
  os.mkdir('FlowerData')
except:
  pass

url = 'https://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz'
r = requests.get(url, allow_redirects=True)
open('FlowerData/FlowerData.tgz', 'wb').write(r.content)

file = tarfile.open('FlowerData/FlowerData.tgz')
file.extractall('FlowerData')
file.close()

# Extract data from images

The next step is to extract the raw data we want from the image files. The full dataset has 17 species or categories overall, arranged in blocks of 80 examples, but we will just work on a selected subset of 4 species.

The code cell below performs the following steps:

1. Get a list of all the filenames in the 'jpg' folder, and pick a selected subset of them.

2. Set a image size, and so calculate the number of features, and create a features matrix $X$ (initially full of zeros).

3. Loop through all the selected images, resize, flatten and add the data to $X$.

4. Make the target vector $y$ using the names of the selected species.

5. Print out some information of the shape of $X$ and $y$.

In [None]:
import glob
import cv2
import numpy as np

img_list_full = sorted(glob.glob('FlowerData/jpg/*.jpg')) # need to sort to ensure in order

full_species_list = ['Daffodil','Snowdrop','LilyValley','Bluebell','Crocus',
                     'Iris','Tigerlily','Tulip','Fritillary','Sunflower',
                     'Daisy','ColtsFoot','Dandelion','Cowslip','Buttercup',
                     'Windflower','Pansy']

n_species = 4

img_list = img_list_full[:80]+img_list_full[480:640]+img_list_full[800:880] # just take the species in offline example
#img_list = img_list_full[:n_species*80]

nrow = 128
ncol = 128

n_features = nrow * ncol * 3
n_samples = len(img_list)
X = np.zeros((n_samples,n_features))


for i,im_name in enumerate(img_list):

    img = cv2.imread(im_name)

    # resize/reshape
    img_resized = cv2.resize(img,(nrow,ncol))

    # flatten into array
    img_flat = img_resized.flatten()

    X[i,:] = img_flat


species_list = ['Daffodil','Tigerlily','Tulip','Daisy']
#species_list = full_species_list[:n_species]

y=[]

for species in species_list:
    y=y+[species]*80


print('features matrix has shape',X.shape)
print('target vector has length',len(y))

features matrix has shape (320, 49152)
target vector has length 320


# Train Multilayer perceptron

Using Scikit-Learn, split the data into training and validation sets, and set up and train a multilayer perceptron with two hidden layers, of size 256 and 64.

This might take a minute or so to run so please be patient! The progress should be displayed at the bottom of the screen.

In [None]:
# split into training / validation
from sklearn.model_selection import train_test_split

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y,random_state=13)

# choose / set up model
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(256,64,),max_iter=1000)

model.fit(Xtrain, ytrain)

# Evaluate and visualise the results

Use the trained model to make a set of predictions on the test set, evaluate the accuracy score, and create and display the confusion matrix.

In [None]:
y_model = model.predict(Xtest)

from sklearn.metrics import accuracy_score
score = accuracy_score(ytest, y_model)
print(score)

from sklearn.metrics import confusion_matrix

mat = confusion_matrix(ytest, y_model)

from sklearn.metrics import ConfusionMatrixDisplay
import matplotlib.pyplot as plt
%matplotlib inline
disp = ConfusionMatrixDisplay(confusion_matrix=mat,display_labels=species_list)
disp.plot()

plt.show()

# Things to try

Hopefully this exercise gave you a good idea of how you might use colab, in particular for machine learning. If you want to, and have time, you can try saving your own version of code (click the "Copy to Drive" button near the top of this window), and try apapting the code to do any of the following:

* Train and evaluate the model using a different random split in the data (change 'random_state' in the function call to 'train_test_split')

* Change the number and size of hidden layers in the multilayer perceptron

* Look at a different subset of the full dataset (remember the files are orgainsed in blocks of 80)