## Classification of Benthic Megafauna using a CNN

Before you can use this notebook, make sure that you have downloaded the basml.py (python) file and put it into your working directory. Also, make sure that you have all of the required libraries installed; libraries can be installed using:


pip install -r requirements.txt

Alternatively, you can git clone the whole repository https://github.com/brett-hosking/BASML_DataChallenge.git

See https://github.com/brett-hosking/BASML_DataChallenge for more details

Download and extract the data: 
'https://github.com/brett-hosking/BASML_DataChallenge/blob/master/data256.zip?raw=true'

Download basml.py file:
'https://github.com/brett-hosking/BASML_DataChallenge/blob/master/basml.py?raw=true'

Download requirements.txt file:
'https://github.com/brett-hosking/BASML_DataChallenge/blob/master/requirements.txt?raw=true'

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten
import basml # some pre-made functions for this data challenge
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report
%matplotlib inline

Load data using pre-made function in the basml mini library

In [2]:
X,Y,classlist = basml.loaddata('data256',labelstr=True) # the folder containing the downloaded data for this challenege
print('data:', np.shape(X))
print('labels:', np.shape(Y))

data: (2094, 256, 256, 3)
labels: (2094, 4)


In [3]:
print(classlist)

['cnidaria', 'amperima', 'tunicate', 'polychaete']


normalise the values, in range [0,1] 

In [4]:
X /= 255.0 


Generate a random training and test. Here we use 80% of the data for training and 20% for test

In [5]:
X,Y = basml.randomiseXY(X,Y)
Xtrain,Ytrain,Xtest,Ytest = basml.ttsplit(X,Y,per=20)

In [6]:
print(np.shape(Xtrain),np.shape(Ytrain), np.shape(Xtest), np.shape(Ytest))

(1675, 256, 256, 3) (1675, 4) (419, 256, 256, 3) (419, 4)


Apply zeromean normalisation

In [7]:
zeromean = np.mean(Xtrain)
Xtrain -=zeromean
Xtest -=zeromean

Build the CNN model

In [8]:
model = keras.Sequential()

#add model layers
model.add(Conv2D(64, kernel_size=3, activation='relu', input_shape=(256,256,3)))
model.add(Conv2D(32, kernel_size=3, activation='relu'))
model.add(Flatten())
model.add(Dense(4, activation='softmax'))


Instructions for updating:
Colocations handled automatically by placer.


In [9]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


Train the CNN using the training data (using the validation_data hyperparameter is optional)

In [10]:
model.fit(Xtrain, Ytrain, epochs=5,validation_data=(Xtest, Ytest))


Train on 1675 samples, validate on 419 samples
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x1f184d940>

Use the model to predict the class of samples in the test set and reformat as one-hot, e.g. [0,1,0,0] for the second class

In [11]:
testlen = len(Ytest)
Ypred= np.array(np.zeros((testlen,4)))
predictions = model.predict(Xtest)
for i in range(testlen):
    Ypred[i][np.argmax(predictions[i])] 	= 1

Use scikit-learn to generate a report

In [13]:
report = classification_report(Ytest, Ypred, target_names=classlist,digits=3,labels=range(len(classlist)))
print(report)

              precision    recall  f1-score   support

    cnidaria      0.899     0.970     0.933       101
    amperima      0.876     0.885     0.880       104
    tunicate      0.790     0.867     0.827       113
  polychaete      0.877     0.703     0.780       101

   micro avg      0.857     0.857     0.857       419
   macro avg      0.861     0.856     0.855       419
weighted avg      0.859     0.857     0.855       419
 samples avg      0.857     0.857     0.857       419



In [14]:
print('accuracy: ',(np.sum(np.argmax(Ytest,axis=1) == np.argmax(Ypred,axis=1))/ len(Ytest)))


accuracy:  0.8568019093078759
