# 3-class classification

Dicussion and Analysis (Pulled from Report):

Simplicity of the problem

    The original plan for this experiment was to train these models with basic parameters, then select the top 5 models to hyper-parameter tune after comparing scores. However as described in the results, the models came out near perfect anyway even with differing amounts of images trained between models. Hyper-parameter tuning could’ve still been done to get these models to have a 100% accuracy but was left out because it could lead to potential overturning, making it unscalable as new data comes in. The problem lies within the data rather than the model for two reasons. 
    The first reason is the images themselves. Looking at each class of images, it is easy to distinguish between the different classes using just the visual eye. Demospongiae can be viewed as “sticks”, Hexactinellida as “branches”, and Negative images as “blobs”. According to our data, each class doesn’t have any difficult-to-distinguish features that would require a complex model to aid with classification. This makes this experiment very poor to scale as new classes and data get introduced. This experiment cannot offer any hyper-parameter tuning tips that won’t be too specific to our limited dataset. 

    The second reason is the lack of available data, specifically the Demospongiae and Hexactinellida. There is a very obvious disproportion between Negative and classifiable images having approximately a 40:1 ratio. It was also noted in the results that nearly all the misclassifications were misclassifying Demospongiae and Hexactinellida images with each other. The data needs to have more diverse images with closer similarities across different classes for these models to be more “challenged” and trained properly. Alternatively, new sets of classes could be introduced for the same reason.

Improvements for the Future

    As mentioned in the previous section possible suggestions include, introducing harder-to-distinguish data, more images to better proportion against the Negative images, or new class sets. Adjusting the input data is needed before weeding out the best models to hyper-parameter tune is plausible.
    On the computation side, the models are very restrictive to using only 224x224x3 images and would require rescaling or padding of images. This itself requires time and effort just to keep the images accurate upon adjustment making the process not very user-friendly. FlowCam’s images comes out as a mix of different shapes. This would require an implementation to restructure raw FlowCam images to match the format, so it would save time for the user to need to restructure images outside of the program. Alternatively, research could be done on TensorFlow/keras to adjust the models to have a less strict input.
    
    A helpful improvement would be to use TensorFlow/keras’s image_dataset_from_directory function instead of manually importing each individual image using our current method. This function is space and time efficient, and the computer used was able to load all the images quickly and without memory issue. It also doesn’t require any special conversion before training the models and splits the data, making it easy to use. This was used in the early phase of the experiment, but a misunderstanding with its output caused the switch to the current method. The only concern is, just like the TensorFlow models themselves, the image shape has to be exact otherwise they won’t be loaded without warning.


### Prepare data

In [1]:
import numpy as np                                       #NumPy
import pandas as pd                                      #PANDAS
import tensorflow as tf                                  #Tensor-Flow
from tensorflow import keras                             #Keras
import random
import cv2
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt                          #Image display configurations
# for keras
from classification_models.keras import Classifiers
import os, sys

In [4]:
def load_training_images(neg_limit=-1):
    X = []
    y = []
        
    i=0
    for dir in os.listdir("Data/Images/Negative"):
        for folder in os.listdir("Data/Images/Negative"+"/"+dir):
            x=0
            for image in os.listdir("Data/Images/Negative"+"/"+dir+"/"+folder):
                X.append(cv2.imread("Data/Images/Negative"+"/"+dir+"/"+folder+"/"+image))
                y.append(2)
                i+=1
                x+=1
                if x==4:break #Uncomment if using Sample method
        
    #Randomly remove images random to reach limit - Uncomment if using Limit method
    #random.shuffle(X);
    #if(neg_limit!=-1):
    #    for x in range(i-neg_limit):
    #        X.pop()
    #        y.pop()
    print(str(len(X))+" Negative images loaded")
    
    i=0
    for image in os.listdir("Data/Images/Demospongiae"):
        #image_arr = cv2.imread("Data/Images/Demospongiae/"+image)
        #X.append(image_arr.reshape(-1, 224, 224, 1))
        X.append(cv2.imread("Data/Images/Demospongiae/"+image))
        y.append(0)
        i+=1
    print(str(i)+" Demospongiae images loaded")
    
    i=0
    for image in os.listdir("Data/Images/Hexactinellida"):
        X.append(cv2.imread("Data/Images/Hexactinellida/"+image))
        y.append(1)
        i+=1
    print(str(i)+" Hexactinellida images loaded")
    
    return X, y

In [7]:
#152817 total negative images
#S4 = 7204 negative images
#S6 = 10806 negative images
X, y = load_training_images()

10806 Negative images loaded
3481 Demospongiae images loaded
1378 Hexactinellida images loaded


In [None]:
print(len(X))
len(y)

104859


104859

In [23]:
X = np.asarray(X)

In [24]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
y_train = keras.utils.to_categorical(y_train,3)
y_test = keras.utils.to_categorical(y_test,3)

### Source 1 models: https://pypi.org/project/image-classifiers/

In [26]:
key_output = ""
i=0
for key in Classifiers.models.keys():
    i+=1
    key_output += key+"\t"
    if(i==3): 
        key_output += "\n"
        i=0
print(key_output)

resnet18	resnet34	resnet50	
resnet101	resnet152	seresnet18	
seresnet34	seresnet50	seresnet101	
seresnet152	seresnext50	seresnext101	
senet154	resnet50v2	resnet101v2	
resnet152v2	resnext50	resnext101	
vgg16	vgg19	densenet121	
densenet169	densenet201	inceptionresnetv2	
inceptionv3	xception	nasnetlarge	
nasnetmobile	mobilenet	mobilenetv2	



#### ResNet18 COMPLETED

In [27]:
resnet18, preprocess = Classifiers.get('resnet18')

In [28]:
model = resnet18(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [29]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train) #2hrs
model.save("models/resnet18")

  13/2622 [..............................] - ETA: 1:36:48 - loss: 0.3157 - accuracy: 0.8990


KeyboardInterrupt



#### MobileNet COMPLETED

In [59]:
mobilenet, preprocess = Classifiers.get('mobilenet')

In [60]:
model = mobilenet(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [61]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/mobilenet_S6")  #2hrs

INFO:tensorflow:Assets written to: models/mobilenet_S6\assets


#### SE-ResNet18 COMPLETED

In [62]:
seresnet18, preprocess = Classifiers.get('seresnet18')

In [63]:
model = seresnet18(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [65]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/seresnet18_S6") #4hrs

INFO:tensorflow:Assets written to: models/seresnet18_S6\assets


#### InceptionV3 COMPLETED

In [21]:
inceptionv3, preprocess = Classifiers.get('inceptionv3')

In [22]:
model = inceptionv3(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [23]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/inceptionv3") #5.5hrs

INFO:tensorflow:Assets written to: models/inceptionv3\assets


#### Xception COMPLETED

In [32]:
xception, preprocess = Classifiers.get('xception')

In [33]:
model = xception(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [34]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/xception") #6.5 hrs

INFO:tensorflow:Assets written to: models/xception\assets


#### DenseNet121 COMPLETED

In [29]:
densenet121, preprocess = Classifiers.get('densenet121')

In [32]:
model = densenet121(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [33]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/densenet121_S4") #9hrs

INFO:tensorflow:Assets written to: models/densenet121_S4\assets


#### Inception ResNet V2

In [34]:
inceptionresnetv2, preprocess = Classifiers.get('inceptionresnetv2')

In [35]:
model = inceptionresnetv2(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [None]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/inceptionresnetv2_S4") #12hrs

 16/302 [>.............................] - ETA: 4:27:04 - loss: 0.5052 - accuracy: 0.8281

#### ResNext50

In [None]:
resnext50, preprocess = Classifiers.get('resnext50')

In [None]:
model = resnext50(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [None]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/resnext50_S4") #16hrs

#### SE-ResNeXt50

In [None]:
seresnext50, preprocess = Classifiers.get('seresnext50')

In [None]:
model = seresnext50(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [None]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/seresnext50_S4") #20hrs

#### ~~SE-Net154~~ (Crashes CPU after 1 batch (tested on 100k negatives))!

#### VGG16

In [None]:
vgg16, preprocess = Classifiers.get('vgg16')

In [None]:
model = vgg16(input_shape=(224,224,3), weights='imagenet', include_top=False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [None]:
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train)
model.save("models/vgg16") #22hrs

### Source 2 models: https://keras.io/api/applications/

#### VGG16 COMPLETE

In [30]:
model = tf.keras.applications.VGG16(input_shape = (224, 224,3), include_top = False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [31]:
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])

In [33]:
model.fit(X_train,y_train)
model.save("models/VGG16_S4_dispose") #5hrs

  15/2622 [..............................] - ETA: 4:21:12 - loss: 433.5692 - accuracy: 0.7750

KeyboardInterrupt: 

#### NASNetMobile COMPLETE

In [34]:
model = tf.keras.applications.NASNetMobile(input_shape = (224, 224,3), include_top = False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [35]:
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])

In [36]:
model.fit(X_train,y_train)
model.save("models/NASNetMobile_S4_dispose") #3.5hrs

  10/2622 [..............................] - ETA: 2:55:20 - loss: 0.1747 - accuracy: 0.9125


KeyboardInterrupt



#### EfficientNetB0 COMPLETE

In [37]:
model = tf.keras.applications.EfficientNetB0(input_shape = (224, 224,3), include_top = False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [38]:
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])

In [39]:
model.fit(X_train,y_train)
model.save("models/EfficientNetB0_S4_dispose")

  14/2622 [..............................] - ETA: 2:52:04 - loss: 0.2020 - accuracy: 0.9107


KeyboardInterrupt



#### ~~EfficientNetV2B0~~ Possibly removed

In [40]:
model = tf.keras.applications.EfficientNetV2B0(input_shape = (224, 224,3), include_top = False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

AttributeError: module 'tensorflow.keras.applications' has no attribute 'EfficientNetV2B0'

In [41]:
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])

In [None]:
model.fit(X_train,y_train)
model.save("models/EfficientNetV2B0_S4")

#### ResNet50 COMPLETED

In [None]:
model = tf.keras.applications.ResNet50(input_shape = (224, 224,3), include_top = False)
model = tf.keras.Sequential([model,
                                 tf.keras.layers.GlobalAveragePooling2D(),
                                 tf.keras.layers.Dense(3, activation="softmax")                                     
                                ])

In [None]:
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])

In [None]:
model.fit(X_train,y_train)
model.save("models/ResNet50_S4_dispose")

   6/2622 [..............................] - ETA: 6:12:40 - loss: 0.3111 - accuracy: 0.8906


KeyboardInterrupt



### Score

In [2]:
score = {}
def get_predicted_class(prediction):
    x = 0
    for i in range(1,3):
        if prediction[i] > prediction[x]:
            x = i
    return ["Demospongiae", "Hexactinellida", "Negative"][x]
    
def get_label_index(label):
    i = 0
    while label[i] != 1:
        i += 1
    return ["Demospongiae", "Hexactinellida", "Negative"][i]

def get_score(predictions):
    total_correct = 0
    for x in predictions:
        if x[0] == x[1]:
            total_correct += 1
        else:
            print("labeled "+x[1]+" as "+x[0])
    score = total_correct/len(predictions)*100
    print("Accuracy: "+str(round(score,2))+"%")
    return score

def score_model(model_name):
    predictions = []
    predict = keras.models.load_model("models/"+model_name).predict(X_test)
    for i, prediction in enumerate(predict):
        entry = []
        entry.append(get_predicted_class(prediction))
        entry.append(get_label_index(y_test[i]))

        predictions.append(entry)
    score[model_name] = get_score(predictions)

In [3]:
score_model("resnet18_S4") #99.94%

NameError: name 'keras' is not defined

In [15]:
score_model("mobilenet_S6") #99.96%

labeled Demospongiae as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Negative as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
Accuracy: 99.96%


In [16]:
score_model("seresnet18_S6") #99.95%

labeled Hexactinellida as Negative
labeled Hexactinellida as Demospongiae
labeled Negative as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
Accuracy: 99.95%


In [17]:
score_model("inceptionv3") #99.97%

labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Negative as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
Accuracy: 99.97%


In [18]:
score_model("xception") #99.94%

labeled Hexactinellida as Demospongiae
labeled Demospongiae as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Demospongiae as Negative
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Negative
labeled Demospongiae as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Demospongiae as Negative
Accuracy: 99.94%


In [19]:
score_model("densenet121_S4") #99.96% (long runtime)

labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Negative as Demospongiae
labeled Negative as Demospongiae
labeled Negative as Demospongiae
labeled Negative as Demospongiae
labeled Negative as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
Accuracy: 99.96%


In [13]:
score_model("VGG16_S4") #98.91% (long runtime)

labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Negative as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Negative as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Negative as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Negative
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
la

In [14]:
score_model("NASNetMobile_S4") #1.34%

labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as Hexactinellida
labeled Negative as 

In [15]:
score_model("EfficientNetB0_S4") #99.92% (multiple warnings)

labeled Negative as Demospongiae
labeled Negative as Demospongiae
labeled Demospongiae as Hexactinellida
labeled Demospongiae as Hexactinellida
labeled Negative as Demospongiae
labeled Negative as Demospongiae
labeled Negative as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Negative as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Demospongiae as Hexactinellida
labeled Hexactinellida as Demospongiae
labeled Negative as Demospongiae
labeled Hexactinellida as Demospongiae
labeled Hexactinellida as Demospongiae
Accuracy: 99.92%


In [16]:
score_model("ResNet50_S4") #95.87%

labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Hexactinellida as Negative
labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Demospongiae as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Demospongiae as Negative
labeled Demospongiae as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Demospongiae as Negative
labeled Demospongiae as Negative
labeled Hexactinellida as Negative
labeled Hexacti

In [17]:
score

{'VGG16_S4': 98.91283616250239,
 'NASNetMobile_S4': 1.3446500095365248,
 'EfficientNetB0_S4': 99.92370780087737,
 'ResNet50_S4': 95.87068472248713}