Project Data Description - Image Classification 

You are provided with a dataset of images of plant seedlings at various stages of grown. Each image has a filename that is its
unique id. The dataset comprises 12 plant species. The goal of the project is to create a classifier capable of determining a plant's
species from a photo.

Context:
* Can you differentiate a weed from a crop seedling?
* The ability to do so effectively can mean better crop yields and better stewardship of the environment.
* The Aarhus University Signal Processing group, in collaboration with University of Southern Denmark, has recently released a dataset containing images of unique plants belonging to 12 species at several growth stages

Objective:
* To implement the techniques learnt as a part of the course.

Learning Outcomes:
* Pre-processing of image data.
* Visualization of images.
* Building CNN.
* Evaluate the Model.
* The motive of the project is to make the learners capable to handle images/image classification problems, during this process you shouldalso be capable to handle real image files, not just limited to a numpy array of image pixels.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
trainFiles = '/content/drive/MyDrive/plant-seedlings-classification/train'
testFiles = '/content/drive/MyDrive/plant-seedlings-classification/test'

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

from glob import glob
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
from numpy.random import seed
seed(101)
import tensorflow as tf
tf.random.set_seed(101)

import pandas as pd
import numpy as np

import tensorflow

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

import os
import cv2

import imageio
import skimage
import skimage.io
import skimage.transform

from sklearn.utils import shuffle
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
import itertools
import shutil
import matplotlib.pyplot as plt
%matplotlib inline

Sample & Image Size

In [None]:
# Sample size
SAMPLE_SIZE = 250

# Image size
IMAGE_SIZE = 96

In [None]:
trainImg = []
trainLabel = []
num = len(trainFiles)

testImg = []
testFileName = []
num = len(testFiles)

In [None]:
data_path = '/content/drive/MyDrive/plant-seedlings-classification/train'
os.listdir(data_path)

['Loose Silky-bent',
 'Small-flowered Cranesbill',
 'Common wheat',
 'Shepherds Purse',
 'Scentless Mayweed',
 'Sugar beet',
 'Fat Hen',
 'Maize',
 'Common Chickweed',
 'Black-grass',
 'Charlock',
 'Cleavers']

In [None]:
# Create a new directory to store all Labels
Label_dir = 'Label_dir'
os.mkdir(Label_dir)

Combine Labels within a folder

In [None]:
 # create destination path to image
folder_list = os.listdir(data_path)

for folder in folder_list:
    
    # create a path to the folder
    path = data_path + '/' + str(folder)

    # create a list of all files in the folder
    file_list = os.listdir(path)

    for fname in file_list:

        
        src = os.path.join(path, fname) 
        new_fname = str(folder) + '_' + fname
        dst = os.path.join(Label_dir, new_fname)
        shutil.copyfile(src, dst)

In [None]:
len(os.listdir('Label_dir'))

3794

In [None]:
# Get a list of all images in the Label_dir folder.
image_list = os.listdir('Label_dir')
df_data = pd.DataFrame(image_list, columns=['Label_id'])

df_data.head()

Unnamed: 0,Label_id
0,Cleavers_1a4fe0d36.png
1,Maize_b62a6a471.png
2,Small-flowered Cranesbill_869c32954.png
3,Loose Silky-bent_0f9d5c657.png
4,Scentless Mayweed_f8c96bd65.png


In [None]:
#  Extracting the class name from the file name of each image
def extract_target(x):
    a = x.split('_')
    target = a[0]
    
    return target


# Create a new column called 'target'
df_data['target'] = df_data['Label_id'].apply(extract_target)

df_data.head()

Unnamed: 0,Label_id,target
0,Cleavers_1a4fe0d36.png,Cleavers
1,Maize_b62a6a471.png,Maize
2,Small-flowered Cranesbill_869c32954.png,Small-flowered Cranesbill
3,Loose Silky-bent_0f9d5c657.png,Loose Silky-bent
4,Scentless Mayweed_f8c96bd65.png,Scentless Mayweed


In [None]:
df_data.shape

(3794, 2)

In [None]:
# Determine the class distribution

df_data['target'].value_counts()

Loose Silky-bent             523
Common Chickweed             488
Scentless Mayweed            412
Small-flowered Cranesbill    396
Fat Hen                      380
Charlock                     312
Sugar beet                   308
Cleavers                     229
Black-grass                  210
Shepherds Purse              184
Common wheat                 176
Maize                        176
Name: target, dtype: int64

In [None]:
# Train test split
y = df_data['target']

df_train, df_val = train_test_split(df_data, test_size=0.10, random_state=101, stratify=y)

print(df_train.shape)
print(df_val.shape)

(3414, 2)
(380, 2)


In [None]:
# Train set class distribution

df_train['target'].value_counts()

Loose Silky-bent             471
Common Chickweed             439
Scentless Mayweed            371
Small-flowered Cranesbill    356
Fat Hen                      342
Charlock                     281
Sugar beet                   277
Cleavers                     206
Black-grass                  189
Shepherds Purse              166
Common wheat                 158
Maize                        158
Name: target, dtype: int64

In [None]:
# Val set class distribution

df_val['target'].value_counts()

Loose Silky-bent             52
Common Chickweed             49
Scentless Mayweed            41
Small-flowered Cranesbill    40
Fat Hen                      38
Charlock                     31
Sugar beet                   31
Cleavers                     23
Black-grass                  21
Shepherds Purse              18
Maize                        18
Common wheat                 18
Name: target, dtype: int64

In [None]:
# folder_list = os.listdir(data_path)

classes = ['Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 'Fat Hen', 'Loose Silky-bent', 'Maize', 
               'nonsegmentedv2', 'Scentless Mayweed', 'Shepherd_Purse', 'Small-flowered Cranesbill', 'Sugar beet']

In [None]:
# Check to make sure the folders has been created
os.mkdir('base_dir')
os.mkdir('base_dir/train_dir')
os.mkdir('base_dir/test_dir')

for x in folder_list:
    os.mkdir('base_dir/train_dir/'+x)
    os.mkdir('base_dir/test_dir/'+x)

In [None]:
# Set the id as the index in the dataframe
df_data.set_index('Label_id', inplace=True)

In [None]:
src = 'Label_dir/'
dst = 'base_dir/train_dir/'

for x in df_train['Label_id']:
    a = x.split('_')
#     print(a[1])
    if a[1] == 'Purse':
        a[0] = a[0]+'_'+a[1]
        a[1] = a[2]
    shutil.copyfile(src+x, dst+a[0]+'/'+a[1])

In [None]:
src = 'Label_dir/'
dst = 'base_dir/test_dir/'

for x in df_val['Label_id']:
    a = x.split('_')
#     print(a[1])
    if a[1] == 'Purse':
        a[0] = a[0]+'_'+a[1]
        a[1] = a[2]
    shutil.copyfile(src+x, dst+a[0]+'/'+a[1])

In [None]:
import os, re, glob
import cv2
import numpy as np
from sklearn.model_selection import train_test_split
  
groups_folder_path = './base_dir/train_dir/'
categories = ['Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 'Fat Hen', 'Loose Silky-bent', 'Maize', 
                'Scentless Mayweed', 'Shepherd_Purse', 'Small-flowered Cranesbill', 'Sugar beet']
 
num_classes = len(categories)
  
image_w = 28
image_h = 28
  
X = []
Y = []
  
for idex, categorie in enumerate(categories):
    label = [0 for i in range(num_classes)]
    label[idex] = 1
    Label_dir = groups_folder_path + categorie + '/'
  
    for top, dir, f in os.walk(Label_dir):
        for filename in f:
            print(Label_dir+filename)
            img = cv2.imread(Label_dir+filename)
            img = cv2.resize(img,dsize=(IMAGE_SIZE, IMAGE_SIZE), interpolation=cv2.INTER_AREA)
            X.append(img/256)
            Y.append(label)
 
X = np.array(X)
Y = np.array(Y)
 
X_train, X_test, Y_train, Y_test = train_test_split(X,Y)
xy = (X_train, X_test, Y_train, Y_test)

./base_dir/train_dir/Black-grass/7b71d3e65.png
./base_dir/train_dir/Black-grass/7b72b398d.png
./base_dir/train_dir/Black-grass/f7f671785.png
./base_dir/train_dir/Black-grass/6a19547c5.png
./base_dir/train_dir/Black-grass/ed0bc2794.png
./base_dir/train_dir/Black-grass/f0a7c51a2.png
./base_dir/train_dir/Black-grass/34a672a63.png
./base_dir/train_dir/Black-grass/d3ff1a639.png
./base_dir/train_dir/Black-grass/6104de96e.png
./base_dir/train_dir/Black-grass/75ef53b3b.png
./base_dir/train_dir/Black-grass/c1a625098.png
./base_dir/train_dir/Black-grass/1e49633e0.png
./base_dir/train_dir/Black-grass/fab809601.png
./base_dir/train_dir/Black-grass/2aa60045d.png
./base_dir/train_dir/Black-grass/afaade548.png
./base_dir/train_dir/Black-grass/0ace21089.png
./base_dir/train_dir/Black-grass/0d1a9985f.png
./base_dir/train_dir/Black-grass/d0ad9c78b.png
./base_dir/train_dir/Black-grass/86dfe670c.png
./base_dir/train_dir/Black-grass/e7d7e6351.png
./base_dir/train_dir/Black-grass/a8ab1ff26.png
./base_dir/tr

In [None]:
# X_train.shape
np.save("./img_data.npy", xy)

MaxPooling 

In [None]:
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model

# adapt this if using `channels_first` image data format
input_img = Input(shape=(IMAGE_SIZE, IMAGE_SIZE,3))

x = Conv2D(32, (3, 3), activation='relu', padding='same',kernel_initializer = 'he_uniform',name='encode_1')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same',kernel_initializer = 'he_uniform',name='encode_2')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (7, 7, 32)

x = Conv2D(32, (3, 3), activation='relu', padding='same',kernel_initializer = 'he_uniform',name='decode_1')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same',kernel_initializer = 'he_uniform',name='decode_2')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)

# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)

autoencoder.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 96, 96, 3)]       0         
_________________________________________________________________
encode_1 (Conv2D)            (None, 96, 96, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 48, 48, 32)        0         
_________________________________________________________________
encode_2 (Conv2D)            (None, 48, 48, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 24, 24, 32)        0         
_________________________________________________________________
decode_1 (Conv2D)            (None, 24, 24, 32)        9248      
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 48, 48, 32)        0     

Complie Data

In [None]:
# compile
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

## train
from keras.callbacks import TensorBoard

history = autoencoder.fit(X_train, X_train,
                epochs=30,
                batch_size=32,
                shuffle=True,
                validation_data=(X_test, X_test))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [None]:
train_X = autoencoder.predict(X_train)
train_X.shape

(2436, 96, 96, 3)

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, GlobalMaxPooling2D, InputLayer, BatchNormalization, GlobalMaxPool2D, GlobalAveragePooling2D
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam

kernel_size = (3,3)
pool_size= (2,2)
first_filters = 32
second_filters = 64
third_filters = 128

dropout_conv = 0.3
dropout_dense = 0.3


model = Sequential()
model.add(Conv2D(first_filters, kernel_size, activation = 'relu', 
                 input_shape = train_X.shape[1:]))
model.add(Conv2D(first_filters, kernel_size, activation = 'relu'))
model.add(Conv2D(first_filters, kernel_size, activation = 'relu'))
model.add(MaxPooling2D(pool_size = pool_size)) 
model.add(Dropout(dropout_conv))

model.add(Conv2D(second_filters, kernel_size, activation ='relu'))
model.add(Conv2D(second_filters, kernel_size, activation ='relu'))
model.add(Conv2D(second_filters, kernel_size, activation ='relu'))
model.add(MaxPooling2D(pool_size = pool_size))
model.add(Dropout(dropout_conv))

model.add(Conv2D(third_filters, kernel_size, activation ='relu'))
model.add(Conv2D(third_filters, kernel_size, activation ='relu'))
model.add(Conv2D(third_filters, kernel_size, activation ='relu'))
model.add(MaxPooling2D(pool_size = pool_size))
model.add(Dropout(dropout_conv))

model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(dropout_dense))
model.add(Dense(12, activation = "softmax"))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 94, 94, 32)        896       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 92, 92, 32)        9248      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 90, 90, 32)        9248      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 45, 45, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 45, 45, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 43, 43, 64)        18496     
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 41, 41, 64)        3

In [None]:
model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
X_train.shape

(2436, 96, 96, 3)

In [None]:
filepath = "model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, 
                             save_best_only=True, mode='max')

reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5, patience=3, 
                                   verbose=1, mode='max', min_lr=0.00001)
                              
                              
callbacks_list = [checkpoint, reduce_lr]
epochs = 100

model.fit(train_X, Y_train, batch_size=10,
           epochs=100,
           callbacks=callbacks_list,
           validation_split = 0.05)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f89cda2ee10>

In [None]:
#List metric names for evaulate_generator
model.metrics_names

['loss', 'accuracy']

Validation Loss and Accuracy 

In [None]:
model.load_weights('model.h5')

val_loss, val_acc = \
model.evaluate_generator(test_gen, 
                        steps=len(df_val))

print('val_loss:', val_loss)
print('val_acc:', val_acc)

Model Prediction

In [None]:
predictions = model.predict_generator(test_gen, steps=len(df_val), verbose=1)
predictions.shape

Move Predictions into a dataframe

In [None]:
# determine the list of the dict keys with in the dataframe 
test_gen.class_indices
class_dict = train_gen.class_indices

cols = class_dict.keys()

df_preds = pd.DataFrame(predictions, columns=cols)

df_preds.head()

In [None]:
test_labels = test_gen.classes

Confusion Matrix for model predictions 

In [None]:
cm = confusion_matrix(test_labels, predictions.argmax(axis=1))
cm_plot_labels = cols

plot_confusion_matrix(cm, cm_plot_labels, title='Confusion Matrix')

Conclusion

After running the Epoch several times, the model accuracy improved to 98% with a val loss of 2.49 