# Plant Seedlings Classification

## Objective
Can we differentiate a weed from a crop seedling? Given an image, how do we differentiate between different plant
types?

This dataset gives us an opportunity to experiment with different image recognition techniques, as well to
provide a place to cross-pollenate ideas. The ability to do so effectively can mean better crop yields and
better stewardship of the environment.

## Dataset

The Aarhus University Signal Processing group, in collaboration with the University of Southern Denmark, has
recently released a dataset containing images of approximately 960 unique plants belonging to 12 species at
several growth stages.

Here we are provided with a training set and a test set of images of plant seedlings at various stages of growing.
Each image has a filename that is its unique id. The dataset comprises 12 plant species. The objective is to create a classifier capable of determining a plant's species from a photo. The list of species
is as follows:

● Black-grass

● Charlock

● Cleavers

● Common Chickweed

● Common wheat

● Fat Hen

● Loose Silky-bent

● Maize

● Scentless Mayweed

● Shepherds Purse

● Small-flowered Cranesbill

● Sugar beet

Link: https://www.kaggle.com/c/plant-seedlings-classification

## Acknowledgments
We extend our appreciation to the Aarhus University Department of Engineering Signal Processing Group for hosting the original data | https://vision.eng.au.dk/plant-seedlings-dataset/.

## Citation
A Public Image Database for Benchmark of Plant Seedling Classification Algorithms: | https://arxiv.org/abs/1711.05458

## Approach
We have approached to solve this problem as follows:

1. Read the images and generate the training dataset
    a. Note: We should not use the test folder as the labels are not available for the same
2. Split the data set into train and validation
3. Initialize & build the model
4. Compile and fit the model
5. Predict the accuracy for both train and validation data

### Package Version
- tensorflow==2.2.0
- pandas==1.0.4
- numpy==1.18.5
- google==2.0.3

# Plant Breed Classification

### Let's mount the Google Drive first

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Extract the contents of the Zip file
- Filename: plant-seedlings-classification.zip
- Extract train and test folders

In [None]:
from zipfile import ZipFile
with ZipFile('/content/drive/My Drive/plant-seedlings-classification.zip', 'r') as z:
  z.extractall()

### Generate training data
- Read the images
- Resize the image to 128 x 128
- Get the image labels from the folder name

In [None]:
import os
import cv2

X_train = []
y_train = []

# list all folders inside train directory
for i in os.listdir('train'):
    print(i)          
    for j in os.listdir('train/' + i):
      # read each image inside train directory one by one
      dummy = cv2.imread('train/' + i + "/" + j)
      dummy = cv2.resize(dummy, (128, 128))
      X_train.append(dummy)
      y_train.append(i)

Cleavers
Fat Hen
Sugar beet
Charlock
Maize
Common wheat
Small-flowered Cranesbill
Black-grass
Common Chickweed
Loose Silky-bent
Shepherds Purse
Scentless Mayweed


### Encode the labels
- Convert categorical variables into one hot encoded variables

In [None]:
import pandas as pd

y_train = pd.get_dummies(y_train).values

Covert feature list in a numpy array

In [None]:
import numpy as np

X_train = np.array(X_train)

### Split the data into training and testing

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=2)

print(len(X_train))
print(len(X_val))

3800
950


### Shape of the data

In [None]:
X_train.shape

(3800, 128, 128, 3)

In [None]:
X_val.shape

(950, 128, 128, 3)

In [None]:
y_train.shape

(3800, 12)

In [None]:
y_val.shape

(950, 12)

### Define the model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Convolution2D, Dropout, Dense, Flatten, BatchNormalization, MaxPooling2D

# model architecture building
model = Sequential()

model.add(BatchNormalization(input_shape = (128, 128, 3)))

model.add(Convolution2D(filters = 32, kernel_size = 3, activation ='relu', input_shape = (128, 128, 3))) 
model.add(MaxPooling2D(pool_size = 2))

model.add(Convolution2D(filters = 64, kernel_size = 4, padding = 'same', activation = 'relu'))
model.add(MaxPooling2D(pool_size = 2))

model.add(Convolution2D(filters = 128, kernel_size = 3, padding = 'same', activation = 'relu'))
model.add(MaxPooling2D(pool_size = 2))

model.add(Convolution2D(filters = 128, kernel_size = 2, padding = 'same', activation = 'relu'))
model.add(MaxPooling2D(pool_size = 2))

model.add(Flatten()) 

# fully connected layer
model.add(Dense(units = 128,activation = 'relu'))
model.add(Dense(units = 64, activation = 'relu'))
model.add(Dense(units = 32, activation = 'relu'))

model.add(Dense(units = 12, activation = 'softmax')) 

### Compile the model

In [None]:
from tensorflow.keras.optimizers import Adam

optimizer = Adam(lr=0.001)
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

### Summarize the model

In [None]:
model.summary()

Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_7 (Batch (None, 128, 128, 3)       12        
_________________________________________________________________
conv2d_36 (Conv2D)           (None, 126, 126, 32)      896       
_________________________________________________________________
max_pooling2d_36 (MaxPooling (None, 63, 63, 32)        0         
_________________________________________________________________
conv2d_37 (Conv2D)           (None, 63, 63, 64)        32832     
_________________________________________________________________
max_pooling2d_37 (MaxPooling (None, 31, 31, 64)        0         
_________________________________________________________________
conv2d_38 (Conv2D)           (None, 31, 31, 128)       73856     
_________________________________________________________________
max_pooling2d_38 (MaxPooling (None, 15, 15, 128)     

### Fit the model

In [None]:
model.fit(X_train, y_train, epochs = 20, validation_data = (X_val, y_val), initial_epoch=0)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fad106ae6a0>

### Evaluate the model

In [None]:
scores = model.evaluate(X_val, y_val)
print('Loss: {}, Accuracy: {}'.format(scores[0], scores[1]))

Loss: 1.0749837160110474, Accuracy: 0.7831578850746155
