# leaf-classification <hr/>
https://www.kaggle.com/c/leaf-classification

## Description

There are estimated to be nearly half a million species of plant in the world. Classification of species has been historically problematic and often results in duplicate identifications. Automating plant recognition might have many applications, including:

- Species population tracking and preservation
- Plant-based medicinal research
- Crop and food supply management
- Leaf Classification

The objective of this playground competition is to use binary leaf images and extracted features, including shape, margin & texture, to accurately identify 99 species of plants. Leaves, due to their volume, prevalence, and unique characteristics, are an effective means of differentiating plant species. They also provide a fun introduction to applying techniques that involve image-based features.

As a first step, try building a classifier that uses the provided pre-extracted features. Next, try creating a set of your own features. Finally, examine the errors you're making and see what you can do to improve.

### Acknowledgments
Kaggle is hosting this competition for the data science community to use for fun and education. This dataset originates from leaf images collected by  
James Cope, Thibaut Beghin, Paolo Remagnino, & Sarah Barman of the Royal Botanic Gardens, Kew, UK.

Charles Mallah, James Cope, James Orwell. Plant Leaf Classification Using Probabilistic Integration of Shape, Texture and Margin Features. Signal Processing, Pattern Recognition and Applications, in press. 2013.

We thank the UCI machine learning repository for hosting the dataset.

In [128]:
import numpy as np
import pandas as pd
import tensorflow as tf 

In [254]:
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

In [162]:
train.head()

Unnamed: 0,id,species,margin1,margin2,margin3,margin4,margin5,margin6,margin7,margin8,...,texture55,texture56,texture57,texture58,texture59,texture60,texture61,texture62,texture63,texture64
0,1,Acer_Opalus,0.007812,0.023438,0.023438,0.003906,0.011719,0.009766,0.027344,0.0,...,0.007812,0.0,0.00293,0.00293,0.035156,0.0,0.0,0.004883,0.0,0.025391
1,2,Pterocarya_Stenoptera,0.005859,0.0,0.03125,0.015625,0.025391,0.001953,0.019531,0.0,...,0.000977,0.0,0.0,0.000977,0.023438,0.0,0.0,0.000977,0.039062,0.022461
2,3,Quercus_Hartwissiana,0.005859,0.009766,0.019531,0.007812,0.003906,0.005859,0.068359,0.0,...,0.1543,0.0,0.005859,0.000977,0.007812,0.0,0.0,0.0,0.020508,0.00293
3,5,Tilia_Tomentosa,0.0,0.003906,0.023438,0.005859,0.021484,0.019531,0.023438,0.0,...,0.0,0.000977,0.0,0.0,0.020508,0.0,0.0,0.017578,0.0,0.047852
4,6,Quercus_Variabilis,0.005859,0.003906,0.048828,0.009766,0.013672,0.015625,0.005859,0.0,...,0.09668,0.0,0.021484,0.0,0.0,0.0,0.0,0.0,0.0,0.03125


In [163]:
train.isna().sum()

id           0
species      0
margin1      0
margin2      0
margin3      0
margin4      0
margin5      0
margin6      0
margin7      0
margin8      0
margin9      0
margin10     0
margin11     0
margin12     0
margin13     0
margin14     0
margin15     0
margin16     0
margin17     0
margin18     0
margin19     0
margin20     0
margin21     0
margin22     0
margin23     0
margin24     0
margin25     0
margin26     0
margin27     0
margin28     0
            ..
texture35    0
texture36    0
texture37    0
texture38    0
texture39    0
texture40    0
texture41    0
texture42    0
texture43    0
texture44    0
texture45    0
texture46    0
texture47    0
texture48    0
texture49    0
texture50    0
texture51    0
texture52    0
texture53    0
texture54    0
texture55    0
texture56    0
texture57    0
texture58    0
texture59    0
texture60    0
texture61    0
texture62    0
texture63    0
texture64    0
Length: 194, dtype: int64

In [164]:
train['species'].value_counts()

Acer_Opalus                     10
Olea_Europaea                   10
Viburnum_x_Rhytidophylloides    10
Lithocarpus_Edulis              10
Prunus_X_Shmittii               10
Quercus_Pubescens               10
Quercus_Infectoria_sub          10
Quercus_Rubra                   10
Magnolia_Salicifolia            10
Cotinus_Coggygria               10
Quercus_Ilex                    10
Acer_Pictum                     10
Quercus_Crassifolia             10
Quercus_Phellos                 10
Quercus_Phillyraeoides          10
Betula_Pendula                  10
Quercus_Greggii                 10
Quercus_Brantii                 10
Quercus_Coccinea                10
Cytisus_Battandieri             10
Quercus_x_Turneri               10
Phildelphus                     10
Quercus_Cerris                  10
Betula_Austrosinensis           10
Quercus_x_Hispanica             10
Populus_Adenopoda               10
Quercus_Dolicholepis            10
Quercus_Coccifera               10
Callicarpa_Bodinieri

In [165]:
def get_map(data, name):
    mapp = {}
    num = 0
    for i in range(len(data)):
        if data[name][i] not in mapp:
            mapp[data[name][i]] = num
            num += 1
        else:
            pass
    return mapp

In [166]:
mapp = get_map(train,'species')

train_ = [train]
for data in train_:
    data['species'] = data['species'].map(mapp)

## Modeling
i'm gonna using simple linear regression with batch normalization to classification

In [167]:
drop = ['id', 'species']
trainX = train.drop(drop,axis=1)
trainY = train.species

In [168]:
trainX.shape, trainY.shape

((990, 192), (990,))

In [169]:
trainX = np.array(trainX)
trainY = np.array(trainY).reshape(-1,1)

In [170]:
trainX.shape, trainY.shape

((990, 192), (990, 1))

### Implement custom Batch Generator 

if you wanna see more about my batch generator 
https://github.com/che9992/BatchGenerator

In [171]:
class BatchGenerator():
    where = 0

    def __init__(self, x, y, batch_size, one_hot = False, nb_classes = 0):
        self.nb_classes = nb_classes
        self.one_hot = one_hot
        self.x_ = x
        self.y_ = y
        self.batch_size = batch_size
        
        self.total_batch = int(len(x) / batch_size)
        self.x = self.x_[:batch_size,:]
        self.y = self.y_[:batch_size,:]
        self.where = batch_size
        
        if self.one_hot :
            self.set_one_hot()

    def next_batch(self):
        if self.where + self.batch_size > len(self.x_) :
            self.where = 0
            
        self.x = self.x_[self.where:self.where+self.batch_size,:]
        self.y = self.y_[self.where:self.where+self.batch_size,:]
        self.where += self.batch_size
        
        if self.one_hot:
            self.set_one_hot()
        
    def set_one_hot(self):
        self.y = np.int32(self.y)
        one_hot = np.array(self.y).reshape(-1)
        self.y = np.eye(self.nb_classes)[one_hot]

Using Batch nomalization before ReLu 

if you wanna see more about BN check it out 

https://github.com/che9992/Batch_Normalization

In [142]:
def TO_FC_BN(X, size, phase, scope):
    with tf.variable_scope(scope):
        fc1 = tf.contrib.layers.fully_connected(X, size, activation_fn=None, scope='fully_connected', reuse=tf.AUTO_REUSE)
        fc2 = tf.contrib.layers.batch_norm(fc1, center = True, scale = True, is_training= phase, scope='bn', reuse=tf.AUTO_REUSE)
        return tf.nn.relu(fc2, name='relu')
    
def FC_2_NC(X, nb_classes, scope):
    return tf.contrib.layers.fully_connected(X, nb_classes, activation_fn=None, scope=scope, reuse=tf.AUTO_REUSE)    

In [180]:
tf.reset_default_graph() 
nb_classes = 99

phase = tf.placeholder(dtype=tf.bool, name = 'phase')
X = tf.placeholder(shape=[None, 192], dtype=tf.float32, name='X')
Y = tf.placeholder(shape=[None, nb_classes], dtype=tf.float32, name='Y')

FC1 = TO_FC_BN(X, 128, phase=phase, scope='layer1')
FC2 = TO_FC_BN(FC1, 256, phase=phase, scope='layer2')
FC3 = TO_FC_BN(FC2, 512, phase=phase, scope='layer3')
FC4 = TO_FC_BN(FC3, 1024, phase=phase, scope='layer4')
FC5 = TO_FC_BN(FC4, 1024, phase=phase, scope='layer5')
FC6 = TO_FC_BN(FC5, 512, phase=phase, scope='layer6')
FC7 = TO_FC_BN(FC6, 512, phase=phase, scope='layer7')
FC8 = TO_FC_BN(FC7, 256, phase=phase, scope='layer8')
logits = FC_2_NC(FC8, nb_classes, scope='logits')

with tf.name_scope('accuracy'):
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.name_scope('cost'):
     cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = logits, labels = Y))

In [181]:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    optimizer = tf.train.AdamOptimizer(1e-4).minimize(cost)


sess = tf.Session()
sess.run(tf.global_variables_initializer())

def train(epochs = 20, batch_size = 200):
    batch = BatchGenerator(trainX,trainY,batch_size=batch_size,nb_classes=nb_classes, one_hot=True)
    check = BatchGenerator(trainX,trainY,batch_size=len(trainX),nb_classes=nb_classes, one_hot=True)
    
    for epoch in range(epochs):
        avg_cost = 0
        
        for i in range(batch.total_batch):
            c, _ = sess.run([cost, optimizer], feed_dict= {'X:0': batch.x, 'Y:0': batch.y, 'phase:0': True})
            avg_cost += c / batch.total_batch
            batch.next_batch()
            
        if epoch % 100 == 0:
            print("Epoch:", '%04d,' % (epoch), 'cost = ', '{:.9f}'.format(avg_cost))
            train_acc = sess.run(accuracy,feed_dict={'X:0': check.x, 'Y:0': check.y, 'phase:0': True})
            print('Train Accuracy: {:.2f}% '.format(train_acc * 100))


In [184]:
train(500,10)

Epoch: 0000, cost =  0.000004951
Train Accuracy: 87.37% 
Epoch: 0100, cost =  0.000072314
Train Accuracy: 96.46% 
Epoch: 0200, cost =  0.000000690
Train Accuracy: 96.77% 
Epoch: 0300, cost =  0.000234411
Train Accuracy: 96.57% 
Epoch: 0400, cost =  0.000002950
Train Accuracy: 97.37% 


In [191]:
test.head()

Unnamed: 0,id,margin1,margin2,margin3,margin4,margin5,margin6,margin7,margin8,margin9,...,texture55,texture56,texture57,texture58,texture59,texture60,texture61,texture62,texture63,texture64
0,4,0.019531,0.009766,0.078125,0.011719,0.003906,0.015625,0.005859,0.0,0.005859,...,0.006836,0.0,0.015625,0.000977,0.015625,0.0,0.0,0.0,0.003906,0.053711
1,7,0.007812,0.005859,0.064453,0.009766,0.003906,0.013672,0.007812,0.0,0.033203,...,0.0,0.0,0.006836,0.001953,0.013672,0.0,0.0,0.000977,0.037109,0.044922
2,9,0.0,0.0,0.001953,0.021484,0.041016,0.0,0.023438,0.0,0.011719,...,0.12891,0.0,0.000977,0.0,0.0,0.0,0.0,0.015625,0.0,0.0
3,12,0.0,0.0,0.009766,0.011719,0.017578,0.0,0.003906,0.0,0.003906,...,0.012695,0.015625,0.00293,0.036133,0.013672,0.0,0.0,0.089844,0.0,0.008789
4,13,0.001953,0.0,0.015625,0.009766,0.039062,0.0,0.009766,0.0,0.005859,...,0.0,0.042969,0.016602,0.010742,0.041016,0.0,0.0,0.007812,0.009766,0.007812


In [255]:
test_ = test.drop('id',axis=1)

In [338]:
def testing():
    t = sess.run(tf.argmax(logits, 1), feed_dict={'X:0': test_, 'phase:0': False})
    return t

In [348]:
result = testing()

In [349]:
result = pd.DataFrame(result)

In [350]:
mapp_2 = {v: k for k, v in mapp.items()}

In [351]:
result_ = [result]
for data in result_:
    data[0] = data[0].map(mapp_2)

In [352]:
result.head()

Unnamed: 0,0
0,Quercus_Agrifolia
1,Quercus_Afares
2,Acer_Circinatum
3,Castanea_Sativa
4,Alnus_Viridis


In [353]:
r = np.array(result).reshape(-1)
r = pd.DataFrame(result, columns=r)
r.to_csv('result.csv')