# Basic Classifier for differentiating image objects based on confidence score

## Introduction

Our goal for the project is to create a system that can detect rotten fruits. In our first week of work, we have created a classifier that can detect rotten fruits. So far we have trained the model with 100+ images of Mangoes. It can not only detect Rotten Mangoes but it can also detect green and ripe mangoes alike with an accuracy of 80%, which we expect to get better as we train it with more data.

## Methodology

For this classifier we are using the Inception V3 model from Google which is supplied with their open source Machine Learning framework Tensorflow. Inception V3 model is created using CNN (Convolutional Neural Network).

Then using this model we are training our classifier on our training dataset that we collected from Google Image Search.

## Training Detail

### First Phase

The first phase analyzes the images in the created dataset and creates **Bottlenecks** for each of the images in the dataset. Bottleneck is a layer of the classifier that works just before providing the final output. In other words, Bottlenecks are what classify images based on the training data. 

We iterated the training procedure over every image for 500 iterations. Calculating the layers behind the bottleneck for each image takes a significant amount of time. Since these lower layers of the network are not being modified their outputs can be cached and reused. The lower layers are responsible for edge and shape detection from the images.



### Training Graph

![graph](img/graph.png)

### Details on the graph

The first phase detects edges from the images. The second layer is resoponsible for shape detection. These are considered as lower level layers. The results from these are cached for later use speed up computation.

The third layer , conv_1 to conv_4 takes information from the previous two layers, extracts information from the images, applies CNN and creates the bottleneck for the image. 

Then the bottleneck is fed into the output layer and the training graph is updated. When non dataset image is given as input to the system, the Classifier matches it's features with the bottlenecks in the graph and matches for results. 

Our initial training gave us 75% validation score and 85% training accuracy for mango images.

### Detailed Training Data

We collected all the data representation from our trained graph using Tensorboard.

#### Training accuracy

The training data has the following labels.

- **Training accuracy** : shows the percentage of the images used in the used training data that were labeled with the correct class. We used 3 classes for our data set : mango_ripe, mango_green, mango_rotten

- **Validation accuracy:** The validation accuracy is the precision (percentage of correctly-labelled images) on a randomly-selected group of images from a different set. In other words we can say that the Vaidation Accuracy is our expected outcome that we can test against our training results to get the accuracy.

- **Cross Entropy** is a loss function that indicates the deviation of the training accuracy from the validation accuracy. From this we can know how well our training is working. 

#### Training Accuracy
 - Blue line : Expected / Validation
 - Orange line : Availed from training
 
 ![accuracy](img/accuracy.png)

#### Cross Entropy

 - Blue line : Expected / Validation
 - Orange line : Availed from training

![cross](img/cross_entropy.png)

#### Final Layer representation 

 - Blue line : Expected / Validation
 - Orange line : Availed from training
 
 ![final_ops](img/final_ops.png)

### Measuring Performance 

Whether our training is working well is indicated by the validation and training accuracy when we apply an image set or image that is not part of the training dataset. If for that image set / image diffrence between validation and training accuracy is very high we can assume the CNN network is overfitting and the training is not going as expected.

For our training the difference is acceptable but not as good as expected since we trained with only 100 images. Adding more images with varied and distinct features to the dataset will improve the network. Our target is to achieve 85% or higher for both validation and training accuracy score.

### Demo 

In [2]:
import os, sys
import tensorflow as tf

images = [
    'img/green_mangoes.jpg',
    'img/green_mango2.jpg',
    'img/overripe_ataulfo_mango.png',
    'img/ripe_mango_1.jpg',
    'img/ripe_mango_2.jpg',
    'img/rotten_mango.jpg'
]

print('Number of images : {}'.format(len(images)))

Number of images : 6


In [11]:
# function to get output from the graph

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'


def classify(image_path):
    # Reads in the image_data
    image_data = tf.gfile.FastGFile(image_path, 'rb').read()

    # Loads label file, strips off carriage return
    label_lines = [line.rstrip() for line 
                   in tf.gfile.GFile("../retrained_labels.txt")]

    # Unpersists graph from file
    with tf.gfile.FastGFile("../retrained_graph.pb", 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        tf.import_graph_def(graph_def, name='')

    with tf.Session() as sess:
        # Feeds the image_data as input to the graph and get first prediction
        softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
    
        predictions = sess.run(softmax_tensor, \
             {'DecodeJpeg/contents:0': image_data})
    
        # Sorted to show labels of first prediction in order of confidence
        top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
    
        print('\nClassified input image : Results :\n')
        for node_id in top_k:
            human_string = label_lines[node_id]
            score = predictions[0][node_id]
            print('Input is : {} \twith score = {} %'.format(human_string, score * 100))

        print('\nDONE===========================\n\n')

#### Result for green_mangoes.jpg 

![img](img/green_mangoes.jpg)

In [7]:
classify(images[0])


Classified input image : Results :

Input is : green mango 	with score = 81.32983446121216 %
Input is : rotten mango 	with score = 12.711438536643982 %
Input is : ripe mango 	with score = 5.958719924092293 %





### ripe_mango_1.jpg

![img](img/ripe_mango_1.jpg)

In [9]:
classify(images[3])


Classified input image : Results :

Input is : ripe mango 	with score = 67.17912554740906 %
Input is : green mango 	with score = 17.06155836582184 %
Input is : rotten mango 	with score = 15.759308636188507 %





### rotten_mango.jpg

![img](img/rotten_mango.jpg)

In [10]:
classify(images[5])


Classified input image : Results :

Input is : rotten mango 	with score = 67.93732047080994 %
Input is : ripe mango 	with score = 31.19925856590271 %
Input is : green mango 	with score = 0.8634287863969803 %





## Conclusion 

Our work for the first week has managed to produce a prototype that can detect rotten mangoes, ripe mangoes and green mangoes with a few hit and misses. We plan to improve it by feeding more images and getting more accuracy. Once we are able to do this with mangoes we can extend the same procedure for other fruits as well. 

## Related research work

- *Going Deeper with Convolutions* (Szegedy et el. 2015, http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf)

## Helper Materials

- Tensorflow Official Documentation
- Tensoflow for Poets Codelab

 ##### This document was prepared with Jupyter Notebook with `python3` backend.