# ML1 - Project - Smart Grocery Shopping - ML Model

## Its common to see the long line of people at the cashier when you go grocery shopping. Long queues has always been a challenge to manage. However, Amazon has figured out a way to eliminate the wait and created a smart technology that will charge you as you walk out of the shop.
<img style="float: center; /"  width = "1000"  src="Notebook Images/ML-M1-Project-Model-Image1.png"/>

## Amazon Go grocery shops use machine learning technology to track what customers pick from shelves in order to add the items to their virtual cart and then charge them as they walk out of the store. The ML algorithm needs to recognize the item picked up by the customer using computer vision. Telling items apart can be challenging, especially when it comes to picking fruits & vegetables.
<img style="float: center; /"  width = "1000"  src="Notebook Images/ML-M1-Project-Model-Image2.png"/>

## In order to discover how this works, your task is to train a KNN model to classify three different classes of fruits using three types of features and then use the best performing model to build an ML application that classifies the fruit correclty.
<img style="float: center; /"  width = "1000"  src="Notebook Images/ML-M1-Project-Model-Image3.png"/>

## What you need to do:
1. Load & show the fruits data
2. Train and test a KNN model using three values for k = [3, 5, 10] and using three types of features:<br>
    2.1. Raw pixels<br>
    2.2. GreyScale pixels<br>
    2.3. Color histogram
4. Choose the best performing one and save it

### 1. Load & show the fruits data

1.1. Load the fruits dataset. filename = "fruits" 

In [44]:
data_df = load("fruits")

1.2. Show the data in a table

In [45]:
show(data_df)

Showing the first 10 rows


Unnamed: 0,Image,ClassName
0,,Orange
1,,Orange
2,,Orange
3,,Orange
4,,Orange
5,,Orange
6,,Orange
7,,Orange
8,,Orange
9,,Orange


### 2. Train and test a KNN model using three values for k = [3, 5, 10]

### 2.1 Using raw image pixels

2.1.1 Split the data

In [46]:
train_df, test_df = split_table(data_df)

2.1.2 Create, train & test the KNN models

In [47]:
import time
neighbors = [3, 5, 10]
for k in neighbors:
    model = create_model("knn", k)
    fit_model(model, train_df, "ClassName", "Image")
    check_accuracy(model, test_df)
    show_classification_report(model, test_df)


For k = 3
Accuracy score of the model is: 95.85585585585585 %


For k = 3


Unnamed: 0,precision,recall
Banana,1.0,0.914692
Lime,0.908367,1.0
Orange,1.0,0.956897




For k = 5
Accuracy score of the model is: 93.87387387387388 %


For k = 5


Unnamed: 0,precision,recall
Banana,1.0,0.890995
Lime,0.876923,1.0
Orange,0.981308,0.905172




For k = 10
Accuracy score of the model is: 92.61261261261261 %


For k = 10


Unnamed: 0,precision,recall
Banana,0.994595,0.872038
Lime,0.859848,0.995614
Orange,0.971698,0.887931






### 2.2 Using greyscale pixels

2.2.1 Extract & apply greyscale features

In [48]:
def extract_feature(row):
    im = get_value(row, "Image")
    im = convert_gs(im)
    return im

In [49]:
apply_feature(extract_feature, data_df, "GreyScale")

2.2.2 Show the data in a table

In [50]:
show(data_df)

Showing the first 10 rows


Unnamed: 0,Image,GreyScale,ClassName
0,,,Orange
1,,,Orange
2,,,Orange
3,,,Orange
4,,,Orange
5,,,Orange
6,,,Orange
7,,,Orange
8,,,Orange
9,,,Orange


2.2.3 Split the data

In [51]:
train_df, test_df = split_table(data_df)

2.2.4 Create, train & test the KNN models

In [52]:
neighbors = [3, 5, 10]
for k in neighbors:
    model = create_model("knn", k)
    fit_model(model, train_df, "ClassName", "GreyScale")
    check_accuracy(model, test_df)
    show_classification_report(model, test_df)

For k = 3
Accuracy score of the model is: 93.51351351351352 %


For k = 3


Unnamed: 0,precision,recall
Banana,0.985222,0.900901
Lime,0.888446,0.991111
Orange,0.950495,0.888889




For k = 5
Accuracy score of the model is: 92.25225225225225 %


For k = 5


Unnamed: 0,precision,recall
Banana,0.989899,0.882883
Lime,0.871595,0.995556
Orange,0.92,0.851852




For k = 10
Accuracy score of the model is: 88.46846846846846 %


For k = 10


Unnamed: 0,precision,recall
Banana,0.988889,0.801802
Lime,0.808664,0.995556
Orange,0.908163,0.824074






### 2.3 Using color histogram features

2.3.1 Extract the red histogram features. Define the number of bins to extract.

In [53]:
def extract_feature_hist_red(row):
    im = get_value(row, "Image")
    hist = get_hist(im, color = "red",nbins = 10)
    return hist

2.3.2 Apply the feature extraction

In [54]:
apply_feature(extract_feature_hist_red, data_df, "RedHist_10")

2.3.3 Show the data in a table

In [55]:
show(data_df)

Showing the first 10 rows


Unnamed: 0,Image,RedHist_10,GreyScale,ClassName
0,,,,Orange
1,,,,Orange
2,,,,Orange
3,,,,Orange
4,,,,Orange
5,,,,Orange
6,,,,Orange
7,,,,Orange
8,,,,Orange
9,,,,Orange


2.3.4 Extract the green histogram features

In [56]:
def extract_feature_hist_green(row):
    im = get_value(row, "Image")
    hist = get_hist(im, color = "green",nbins = 10)
    return hist

2.3.5 Apply the feature extraction

In [57]:
apply_feature(extract_feature_hist_green, data_df, "GreenHist_10")

2.3.6 Show the data in a table

In [58]:
show(data_df)

Showing the first 10 rows


Unnamed: 0,Image,GreenHist_10,RedHist_10,GreyScale,ClassName
0,,,,,Orange
1,,,,,Orange
2,,,,,Orange
3,,,,,Orange
4,,,,,Orange
5,,,,,Orange
6,,,,,Orange
7,,,,,Orange
8,,,,,Orange
9,,,,,Orange


2.3.7 Extract the blue histogram features

In [59]:
def extract_feature_hist_blue(row):
    im = get_value(row, "Image")
    hist = get_hist(im, color = "blue",nbins = 10)
    return hist

2.3.8 Apply the feature extraction

In [60]:
apply_feature(extract_feature_hist_blue, data_df, "BlueHist_10")

2.3.9 Show the data in a table

In [61]:
show(data_df)

Showing the first 10 rows


Unnamed: 0,Image,BlueHist_10,GreenHist_10,RedHist_10,GreyScale,ClassName
0,,,,,,Orange
1,,,,,,Orange
2,,,,,,Orange
3,,,,,,Orange
4,,,,,,Orange
5,,,,,,Orange
6,,,,,,Orange
7,,,,,,Orange
8,,,,,,Orange
9,,,,,,Orange


2.3.10 Split the data

In [62]:
train_df, test_df = split_table(data_df)

2.3.11 Create, train & test the KNN models

In [63]:
neighbors = [3, 5, 10]
for k in neighbors:
    model = create_model("knn", k)
    fit_model(model, train_df, "ClassName", "RedHist_10", "GreenHist_10","BlueHist_10")
    check_accuracy(model, test_df)
    show_classification_report(model, test_df)


For k = 3
Accuracy score of the model is: 98.37837837837839 %


For k = 3


Unnamed: 0,precision,recall
Banana,0.970874,0.995025
Lime,0.991597,0.995781
Orange,0.990991,0.940171




For k = 5
Accuracy score of the model is: 98.01801801801801 %


For k = 5


Unnamed: 0,precision,recall
Banana,0.966184,0.995025
Lime,0.991561,0.991561
Orange,0.981982,0.931624




For k = 10
Accuracy score of the model is: 96.57657657657658 %


For k = 10


Unnamed: 0,precision,recall
Banana,0.948113,1.0
Lime,0.978992,0.983122
Orange,0.971429,0.871795






2.3.12 Plot the color histogram of random images using the __*plot\_histogram(data)*__ function. ex: plot_histogram(data)

In [None]:
plot_histogram(data_df)

### 3. Reflect on the results and save the best performing model

In [41]:
model = create_model("knn", 3)
fit_model(model, train_df, "ClassName", "RedHist_10", "GreenHist_10","BlueHist_10")
check_accuracy(model, test_df)
show_classification_report(model, test_df)
save_model(model,'fruits_model')