<a href="https://colab.research.google.com/github/RobDeutsche/CISC499/blob/main/Copy_of_Module14_07_ComputerVisionAndMachineLearning_Coach.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# From Traditional Computer Vision to Artificial Intelligence
*Version 1.1*

To navigate up and down, you can use the up and down arrow keys on your keyboard<br />
To execute code in this workbook, select the code block and press **Shift+Enter** <br />
To edit the code block, press enter. 

The codes in this workbook are cumulative. (Variables defined continue to be available until the notebook is closed) <br />
So do start from the top and work your way down to avoid unexpected results!


For more help on using Jupyter Notebook, you can click on Help > User Interface Tour in the menu above, <br />
or visit https://jupyter-notebook.readthedocs.io/en/stable/ui_components.html

Experiment and test out your ideas, for that is one of the fastest ways to learn!

## 1. What if we did not need to manually define rules to solve a classification problem?

In the last workshop, you experimented with various basic image processing techniques, and explored how computers could “see”. You then attempted to make the system recognize you by using objects that you held up in front of the camera. You explored a variety of methods, and many of your creative methods likely involved defining rules or “if-else” logic. For example, rules for what were considered “authorized” colors, position, or a combination of conditions. 

But what if you did not need to define those rules manually?

**Machine Learning** is a subset of Artificial Intelligence that focuses on the ability of machines to learn based on training data. Applied to the field of computer vision, what if we could get the machine to learn what was an “authorized” or “unauthorized” image, instead of having to define rules for the exact color codes?

In today's workshop, we will explore how basic computer vision techniques can be combined with machine learning to solve a variety of challenges.
1. First, we will jump right into building a simple model to illustrate machine learning
1. Then we will take a step back to see the steps involved in building a classification model
1. Next, we use classification models to make inferences and explore the accuracy.
1. Along the way, do look out for and take note of the limitations and motivations for the different methods and techniques used.


## Classifying a card into 1 out of 3 possible categories

Let us take a quick look at the "access cards" challenge again. <br />
Below there are 3 cards (red, green and black cards), and a background scene when no cards are placed in front of the camera.
The top row shows the cards held further away, while the bottom row shows the cards held very close to the web camera.


<img src="https://raw.githubusercontent.com/ethaneldridge/flc-cisc499/main/Module14-07-ComputerVisionAndMachineLearning/cards.png" style="float:left;"/>
<div style="clear:both;"></div>


Let us scope the problem assuming that cards need to be held close to the web camera for validation, then it could just be a matter of comparing the colors of each card (image) to determine which of the 3 cards it is.

## 1.1 Feature Extraction - Selecting what feature(s) to use to help us infer

For this experiment, we have decided to use color to help us to distinguish the cards. But how will we select our color features? Should we select a particular point (e.g. center of the image), or the average color of the image? Should we use a particular channel of the BGR image, or should we convert it to greyscale or any of the other color spaces? 

The selection of our features will impact the robustness of your solution, and selecting irrelevant "features" would not be useful.

For example, if we try to use the camera image size to determine whether or not it was an authorized card, it would NOT be relevant since the camera image size will not change regardless of what card is placed in front of the camera. 

You can try to experiment with different features. <br />
But in the meantime, let us do a quick experiment using the average color as the feature:

In [1]:
from skimage import io #  io.imread loads image as RGB


In [2]:
def bgr_from_rgb(img):
  return img[:,:,::-1] # see https://scikit-image.org/docs/stable/user_guide/data_types.html

In [3]:
# v2.0 Added
def rgb_from_bgr(img):
  return img[:,:,::-1] # see https://scikit-image.org/docs/stable/user_guide/data_types.html

In [4]:
def load_image(url):
  return bgr_from_rgb(io.imread(url))

In [8]:
import cv2 # BGR is the default OpenCV colour format
import numpy as np
from google.colab.patches import cv2_imshow
import skimage as skk


# Let's read the images into memory
red_card = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardred_close.png")
green_card = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardgreen_close.png")
black_card = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardblack_close.png")
background = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardnone.png")



**Preprocessing and Feature Extraction**

Sometimes, we might need to do preprocessing on our input data to ensure that they are of a consistent format that the model accepts. 

What are some ways that we can preprocess our data?
1. Resizing to a standard size.
2. Changing image orientation.
3. Converting to a particular color space. 

In this particular example, our loaded input images are already in a consistent landscape (640x480) format in the default BGR color space. But our simple model will not be using all the pixels of the image as features for prediction. Instead, we will be using the average color as a feature for the model to infer the class that it belongs to. Hence, we will next be defining a method to extract the average color from each image.

In [9]:
# Define a function to extract our feature (average color)
def averagecolor(image):
    return np.mean(image, axis=(0, 1))

We used np.mean since the average color has 3 channels (and not a single numerical value). To understand how np.mean works, you can refer to the documentation at 
https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean
See also: https://pythonexamples.org/numpy-mean/#3

**Let's explore: what are the extracted features (average color) for our red and green cards?**

In [10]:
print (averagecolor(red_card))

[ 27.35627604   4.48305664 154.21746094]


In [11]:
print (averagecolor(green_card))

[119.53976563 133.40338216  61.1089388 ]


In [12]:
print(averagecolor(black_card))

[70.36474609 61.85563477 67.1775651 ]


Notice that the value generated are different? In fact, their values are very far from each other. This is good! This mean that average color is a good feature for this simple problem.

**Now what if we had chosen to use image size as our feature?**

In [13]:
print (red_card.shape)

(480, 640, 3)


In [14]:
print (green_card.shape)

(480, 640, 3)


Would be able to tell the red card and the green card apart if you only knew their shape? 
No! As their shape are identical. 

How about if you knew their average colous?

As we can see above, the average color of the red card and the green card are quite different. But the image size of both cards are exactly the same! Since we want to use the features to tell the cards apart, we will go ahead to use average color to help us to infer the type of cards.

We will now create variables to input the average color value and the label of each image file. We will use this later for model training. Do you remember how this training is done?

In [15]:
# Store the features (average color) and corresponding label (red/green/black/none) for classification
trainX = []
trainY = []

# loop through the cards and print the average color
for (card,label) in zip((red_card,green_card,black_card,background),("red","green","black","none")):
    print((label, averagecolor(card)))
    trainX.append(averagecolor(card))
    trainY.append(label)

('red', array([ 27.35627604,   4.48305664, 154.21746094]))
('green', array([119.53976563, 133.40338216,  61.1089388 ]))
('black', array([70.36474609, 61.85563477, 67.1775651 ]))
('none', array([247.9326888 , 241.13666016, 241.89832357]))


Recall from the previous workshop how the array representation defaults to the order [Blue, Green, Red] (not [Red, Green, Blue])

Notice how the red card has a much higher value of red than the rest. For the green card, we see that it has higher values of blue and green, and not just green.

trainX now stores the feature vectors (features), and trainY stores the corresponding labels.

If you are wondering what is stored inside trainX and what is stored inside trainY, do print out the arrays and see for yourself (comparing against the print outs above) It is helpful that you understand how data is being stored at this point.

In [16]:
print(trainX)
print(np.array(trainX).shape)      #Note how the 3 channels are stored in the array

[array([ 27.35627604,   4.48305664, 154.21746094]), array([119.53976563, 133.40338216,  61.1089388 ]), array([70.36474609, 61.85563477, 67.1775651 ]), array([247.9326888 , 241.13666016, 241.89832357])]
(4, 3)


In [17]:
print(trainY)
print(np.array(trainY).shape)

['red', 'green', 'black', 'none']
(4,)


If you take more images, you may find that the average color is not always the same exact value, and it will likely fluctuate due to lighting conditions and camera settings. Hence, training a model usually involves more than just a few images. But we will use just these few images just to illustrate the concept.

# 1.2 Introducing the K-Nearest Neighbour (kNN) Algorithm

When we hold a new card in front of the camera, we want to determine which of these cards it is most similar to. Instead of defining the exact color codes, we might approach it from the angle of "**Which of our known existing cards is the new card most similar to?**"

The concept of k-Nearest Neighbours is to search the set of labelled images for k most-similar images to the new image. And based on that labels of those similar images, predict the label for the new image. 

We will run an experiment below for k=1. That is, to find 1 image with the most similar average color to the new image. And use the label for that image to predict the label for the new image.

Let's break down how this is done!

### First we read the new image into memory

In [19]:
new_card = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/16.png")
new_card_features = averagecolor(new_card)

### Calculate the distances between the features (average color) of that new image against the features of the images we know
Read about linealg.norm [here](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html)

In [20]:
calculated_distances = []
for card in (trainX):
    calculated_distances.append(np.linalg.norm(new_card_features-card))
    
print (calculated_distances)

[117.79791641023513, 113.43645699355922, 33.497714831624535, 340.3000785919897]


### And here is the result of the which card it is most similar to:
Can you guess just by looking at calculated_distances above?

In [21]:
print(trainY[np.argmin(calculated_distances)])

black


Do open the images/test subfolder and check the actual colors of the respective images.

Note that the distance measure we used was "np.linalg.norm()". You can read up more about it at https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html or Search the Internet for "Euclidean Distance". In simple terms, you can just take it as a measure of how similar the array values of (new_card_features) and (card) are.

Do take some time to also understand what the last line does. Recall what is stored inside trainY in section 1.1.

Check what is stored inside calculated_distances. 
What does np.argmin do? 

Hint: Lookup the documentation for numpy.argmin if necessary.

In [22]:
print(calculated_distances)

[117.79791641023513, 113.43645699355922, 33.497714831624535, 340.3000785919897]


In [23]:
print(np.argmin(calculated_distances))

2


In [24]:
print(trainY)

['red', 'green', 'black', 'none']


In [25]:
print(trainY[np.argmin(calculated_distances)])

black


### Let's try testing another card
Remember to check your folder to ensure that the model can indeed predict what we want!

In [26]:
# First we read the new image into memory
new_card = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/36.png")
new_card_features = averagecolor(new_card)

# Calculate the distances between the features (average color) of that new image against the features of the images we know
calculated_distances = []
for card in (trainX):
    calculated_distances.append(np.linalg.norm(new_card_features-card))

# And here is the result of the which card it is most similar to:
print(trainY[np.argmin(calculated_distances)])

red


### How about another card?
Remember to check your folder to ensure that the model can indeed predict what we want!

In [27]:
# First we read the new image into memory
new_card = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/56.png")
new_card_features = averagecolor(new_card)

# Calculate the distances between the features (average color) of that new image against the features of the images we know
calculated_distances = []
for card in (trainX):
    calculated_distances.append(np.linalg.norm(new_card_features-card))

# And here is the result of the which card it is most similar to:
print(trainY[np.argmin(calculated_distances)])

green


### Let's try to classify all the test cards

Not bad! It seems that our simplistic model has correctly classified the cards so far. 

Let us try looping over and classifying all the cards in the test subfolder.

In [28]:
from sklearn.metrics import classification_report
# Ground truth for the test images. Open the folder on your computer to see the images.
realtestY = np.array(["black","black","black","black","black",
                     "red","red","red","red","red",
                     "green","green","green","green","green",
                     "none","none","none","none","none"])
def evaluateaccuracy(filenames,predictedY):
    predictedY = np.array(predictedY)
    if (np.sum(realtestY!=predictedY)>0):
        print ("Wrong Predictions: (filename, labelled, predicted) ")
        print (np.dstack([filenames,realtestY,predictedY]).squeeze()[(realtestY!=predictedY)])
    # Calculate those predictions that match (correct), as a percentage of total predictions
    return "Correct :"+ str(np.sum(realtestY==predictedY)) + ". Wrong: "+str(np.sum(realtestY!=predictedY)) + ". Correctly Classified: " + str(np.sum(realtestY==predictedY)*100/len(predictedY))+"%"

Were you surprised that there was no output for the block of code above? That is because we only defined the function to do the accuracy evaluation. To learn more about functions in Python, you can visit [this link](https://www.datacamp.com/community/tutorials/functions-python-tutorial)

Let us run the code block below to see the outputs.

In [29]:
data_test = [
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/16.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/17.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/18.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/19.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/20.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/36.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/37.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/38.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/39.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/40.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/56.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/57.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/58.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/59.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/60.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/76.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/77.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/78.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/79.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/80.png"
]

In [30]:
predictedY = []
filenames = []

for filename in data_test:
  img = load_image(filename)
  img_features = averagecolor(img)
  calculated_distances = []
  for card in (trainX):
    calculated_distances.append(np.linalg.norm(img_features-card))
  prediction = trainY[np.argmin(calculated_distances)]
  
  print (filename + ": " + prediction) #Print out the inferences
  filenames.append(filename)
  predictedY.append(prediction)

# Evaluate Accuracy (the sklearn package provides a useful report)
print ()
print(classification_report(realtestY, predictedY))

# Evaluate Accuracy (our own custom method to output the filenames of the misclassified entries)
print ()
print (evaluateaccuracy(filenames,predictedY))


https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/16.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/17.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/18.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/19.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/20.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/36.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/37.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/38.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/39.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMac

**What does precision and recall mean?**
Do you remember we've went through these during acquire stage?

Remember the concepts of true positives, false positives, true negatives, and false negatives.

For example, if we are evaluating the red class:
- If you classify a red image correctly as red, that is a true positive.
- If you classify a red image wrongly as black, that is a false negative.
- If you classify another non-red image as red, that is a false positive.
- If you classify a non-red image correctly as non-red, that is a true negative.

Precision is the number of True Positives divided by (True Positives + False Positives) i.e. how many out of that were classified red were actually red.

Recall is the number of True Positives divided by (True Positives + False Negatives) i.e. how many red images were correctly classified red when you tried to get all the red images.

To read more about precision and recall, you can Search the Internet as usual, or visit https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

**Let's Investigate the misclassified image**

Open up that folder and check the images. 
It seems that 58.png was classified wrongly. Why?

58.png

<img src="https://raw.githubusercontent.com/ethaneldridge/flc-cisc499/main/Module14-07-ComputerVisionAndMachineLearning/test/58.png" style="width:400px; float:left;" />
<div style="clear:both;"></div>

Recall our initial set of training images. <br />
58.png looks much brighter than the training image for "green", which may suggest why it was mistaken as "none" (which was the "brightest" among the 4 training images)

<img src="https://raw.githubusercontent.com/ethaneldridge/flc-cisc499/main/Module14-07-ComputerVisionAndMachineLearning/cards.png" style="float:left;"/>
<div style="clear:both;"></div>



For us as humans, it is easy for us to tell that 58.png should be classified as green. 

However, remember that the feature we used to "train" the system was "average color" and we only supplied one training image. 

It seem that the average color of 58.png is closer to the average color of the background (background.png) rather than the training image (cardgreen_close.png). 

It will be left as an exercise for you to calculate the average color of the images respectively and uncover why it was misclassified. That will be your Challenge 1 later in this notebook.

Meanwhile, can you think of a way to improve the model?

### Open the folder of test images!
You can open the folder of test images. Do they appear to be under different lighting conditions? We only trained our system using a single example for each colored card so far. Do you think having more training images might help?

 ## 1.3 Training with more samples
 
How about training it with more samples? <br />
Recall what we did in section 1.1 to get trainX and trainY. If you have forgotten, do revisit section 1.1 to understand the code better.

In [31]:

data_red = [
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/21.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/22.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/23.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/24.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/25.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/26.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/27.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/28.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/29.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/30.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/31.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/32.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/33.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/34.png",
        "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/red/35.png"
]

data_green = [
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/41.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/42.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/43.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/44.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/45.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/46.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/47.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/48.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/49.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/50.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/51.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/52.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/53.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/54.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/green/55.png"
]

data_black = [
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/1.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/2.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/3.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/4.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/5.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/6.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/7.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/8.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/9.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/10.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/11.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/12.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/13.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/14.png",
            "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/black/15.png"
]

data_none = [
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/61.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/62.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/63.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/64.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/65.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/66.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/67.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/68.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/69.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/70.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/71.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/72.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/73.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/74.png",
           "https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/none/75.png"
]


In [32]:
trainX2 = []
trainY2 = []

for red in data_red:
  img = load_image(red)
  img_features = averagecolor(img)
  trainX2.append(img_features)
  trainY2.append('red')

for green in data_green:
  img = load_image(green)
  img_features = averagecolor(img)
  trainX2.append(img_features)
  trainY2.append('green')

for black in data_black:
  img = load_image(black)
  img_features = averagecolor(img)
  trainX2.append(img_features)
  trainY2.append('black')

for none in data_none:
  img = load_image(none)
  img_features = averagecolor(img)
  trainX2.append(img_features)
  trainY2.append('none')

### Task: How many images do we use to train our model now?

In [34]:
print (len(trainX2))
print (len(trainY2))

60
60


In [35]:
print(trainX2)

[array([ 45.60039063,   5.57547852, 189.96488281]), array([ 42.21973958,   3.52173503, 190.63109375]), array([ 36.30580404,   4.96489258, 156.84991536]), array([ 24.98639974,   3.25621094, 120.94683919]), array([ 27.74974284,   3.75301432, 130.40387695]), array([ 36.44341471,   6.08316081, 141.46752279]), array([ 38.20094727,   2.64868164, 167.89153646]), array([ 38.45661784,   3.06408529, 164.47763021]), array([ 49.18943034,   6.1669043 , 202.84317708]), array([ 38.27839518,  14.36999023, 134.29185547]), array([ 74.11512695,  15.43100586, 235.59394857]), array([ 41.44109049,   7.50405599, 166.28410807]), array([ 34.59366211,   1.62420573, 154.82158203]), array([ 34.56989909,   4.35575195, 139.18686198]), array([ 51.46043945,   4.72528646, 196.45832031]), array([ 96.39947917, 132.38474284,  54.98873047]), array([169.68820313, 189.00037435,  91.21655599]), array([124.32165039, 139.04360352,  67.04812174]), array([113.29489258, 127.49212891,  39.29086263]), array([87.03258789, 95.2636263

### Task: Check with the subfolders!
Open the red, green, black and none subfolders in the images directory on your computer. How many images are we loading in from each folder?

### After having loaded more training images, let us re-run the test

In [36]:
import os
filenames = []
predictedY = []
for filename in data_test:
    img = load_image(filename)
    img_features = averagecolor(img)
    calculated_distances = []
    for card in (trainX2):
        calculated_distances.append(np.linalg.norm(img_features-card))
    prediction =  trainY2[np.argmin(calculated_distances)]
    
    print (filename + ": " + prediction)
    filenames.append(filename)
    predictedY.append(prediction)

# Evaluate Accuracy (the sklearn package provides a useful report)
print ()
print(classification_report(realtestY, predictedY))

# Evaluate Accuracy
print (evaluateaccuracy(filenames,predictedY))

https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/16.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/17.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/18.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/19.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/20.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/36.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/37.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/38.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/39.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMac

**What have we just done?**

We have just seen how we "trained" the model for the kNN in section 1.1, and then used the model to predict which class the new card belonged to in section 1.2. We then went further in section 1.3 to explore how increasing the training data could help to improve the accuracy, eliminating the earlier error of misclassifying "58.png" as none when it was actually green.

We used a very simplified example of the kNN algorithm which finds the k Nearest Neighbours to predict the class of the new image based on its closest neighbours. In the example above, the value of k was 1. Hence, we only searched for the nearest neighbour (the neighbour with the smallest calculated distance), and predicted the value of the test image based on the class of the nearest neighbour.


### Take a moment to reflect

How does this method compare to the methods used in the previous workshop? 

Did you require more or less lines of code? Do you prefer defining the rules or letting the machine learn by itself? For most of you, you would probably find it easier to provide a set of training images than to have to define the rules manually. If you found it easier to define the rules and still had a rather robust system, what techniques did you use?

How can we improve the system further? Would a mix of approaches do even better? Will this work with all types of images? Why or why not? Do write your notes in the Student Activity Guide.

<br />
<video controls src="images/black_red_green.mp4" style="width:400px;" />

## 2. Basic steps for building a classification model

In section 1, we have quickly jumped into implementing a very simple classification model based on the kNN algorithm.
In practice, training of computer vision models is typically done using frameworks like Keras, Tensorflow, Caffe, and MXNet, or libraries such as Scikit-Learn for Python. These frameworks and libraries contain various tools and make it easier to work with larger data sets and algorithms without having to code everything from scratch. 

Training can take hours, days or even weeks, often requiring machines with GPUs and more powerful compute capabilities. The model we built for kNN was a simplistic one using numpy arrays, for the sake of illustrating the concepts.

Let us now explore the steps typically required for building a classification model (some of which were already done for you in this exercise):
1. Gathering data
1. Data Preparation (cleaning, labelling, etc.)
1. Splitting the data into a training set and a test set
1. Selecting an algorithm and training a model
1. Evaluating the performance

Selecting the algorithm to use was just one out of the 5 steps. For machine learning algorithms, the data preparation is very important. If you feed in wrong information, the model will naturally turn out wrong. The data needs to be representative and the features used needs to be relevant to your purpose. Otherwise, you may get very unreliable results.

Similarly, any prior preprocessing and the features that you use for the model is important. Imagine trying to train a model that recognizes flowers of different colors but only using greyscale images (leaving out the important color features). In contrast, for optical character recognition (OCR), color may not be very useful and might not be included in the selected features for the model.

## 3. That was kNN, how about Support Vector Machines?

If you think about it, the k-Nearest-Neighbour algorithm did not really learn much, it basically stored the training data and did a lookup everytime an inference on a new image was required. 

In your math class, do you remember learning about deriving the equation of a line **y = mx + c?**

What if we could also derive an equation or formula that could be used to predict the different classes?


## What are Support Vectors?

Imagine you needed to classify O from X. Could you draw a single line that best separates all the X from the O?

<img src="https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/svm1.jpg" style="width: 300px; float:left;" />
<div style="clear:both;"></div>

Perhaps we could draw a line (blue line below). And this is a simple example of a Support Vector.  Anything to the left/top of the line could be classified as X, and anything to the right/bottom of the line could be classified as O. 

<img src="https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/svm2.jpg" style="width: 300px; float:left;" />
<div style="clear:both;"></div>


Note: The math behind SVM will be outside the scope of this workshop, but you are encouraged to read more. https://www.svm-tutorial.com/2014/11/svm-understanding-math-part-1/ (In the link, it illustrates with diagrams how a single linear vector can separate 2 distinct classes)

Let us go on to explore how Support Vector Machines (SVM) work in practice, making use of the python scikit-learn library. First, "derive the equation" of the Support Vector, then "use the equation" to run the predictions.

### First train the model

In [37]:
# Since SVM uses numerical values, we first encode our labels into numerical
from sklearn.preprocessing import LabelEncoder  #encode labels into numerical
encoder = LabelEncoder()                        #encode labels into numerical
encodedtrainY2 = encoder.fit_transform(trainY2) #encode labels into numerical

from sklearn import svm
model = svm.SVC(gamma="scale", decision_function_shape='ovr')
model.fit(trainX2, encodedtrainY2)

SVC()

What does LabelEncoder do? Let's look at the function result. 
You can read more about LabelEncoder [here](https://medium.com/@contactsunny/label-encoder-vs-one-hot-encoder-in-machine-learning-3fc273365621)

In [38]:
print (encodedtrainY2)

[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]


In-depth understanding of SVM is beyond the scope of this moodule, but feel free to learn more [here](https://medium.com/all-things-ai/in-depth-parameter-tuning-for-svc-758215394769)

Now, we have obtained our SVM model.  

### Let's run the predictions!

In [40]:
import os
filenames = []
predictedY = []
for filename in data_test:
    img = load_image(filename)
    img_features = averagecolor(img)
    prediction = model.predict([img_features])[0]
    
    #decode the prediction
    prediction = encoder.inverse_transform([prediction])[0]
    
    print (filename + ": " + prediction)
    filenames.append(filename)
    predictedY.append(prediction)

# Evaluate Accuracy (the sklearn package provides a useful report)
print ()
print(classification_report(realtestY, predictedY))

# Evaluate Accuracy
print (evaluateaccuracy(filenames,predictedY))

https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/16.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/17.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/18.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/19.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/20.png: black
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/36.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/37.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/38.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/test/39.png: red
https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMac

### Which one is more accurate?

Do you think SVM is more effective at getting more correct classifications than kNN or vice versa? 

It depends on the problem. And for SVM, there are also other parameters that will need to be tuned that are outside the scope of this workshop. These parameters will guide the model generation process. For example, the model needs to know what kind of Support Vector to generate. A "straight line" might work for some datasets, but for others, we might need a curve or more complex support vectors.

For illustration, imagine trying to fit a straight line to classify the Os and Xs below. Perhaps you might need an equation for a circle instead.

<img src="https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/svm3.jpg" style="width:400px; float:left;" />
<div style="clear:both;"></div>

You can refer to the links at the end of this section if you wish to find out more.

Up to this point, trained out model using trainX2 and trainY2, then tested our model against a separate set of images and it seemed to perform well. However, working well on a small test set does not mean that it will always work well. Let us test again on another image that has not been tested before. The human eye can easily tell which color it is. But will the model that seems to be working perfectly so far be able to classify it correctly?

<img src="https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardtestagain.png" style="width:400px; float:left;" />
<div style="clear:both;"></div>


In [41]:
imagenew = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardtestagain.png")
imagenew_features = averagecolor(imagenew)
prediction = (model.predict([imagenew_features])[0])

#decode the prediction from numerical to labels
print(encoder.inverse_transform([prediction])[0])

red


### What went wrong?

Unfortunately, the image appears to be wrongly classified as green instead of red. <br />
It would be hard to dig into why the SVM model classified wrongly in this instance without digging deep into the math which is outside the scope of this workshop. A simple analogy would be that it might be difficult to try to fit a curve into the equation for a straight line. Just like how y=mx+c would be the wrong equation to use for a curve.

**Side Tip:** When designing solutions using machine learning, aim to train the most accurate model but do also take some time to plan for contingencies when the model may not give the correct result. Also consider what could be the impacts of wrong results on your application, and take steps to mitigate the risks. For example, if it is piece of machinary being guided by computer vision, are there other sensors that can also be used to trigger an emergency stop before it crashes into something.

Meanwhile, What does our kNN algorithm think about the same image?

In [42]:
calculated_distances = []
for card in (trainX2):
    calculated_distances.append(np.linalg.norm(imagenew_features-card))
print(trainY2[np.argmin(calculated_distances)])

red


### Does that mean kNN is always more reliable?

Let's try one more image:

<img src="https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardtestagain2.png" style="width:400px; float:left;" />
<div style="clear:both;"></div>

In [43]:
imagenew = load_image("https://ethaneldridge.github.io/cisc499/Module14-07-ComputerVisionAndMachineLearning/cardtestagain2.png")
imagenew_features = averagecolor(imagenew)
calculated_distances = []
for card in (trainX2):
    calculated_distances.append(np.linalg.norm(imagenew_features-card))
    
print("SVM: "+str(encoder.inverse_transform([ model.predict([imagenew_features])[0] ])[0]))
print("kNN: "+str(trainY2[np.argmin(calculated_distances)]))


SVM: none
kNN: none


In the image above, can you guess why kNN wrongly classified the algorithm as none instead of green? 

You can calculate the average color of the image to find out why.

And yes, you can train the model with more images to mitigate these issues.


_Note: The math behind SVM will be outside the scope of this workshop, but you are encouraged to read more. https://www.svm-tutorial.com/2014/11/svm-understanding-math-part-1/ _ (In the link, it illustrates with diagrams how a single linear vector can separate 2 distinct classes)

In our experiment, however, we used it to separate more than 2 classes. You can learn more about the multi-class classification using SVM and view code samples using the documentation at https://scikit-learn.org/stable/modules/svm.html#multi-class-classification And do remember to Search the Internet if you need more help.