# Welcome to this tutorial on image processing!

This tutorial will help you get started with your webcam or camera through a very simple but very powerful example: color detection.

## Introduction to images

**Run the following cell to start!**

In [None]:
import cv2; import matplotlib.pyplot as plt
cap = cv2.VideoCapture(0); title = "A picture of the happiest day of my engineering studies"; _ = cv2.waitKey(500)
plt.imshow(cv2.cvtColor(cap.read()[1], cv2.COLOR_BGR2RGB)); plt.axis('off'); plt.title(title); plt.show(); cap.release()

A lot of things happened to get this picture, so many things that multiple engineering branches are needed to explain it:
1. *PHOTONICS*: Photons bounced of you and your environment, travelled through the air and got picked up by the camera sensor.
2. *ELECTRONICS*: The camera sensor transformed all the photons in  an array of electrical charges. Those were then transformed from an analog to a digital representation.
4. *INFORMATICS*: The digital representation was then stored in the memory of your computer. It could then be manipulated, analyzed or simply displayed, this is what we call image processing!

In this tutorial, we will focus on the last part: **how can we process images to measure, analyze, react to our environment**? If you read something that sounded interesting before the image processing step, feel free to ask any questions... All the steps are fascinating!

## Overview of the objectives of this Notebook:

1. OpenCV
    - Use OpenCV to load and manipulate images
    - Setup a video capture with OpenCV

2. Colors
    - How does a computer perceive color?
    - Detect colors in an image
    - Color ok, so what?

## An engineer never starts from scratch

It would be a massive time loss if we had to manually write scripts to define advanced mathematical operations, define the exact color of every pixel of our screen to display an image, or identify usable USB connections that give us video input. All these issues have been solved and optimized by very talented engineers before us!

Let's start with the following libraries that we will use throughout this notebook.
- Numpy allows us to easily apply mathematical operations
- Matplotlib allows us to display images
- OpenCV allows us to fetch images from a camera, but also much more!

Run the following cell to import the libraries!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import cv2

Additionally, the assistants are also there to make your life a little bit more easy. Here is a helper function that will allow us to see the colors that we detect. The exact code is not important for the moment, but feel free to come back to it once you feel familiar with OpenCV.

Run the following cell to define the function!

In [None]:
def draw_contours(frame, mask, color):

    # BGR to RGB
    color = list(reversed(color))
    frame = frame.copy()

    # Morphological operations
    # You will get a chance to dive deeper into this later!
    kernel = np.ones((5, 5), np.uint8)
    mask = cv2.erode(mask, kernel, iterations=2)
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
    mask = cv2.dilate(mask, kernel, iterations=1)

    # Find contours in the masked image, these are the outlines of connected regions in the image
    cnts, _ = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # If we find any contours
    if len(cnts) > 0:
        # Sort the contours using area
        cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
        # Find the largest contour
        cnt = cnts[0]

        # Calculate the center of the largest contour
        M = cv2.moments(cnt)
        center_point = (int(M['m10'] / M['m00']), int(M['m01'] / M['m00']))

        # Display text on the frame
        cv2.putText(frame, str(center_point), center_point, cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2, cv2.LINE_AA)
        # Draw contours on the frame
        cv2.drawContours(frame, [cnt], 0, color, 3)

    return frame

## A video is a series of frames

The first step of every image processing pipeline is to get at least one image. OpenCV allows us to easily load images through the imread(...) function, or to set up camera stream using VideoCapture(...). Since the first option is trivial, let's check out the camera stream!

Run the following cell to set up a video capture using your webcam. `index=0` should be your webcam, feel free to check if other cameras are available by changing the value of the `index` parameter. OpenCV will tell you if the index is incorrect!

In [None]:
cap = cv2.VideoCapture(index=0)

Now that we found a working webcam, we can check the expected quality of the frames we will receive from it.

Run the following cell to check the resolution and frames per second of your webcam!

In [None]:
cap_width  = cap.get(3)
cap_height = cap.get(4)
cap_fps = cap.get(cv2.CAP_PROP_FPS)

print(f"Resolution: {cap_width} x {cap_height}")
print(f"Frames per second: {cap_fps}")

As you may know, a video is nothing else than a series of still frames that are shown just fast enough to trick your eyes and brain. When you watch a movie, you often only get to see 24 frames per second, and that is already enough!

So, let's start with just one frame.

*Note: If your camera seems to be obstructed and that you are using a MacBook, check your IPhone!*

In [None]:
# get a frame
ok, frame = cap.read()

# quick check to make sure we got a frame
if ok:
    # let's check what the frame actually is
    frame_type = type(frame)
    print(f"1. Our frame is of the following type: {frame_type}")

    # let's check if the frame has the expected size
    frame_shape = frame.shape
    print(f"2. Our frame has the following shape: {frame_shape}")

    # let's see this frame
    plt.figure()
    frame_plt = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # this will be explained later!
    plt.imshow(frame_plt)
    plt.axis('off') # remove the horizontal and vertical axes
    plt.title("3. Our frame lloks like this:")
    plt.show()

Let's check the results

1. The type of the frame that we got is `<class 'numpy.ndarray'>`. This means that we can easily manipulate our frame using our knowledge of Numpy. Can you correct the first line of the next cell `top_half = None` to display only the top half of the frame? Can you correct the second line of the next cell and `flipped = None` to display the flipped version of the frame?

In [None]:
top_half = ...     # Hint: list slicing
flipped = ...      # Hint: check around which axis you want to flip the image

plt.figure()
top_half_plt = cv2.cvtColor(top_half, cv2.COLOR_BGR2RGB)
plt.imshow(top_half_plt)
plt.axis('off') # remove the horizontal and vertical axes
plt.title("Top half of our frame")
plt.show()

plt.figure()
flipped_plt = cv2.cvtColor(flipped, cv2.COLOR_BGR2RGB)
plt.imshow(flipped_plt)
plt.axis('off') # remove the horizontal and vertical axes
plt.title("Flipped frame")
plt.show()

2. Our frame has the expected resolution, but it is stored as `(height, width, 3)`. Why is it a 3D array? Why is the size of the third dimension 3?
3. Since our frame is actually a numpy array, we can use matplotlib.pyplot to display the image. But we did something weird before plotting it with imshow(): `frame_plt = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)`. Do you have an idea what this means? Let's come back to it later!

First, as promised, a video is a series of frames. Run the following cell to check the real-time webcam feed! Think about the structure of this code, does it make sense to use a `while` loop?

*Note 1: Press the **escape** key to end the real-time feed and close the video capture.*

*Note 2: If the window does not close, just ignore it for the moment. This is an issue caused by Jupyter Notebooks*

In [None]:
while True:

    ok, frame = cap.read()
    if ok:
        # cv2.imshow is more efficient than plt.imshow for video content
        cv2.imshow("Real-time webcam", frame)

    # cv2.waitKey is used to wait to check keyboard or mouse inputs, 1 means 1 millisecond
    # The number 27 corresponds to the escape key
    if cv2.waitKey(1) == 27:
        break

# Destroy all OpenCV windows (can fail in Jupyter Notebook environment)
cv2.destroyAllWindows()
# Close the video capture, deactivating the camera
cap.release()

## Processing color in images

First of all, we need to understand how a computer perceives color. The answer is that it is surprisingly similar to humans... and that is not a coincidence! Digital colors are defined based on how our eyes interpret colors.

You may remember that the center of your retina is covered in little conical photoreceptors which we call *cones*. There are three kinds of cones that are sensitive to different wavelength (long, medium and short), which roughly correspond to red, green and blue light. All the colors of the rainbow are actually not directly seen by our eyes, but, as is often the case, created by our brain.

Now, since digital screens are most often displaying images for humans, it makes sense to optimize them to stimulate human eyes. That's why screen are a vast array of very little light sources producing red, green and blue light, often refered to as an *RGB pixel array*. This means that every single pixel needs 3 values to properly display its color.

Run the following code to check this fact out. Up close, you clearly see alternating red and blue squares, similar to pixels in which the red and blue light would be on. By standing several meters away from your screen (or by reducing `square_size`), the squares vanish and you only see purple.

You can also adapt the code to test other combinations of colors for yourself! Green and red is an interesting one. What do you expect as color? What do you actually see?

In [None]:
# Image dimensions
width, height = 1980, 1080
square_size = 20

# Define colors using RGB standards
red = [255, 0, 0]
green = [0, 255, 0]
blue = [0, 0, 255]

# Empty image
img = np.zeros((height, width, 3), dtype=np.uint8)

# Loop over rows and columns
for row in range(0, height, square_size):
    for column in range(0, width, square_size):
        # alternate color between red and blue
        if (column + row) // square_size % 2 == 0:
            color = red
        else:
            color = blue
        # fill the color in the square
        img[row:row+square_size, column:column+square_size] = color

# Display the image
plt.figure()
plt.imshow(img)
plt.axis('off')
plt.show()

There is still one detail to discuss: where does the number 255 come from? Is this a random number or does it correspond to something? What happens if we plug in a smaller number than 255, or a larger one?

*Hint: If you plug in a larger number than 255, Numpy will help you figure it out with a nice red text!*

## Color detection

Now we know everything to start doing interesting stuff! Let's try to detect red, green and blue in an image.

Start a new video capture using your webcam and capture one frame of some colored object.

In [None]:
# start video capture
cap = ...
# since the webcam needs some time to adjust to the light, it's best to wait for a fraction of a second
_ = cv2.waitKey(100)

In [None]:
# get a frame with a colored object


# show the frame


# release your video capture!


We already know how to define red, green and blue. Could you define the color yellow and black using RGB standards?

In [None]:
red = [255, 0, 0]
green = [0, 255, 0]
blue = [0, 0, 255]

yellow = []
black = []

Ok let's try to find the colors in our frame. The most obvious way to proceed is to check every pixel in our array and check if its exact value corresponds to one of the colors that we defined. We will first try with red.

Run the following code to find the amount of red pixels in the frame.

In [None]:
n_red_pixels = 0
n_pixels = 0
for pixel in frame.flatten():
    n_pixels += 1
    if np.array_equal(pixel, red):
        n_red_pixels += 1

print(f"There are {n_pixels} in the frame, from which {n_red_pixels} are red pixels.")

This was not only slow, but you probably did not find a single red pixel in the frame. This is normal!

In reality, all colors around you give combinations of red, green, and blue when picked up by a camera. The best example is light and dark shades, human could call them light and dark red, while computers may refer to them as `[255, 70, 70]` and `[150, 20, 20]`, respectively. Every pixel in the frame will have some combination of `[R, G, B]`, so finding out the color of a pixel is always a 3 number problem.

The easiest way to proceed is to transform our RGB color space into one that is more adapted to color detection for computers: **the HSV color space**!
HSV stands for Hue, Saturation, and Value. And very simply put, the hue maps the actual colors of the rainbow to a range of numbers from 0 to 180, while the saturation and value together encode what we referred to as shade.
Check out the following figure for a visual representation of the RGB and the HSV color spaces.

<p float="left">
  <img src="https://upload.wikimedia.org/wikipedia/commons/8/83/RGB_Cube_Show_lowgamma_cutout_b.png" style="width:47%;">
  <img src="https://upload.wikimedia.org/wikipedia/commons/3/33/HSV_color_solid_cylinder_saturation_gray.png" style="width:47%;">
</p>

So, to conclude:

A video is a series of frames shown very rapidly after each other.
A frame is a collection of pixels represented by an amount of red, green and blue.
To identify colors, we need to transform our frame so that the color is represented by one single number: the hue.

## Color detection in Python using OpenCV

Enough theory, now we know everything needed to start color detection!

Run the following cell to transform your image from the RGB color space to the HSV color space.

*Note: You may see that the conversion is actually done from BGR to HSV. The reason is simply that OpenCV stores the blue first and the red last, hence BGR. Does that ring a bell to one of the weird lines above?*

In [None]:
hsv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

You will probably need to take a lot of new captures from you webcam to check the color detection on different objects. Here is a very small function that you can call to get a frame in BGR and HSV color space from your capture.

*Note: Be careful, the video capture does not get released*

In [None]:
cap = cv2.VideoCapture(0)

def get_frame(cap):
    if not cap.isOpened():
        cap = cv2.VideoCapture(0)

    _ = cv2.waitKey(100)

    _, bgr_frame = cap.read()
    hsv_frame = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2HSV)
    return bgr_frame, hsv_frame

In [None]:
# You can now use this line of code to get a new frame
frame, hsv_frame = get_frame(cap)

Let's start with blue.

In the HSV color space, blueish colors are somewhat in the middle of the range of hues, let's say from 100 to 125. The saturation and the value range from 0 to 255, and since we already defined the exact color by its hue, we can set a broad range for both these numbers to find as many shades of blue as possible. Just remember that a saturation close to 0 corresponds to very gray shades and a value close to 0 corresponds to very dark shades. You can check the figure above for visual confirmation. The output of `inRange()` is a mask, this is an array of the same shape as the input with 0 where nothing was detected and 255 where something was detected.

Run the following cell to identify blue regions in the frame!

In [None]:
# lower bound for blue
low_blue = np.array([100, 50, 50])
# upper bound for blue
high_blue = np.array([125, 255, 255])
# find any pixel that sits between low blue and high blue
blue_mask = cv2.inRange(hsv_frame, low_blue, high_blue)
# draw mask, this is the function that we defined at the start of this notebook
blue_frame = draw_contours(frame, blue_mask, blue)

# plot
plt.figure()
rgb_blue_frame = cv2.cvtColor(blue_frame, cv2.COLOR_BGR2RGB)
plt.imshow(rgb_blue_frame)
plt.axis('off')
plt.show()

And let's do the same for green.

In the HSV color space, greenish colors correspond are somewhat at the center of the range of hues, let's say from 40 to 90. Don't forget to use a broad range of saturation and value.

In [None]:
# range
low_green = np.array([])
high_green = np.array([])

# mask
green_mask = cv2.inRange(hsv_frame, low_green, high_green)
# draw mask
green_frame = draw_contours(frame, green_mask, green)

# plot
plt.figure()
rgb_green_frame = cv2.cvtColor(green_frame, cv2.COLOR_BGR2RGB)
plt.imshow(rgb_green_frame)
plt.axis('off')
plt.show()

For red, the is one extra step to take. In the HSV color space of OpenCV, reddish colors are defined both from 0 to 20 and from 160 to 180. So we need to check two ranges. Luckily, both inputs and outputs of OpenCV are numpy arrays, so you can simply add the red masks together!

Another complication with red is that white color skin tends to overlap with reddish colors in HSV color space. To avoid these pale reddish colors, use a stricter range on the saturation to get only vibrant reds!

Fill the following cell to detect red!

In [None]:
# range and mask 1

# range and mask 2

# generating the final mask to detect red color

# draw mask

# plot


As a last exercice, can you detect black using this method? Or what about white or gray? Can you use the exact same logic for these colors or not? Why do you think that is?

Please release any video capture before going to the final part of this tutorial

In [None]:
cap.release()

## Color detection in a real-time video

Let's put everything together and detect all the colors in real time using your webcam! Make sure to draw the contours on the same frame to see all the detected colors at once!

You will need to find the hue corresponding to yellow yourself. You can use the internet for that, or just maybe we can find it using a couple of lines of code in OpenCV?

In [None]:
# video capture setup
cap = cv2.VideoCapture(0)

# infinite loop
while True:

    # get a frame
    ok, frame = cap.read()


    # get HSV frame


    # blue

    # green

    # red

    # yellow


    # show frame
    cv2.imshow("Color detection", frame)

    # to quit
    if cv2.waitKey(1) & 0xFF == 27:
        break

# release the capture and close windows


## Use color as an input to our system

Let's try to follow an object of a specific color. The tighter the range can be around the colored object, the better it will work!

In [None]:
# color of object to track in HSV color space
low = np.array([100, 50, 50])
high = np.array([125, 255, 255])

In [None]:
# video capture setup
cap = ...

# Define a trail mask with the same size as our frame to store the trail of the object
cap_width  = ...
cap_height = ...
trail_mask = np.zeros((height, width, 3), dtype=np.uint8)

# infinite loop
while True:

    # get a frame
    _, frame = ...

    # get HSV frame
    hsv_frame = ...

    # get mask of the color
    mask = ...

    # Find contours in the masked image, these are the outlines of connected regions in the image
    cnts, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # If we find any contours
    if len(cnts) > 0:
        # Sort the contours using area
        cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
        # Find the largest contour
        cnt = cnts[0]

        # Calculate center of mass of contoured area
        M = cv2.moments(cnt)
        COM = (int(M['m10'] / M['m00']), int(M['m01'] / M['m00']))

        # Draw a circle at the location of the center of mass on our trail mask
        cv2.circle(trail_mask, COM, 5, [0, 0, 255], -1)

    # Add the trail onto the most recent frame
    frame = cv2.addWeighted(frame, 1, trail_mask, 1, 0)

    # Show frame
    ...

    # to quit
    ...

# release the capture and close windows
...