# Python for Computer Vision 
Adapted from 6.8300 / 6.8301 course from MIT 

# Contents

Table of Contents

- [Matplotlib (Plotting and Visualization)](#Matplotlib)
    - Basic plots
    - Figures
      
    
- [OpenCV (Computer Vision)](#OpenCV)
    - Reading images
    - Channels, Image Formats, and using images as arrays
    - Showing images
    - Basic image operations - Resize, Color, and more
    - Working with Video

# Initilization

In [None]:
import os
import cv2
import matplotlib.pyplot as plt

In [None]:
expected_name = "2023_PythonTutorial_6869.ipynb"

# Colab specific setup
try:
  from google.colab import drive

except Exception:
  # Local setup
  rootpath = "."

else:
  drive.mount('/content/gdrive')
  rootpath = '/content/'



**NOTE:** Matrices in numpy MUST be rectangular. Unlike nested Python lists, which can have the first list contain 1 element, and the second list contain 3 elements, in a numpy matrix, all rows have to have the same length. In other words, the matrix cannot be "jagged"

Unless you explicitly use `np.copy`, Reshapes and slices create *views* of your data - that is, they all reference the same data! So, since the variables are all aliases to the same data, changes to one will reflect in all the others! This is a double-edged sword that can boost your performance, but might catch you off guard.

The basic mathematical operators (+, -, /, \*, %) are treated as "elementwise" operators - they do something with each element. Which operands are used depends on a concept called "broadcasting". In practice - if you have two ndarrays of the same shape, then the operands will be corresponding elements in each ndarray. Otherwise, if possible (ie, dimension is length 1), the smaller ndarray/scalar is repeated to be the same size as the larger array.

# Matplotlib

Matplotlib is a plotting library. `matplotlib.pyplot` exposes a stateful, easy to use, plotting system.

Documentation: https://matplotlib.org/stable/index.html

In [None]:
import matplotlib
import matplotlib.pyplot as plt

### Plotting

Let's make a simple 2d plot

In [None]:
# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)

# Plot the points using matplotlib
plt.plot(x, y)
plt.show()

With just a little bit of extra work we can easily plot multiple lines at once, and add a title, legend, and axis labels:

In [None]:
y_sin = np.sin(x)
y_cos = np.cos(x)

# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])
plt.show()

### Subplots

You can plot different things in the same figure using the subplot function. Here is an example:

In [None]:
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)

# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')

# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.scatter(x, y_cos)
plt.title('Cosine')

# Show the figure.
plt.show()

# OpenCV

OpenCV is an extremely popular computer vision library built in C++, with many powerful tools for CV. It lets you read, write, and show images and videos, read from webcam streams, find matching keypoints between two images, and more.

OpenCV is written in C++, however, there is a Python library that uses these optimized C++ libraries, and exposes an API using numpy arrays!

We're going to work with images and videos here. We'll be downloading them automatically, but if you have any issues, you can get the files from this folder and manually upload them to colab with drag and drop:  https://drive.google.com/drive/folders/1wP7BLo6gKC13696GVjVdIK-aXcXpr4Ye?usp=share_link

Let's import OpenCV

In [None]:
import cv2

Let's start by fetching a test image and video that we'll be using in this section. For this, we use the `requests` package: a fairly straightforward way of fetching content from URLs.

In [None]:
# Let's download the image and video we'll be using!
import requests

img = requests.get("https://www.dropbox.com/s/8n5v2zp7cuwb0gx/phoenix.jpg?dl=1").content
with open('phoenix.jpg', 'wb') as handler:
    handler.write(img)

vid = requests.get("https://www.dropbox.com/s/f194zeyqbr00cjm/sample_video.mp4?dl=1").content
with open('sample_video.mp4', 'wb') as handler:
    handler.write(vid)

The image and video should show up on your file explorer. If you're on colab, click the refresh files button if it's not popping up.

## Reading, Writing, and Showing Images

### Reading

You can use the `imread` function to read in an image from a filepath.

In [None]:
phoenix_image = cv2.imread("phoenix.jpg")

# Careful, if it can't find your image, cv2.imread silently fails and returns None!
if phoenix_image is None:
  raise Exception("The image was not found! Check that you can see it on colab's file explorer by clicking the files icon.")

Images in OpenCV are represented as numpy arrays!

In [None]:
type(phoenix_image), phoenix_image.shape, phoenix_image.dtype

### Channels, Image Formats, and using images as arrays
The shape of a color image is (height, width, colors BGR) \
While it may seem strange that the height is first, it's because OpenCV treats images as "Rows" and "Columns" of an image. The "height" of an image is the number of rows!

In [None]:
phoenix_image.shape

You can see each pixel is represented by 3 values (uint8 means they are between 0 and 255)

In [None]:
phoenix_image[0,0] # Get the pixel located at (0,0) from the top left

Color images consist of "channels" - each color we can render is some combination of red, green, and blue (OR, in the case of a grayscale image, gray).

There are other sets of channels - you'll learn about these in the Color lecture!

By default, color images are opened by OpenCV as BGR, meaning the values for a given pixel are ordered "blue, green, red".

We can use the `cv2.cvtColor` function to change which color system our image is in. This will appear shortly.

In [None]:
phoenix_image_rgb = cv2.cvtColor(phoenix_image, cv2.COLOR_BGR2RGB)

### Showing the image

If you're running scripted Python (not Jupyter notebook) The `imshow` command will display an image. However, this doesn't work in jupyter notebook, so we'll use Matplotlib's `imshow` instead

In [None]:
# This line only works if you're running locally
# cv2.imshow('test', phoenix_image)

Matplotlib assumes images are in the **RGB** format. OpenCV assumes that images are in the **BGR** format. So, we'll convert colors before showing the image. Let's make a function to do this.

In [None]:
def imshow(image, *args, **kwargs):
    if len(image.shape) == 3:
      # Height, width, channels
      # Assume BGR, do a conversion since
      image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    else:
      # Height, width - must be grayscale
      # convert to RGB, since matplotlib will plot in a weird colormap (instead of black = 0, white = 1)
      image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
    # Draw the image
    plt.imshow(image, *args, **kwargs)
    # We'll also disable drawing the axes and tick marks in the plot, since it's actually an image
    plt.axis('off')
    # Make sure it outputs
    plt.show()

Let's show the image!

In [None]:
imshow(phoenix_image)

### Manipulating images

#### Changing color spaces

OpenCV exposes several functions to work with images. Let's use the `cvtColor` function to convert the color image to gray. Grayscale images do not have a third dimension, instead, each pixel has a luminosity ("whiteness") value between 0 and 255

In [None]:
phoenix_gray = cv2.cvtColor(phoenix_image, cv2.COLOR_BGR2GRAY)

print("Created BW image of shape",phoenix_gray.shape)
imshow(phoenix_gray)

We also can manipulate it by doing anything we would to a normal array. Let's make an image that includes the gray phoenix as the blue channel and red channels, and nothing in the green channels. (This is NOT the same as excluding the green channel from the original image)

In [None]:
empty_arr = np.zeros(phoenix_gray.shape, dtype=np.uint8)

# Stack them, making the 3rd axis
magenta_phoenix = np.stack([ phoenix_gray, empty_arr, phoenix_gray, ], axis=2)
print("Created image of shape",magenta_phoenix.shape)
imshow(magenta_phoenix)

#### Resizing images

We can also resize images using `resize`. This needs the output size. Note that these are image sizes, which are expressed as (width, height), NOT to be confused with their shape.

In [None]:
image_height, image_width, image_num_channels = magenta_phoenix.shape
new_height = image_height * 2
new_width = image_width * 3

# Resize it to 3x the width, and 2x the height, so we expect some distortion.
# (To display it in the browser, the image is being scaled down anyway, so resizing it 2 x 2 will not be obvious)

bigger_magenta_phoenix = cv2.resize(magenta_phoenix, (new_width, new_height))
print("Resized to image of shape",bigger_magenta_phoenix.shape)
imshow(bigger_magenta_phoenix)

### Writing an Image

The `imwrite` function can write out an image. Let's write out the image we just made, so we can use it later!

In [None]:
output_path = "./output_pinkphoenix.png"
cv2.imwrite(output_path, bigger_magenta_phoenix)

We should be able to read that image directly from the file. Let's try!

In [None]:
test_read_output = cv2.imread(output_path)
print("Read file of shape:",test_read_output.shape, "type",test_read_output.dtype)
imshow(test_read_output)

Everything works as expected!

### Working with Video

A video is nothing more than a series of images. We can use the `VideoCapture` object to read videos from webcams, IP cameras, and files. Since we're working in the cloud, we'll use files.

We can use the `VideoWriter` object to write videos to a file. (If you were working locally, you could use `cv2.imshow` to display it in real time)

Let's use what we've learned so far to crop the video!

In [None]:
# function to crop a given frame
def crop_frame(frame, crop_size):
  # We're given a frame, either gray or RGB, and a crop-size (w,h)
  crop_w, crop_h = crop_size
  # This is an array! We can slice it
  # Take the first pixels along the height, and along the width
  cropped = frame[:crop_h, :crop_w]
  return cropped

capture = cv2.VideoCapture('sample_video.mp4')

crop_size = (600,400) # w,h
output_path = 'output_cropped.mp4'
# Use the MJPG format
output_format = cv2.VideoWriter_fourcc('M','P','4','V')
output_fps = 30
cropped_output = cv2.VideoWriter(output_path, output_format, output_fps, crop_size)
n = 0

while True:
  successful, next_frame = capture.read()
  if not successful:
    # No more frames to read
    print("Processed %d frames" % n)
    break
  # We have an input frame. Use our function to crop it.
  output_frame = crop_frame(next_frame, crop_size)
  # Write the output frame to the output video
  cropped_output.write(output_frame)
  n += 1
  # Now we have an image! We can process that as we would.

# We have to give up the file at the end.
capture.release()
cropped_output.release()


### Display the Video

Unfortunately, it's rather difficult to display videos in Jupyter, so check your file explorer!

# Exercises

All of these exercises are doable with the information you've been presented thus far.


## Exercise 1
**Grading students**
The class 6.869 has `num_students` students. Each student has `num_grades` grades, one for each assignment.
The staff store our grades in a numpy ndarray, of shape `(num_students, num_grades)`. (Each row is a student, each column is an assignment)

**(a)** Create an ndarray of the proper shape to hold the grades table, and fill it with the values `[0, num_students * num_grades)`, going left-to-right, then top-to-bottom.

**(b)** We have a meeting with Julie, whose student index is `2`, and want to see how she's doing in the class. Use ndarray slicing to get an array containing all of her grades (index 2).

**(c)** Phillip wants to know if PSet 4 (assignment index 4) is too hard. Use ndarray slicing to extract the whole classes grades for PSet 4.

In [None]:
num_students = 4
num_assignments = 5

# Write your solution below.


# Exercise 2
**Pose Estimation** The 6.869 staff have developed a ground-breaking pose estimation network. The output of the network is a matrix of shape `(num_keypoints, 3)` (each row is a key point on the body, the columns are X,Y,Z). A "joint" is a connection between two keypoints, expressed as a matrix of shape `(num_joints, 2)`, (each row is a joint, the columns are START_KEYPOINT_INDEX and END_KEYPOINT_INDEX).

**(a)** Create a matrix of joint starts, and another matrix of joint ends, each of shape `(num_joints, 3)`. The starts table should contain the position of the start of each joint (according to `position`)  

**(b)** Create a matrix of joint-displacements, of shape `(num_joints, 3)`. Each row represents a joint. The columns should be the difference in X, Y, and Z between the start of the joint, and the end of the joint, respectively `(endX - startX, endY - startY, endZ-startZ)`.

**(c)** Find the magnitude (length) of each of these displacement vectors, and output the results in an array of length `num_joints`. Remember the power operator is `**`.


In [None]:
num_keypoints = 7
num_joints = 5


# All Z's in one plane, but makes it easier to see XYZ vs Start/end
keypoint_positions = np.array(
    [
        [0, 1, 0], #Head
        [0, 0, 0], #Torso
        [1, 0, 0], #Right Arm
        [-1, 0, 0], #Left Arm
        [0, -1, 0], #Lower ,Torso
        [1, -2, 0], #Right Leg
        [-1, -2, 0] #Left Leg
    ]
)

#   O
#  _|_
#   |
#  /\
joints = np.array([
    # Head to torso
    [0, 1],
    # Torso to Right arm
    [1, 2],
    # Torso to Left Arm
    [1, 3],
    # Torso to Lower Torso
    [3, 4],
    # Lower Torso to Right Leg
    [4, 5],
    # Lower Torso to Left Leg
    [4, 6]
])


# Write your solution below.


# Exercise 3

**Edge detection** Our phoenix is lovely and bright - but what if we wanted to draw it? it might help to have the edges.

**(a)** Load the image `phoenix.jpg`. Covert it to grayscale. \
**(b)** Use the scipy.signal.convolve2d (aliased as `conv2d`) to compute the convolution of the phoenix and the `kernel` (make sure to cast to a float32 between 0 and 1 first). Use imshow to show the results. Use `prep_to_draw` to convert a [0,1] BW image to a drawable image.

In [None]:
import scipy
import scipy.signal

conv2d = scipy.signal.convolve2d # assigning a shorter name for this function.

# looks for horizontal edges
horizontal_edge_detector = np.array(
  [
      [-1, 0, 1]
  ]
)

box_blur_size = 15
box_blur = np.ones((box_blur_size, box_blur_size)) / (box_blur_size ** 2)
sharpen_kernel = np.array(
    [
        [0, -1, 0],
        [-1, 5, -1],
        [0,  -1, 0]
    ]
)

all_edge_detector = np.array(
    [
        [0, -1, 0],
        [-1, 4, -1],
        [0,  -1, 0]
    ]
)

def prep_to_draw(img):
  """ Function which takes in an image and processes it to display it.
  """
  # Scale to 0,255
  prepped = img * 255
  # Clamp to [0, 255]
  prepped = np.clip(prepped, 0, 255) # clips values < 0 to 0 and > 255 to 255.
  prepped = prepped.astype(np.uint8)
  return prepped

# Write your solution below.