In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

### Read this First

#### Remember that `tab` is is useful for autocompletion.

#### Remember that `shift + tab` is useful for rapidly obtaining usage + documentation.



### Basic Image Manipulations with Numpy and OpenCV

For this exercise, you'll need to install PyTorch and torchvision if you are running on your local computer:

pip3 install torch torchvision


**Run the following code to download and load the MNIST training set.** (The data will be downloaded if it is not already present.)

In [2]:
from pathlib import Path
HOME = Path.home()
MNIST_PATH = HOME / 'data' / 'mnist'
import torchvision
official_mnist_train = torchvision.datasets.MNIST(str(MNIST_PATH), train=True, download=True)
train_images = official_mnist_train.train_data.numpy().astype(np.float)
train_labels = official_mnist_train.train_labels.numpy().astype(np.int)

ModuleNotFoundError: No module named 'torchvision'

**Print the shape of `train_images` and `train_labels`.**

In [None]:
print(train_images.shape," ",train_labels.shape)

**In the following Markdown Cell, answer:**

**Based on these shapes, how many training images are there? And what is the height and width of each image?**

60000 images, height=28 , width=28

**Form `first_image` as a 2-D array with shape `[28, 28]`, containing the 0-th image of `train_images`, and visualize `first_image` using `plt.imshow`.** Also feel free to run `plt.set_cmap('gray')` after plotting the image if you'd like to see it using a grayscale colormap.

In [None]:
first_image = train_images[0]
plt.imshow(first_image)
plt.set_cmap('gray')

**Print the label of the 0-th image.**

In [None]:
print(train_labels[0])

**Create a 2-D array `first_image_flipped` that consists of the first training image *but flipped horizontally*, and visualize the result using `plt.imshow`.** Note that `first_image` has a shape of `[H, W]`, where `H` is the height of the image and `W` is the width of the image.

In [None]:
import cv2
first_image_flipped = cv2.flip(first_image,1)
plt.imshow(first_image_flipped)

**Create a 2-D array `first_image_down_2` that consists of the first training image but downsampled by a factor of 2, and plot the result using `plt.imshow`.** (The resulting image should have shape `[14, 14]`.)

In [None]:
first_image_down_2 = train_images[0][::2 , ::2]
print(first_image_down_2.shape)
plt.imshow(first_image_down_2)

**Create a 2-D array `first_image_down_4` that consists of the first training image but downsampled by a factor of 4, and plot the result using `plt.imshow`.** (The resulting image should have shape `[7, 7]`.)

In [None]:
first_image_down_4 = train_images[0][::4 , ::4]
print(first_image_down_4.shape)
plt.imshow(first_image_down_4)

**Print the minimum and maximum values of `first_image`.**

In [None]:
print(np.min(first_image)," ",np.max(first_image))

**Create a copy of `first_image`, `first_image_copy`, using `first_image_copy = first_image.copy()`.**

In [None]:
first_image_copy = first_image.copy()

**Create a 2-D boolean mask named `mask` with the same shape as `first_image_copy`, with elements that are `True` whenever a pixel's value exceeds 50 and which is `False` otherwise. Print `mask`'s `dtype` and also print how many values are `True`.**

In [None]:
mask = first_image_copy>50
print('dtype:',mask.dtype)
print('number of True:',np.sum(mask!=False))

**Visualize `mask` using `plt.imshow`.**

In [None]:
plt.imshow(mask)

**Create `mask_upper_half` by keeping only the upper half of `mask`, and visualize `mask_upper_half` using `plt.imshow`.** (`mask_upper_half` should have shape `[14, 28]`.)

In [None]:
mask_upper_half = mask[1:15 , : ]
plt.imshow(mask_upper_half)
print(mask_upper_half.shape)

**Halve all pixels that exceed 50 in `first_image_copy` that exceed a value of 50, in place, using `mask`, and then print the minimum and maximum values of `first_image_copy`.**

In [None]:
for i in range(28):
  for j in range(28):
    if(mask[i,j]==True):
      first_image_copy[i,j] = first_image[i,j]/2
print(np.min(first_image_copy),np.max(first_image_copy))

**Form `first_ten_images_as_one` by concatenating the first 10 training MNIST images horizontally, and visualize the result using `plt.imshow`.**

In [None]:
first_ten_images_as_one = np.concatenate(train_images[0:10],axis=1)
plt.imshow(first_ten_images_as_one)

**We are also going to make sure of the OpenCV libraries.  To install openCV you will need to run**

**pip3 install opencv-python**

In [None]:
import cv2

In [None]:
cap = cv2.VideoCapture(0)
if not (cap.isOpened()):
    print('Could not open video device')

**If this doesn't work, or you are running on Colab, then you can instead place an image in your local directory with the name testimg.jpg and skip over the next line.**

In [None]:
ret, frame = cap.read()
cv2.imwrite('testim.jpg',frame)

In [None]:
frame = cv2.imread('testim.jpg')
plt.imshow(frame)

**Does something look wrong here? Explain what is going on in the box below:**

the picture is too blue to what it should be. imread() get the picture in order of BGR, NOT RGB

**Can you fix this on your own?  Add some code below that addresses the problem you've found by directly manipulating the image**

In [None]:
B , G , R = cv2.split(frame) 
correct_frame = cv2.merge([R,G,B]) 
plt.imshow(correct_frame)

**This is common enough that we have tools for this.  Please do the same using the built in function cv2.cvtColor**




In [None]:
correct_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
plt.imshow(correct_frame)

**Now convert the image to a gray-scale image using cvtColor**

In [None]:
gray_scale_image = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
plt.imshow(gray_scale_image , cmap = 'gray')

**Now, write some code todo the following:  Pad the image with a single row and column of zeros top and bottom. Create a version of the image that is shifted by one pixel to the right, and one to the left. Display the absolute value of the difference between the original image and the shift images.  What does it seem to be doing?**



In [None]:
m,n,channel = correct_frame.shape
m = m + 2
n = n + 2
origin_img = np.zeros((m,n,channel),dtype= np.int)
origin_img[1:m-1 , 1:n-1 , ...] = correct_frame

left_shifted_img = np.zeros((m,n,channel),dtype= np.int)
left_shifted_img[0:m-2 , 0:n-2 , ...] = correct_frame
diff_of_OL = origin_img - left_shifted_img
plt.imshow(diff_of_OL)
plt.show()
right_shifted_img = np.zeros((m,n,channel),dtype= np.int)
right_shifted_img[2:m , 2:n , ...] = correct_frame

diff_of_OR = origin_img - right_shifted_img
plt.imshow(diff_of_OR)
plt.show()

In [None]:
It seems to be depicting the edges of areass which have different colors

**Now, compute the average of the shifted images.  What does this seem to be doing?**

In [None]:
avg_img = (left_shifted_img + right_shifted_img)//2
plt.imshow(avg_img)
plt.show()
plt.imshow(avg_img - origin_img)
plt.show()

It seems that the average of shifted images is the original image

**To gain more experience, you may want to play with other ways you can transform images -- changing contrast, brightness, or performing histogram equalization are all common ways to enhance or normalize images before processing them.**