# **Introduction to Computer Vision - Tutorial**




---
---

Welcome to a tutorial session on computer vision as part of the Fundamentals of Artificial Intelligence Programme. In this notebook, we introduce libraries and codes that you need for the computer vision assignment.

##  **Image Processing** 
---

### **Image Array**
An image is a two-dimensional array, thus we use the most popular array library, *Numpy*, in Python to represent and process as an image array. An example array is represented in the following generated by a custom function.

In [1]:
# Uncomment to install packages for running locally
# !pip install opencv-python 
# matplotlib numpy torch torchvision 
# !pip install tensorboard

In [2]:
import numpy as np

In [None]:
# This is a custom funcion to generate chekered pattern in a 2D array
# Returns an LxL array, with C checkered blocks per direction
def np_checkered(L, C):
    assert L % C == 0
    r = np.arange(C, dtype=np.uint8) & 1
    r = np.kron(r, np.ones(L // C, dtype=np.uint8))
    # `^ 1` fill force upper-left cell to consist of 1s
    return (r ^ 1) ^ r[:, None]
 
print(np_checkered(8, 4))

###  **Read & Display Images** 

However, we often visualize the image array into interpretable format that human eye can perceive. There are some libraries that support image formats in Python 


*   Matplotlib — plt.imread(), plt.imshow()
*   OpenCV — cv2.imread(), cv2.imshow()
*   Pillow — Image.open(), 
*   scikit-image — io.imread()
 

In [4]:
# we use matplotlib to visualize our generated checkered pattern
from matplotlib import pyplot as plt

In [None]:
# the image can be visualized using one line of code 
plt.imshow(np_checkered(8, 4), cmap='Greys')

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Natively, matplotlib only supports PNG images. PNG images are returned as float arrays (0 to 1).

A photo image is commonly captured by a camera sensor and saved in 

- (M, N) for grayscale images.
- (M, N, 3) for RGB images.
- (M, N, 4) for transparent images.

Computer-generated images (graphics) can be saved in vector and 3D formats. 
In this tutorial, we focus on 2D images. 


In [None]:
# The type of the image is just a numpy array
print(f"type(img): {type(img)}, img.shape: {img.shape}\n\n")

# The image can be printed as an array 
imread = print(img)

In [None]:
# you can read url jpeg images from a link and visualize it 
import urllib.request

req = urllib.request.urlopen("https://i.pinimg.com/originals/32/ae/1f/32ae1ff77bc598200309bb9eddd11622.jpg")
img = plt.imread(req ,0)

# jpgs are returned with values between 0 and 255
print(img.min(), img.max())

plt.imshow(img)

In [None]:
# What is the size of my image?
print(f"shape: {img.shape}")

height, width, channels = img.shape

print(f"Image has height={height}, width={width}, and {channels} channels")

In [None]:
# Image size can be manipulated by cutting pixels from the width and the hight of the image (crop), using numpy array slicing

# Use ':' in the channel dimension to select all channels
imgresize = img[50:410, 90:600, :]

height, width, channels = imgresize.shape
print(f"Image has height={height}, width={width}, and {channels} channels")

plt.imshow(imgresize) 

One can do photoshop-like processing with OpenCV library such as downsampling and filtering an image  


In [10]:
# import OpenCV library 
import cv2

In [None]:
# resizing the image can be done using OpenCV functions easily 

# with the downsampling the height to width ratio of image is preserved
img_resized = cv2.resize(imgresize,(0,0), fx=0.8, fy=0.8)

#resize the image by forcing the dimensions 
#img_resized = cv2.resize(imgresize,(256, 256))

height, width, channels = img_resized.shape
print(height,width,channels)
plt.imshow(img_resized)

In [None]:
# image can be converted to grayscale 
img_gray = cv2.cvtColor(img_resized, cv2.COLOR_BGR2GRAY)
plt.imshow(img_gray, cmap="gray")

In [None]:
# image can be smoothed using Gaussian filter
img_blur = cv2.GaussianBlur(img_gray, (13,13), 0)
plt.imshow(img_blur, cmap="gray")

In [None]:
# use edge detector filter 
img_edges = cv2.Canny(image=img_blur, threshold1=60, threshold2=60) 
plt.imshow(img_edges, cmap="gray")

In [None]:
# Invert image colours
img_not= cv2.bitwise_not(img_edges)
plt.imshow(img_not, cmap="gray")

In [None]:
# check the pixel values  
imread = print(img_gray)

# normalize image to the range [0, 1], make sure the output data type is float.
norm_image = cv2.normalize(img_gray, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)

print(f"type(img): {type(img_not)}, img.shape: {img_not.shape}\n\n")

# double check the pixel values 
imread = print(norm_image)

# **Convolutional Neural Network**
---

*Pytorch* is one of the popular machine learning programming frameworks to implement Python's convolutional neural network (CNN). *Pytorch* is based on the *Torch* library performing operations on the tensor data format. Tensors are high-dimensional arrays used to stack visual features in intermediate layers of CNN models. See [Pytorch Totorial](https://pytorch.org/tutorials/beginner/introyt/introyt1_tutorial.html) if you are looking for more details. 


In [None]:
import torch
from torchvision import transforms

# example tensor of shape 2x3x2
z = torch.zeros(2, 3, 2)

print(z)
print(z.dtype)

# You can convert a (numpy) image to a tensor
img_tensor = transforms.functional.to_tensor(img_gray)

print(img_tensor.shape)

# to_tensor also rescales images to 0-1
print(f"image range: {img_gray.min(), img_gray.max()}")
print(f"img_tensor range: {img_tensor.min(), img_tensor.max()}")

In [18]:
import torchvision
import torch.nn.functional as F
import torch.nn as nn

# Define a neural network model.
def create_model():
  # nn.Sequential takes multiple neural network modules and sequentially passes 
  # the input through the modules.
  model = nn.Sequential(
              # MNIST image has dimension 28 x 28 (height x width)
              nn.Conv2d(1, 28, kernel_size=3, padding=1), # output: 28 x 28 x 28
              nn.ReLU(),
              
              nn.Conv2d(28, 32, kernel_size=3, stride=1, padding=1), # output: 32 x 28 x 28
              nn.ReLU(),
              nn.MaxPool2d(2, 2), # output: 32 x 14 x 14
              nn.BatchNorm2d(32),
              
              nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1), # output: 64 x 14 x 14
              nn.ReLU(),
              nn.MaxPool2d(2, 2), # output: 64 x 7 x 7
              nn.BatchNorm2d(64),

              nn.Flatten(), # Flattens the 64 x 7 x 7 tensor into a one dimensional array of length 64*7*7

              nn.Linear(64*7*7, 10) # output: one dimensional array of size 10
          )
  
  return model

In [None]:
# Check the model
print(create_model())

### **MNIST dataset**

In [None]:
# Import MNIST
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor

# Download and Save MNIST 
data_train = MNIST('~/mnist_data', train=True, download=True, transform=ToTensor())
data_test = MNIST('~/mnist_data', train=False, download=True, transform=ToTensor())

# Print Data
print(data_train)

In [None]:
np.array(data_train.data[0]).shape

In [None]:
import matplotlib.pyplot as plt

# Show the first image in the training set
random_image = data_train.data[0]
random_image_label = data_train.targets[0]

# Print the Image using Matplotlib
plt.imshow(random_image,cmap='gray')
print("The label of the image is:", random_image_label)

print(random_image.shape)

In [None]:
from torch.utils.data import DataLoader

# DataLoaders allow you to iterate through the data in randomized batches.
loaders = {
    'train' : torch.utils.data.DataLoader(data_train, 
                                          batch_size=100, 
                                          shuffle=True, 
                                          num_workers=1),
    
    'test'  : torch.utils.data.DataLoader(data_test, 
                                          batch_size=100, 
                                          shuffle=True, 
                                          num_workers=1),
}

loaders

In [None]:
example_batch = None

# A batch from the dataloader consists of a tensor of 100 images, and 
# a tensor of 100 labels
for images, labels in loaders["train"]:
  print(images.shape, labels.shape)

  example_batch = images, labels

  break

### **Training the CNN model on MNIST data**

Now that the CNN model is defined and the dataset is loaded, it's time to train the model on the training set!

You may change the CNN parameters such as convolutional layers, or/and training hyperparameters such as numeber of epochs to improve the accuracy of the model. Note that the model needs to be re-trained if you change these parameters.   

In [24]:
# Initialize the model
model = create_model()

In [25]:
import torch.optim as optim

# Define the loss function
loss_fn = nn.CrossEntropyLoss()

# Define the optimizer
# model.parameters() are the weights that should be updated during training
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

In [26]:
# Setup tensorboard to log training results
from torch.utils.tensorboard import SummaryWriter

# Change the comment to something else to identify the run in tensorboard
writer = SummaryWriter(comment="mnist_run")

# Adds a graph of the model to tensorboard.
# In the graph tab you can see the computional graph of the model.
writer.add_graph(model, example_batch[0])

writer.flush()

In [None]:
import torch
device=torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# device = "cpu"

device

In [None]:
# Move the model to the GPU to train faster
# If you get an error, choose gpu as hardware accelerator under Runtime -> Change runtime type
# model.cuda()
model.to(device)

In [None]:
# The number of epochs to train for.
# During each epoch, the model is trained on all of the training data
num_epoch = 5

for epoch in range(1, num_epoch+1):
  train_loss=0.0
  valid_loss=0.0

  train_correct = 0
  valid_correct = 0

  # Set the model to train mode
  model.train()

  # Go through all the mini-batches in the train loader
  for img, lbl in loaders['train']:

    # Put the data on the gpu
    img=img.to(device)
    lbl=lbl.to(device)

    # Set the accumaleted gradients to zero
    optimizer.zero_grad()

    # Get the output of the model
    predict = model(img)
    
    # Compute the loss
    loss=loss_fn(predict,lbl)
    
    # Compute the gradients and update the weigths
    loss.backward()
    optimizer.step()
    
    # Update the cumulative train loss of this epoch
    train_loss += loss.item() * img.size(0)

    # Compute accuracy
    predicted_labels = predict.argmax(dim=1)

    train_correct += torch.sum(predicted_labels == lbl)


  # Evaluate the model on the test data
  model.eval()
  for img,lbl in loaders['test']:
    img=img.to(device)
    lbl=lbl.to(device)

    predict=model(img)
    loss=loss_fn(predict,lbl)

    # Update the test loss for this epoch
    valid_loss+=loss.item()*img.size(0)

    # Compute accuracy
    predicted_labels = predict.argmax(dim=1)

    valid_correct += torch.sum(predicted_labels == lbl)
  
  # Compute the loss and accuracy metrics for this epoch
  train_loss=train_loss/len(loaders['train'].sampler) 
  valid_loss=valid_loss/len(loaders['test'].sampler)

  train_acc = train_correct / len(loaders['train'].sampler) 
  valid_acc = valid_correct / len(loaders['test'].sampler)

  print('Epoch:{} Train Loss:{:.4f} valid Loss:{:.4f}; train Acc:{:.4f} valid Acc:{:.4f}'.format(epoch,train_loss,valid_loss, train_acc, valid_acc))

  # Log results to tensorboard
  writer.add_scalar('Loss/train', train_loss, epoch)
  writer.add_scalar('Loss/test', valid_loss, epoch)

  writer.add_scalar('Accuracy/train', train_acc, epoch)
  writer.add_scalar('Accuracy/test', valid_acc, epoch)

  writer.flush()

## Tensorboard

We will also use tensorboard to visualize the training results. 
Use the scalars and/or time series tab to inspect the learning curves of the model. The graphs tab shows a diagram of the model architecture where you can double-click on the nodes to see more details.

If you get a 403 error in the tensorboard view, you have to enable third-party cookies/turn off tracking protection. See [here](https://stackoverflow.com/questions/64218755/getting-error-403-in-google-colab-with-tensorboard-with-firefox).

To refresh tensorboard either click on the refresh button in the top right, or rerun this cell.

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

%tensorboard --logdir runs

## Visualizing predictions

The following code shows the predicted and actual labels of 10 random images.



In [None]:
img

In [None]:
model.eval()

import random

correct = 0

data_test[0][0].shape

# Randomly select 10 indices from the test data
random_indices = random.sample(range(len(data_test)), k=10)

for i in random_indices:
  img = data_test[i][0]
  label = data_test[i][1]

  input = img.reshape(-1, 1, 28, 28)

  prediction = model(input.to(device))

  predicted_label = prediction.argmax()

  is_correct = label == predicted_label

  plt.figure()
  plt.title(f"{is_correct}, actual label: {label}, predicted label: {predicted_label}", color="black" if is_correct else "red")
  plt.imshow(input.reshape(28, 28), cmap="gray")

## Incorrectly predicted digits

In [32]:
# Store all incorrectly predicted images from the test set

model.eval()

incorrect_imgs = []
incorrect_labels = []
incorrect_predictions = []

for imgs, labels in loaders["test"]:

  imgs, labels = imgs.to(device), labels.to(device)
  
  predictions = model(imgs.to(device))

  predictions = predictions.argmax(dim=1)

  correct_indices = predictions == labels
  incorrect_indices = ~ correct_indices


  incorrect_imgs.append(imgs[incorrect_indices].cpu())
  incorrect_labels.append(labels[incorrect_indices].cpu())
  incorrect_predictions.append(predictions[incorrect_indices].cpu())

incorrect_imgs = torch.cat(incorrect_imgs, dim=0)
incorrect_labels = torch.cat(incorrect_labels, dim=0)
incorrect_predictions = torch.cat(incorrect_predictions, dim=0)

In [None]:
for i in range(min(10, len(incorrect_labels))):
  is_correct = False

  plt.figure()
  plt.title(f"{is_correct}, actual label: {incorrect_labels[i]}, predicted label: {incorrect_predictions[i]}", color="black" if is_correct else "red")
  plt.imshow(incorrect_imgs[i].reshape(28, 28), cmap="gray")

## Access Google Drive on Google Colab

If you uploaded your handdrawn digits to google drive you can access them by mounting your google drive as a folder.

To see what's in your drive you can run `!ls /content/gdrive` to list the files. You can also find the folder where you stored your images by using the file explorer on the left side of colab. After mounting there should be a folder called gdrive.

Note that if you want to directly load and save data in your GD space, you need to give access to Google colab to access your drive otherwise use publicly accessable storage space to read/write data from/to the notebook. 



<!-- This notebook is running remotely on the Google Colab platform, therefore to save and access this notebook in your personal (local or remote) Google space you need to give access to Google to control your Google drive (gdrive).  To have local access, a virtual drive is used on your computer to save and access this notebook. You can make a directory on your gdrive to also save images that are read or written, by colab, in your drive. The following code snippet will mount the gdrive.  -->

In [None]:
# You need to give access to your google drive to mount the virtual drive here
import os
from google.colab import drive
drive.mount('/content/gdrive')