# CSCI 3343 Lab 5: Pytorch for Image Prediction and Object Segmentation

**Posted:** Monday, October 18, 2021

**Due:** N/A

__Total Points__: 0.5 (extra pts for the final grade)

__Name__:
[Your first name] [Your last name], [Your BC username]

(e.g. Donglai Wei, weidf)

__Submission__: please rename the .ipynb file as __\<your_username\>_lab5.ipynb__ before you submit it to canvas. Example: weidf_lab5.ipynb.

#Introduction

Let's learn to use existing deep learning libraries for image prediction and object detection in PyTorch.

## Download and display the image

In [2]:
# Download images
! wget https://post.healthline.com/wp-content/uploads/2020/08/3180-Pug_green_grass-732x549-thumbnail-732x549.jpg -O test_dog_easy.jpg
! wget https://www.chicagotribune.com/resizer/Z_oN8fZUymKMakZ7Y-KBqCwwEi0=/800x515/top/arc-anglerfish-arc2-prod-tronc.s3.amazonaws.com/public/BPLQ2KEPMJABHPUP7U565WVMNA.jpg -O test_dog_hard.jpg

# Download ImageNet labels
!wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt -O imagenet_classes.txt
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

--2021-10-18 14:07:44--  https://post.healthline.com/wp-content/uploads/2020/08/3180-Pug_green_grass-732x549-thumbnail-732x549.jpg
Resolving post.healthline.com (post.healthline.com)... 151.101.2.133, 151.101.66.133, 151.101.130.133, ...
Connecting to post.healthline.com (post.healthline.com)|151.101.2.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 45602 (45K) [image/jpeg]
Saving to: ‘test_dog_easy.jpg’


2021-10-18 14:07:44 (30.8 MB/s) - ‘test_dog_easy.jpg’ saved [45602/45602]

--2021-10-18 14:07:44--  https://www.chicagotribune.com/resizer/Z_oN8fZUymKMakZ7Y-KBqCwwEi0=/800x515/top/arc-anglerfish-arc2-prod-tronc.s3.amazonaws.com/public/BPLQ2KEPMJABHPUP7U565WVMNA.jpg
Resolving www.chicagotribune.com (www.chicagotribune.com)... 23.12.147.85, 23.12.147.69, 2600:1408:c400:e::17cd:6a0d, ...
Connecting to www.chicagotribune.com (www.chicagotribune.com)|23.12.147.85|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 63717 (62K) [image/jpeg

In [None]:
import imageio
import matplotlib.pyplot as plt

dog_easy = imageio.imread('test_dog_easy.jpg')
dog_hard = imageio.imread('test_dog_hard.jpg')

plt.rcParams["figure.figsize"] = (10,5)
plt.subplot(121)
plt.imshow(dog_easy)
plt.axis('off')
plt.title('easy case')
plt.subplot(122)
plt.imshow(dog_hard)
plt.axis('off')
plt.title('hard case')
plt.show()

# Part 1. Image Classification with PyTorch

## (a) Download the AlexNet model and 1,000 class labels

In [None]:
import torch
import torch.nn.functional as F
import torchvision.models as models

from torchvision import transforms
import numpy as np
from PIL import Image

# Download models
alexnet = models.alexnet(pretrained=True)

## (b) Preprocess image

In [None]:
from PIL import Image

preprocess = transforms.Compose([
    transforms.Resize(256),
    # if only takes the center crop
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    # if takes 10 crops: (1 center + 4 corners) * (original + horizontal flip)
    #transforms.TenCrop(224),
    #transforms.Lambda(lambda crops: torch.stack([transforms.ToTensor()(crop) for crop in crops])),    
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# PyTorch only takes image in PIL format

im_batch = torch.stack([preprocess(Image.fromarray(dog_easy)), \
                        preprocess(Image.fromarray(dog_hard))])
if im_batch.ndim == 3:
  # [None]: create an extra dimension in front as the batch
  im_batch = im_batch[None]

print('input batch size:', im_batch.shape)

 ## (c) Run inference

In [None]:
# the model doesn't include the softmax layer
pred = alexnet(im_batch)
prob = F.softmax(pred, dim=1)

for i in range(pred.shape[0]):
  print('------ %s -----' % (['Easy case', 'Hard case'][i]))
  # Show top categories per image
  top5_prob, top5_catid = torch.topk(prob[i], 5)
  for i in range(top5_prob.size(0)):
      print(categories[top5_catid[i]], '%.2f'%top5_prob[i].item())

## [TODO] Exercise 1: Image classification with ResNet50 for these two images
Hint: repeat (a) and (c) above with the new model

In [None]:
#### TODO #####

# Part 2. Object Detection with PyTorch

Let's try out different pipelines for the hard case above to detect the dog.

In [None]:
import matplotlib.pyplot as plt
from PIL import Image
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F
import numpy as np
import cv2
from google.colab.patches import cv2_imshow

# Download models
alexnet = torchvision.models.alexnet(pretrained=True)

## (a) Sliding CNN
Let's iterative through the bounding boxes with a certain size (square patch: 151x151) with a fixed stride (71x71). In practice, we need to repeat the computation above for different sizes of bounding boxes.

### (i) Get bounding boxes (sliding window)

In [None]:
def imageToPatch(img, row_id, col_id, patch_size, stride_size):
  return img[stride_size*row_id : stride_size*row_id+patch_size,\
             stride_size*col_id : stride_size*col_id+patch_size]

# image to patches
dog_hard = imageio.imread('test_dog_hard.jpg')
im_size = dog_hard.shape
patch_size = 151
stride_size = 71
num_row = (im_size[0] - patch_size) // stride_size + 1
num_col = (im_size[1] - patch_size) // stride_size + 1
print('#row=%d, #col=%d' % (num_row, num_col))

plt.rcParams["figure.figsize"] = (10,6)
count = 1
for y in range(num_row):
  for x in range(num_col):
    plt.subplot(num_row, num_col, count)
    patch = imageToPatch(dog_hard, y, x, patch_size, stride_size)
    plt.imshow(patch)
    plt.axis('off')
    count += 1

plt.show()

### (ii) Preprocess image patches

In [None]:
preprocess = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# convert image into patches
patches = []
for y in range(num_row):
  for x in range(num_col):
    patch = imageToPatch(dog_hard, y, x, patch_size, stride_size)
    patches.append(preprocess(Image.fromarray(patch)))

im_batch = torch.stack(patches)
print('input batch size:', im_batch.shape)

### (iii) Run inference

In [None]:
pred = alexnet(im_batch)
prob = F.softmax(pred, dim=1).detach().numpy()

# plot the top probability for each patch
plt.rcParams["figure.figsize"] = (10,2)
prob_max = prob.max(axis=1)
plt.plot(prob_max)

In [None]:
# let's see what does it detect for the most confident patch
pos_ids = np.where(prob_max > 0.3)[0]
row_ids = pos_ids // num_col
col_ids = pos_ids - row_ids * num_col

plt.rcParams["figure.figsize"] = (10,10)
for i in range(len(pos_ids)):
  plt.subplot(3,3,i+1)
  patch = imageToPatch(dog_hard, row_ids[i], col_ids[i], patch_size, stride_size)
  plt.imshow(patch)
  plt.axis('off')
  plt.title(categories[np.argmax(prob[pos_ids[i]])])

## (b) R-CNN


### (i) Get bounding boxes (selective search)

In [None]:
# cv2 reads in images as BGR; pytorch pretrain models take RGB input
# create a new variable to avoid confusion...
dog_hard_cv2 = cv2.imread('test_dog_hard.jpg')
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
ss.setBaseImage(dog_hard_cv2)
ss.switchToSelectiveSearchFast()
# rects: Nx4 matrix
# each row: x,y,w,h
rects = ss.process()
print('Selective search: find %d boxes' % rects.shape[0])

Selective search: find 4782 boxes


In [None]:
# plot the top 20 bounding boxes
image = cv2.rectangle(dog_hard_cv2.copy(), tuple(rects[0,:2]), tuple(rects[0,:2]+rects[0,2:]), (0,255,0), 2)
for i in range(1, 20):
  image = cv2.rectangle(image, tuple(rects[i,:2]), tuple(rects[i,:2]+rects[i,2:]), (0,255,0), 2)
cv2_imshow(image)

### (ii) Preprocess image patches

In [None]:
preprocess = transforms.Compose([
    transforms.Resize((224,224)), # scale the both sides to 224
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# take every 8 bounding boxes
rects_sub = rects[::8]
patches = [None] * rects_sub.shape[0]
for i in range(len(patches)):  
  x,y,w,h = rects_sub[i]  
  patches[i] = preprocess(Image.fromarray(dog_hard[y:y+h, x:x+w]))

im_batch = torch.stack(patches)
print('input batch size:', im_batch.shape)

### (iii) Run inference

In [None]:
pred = alexnet(im_batch)
prob = F.softmax(pred, dim=1).detach().numpy()

# plot the top probability for each patch
prob_max = prob.max(axis=1)
plt.rcParams["figure.figsize"] = (10,2)
plt.plot(prob_max)

In [None]:
# let's see what does it detect for the most confident patch 
pos_sort = np.argsort(-prob_max)

plt.rcParams["figure.figsize"] = (10,10)
for i in range(16):
  plt.subplot(4,4,i+1)
  x,y,w,h = rects_sub[pos_sort[i]]
  patch = dog_hard[y:y+h, x:x+w]
  plt.imshow(cv2.resize(patch,(224,224)))
  plt.axis('off')
  plt.title(categories[np.argmax(prob[pos_sort[i]])])

In [None]:
# Non-maximum suppression
rects_sub_pt = torch.as_tensor(np.hstack([rects_sub[:,:2], rects_sub[:,:2]+rects_sub[:,2:]]).astype(np.float32))
idx = torchvision.ops.nms(rects_sub_pt, torch.as_tensor(prob_max), 0.1)

plt.rcParams["figure.figsize"] = (10,10)
for i in range(16):
  plt.subplot(4,4,i+1)
  x,y,w,h = rects_sub[idx[i]]
  patch = dog_hard[y:y+h, x:x+w]
  plt.imshow(cv2.resize(patch,(224,224)))
  plt.axis('off')
  plt.title(categories[np.argmax(prob[idx[i]])])

## (c) Fast/Faster R-CNN (Detectron2!)

### (i) Install Detectron2

In [None]:
!pip install pyyaml==5.1
# This is the current pytorch version on Colab. Uncomment this if Colab changes its pytorch version
# !pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

# Install detectron2 that matches the above pytorch version
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
# exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime

### (ii) Run inference

In [None]:
# check pytorch installation: 
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
assert torch.__version__.startswith("1.9")   # please manually install torch 1.9 if Colab changes its default version

# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

1.9.0+cu111 True


In [None]:
cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.INPUT.FORMAT = 'RGB'
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_C4_3x.yaml"))
#cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_C4_3x.yaml")
predictor = DefaultPredictor(cfg)

The checkpoint state_dict contains keys that are not used by the model:
  [35mproposal_generator.anchor_generator.cell_anchors.0[0m


In [None]:
dog_hard = imageio.imread('test_dog_hard.jpg')
outputs = predictor(dog_hard)

In [None]:
outputs['instances'].__dict__['_fields'].keys()

dict_keys(['pred_boxes', 'scores', 'pred_classes'])

In [None]:
# We can use `Visualizer` to draw the predictions on the image.
v = Visualizer(dog_hard, MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))

plt.rcParams["figure.figsize"] = (30,30)
plt.imshow(out.get_image())
plt.axis('off')
plt.show()

###(iii) Model examination

#### Q1. how many modules in the model

In [None]:
# what are the modules
for module in predictor.model._modules:
  print(module)

#### Q2. what's the input and output size of the **backbone** module

In [None]:
print('Input image size:', dog_hard.shape)

In [None]:
predictor.model.backbone

In [None]:
activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook


In [None]:
predictor.model.backbone.res4._modules['5'].conv3.norm.register_forward_hook(get_activation('backbone'))
outputs = predictor(dog_hard)
print('Output feature size:', activation['backbone'].shape)

#### Q3. what's the input and output size of the **proposal_generator** module
Check out the size for `objectness_logits` and `anchor_deltas`

In [None]:
predictor.model.proposal_generator

In [None]:
predictor.model.proposal_generator.rpn_head.objectness_logits.register_forward_hook(get_activation('objectness_logits'))
predictor.model.proposal_generator.rpn_head.anchor_deltas.register_forward_hook(get_activation('anchor_deltas'))

outputs = predictor(dog_hard)

#print('Output feature size:', (dog_hard_cv2).shape)

In [None]:
print('Output objectness size:', activation['objectness_logits'].shape)
print('Output anchor_deltas size:', activation['anchor_deltas'].shape)

#### [TDOO] Exercise 2. How many boxes does the model predict?

In [None]:
????

#### Q4. what's the input and output size of the **roi_heads** module

In [None]:
predictor.model.roi_heads

In [None]:
predictor.model.roi_heads.pooler.level_poolers._modules['0'].register_forward_hook(get_activation('level_poolers'))
predictor.model.roi_heads.box_predictor.cls_score.register_forward_hook(get_activation('cls_score'))
predictor.model.roi_heads.box_predictor.bbox_pred.register_forward_hook(get_activation('bbox_pred'))
outputs = predictor(dog_hard)

In [None]:
print('Input level_poolers size:', activation['level_poolers'].shape)
print('Output cls_score size:', activation['cls_score'].shape)
print('Output bbox_pred size:', activation['bbox_pred'].shape)

Input level_poolers size: torch.Size([1000, 1024, 14, 14])
Output cls_score size: torch.Size([1000, 81])
Output bbox_pred size: torch.Size([1000, 320])


In [None]:
predictor.model

## (d) YOLO

### (i) Install YOLO

In [None]:
! pip install -qr https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt  # install dependencies

### (ii) Run inference

In [None]:
import torch

# Model
model_yolo = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Images
imgs = ['https://www.chicagotribune.com/resizer/Z_oN8fZUymKMakZ7Y-KBqCwwEi0=/800x515/top/arc-anglerfish-arc2-prod-tronc.s3.amazonaws.com/public/BPLQ2KEPMJABHPUP7U565WVMNA.jpg']  # batch of images

# Inference
results = model_yolo(imgs)
results_np = results.xyxy[0].detach().cpu().numpy().astype(int)

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master


[31m[1mrequirements:[0m PyYAML>=5.3.1 not found and is required by YOLOv5, attempting auto-update...


YOLOv5 🚀 2021-10-18 torch 1.9.0+cu111 CUDA:0 (Tesla K80, 11441.1875MB)




[31m[1mrequirements:[0m 1 package updated per /root/.cache/torch/hub/ultralytics_yolov5_master/requirements.txt
[31m[1mrequirements:[0m ⚠️ [1mRestart runtime or rerun command for updates to take effect[0m



Fusing layers... 
Model Summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 


In [None]:
# plot the top 6 bounding boxes
image = cv2.rectangle(dog_hard_cv2.copy(), tuple(results_np[0,:2]), tuple(results_np[0,2:4]), (0,255,0), 2)
for i in range(1, results_np.shape[0]):
  image = cv2.rectangle(image, tuple(results_np[i,:2]), tuple(results_np[i,2:4]), (0,255,0), 2)
cv2_imshow(image)

In [None]:
plt.rcParams["figure.figsize"] = (10,10)
for i in range(8):
  plt.subplot(4,2,i+1)
  x1,y1,x2,y2,sc, cls = results_np[i]
  patch = dog_hard[y1:y2, x1:x2]
  plt.imshow(cv2.resize(patch,(224,224)))
  plt.axis('off')
  plt.title(MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes[int(cls)])

### (iii) Model examination

In [None]:
model_yolo 

In [None]:
model_yolo.model.model._modules['24'].m._modules['2'].register_forward_hook(get_activation('detection'))
results = model_yolo(imgs)
print('Detection shape', activation['detection'].shape)