## Testing Pretrained ResNet101

### Test a sample using the pre-trained ResNet101 (original code)

In [1]:
# https://learnopencv.com/pytorch-for-beginners-image-classification-using-pre-trained-models/

import torch, torchvision
from PIL import Image

# load model
resnet = torchvision.models.resnet101(pretrained=True)

# set network to evaluation mode
resnet.eval()

transform = torchvision.transforms.Compose([          
 torchvision.transforms.Resize(256),                   
 torchvision.transforms.CenterCrop(224),               
 torchvision.transforms.ToTensor(),                     
 torchvision.transforms.Normalize(                      
 mean=[0.485, 0.456, 0.406],                            
 std=[0.229, 0.224, 0.225]                             
 )])


img = Image.open("dog.jpg") # You can download an image of a dog from Internet or capture an image by yourself.
img_t = transform(img)
print(img_t.shape)

batch_t = torch.unsqueeze(img_t, 0)


# perform inference
out = resnet(batch_t)

# print top-5 classes predicted by model
_, indices = torch.sort(out, descending=True)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100

for idx in indices[0][:5]:
    print('Label:', idx, '. Confidence Score:', percentage[idx].item(), '%')

Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /home/jovyan/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth


HBox(children=(FloatProgress(value=0.0, max=178728960.0), HTML(value='')))


torch.Size([3, 224, 224])
Label: tensor(162) . Confidence Score: 81.00029754638672 %
Label: tensor(168) . Confidence Score: 9.432700157165527 %
Label: tensor(208) . Confidence Score: 2.8158085346221924 %
Label: tensor(161) . Confidence Score: 1.579605221748352 %
Label: tensor(211) . Confidence Score: 1.131980538368225 %


Refer to https://gist.github.com/ageitgey/4e1342c10a71981d0b491e1b8227328b, to check if the predicted classes are meaningful.

### Task (edited code)

Modify the code above, to peform data augmentation for the testing sample (averaging the scores of 5 crops: center crop, upper left crop, lower left crop, lower right crop, upper right crop).

Pls briefly discuss the advantages and disadvantages of using testing data augmentation.

In [2]:
import torch, torchvision
from PIL import Image

# load model
resnet = torchvision.models.resnet101(pretrained=True)

# set network to evaluation mode
resnet.eval()

transform = torchvision.transforms.Compose([          
 torchvision.transforms.Resize(256),                   
 # torchvision.transforms.CenterCrop(224),               
 torchvision.transforms.ToTensor(),                     
 torchvision.transforms.Normalize(                      
 mean=[0.485, 0.456, 0.406],                            
 std=[0.229, 0.224, 0.225]                             
 )])

img = Image.open("dog.jpg")
img_t = transform(img)

height = img_t.shape[1]
width = img_t.shape[2]

# 5 slices of img_t
center_crop = img_t[:, round(height/2 - 112): round(height/2 + 112), round(width/2 - 112): round(width/2 + 112)]
upper_left_crop = img_t[:, :224, :224]
lower_left_crop = img_t[:, height - 224:, :224]
lower_right_crop = img_t[:, height - 224:, width - 224:]
upper_right_crop = img_t[:, :224, width - 224:]

batch_t = torch.stack((center_crop, upper_left_crop, lower_left_crop, lower_right_crop, upper_right_crop), 0)

# perform inference
out = resnet(batch_t)
out_sum = torch.sum(out, 0) # sum the scores for each index across the entire batch

# print top-5 classes predicted by model
_, indices = torch.sort(out_sum, descending=True)
percentage = torch.nn.functional.softmax(out_sum, dim=0) * 100

for idx in indices[:5]:
    print('Label:', idx, '. Confidence Score:', percentage[idx].item(), '%')

Label: tensor(162) . Confidence Score: 99.99964141845703 %
Label: tensor(168) . Confidence Score: 0.00034762214636430144 %
Label: tensor(161) . Confidence Score: 1.0993042451445945e-05 %
Label: tensor(164) . Confidence Score: 2.276629516018147e-07 %
Label: tensor(166) . Confidence Score: 1.4632828282401533e-09 %


The code has been modified that takes into account the scores of 5 crops. I did not average the scores as I applied softmax after summing the scores for each index for all 5 crops in the minibatch.

Advantages:
- The prediction relies on multiple different views of the same image, instead of only 1 raw image, so the final score is more reliable.

Disadvantages:
- Longer prediction time as each augmented image needs to be generated and processed.
- The cropped images might not always contain the object of interest, leading to incorrect predictions.