<a href="https://colab.research.google.com/github/ToniRV/MIT_6.862_Applied_Machine_Learning/blob/master/Miniplaces1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6.869 Miniplaces Challenge - Part 1

The miniplaces challenge is a 2 part challenge. Each part counts for 1 pset. 
In the challenge, you will work on classifying scenes into one of several categories (such as "desert", or "forest")

In this part, we'll use pretrained weights on a different dataset, but one that's also used for scene classification. We'll examine how we can visualize feature maps, to better understand how a neural net came to a decision about a particular scene.

Next week, you'll implement your own neural net to do scene classification, and try to improve it as much as you can.

# Requirements installation

First, let's install everything needed to run this notebook

In [0]:
!pip install Pillow==4.1.1
!pip install -U image
!pip install opencv-python


from io import BytesIO
from IPython.display import clear_output, Image, display
import numpy as np
import PIL.Image
from __future__ import print_function
import cv2

We will load PyTorch, our main tool to play with neural networks. 

In [0]:
!pip install torch
!pip install torchvision

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
import torch.hub

from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag


# Loading Images and PyTorch models


Once, we have loaded all the relevant libraries, we will load the model. We will begin with an scene classification model trained on the Places Dataset with a ResNet-50 architecture.

![texto alternativo](https://www.codeproject.com/KB/AI/1248963/resnet.png)



In [0]:
resnet = models.resnet50(num_classes=365)
# print(resnet)

In [0]:
# Helper function to download things without wget
import requests
def download(url, fn=None):
  if fn is None:
    fn = url.split('/')[-1]
  r = requests.get(url)
  if r.status_code == 200:
      open(fn, 'wb').write(r.content)
      print("{} downloaded: {:.2f} KB".format(fn, len(r.content)/1024.))
  else:
      print("url not found:", url)

In [0]:
# Download the pretrained weights

download('http://places2.csail.mit.edu/models_places365/resnet50_places365.pth.tar')



We will load the pretrained weights into the model. 

In [0]:
sd = torch.load('resnet50_places365.pth.tar') # pytorch 1.1
sd = sd['state_dict']
# When a model is trained on the GPU, the weights begin with "module."
# Since we aren't going to be using the GPU, we'll manually change these keys to load the state dict
sd = {k.replace('module.', ''): v for k, v in sd.items()}
resnet.load_state_dict(sd)
resnet.eval()



# Visualizing Network Filters

First, we will define a function to display images from numpy arrays. 

In [0]:
def showarray(a, fmt='jpeg'):
    a = np.uint8(np.clip(a, 0, 255))
    f = BytesIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))



Now, we will focus on visualizing the filters of the ResNet network. Let's take a look to the first layer. 

In [0]:
print(resnet.conv1.weight.data.size()) # Access convolutional filters


Now, let's write a function to visualize the filters. You have to complete the following code, with one line normalizing the filter values:

In [0]:
def visualize_filters(conv_w,output_size = None):
    #TODO: Normalize conv_w values to 0-1 range  
    w_normalized = None
    map_t = 255*w_normalized
    map_t = map_t.numpy()
    map_t = map_t.astype(np.uint8)
    if output_size is not None:
        map_t = cv2.resize(map_t,(output_size,output_size))
    return map_t
 


We will display the filters of the initial convolutional layer:

In [0]:
for i in range(30):
  print('Visualizing conv1 filter',i)
  vis = visualize_filters(resnet.conv1.weight.data[i,0,:,:],50)
  showarray(vis)


## Exercise: Visualize filters for another convolutional layer in ResNet

In [0]:
for i in range(30):
  print('Visualizing conv2 filter',i)
  vis_conv2 = visualize_filters(resnet.layer3[0].conv2.weight.data[i,0,:,:],50)
  showarray(vis_conv2)


# Predicting classes with a pre-trained model


To make the process easier to read, we will load the label <--> index assignament for the Places dataset. 

In [0]:
# Load labels
from urllib.request import urlopen

synset_url = 'http://gandissect.csail.mit.edu/models/categories_places365.txt'
classlabels = [r.split(' ')[0][3:] for r in urlopen(synset_url).read().decode('utf-8').split('\n')]


We will load one image to use through the pset. 

In [0]:
from torchvision import transforms


download('http://6.869.csail.mit.edu/fa19/miniplaces_part1/rio.jpg')
img0 = PIL.Image.open('rio.jpg').convert('RGB')
  
img_numpy = np.array(img0)


showarray(img_numpy)

First, let's take a look at the raw prediction of the model.

You can find the ImageNet classes here: https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a

In [0]:
  center_crop = transforms.Compose([
         transforms.Resize((227,227)),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
  ])

  im = center_crop(img0)
  out = resnet(im.unsqueeze(0)).squeeze()
  print(out.size())
  categories = out.topk(5)[1]

  print(categories)
  print(classlabels[categories[0]])
  print(classlabels[categories[1]])
  print(classlabels[categories[2]])
  print(classlabels[categories[3]])
  print(classlabels[categories[4]])



# Visualizing Internal Activations of the Network

Let's look at what parts of the image cause different units to activate (send some positive signal). All of these activations combine to inform the final inference. 

The convolutional layers of ResNet essentially make a semantic representation of what is contained in the image. This is followed by two fully connected layers, which use the information from that representation to categorize the image.

So, let's remove the last few layers (which do classification) to get the underlying representation, and we'll visualize the activations that went into that representation from different units

In [0]:
def generate_featuremap_unit(resnet,unit_id,im_input):
    #Extract activation from model
    #TODO: remove the last 2 layers of resnet 
    model_cut  = None
    # Mark the model as being used for inference
    model_cut.eval()
    # Crop the image
    im = center_crop(im_input)
    # Place the image into a batch of size 1, and use the model to get an intermediate representation
    out = model_cut(im.unsqueeze(0))
    # Print the shape of our representation
    print(out.size())
    # Extract the only result from this batch, and take just the `unit_id`th channel
    out_final = out.squeeze()[unit_id]
    # Return this channel
    return out_final
    

    

In [0]:
def visualize_featuremap(im_input,feature_map,alpha=0.3):
    # Normalize to [0..1], with a little leeway (0.9999) in case feature_map has 0 range
    feature_map = feature_map/(feature_map.max()+1e-10)
    # Convert to numpy (detach() just seperates a tensor from the gradient)
    feat_numpy = feature_map.detach().numpy()
    # Resize the feature map to our original image size (our strided conv layers reduce the size of the image)
    feat_numpy = cv2.resize(feat_numpy,(im_input.shape[1],im_input.shape[0]))
    # Invert to make the heatmap look more natural
    map_t = 1-feat_numpy
    # Add an extra dimension to make this a [H,W,C=1] image 
    feat_numpy = np.expand_dims(feat_numpy, axis=2)
    
    # Convert to image (UINT8 from 0-255)
    map_t = 255*map_t
    map_t = map_t.astype(np.uint8)
    # Use a color map to change this from BW to a nice color
    map_t = cv2.applyColorMap(map_t, cv2.COLORMAP_JET)
    # Combine the heatmap with the original image so you can see which section of the image is activated
    im_final = np.multiply((alpha*im_input + (1-alpha)*map_t), feat_numpy) + np.multiply(im_input, 1-feat_numpy)
    # Return final visualization
    return im_final


In [0]:
feat = generate_featuremap_unit(resnet,300,img0)
im_final = visualize_featuremap(img_numpy,feat)
showarray(im_final)

Exercise: Find other units that detect other relevant concepts in the image. 




In [0]:
#TODO: Find different units 

In [0]:
# (6.869 required) Find the unit index that has the maximum weights in the fully connected layer and deactivate that unit. Compare the orginal prediction and the new prediction
import matplotlib.pyplot as plt


out_original = resnet(im.unsqueeze(0)).squeeze() #origianl prediction 
sorted_classes = np.argsort(-out_original.data.cpu().numpy())
class_ids = sorted_classes[:5][0]
print("top 1 class id:", class_ids)
# Torch.max will help - returns (maxvalue,maxindex)
_, index = torch.max(resnet.fc.weight[class_ids,:], 0) #find the unit index that has the maximum weights in the fully connected layer 



#TODO: remove the last 2 layers of resnet 
model_cut = None

# Get the representation for this model
out1 = model_cut(im.unsqueeze(0))
# Shape is now (1, # units, H, W)
#TODO: deactive the unit that has the maximum weights (Set all values for that unit to 0)
### DO SOMETHING HERE ###

out2 = resnet.fc(resnet.avgpool(out1).squeeze().unsqueeze(0)).squeeze()

def plot_top_classes(values, top_k=5, title = None):
  sorted_classes = np.argsort(-values)
  class_ids = sorted_classes[:top_k]
  class_names = [classlabels[it] for it in list(class_ids)]
  class_values = values[class_ids]
  print(title + " top 5 class names ", class_names)
  print(title + " top 5 class values ", class_values)
  plt.bar(class_names, class_values)
  plt.xticks(rotation=60)
  plt.title(title)

plt.figure(0)
plot_top_classes(out.data.cpu().numpy(), title = 'Original')
plt.figure(1)
plot_top_classes(out2.data.cpu().numpy(), title = 'Modified')

# Visualizing model activations with Class Activation Models (CAMs)


Once we have load the image and the model, now we will explore how to visualize the internal activations of the model. We will start by visualizing which parts of the image are responsibe for the final decision. 

![texto alternativo](https://camo.githubusercontent.com/fb9a2d0813e5d530f49fa074c378cf83959346f7/687474703a2f2f636e6e6c6f63616c697a6174696f6e2e637361696c2e6d69742e6564752f6672616d65776f726b2e6a7067)



We create a version of the model without the last two layers, so that we can access the last convolutional layer.

In [0]:
#TODO: remove the last 2 layers of resnet 
model = None
model.eval()

We compute the activations using the Class Activation Mapping for a given output label. 

In [0]:
def generate_featuremap_CAM(model,unit_id,im_input):
    #Extract activation from model
    
    im = center_crop(im_input)
    model.eval()
    out = model(im.unsqueeze(0)) #1 x 2048 x h x w
    w = out.size(3)
    h = out.size(2)
    b = out.size(0)
    c = out.size(1)
    print(out.size())
    # print(b,c,h,w)
    # fc input: N x 2048
    
    #TODO: convert the shape of the output (out variable) to (h*w) x c 
    # The .view() function and .transpose() functions will help
    ### Do stuff here ###
    
    print(out.size())

    #TODO: Run the fully connected layer from resnet to compute the weighted average with out as the input variable
    out_final = None
    print(out_final.size())
    
    out_final = out_final.view(b,h*w,-1).transpose(1,2).view(b,-1,h,w)
    print(out_final.size())
    out_final = out_final.squeeze()[unit_id]
    # print(out_final.size())
    return out_final
    

    


We can visualize the most activated region in the image for the 5 main top classes. 

In [0]:
for i in range(categories.shape[0]):
  print('Visualizing category',classlabels[categories[i]])
  feat = generate_featuremap_CAM(model, categories[i].item(),img0)
  im_result = visualize_featuremap(img_numpy,feat)
  showarray(im_result)
