# CSE527 Programming Assignment 4
**Due date: 23:59 on November 26th 2021**

In this semester, we will use Google Colab for the assignments, which allows us to utilize resources that some of us might not have in their local machines such as GPUs. You will need to use your Stony Brook (*.stonybrook.edu) account for coding and Google Drive to save your results.

## Google Colab Tutorial
---
Go to https://colab.research.google.com/notebooks/, you will see a tutorial named "Welcome to Colaboratory" file, where you can learn the basics of using google colab.

Settings used for assignments: ***Edit -> Notebook Settings -> Runtime Type (Python 3)***.


## Description
---
You train a deep network from scratch if you have enough data (it's not always obvious whether or not you do), and if you cannot then instead you fine-tune a pre-trained network as in this problem.

In Problem 1, you will be finetuning a pretrained resnet and using it to classify JPL interaction video frames. 

For Problem 2, you are going to use thread pooling/convolution to classify the video files using the similar pretrained network as your baseline model.



There are 2 problems in this homework with a total of 120 points including 20 bonus points. Be sure to read **Submission Guidelines** below. They are important. For the problems requiring text descriptions, you might want to add a markdown block for that.

## Dataset
---
Save the dataset(click me) into your working folder in your Google Drive for this homework. <br>
Under your root folder, there should be a folder named "data" (i.e. XXX/Surname_Givenname_SBUID/data) containing the images.
**Do not upload** the data subfolder before submitting on blackboard due to size limit. There should be only one .ipynb file under your root folder Surname_Givenname_SBUID.

## Some Tutorials (PyTorch)
---
- You will be using PyTorch for deep learning toolbox (follow the [link](http://pytorch.org) for installation).
- For PyTorch beginners, please read this [tutorial](http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) before doing your homework.
- Feel free to study more tutorials at http://pytorch.org/tutorials/.
- Find cool visualization here at http://playground.tensorflow.org.




In [None]:
# import packages here
import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
import random 
import time

import torch
import torchvision
import torchvision.transforms as transforms

from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

In [None]:
# Mount your google drive where you've saved your assignment folder
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
ls

[0m[01;34mgdrive[0m/  [01;34msample_data[0m/


## Problem 1 Data Preparation and Fine-tuning
## First-Person Activity Recognition: What Are They Doing to Me?
In this part of the assignment, you will implement an Activity Classifier using JPL dataset. You will use an an ImageNet pre-trained CNN that serves as a feature extractor.
## About JPL dataset
This first-person dataset contains videos of interactions between humans and the observer. We attached a GoPro2 camera to the head of our humanoid model, and asked human participants to interact with the humanoid by performing activities. In order to emulate the mobility of a real robot, we also placed wheels below the humanoid and made an operator to move the humanoid by pushing it from the behind. Videos were recorded continuously during human activities where each video sequence contains 0 to 3 activities. The videos are in 320*240 resolution with 30 fps.

There are 7 different types of activities in the dataset, including :
<ol>

###4 positive (i.e., friendly) interactions with the observer: 

<li> 'Shaking hands with the observer', <li> 'hugging the observer', <li> 'petting the observer', and <li> 'waving a hand to the observer' 

###1 neutral interaction:
<li>  the situation where two persons have a conversation about the observer while occasionally pointing it.

###2 negative (i.e., hostile) interactions: 
<li>  'Punching the observer' and <li> 'throwing objects to the observer'
</ol>
We will thus assign label to each action, for example:


```
{
  'Shaking hands with the observer': 1, 
  'hugging the observer': 2, 
  'petting the observer': 3, 
  'waving a hand to the observer': 4,
  'the situation where two persons have a conversation about the observer while occasionally pointing it': 5,
  'Punching the observer': 6,
  'throwing objects to the observer': 7
}
```



### Problem 1.0
## Loading the JPL dataset: 5 points
Check the segmented version from [here](https://drive.google.com/file/d/1eivyF3gPbS3ejea-NYebMBzS40xsRrqF/view?usp=sharing). 
Save the videos into your working folder in your Google Drive.
Under your root folder, there should be a folder named "data" (i.e. XXX/Surname_Givenname_SBUID/data) containing the jpl_vid directory where you should extract the jpl dataset. Do not upload the data subfolder before submitting on blackboard due to size limit. There should be only one .ipynb file under your root folder Surname_Givenname_SBUID. 
In the first part of data preparation, we will convert the videos into images. We will only use all frames from each video and store them as .jpg files. The data folder now also consists of two other directories: jpl_vid, jpl_img. **We will delete the jpl_img directory from you data folder before evaluating.**


In [None]:
%cd '/content/gdrive/MyDrive/ColabNotebooks/CSE527CV/kumar_arun_114708780_pa4'
%ls

/content/gdrive/MyDrive/ColabNotebooks/CSE527CV/kumar_arun_114708780_pa4
alexnet_trained_nov30_130_v2.pkl  alexnet_trained_v1.pkl    modelpa4q1_gpu.pkl
alexnet_trained_nov30.pkl         CSE527_PA4_fall_21.ipynb  modelpa4q1.pkl
alexnet_trained_nov30_v2.pkl      [0m[01;34mdata[0m/


In [None]:
#make directories
!mkdir data
!mkdir ./data/jpl_vid
!mkdir ./data/jpl_img

mkdir: cannot create directory ‘data’: File exists
mkdir: cannot create directory ‘./data/jpl_vid’: File exists
mkdir: cannot create directory ‘./data/jpl_img’: File exists


In [None]:
#Unzip the main video file
!cp -r jpl_interaction_segmented_iyuv.zip ./data/jpl_vid
%cd ./data/jpl_vid
!unzip "jpl_interaction_segmented_iyuv.zip" 

In [None]:
!rm -rf /content/gdrive/MyDrive/ColabNotebooks/CSE527CV/kumar_arun_114708780_pa4/data/jpl_img

In [None]:
!mkdir ./data/jpl_img

In [None]:
#to save space remove the ... jpl_interaction_segmented_iyuv.zip file
!rm jpl_interaction_segmented_iyuv.zip

In [None]:
%cd '/content/gdrive/MyDrive/ColabNotebooks/CSE527CV/kumar_arun_114708780_pa4'

/content/gdrive/MyDrive/ColabNotebooks/CSE527CV/kumar_arun_114708780_pa4


In [None]:
#Add new imports always here
import cv2
import os
import os, sys
cwd = os.getcwd()
# import packages here
import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
import random 
import time

import torch
import torchvision
import torchvision.transforms as transforms

from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

In [None]:
os.getcwd()

'/content/gdrive/MyDrive/ColabNotebooks/CSE527CV/kumar_arun_114708780_pa4'

In [None]:
def get_frames(filename, n_frames= -1):
#--------------------------------------------------
#       given the filename of a video,  generate all the frames for that video and return it with length of the frame
#       Example: if path /data/jpl_img/10_1/ should contain the frames from 10_1.avi

#       if n_frames is -1 store all the frames of the video. 
#       Else we will only use n_frames frames from each video that are equally spaced across the entire video and store them as .jpg files.
#       We expect you to use CV2 library to read video frames.
#--------------------------------------------------
    
    print(filename)
    frame_extrct = 0
    frames = []

    vid = cv2.VideoCapture(filename) 
    frame_count = int(vid.get(cv2.CAP_PROP_FRAME_COUNT))
    count = 0

    if n_frames == -1:
      jump = 1
    else:
      jump = frame_count/n_frames

    while vid.isOpened():
      ret,frame = vid.read()

      if frame_extrct >= n_frames:
        break
      if ret:
        frame_extrct = frame_extrct + 1
        frames.append(frame)
        count += jump
        vid.set(cv2.CAP_PROP_POS_FRAMES, count)
      else:
        vid.release()
        break
    
    v_len = len(frames)
    print(v_len)

    return frames, v_len
    
def store_frames(frames, path2store):
    for ii, frame in enumerate(frames):
        frame = cv2.cvtColor(np.float32(frame), cv2.COLOR_RGB2BGR)  
        path2img = os.path.join(path2store, "frame"+str(ii)+".jpg")
        cv2.imwrite(path2img, frame)

In [None]:
path2data = "./data"
sub_folder = "jpl_vid"
sub_folder_jpg = "jpl_img"
path2video = os.path.join(path2data, sub_folder)
listOfCategories = os.listdir(path2video)
listOfCategories, len(listOfCategories)

(['1_1.avi',
  '1_2.avi',
  '1_3.avi',
  '1_4.avi',
  '1_5.avi',
  '1_6.avi',
  '1_7.avi',
  '2_1.avi',
  '2_2.avi',
  '2_3.avi',
  '2_4.avi',
  '2_5.avi',
  '2_6.avi',
  '2_7.avi',
  '3_1.avi',
  '3_2.avi',
  '3_3.avi',
  '3_5.avi',
  '3_6.avi',
  '3_7.avi',
  '4_1.avi',
  '4_2.avi',
  '4_3.avi',
  '4_4_1.avi',
  '4_4_2.avi',
  '4_5.avi',
  '4_6.avi',
  '4_7.avi',
  '5_1.avi',
  '5_2.avi',
  '5_3.avi',
  '5_4.avi',
  '5_5.avi',
  '5_6.avi',
  '5_7.avi',
  '6_1.avi',
  '6_2.avi',
  '6_3.avi',
  '6_4.avi',
  '6_5.avi',
  '6_7.avi',
  '7_1.avi',
  '7_2.avi',
  '7_3.avi',
  '7_4.avi',
  '7_5.avi',
  '7_7.avi',
  '8_1.avi',
  '8_2.avi',
  '8_3.avi',
  '8_5.avi',
  '8_7.avi',
  '8_6.avi',
  '7_6.avi',
  '6_6.avi',
  '8_4.avi',
  '9_7.avi',
  '9_6.avi',
  '9_4.avi',
  '9_1.avi',
  '9_3.avi',
  '9_5.avi',
  '9_2.avi',
  '10_7.avi',
  '10_6.avi',
  '10_4.avi',
  '10_1.avi',
  '10_2.avi',
  '10_5.avi',
  '10_3.avi',
  '11_7.avi',
  '11_6.avi',
  '11_4.avi',
  '11_1.avi',
  '11_3.avi',
  '11_5.a

In [None]:
n_frames = 30

In [None]:
extension = ".avi"
#--------------------------------------------------
#choose a value for n_frames below to optimize your solution. We might randomly choose n_frames while evaluating 
#--------------------------------------------------

for root, dirs, files in os.walk(path2video, topdown=False):
    for name in files:
        if extension not in name:
            continue
        path2vid = os.path.join(root, name)
        frames, vlen = get_frames(path2vid, n_frames= n_frames)
        print(vlen)
        path2store = path2vid.replace(sub_folder, sub_folder_jpg)
        path2store = path2store.replace(extension, "")
        print(path2store)
        os.makedirs(path2store, exist_ok= True)
        store_frames(frames, path2store)
    print("-"*50) 

./data/jpl_vid/1_1.avi
30
30
./data/jpl_img/1_1
./data/jpl_vid/1_2.avi
30
30
./data/jpl_img/1_2
./data/jpl_vid/1_3.avi
30
30
./data/jpl_img/1_3
./data/jpl_vid/1_4.avi
30
30
./data/jpl_img/1_4
./data/jpl_vid/1_5.avi
30
30
./data/jpl_img/1_5
./data/jpl_vid/1_6.avi
30
30
./data/jpl_img/1_6
./data/jpl_vid/1_7.avi
30
30
./data/jpl_img/1_7
./data/jpl_vid/2_1.avi
30
30
./data/jpl_img/2_1
./data/jpl_vid/2_2.avi
30
30
./data/jpl_img/2_2
./data/jpl_vid/2_3.avi
30
30
./data/jpl_img/2_3
./data/jpl_vid/2_4.avi
30
30
./data/jpl_img/2_4
./data/jpl_vid/2_5.avi
30
30
./data/jpl_img/2_5
./data/jpl_vid/2_6.avi
30
30
./data/jpl_img/2_6
./data/jpl_vid/2_7.avi
30
30
./data/jpl_img/2_7
./data/jpl_vid/3_1.avi
30
30
./data/jpl_img/3_1
./data/jpl_vid/3_2.avi
30
30
./data/jpl_img/3_2
./data/jpl_vid/3_3.avi
30
30
./data/jpl_img/3_3
./data/jpl_vid/3_5.avi
30
30
./data/jpl_img/3_5
./data/jpl_vid/3_6.avi
30
30
./data/jpl_img/3_6
./data/jpl_vid/3_7.avi
30
30
./data/jpl_img/3_7
./data/jpl_vid/4_1.avi
30
30
./data/jpl_

## Training, Test and Validation set
**Training set:**
Participant 1-9


**Test set:**
Participant 10 - 12

In [None]:
def prepare_sets(path2ajpgs):
    listOfCats = os.listdir(path2ajpgs)
    train_id = []
    train_label = []
    test_id = []
    test_label = []
    for i in range(len(listOfCats)):
      x = listOfCats[i].split("_")
      if(int(x[0]) > 9):
        test_id.append(listOfCats[i])
        test_label.append(int(x[1])-1)
      else:
        train_id.append(listOfCats[i])
        train_label.append(int(x[1])-1)
      
    return train_id, train_label, test_id, test_label

In [None]:
path2jpg = os.path.join(path2data, sub_folder_jpg)

In [None]:
train_ids, train_labels, test_ids, test_labels = prepare_sets(path2jpg)

In [None]:
test_labels

[6, 5, 3, 0, 1, 4, 2, 6, 5, 3, 0, 2, 4, 1, 6, 5, 3, 0, 2, 4, 1]

In [None]:
from torch.utils.data import Dataset, DataLoader, Subset
import glob
from PIL import Image
import torch
import numpy as np
import random
np.random.seed(2020)
random.seed(2020)
torch.manual_seed(2020)

class VideoDataset(Dataset):
    def __init__(self, ids, labels,transform):      
        self.transform = transform
        self.ids = ids
        self.labels = labels
    def __len__(self):
        return len(self.ids)
    def __getitem__(self, idx):
        path2imgs=glob.glob(path2jpg+"/"+self.ids[idx]+"/*.jpg")
        path2imgs = path2imgs[:n_frames]
        label = self.labels[idx]
        frames = []
        for p2i in path2imgs:
            frame = Image.open(p2i)
            frames.append(frame)
        
        seed = np.random.randint(1e9)        
        frames_tr = []
        for frame in frames:
            random.seed(seed)
            np.random.seed(seed)
            frame = self.transform(frame) 
            frames_tr.append(frame)
        if len(frames_tr)>0:
            frames_tr = torch.stack(frames_tr)
        #print(len(frames_tr))
        #for xb,yb in frames_tr:
        #frames_tr = torch.reshape(frames_tr, ( -1, 240, 320))
        return frames_tr, label

### Problem 1.1
## Dataloader: 5 points
We now need to create scripts so that it accepts the generator that we just created

In [None]:
train_ds = VideoDataset(ids= train_ids, labels= train_labels,
                                                      transform=transforms.Compose([transforms.ToTensor(),
                                                      transforms.Resize((224,224), transforms.InterpolationMode.BILINEAR),
                                                      transforms.Normalize(mean = (0.485, 0.456, 0.406),
                                                      std = (0.229, 0.224, 0.225))]))

print(len(train_ds))

63


In [None]:
test_ds = VideoDataset(ids= test_ids, labels= test_labels,
                                                     transform=transforms.Compose([transforms.ToTensor(),
                                                     transforms.Resize((224,224),transforms.InterpolationMode.BILINEAR),
                                                     transforms.Normalize(mean = (0.485, 0.456, 0.406), 
                                                      std = (0.229, 0.224, 0.225))]))
print(len(test_ds))

21


In [None]:
#train_combine = ConcatDataset[train_ds_flip, train_ds]

In [None]:
#--------------------------------------------------
#create dataloader for all the datasets(train and test) 
#--------------------------------------------------
train_dl = DataLoader(train_ds, batch_size=7, shuffle=True, num_workers=1)
test_dl  = DataLoader(test_ds , batch_size=7, shuffle=False, num_workers=1)

In [None]:
#train_combine = [d for dl in [train_dl, train_dl_flip] for d in dl]

In [None]:
print(train_labels)

[0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 4, 5, 6, 0, 1, 2, 3, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 6, 0, 1, 2, 3, 4, 6, 0, 1, 2, 4, 6, 5, 5, 5, 3, 6, 5, 3, 0, 2, 4, 1]


In [None]:
print(test_labels)

[6, 5, 3, 0, 1, 4, 2, 6, 5, 3, 0, 2, 4, 1, 6, 5, 3, 0, 2, 4, 1]


In [None]:
dataloaders = []
dataloaders.append(train_dl)
dataloaders.append(test_dl)

In [None]:
dataset_sizes = []
dataset_sizes.append(63)
dataset_sizes.append(21)

In [None]:
for xb,yb in train_dl:
    print(xb.shape, yb.shape)
    break

torch.Size([7, 30, 3, 224, 224]) torch.Size([7])


## Problem 1.2: Fine Tuning a Pre-Trained Deep Network: 40 points
The representations learned by deep convolutional networks generalize surprisingly well to other recognition tasks. 

But how do we use an existing deep network for a new recognition task? Take for instance,  [ResNet network](https://pytorch.org/docs/stable/_modules/torchvision/models/resnet.html) [(paper)](https://arxiv.org/abs/1512.03385).


**Hints**:
- Many pre-trained models are available in PyTorch at [here](http://pytorch.org/docs/master/torchvision/models.html).
- For fine-tuning pretrained network using PyTorch, please read this [tutorial](http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html).

###Problem 1.2.1

##*Fine-tune* an existing network: 30 points
 In this scenario you take an existing network, replace the final layer (or more) with random weights, and train the entire network again with images and ground truth labels for your recognition task. You are effectively treating the pre-trained deep network as a better initialization than the random weights used when training from scratch. When you don't have enough training data to train a complex network from scratch (e.g. with the 7 classes) this is an attractive option. In [this paper](http://www.cc.gatech.edu/~hays/papers/deep_geo.pdf) from CVPR 2015, there wasn't enough data to train a deep network from scratch, but fine tuning led to 4 times higher accuracy than using off-the-shelf networks directly.
 You are required to implement above strategy to fine-tune a pre-trained **ResNet** for this video frames classification task with 7 classes.



In [None]:
import torch.optim as optim
from torch.optim import lr_scheduler
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
#import resnet
import torchvision
r3d_18_pre = torchvision.models.video.r3d_18(pretrained=True)

In [None]:
#--------------------------------------------------
#       Fine-Tune Pretrained Network
#--------------------------------------------------
r3d_18_pre = r3d_18_pre.to(device)
criterion = nn.CrossEntropyLoss()
optimizer_ft = optim.Adam(r3d_18_pre.parameters(), lr= 0.001)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

In [None]:
r3d_18_pre

VideoResNet(
  (stem): BasicStem(
    (0): Conv3d(3, 64, kernel_size=(3, 7, 7), stride=(1, 2, 2), padding=(1, 3, 3), bias=False)
    (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Sequential(
        (0): Conv3DSimple(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (conv2): Sequential(
        (0): Conv3DSimple(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (relu): ReLU(inplace=True)
    )
    (1): BasicBlock(
      (conv1): Sequential(
        (0): Conv3DSimple(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (1):

In [None]:
#for param in r3d_18_pre.parameters():
#    param.requires_grad = False

r3d_18_pre.stem[0] = nn.Conv3d(n_frames, 64, kernel_size=(3, 7, 7), stride=(1, 2, 2), padding=(1, 3, 3), bias=False)
r3d_18_pre.fc = nn.Linear(in_features=512, out_features=7, bias=True)

# for child in r3d_18_pre.children():
#     for param in child.parameters():
#         param.requires_grad = False

# for layer in list(r3d_18_pre.children())[-4:]:
#   print(layer)
#   layer.requires_grad = True



###Problem 1.2.2
###Training and Testing your fine-tuned Network: 10 points
You will fine-tune your network using every frame in the video as a sample with the class label. Use train_dl and test_dl and feed it to your fine-tuned network. Please provide detailed descriptions of:<br>
(1) which layers of Resnet have been replaced<br>
(2) the architecture of the new layers added including activation methods <br>
(3) the final accuracy on test set <br>

In [None]:
dataset_sizes

[63, 21]

In [None]:
#--------------------------------------------------
#       Train your model
#--------------------------------------------------

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()
    model.to(device)

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in range(len(dataloaders)):
            if phase == 0:
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                

                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 0):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    #loss.requires_grad = True

                    # backward + optimize only if in training phase
                    if phase == 0:
                        loss.backward()
                        optimizer.step()

  

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 0:
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 1 and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

In [None]:
model_ft = train_model(r3d_18_pre, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)

Epoch 0/24
----------
0 Loss: 1.6657 Acc: 0.3492
1 Loss: 4.3657 Acc: 0.1905

Epoch 1/24
----------
0 Loss: 0.6963 Acc: 0.6984
1 Loss: 6.8464 Acc: 0.1429

Epoch 2/24
----------
0 Loss: 0.4383 Acc: 0.8889
1 Loss: 5.2877 Acc: 0.2381

Epoch 3/24
----------
0 Loss: 0.2417 Acc: 0.9048
1 Loss: 2.4201 Acc: 0.4286

Epoch 4/24
----------
0 Loss: 0.1557 Acc: 0.9524
1 Loss: 1.6532 Acc: 0.4286

Epoch 5/24
----------
0 Loss: 0.0630 Acc: 1.0000
1 Loss: 2.4974 Acc: 0.2857

Epoch 6/24
----------
0 Loss: 0.0945 Acc: 0.9841
1 Loss: 1.6982 Acc: 0.3810

Epoch 7/24
----------
0 Loss: 0.0422 Acc: 1.0000
1 Loss: 1.1683 Acc: 0.5714

Epoch 8/24
----------
0 Loss: 0.0381 Acc: 1.0000
1 Loss: 0.9356 Acc: 0.6190

Epoch 9/24
----------
0 Loss: 0.0180 Acc: 1.0000
1 Loss: 0.8567 Acc: 0.6667

Epoch 10/24
----------
0 Loss: 0.0145 Acc: 1.0000
1 Loss: 0.8191 Acc: 0.7143

Epoch 11/24
----------
0 Loss: 0.0230 Acc: 1.0000
1 Loss: 0.8137 Acc: 0.7143

Epoch 12/24
----------
0 Loss: 0.0247 Acc: 1.0000
1 Loss: 0.8261 Acc: 0.71

In [None]:
#model_ft = train_model(r3d_18_pre, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)
#0 is training set
#1 is validation set

In [None]:
#Saving the model:
import pickle
pickle.dump(model_ft, open("./modelpa4q1.pkl", "wb"))

In [None]:
import pickle
pickle.dump(model_ft, open("./modelpa4q1_gpu.pkl", "wb"))

In [None]:
ls

CSE527_PA4_fall_21.ipynb  jpl_interaction_segmented_iyuv.zip  modelpa4q1.pkl
[0m[01;34mdata[0m/                     modelpa4q1_gpu.pkl


## Problem 2: Video Classification
### Previous Implementation
This dataset was released as a part of the [paper](http://michaelryoo.com/papers/cvpr2013_ryoo.pdf) in CVPR 2013. The paper investigates multichannel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. As stated in the paper, *We first introduce video features designed to capture
global motion (Subsection 3.1) and local motion (Subsection 3.2) observed during humans’ various interactions with
the observer. Next, in Subsection 3.3, we cluster features to
form visual words and obtain histogram representations. In
Subsection 3.4, multi-channel kernels are described.* These features were prepared for an input to the SVM.

###Using CNNs
In this approach of video classification we are using an image classifier on every single frame of the video. We then have to merge the feature vectors obtained per frames using a fusion layer. This need to be built into the network itself. A Fusion layer is used to merge the output of separate networks that operate on temporally distant frames. It is normally implemented using the max pooling, average pooling or flattening technique. We then define a fully connected layer to provide the output.



### Problem 2.1
### Temporal Pooling: 20 points
As suggested in this [paper](https://arxiv.org/abs/1503.08909), we position the temporal pooling layer right before the ﬁrst fully connected layer as illustrated. This layer performs either mean-pooling or max-pooling across all video frames. The structure of the CNN-component is identical single-frame model. This network is able to collect all the spatial features in a given time window. However, the order of the temporal events is lost due to the nature of pooling across frames


In [None]:
import torch
pretrained = torch.hub.load('pytorch/vision:v0.10.0', 'alexnet', pretrained=True)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0


In [None]:
pretrained

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

In [None]:
pretrained.classifier[6] = nn.Linear(in_features=4096, out_features=7, bias=True)

In [None]:
dataset_sizes = []
dataset_sizes.append(len(train_ids))
dataset_sizes.append(len(test_ids))
print(dataset_sizes)

[63, 21]


In [None]:
#alexnet_pretrained.features[0] = nn.Conv2d(48, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))

In [None]:
# #train the alexnet
# num_epochs = 20
# num_classes = 7
# batch_size = 8
learning_rate = 0.001
# # convert all the layers to list and remove the last one
# nu_ftrs = alexnet_pretrained.classifier[6].in_features
# #print(nu_ftrs)

# features = list(alexnet_pretrained.classifier.children())[:-1]
# #print(features)
# #Updated AlexNet
# features.extend([nn.Linear(nu_ftrs,num_classes)])
# alexnet_pretrained.classifier = nn.Sequential(*features)
# print(alexnet_pretrained.classifier)

#Defining the Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(pretrained.parameters(), lr=learning_rate)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

In [None]:
#--------------------------------------------------
#       Train your model
#--------------------------------------------------

def train_alexnet_model(model, criterion, optimizer, scheduler, num_epochs=25):

    since = time.time()
    model.to(device)

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in range(len(dataloaders)):
            if phase == 0:
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                
                inputs = inputs.to(device)
                labels = labels.repeat_interleave(n_frames)
                labels = labels.to(device)
                inputs = torch.reshape(inputs, ( inputs.size()[0]*inputs.size()[1], inputs.size()[2], inputs.size()[3], inputs.size()[4]) )
                #print(inputs.shape)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 0):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    #oss.requires_grad = True

                    # backward + optimize only if in training phase
                    if phase == 0:
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 0:
                scheduler.step()

            epoch_loss = running_loss / (dataset_sizes[phase]*n_frames)
            epoch_acc = running_corrects.double() / (dataset_sizes[phase]*n_frames)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 1 and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

    #print()

  

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)

    return model

In [None]:
dataset_sizes

[63, 21]

In [None]:
ct = 0

for child in pretrained.children():
  ct = ct+1
  if ct <= 1:
    print(ct)
    print(child)
    for param in child.parameters():
      param.requires_grad = False

1
Sequential(
  (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
  (1): ReLU(inplace=True)
  (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (4): ReLU(inplace=True)
  (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU(inplace=True)
  (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): ReLU(inplace=True)
  (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (11): ReLU(inplace=True)
  (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)


In [None]:
pre_trained2 = train_alexnet_model(pretrained, criterion, optimizer, exp_lr_scheduler, num_epochs=25)

Epoch 0/24
----------
0 Loss: 2.1141 Acc: 0.1608
1 Loss: 1.8102 Acc: 0.2873
Epoch 1/24
----------
0 Loss: 1.6934 Acc: 0.3376
1 Loss: 1.6809 Acc: 0.3413
Epoch 2/24
----------
0 Loss: 1.5672 Acc: 0.3942
1 Loss: 1.6179 Acc: 0.4127
Epoch 3/24
----------
0 Loss: 1.3974 Acc: 0.4884
1 Loss: 1.5720 Acc: 0.4159
Epoch 4/24
----------
0 Loss: 1.3143 Acc: 0.4942
1 Loss: 1.5159 Acc: 0.4937
Epoch 5/24
----------
0 Loss: 1.2745 Acc: 0.5228
1 Loss: 1.5036 Acc: 0.4317
Epoch 6/24
----------
0 Loss: 1.1885 Acc: 0.5677
1 Loss: 1.4547 Acc: 0.4873
Epoch 7/24
----------
0 Loss: 1.0925 Acc: 0.6233
1 Loss: 1.4485 Acc: 0.5048
Epoch 8/24
----------
0 Loss: 1.0613 Acc: 0.6471
1 Loss: 1.4443 Acc: 0.5206
Epoch 9/24
----------
0 Loss: 1.0689 Acc: 0.6471
1 Loss: 1.4429 Acc: 0.5095
Epoch 10/24
----------
0 Loss: 1.0654 Acc: 0.6497
1 Loss: 1.4416 Acc: 0.5127
Epoch 11/24
----------
0 Loss: 1.0404 Acc: 0.6741
1 Loss: 1.4390 Acc: 0.5048
Epoch 12/24
----------
0 Loss: 1.0314 Acc: 0.6772
1 Loss: 1.4365 Acc: 0.5048
Epoch 13/

In [None]:
import torch
resnet = torch.hub.load('pytorch/vision:v0.10.0', 'googlenet', pretrained=True)

Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0
Downloading: "https://download.pytorch.org/models/googlenet-1378be20.pth" to /root/.cache/torch/hub/checkpoints/googlenet-1378be20.pth


  0%|          | 0.00/49.7M [00:00<?, ?B/s]

In [None]:
resnet

GoogLeNet(
  (conv1): BasicConv2d(
    (conv): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (maxpool1): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
  (conv2): BasicConv2d(
    (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (conv3): BasicConv2d(
    (conv): Conv2d(64, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  )
  (maxpool2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
  (inception3a): Inception(
    (branch1): BasicConv2d(
      (conv): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track

In [None]:
resnet.fc =  nn.Linear(in_features=1024, out_features=7, bias=True)

In [None]:
ct = 0

for child in resnet.children():
    ct = ct+1
    if ct <= 6:
      print(ct, "------------------")
      print(child)
      for param in child.parameters():
        param.requires_grad = True

1 ------------------
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
2 ------------------
BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
3 ------------------
ReLU(inplace=True)
4 ------------------
MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
5 ------------------
Sequential(
  (0): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (1): BasicBlock(
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): R

In [None]:
#Saving the model:
import pickle
pickle.dump(pre_trained2, open("./alexnet_trained_nov30_130_v2.pkl", "wb"))

In [None]:
def removeLayer(alex_net):    
    new_classifier = nn.Sequential(*list(alex_net.classifier.children())[:-1])
    alex_net.classifier = new_classifier
    return alex_net
#alex_net = models.alexnet(pretrained=True)
#print(alex_net)
alex_net = removeLayer(pre_trained2)
print(alex_net)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

In [None]:
alex_net=pre_trained2

In [None]:
torch.cuda.empty_cache()
import gc
gc.collect()

834

In [None]:
alex_net

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
 

In [None]:
#Extract feature:
def extract_imagenet_feature(dl):

  feature = []
  
  for input, labels in dl:
    for i in range(len(input)):
      inputs = input[i]
      inputs.to(device)
      #print(inputs.shape)
      inputs = torch.reshape(inputs, ( inputs.size()[0], inputs.size()[1], inputs.size()[2], inputs.size()[3]) )
      #alex_net.to(device)
      output = alex_net(inputs.to(device))
      feature.append(output)
  
  feature_tensor = torch.stack(feature)

  return feature_tensor


In [None]:
train_feauture = extract_imagenet_feature(dataloaders[0])
test_feauture = extract_imagenet_feature(dataloaders[1])
print(len(train_feauture), train_feauture.shape)

63 torch.Size([63, 30, 4096])


In [None]:
#--------------------------------------------------
#       Utilities for Temporal pooling
#--------------------------------------------------

In [None]:
from torch.nn import MaxPool3d, AvgPool3d
from torch.nn import MaxPool2d, AvgPool2d

In [None]:
pool = AvgPool2d(30,1)
#train_feature_tensor = torch.stack(train_feature)
train_feature_pool = pool(train_feauture)
test_feature_pool = pool(test_feauture)

print(train_feature_pool.shape, test_feature_pool.shape)

#Convert roch to Numpy
test_feature_np = test_feature_pool.cpu().detach().numpy()
train_feature_np = train_feature_pool.cpu().detach().numpy()

train_feature_np = train_feature_np[:,0,:]
test_feature_np = test_feature_np[:,0,:]

print(train_feature_np.shape, test_feature_np.shape)

#Extract labels
test_labels_np = np.array(test_labels)
train_labels_np = np.array(train_labels)

torch.Size([63, 1, 4067]) torch.Size([21, 1, 4067])
(63, 4067) (21, 4067)


In [None]:
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

In [None]:
print(train_labels_np.shape, train_feature_np.shape)

(63,) (63, 4067)


In [None]:
train_labels_np

array([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 4, 5, 6, 0, 1,
       2, 3, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 6, 0, 1, 2,
       3, 4, 6, 0, 1, 2, 4, 6, 5, 5, 5, 3, 6, 5, 3, 0, 2, 4, 1])

In [None]:
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn import svm
from sklearn.svm import SVC
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier

# split data into train and test sets

def modelfit(t):
  
  #clf = KNeighborsClassifier(n_neighbors=10, algorithm="kd_tree")
  #clf = DecisionTreeClassifier(max_depth=8, random_state=t)
  clf = svm.SVC(kernel = "poly", max_iter=50000, degree=2, tol = 1e-5,coef0=0,shrinking=True,decision_function_shape='ovr',gamma='scale')
  #clf = XGBClassifier( learning_rate = 0.01,max_depth = 5 )

  clf.fit(train_feature_np, train_labels_np)

  def calulate(dataset, labels, data):

    prediction = [-1 for _ in range(len(dataset))]
    for i in range(len(dataset)):
      prediction[i] = clf.predict(np.reshape(dataset[i],(1,-1)))

    label_pred = np.reshape(np.array(prediction),(-1,))
    #print(label_pred)
    #The prediction is test_label_pred
    accuracy = sum(np.array(labels) == label_pred) / float(len(labels))
    #list.append(accuracy)
    print("The accuracy of model is", data,  "{:.2f}%".format(accuracy*100))

  calulate(train_feature_np, train_labels_np, "Train")
  calulate(test_feature_np, test_labels_np, "Test")

modelfit(i)

The accuracy of model is Train 90.48%
The accuracy of model is Test 38.10%


### Problem 2.2
### Network Definition: 20 points
### Feature Extraction using an ImageNet pre-trained CNN 
Use a fine-tuned resNet model that you used in Part1 to extract the features from every video frames.



In [None]:
#--------------------------------------------------
#Define your  Vid_Classifier
#you may add extra parameters here.
#remember to define :
  # a base model which is an ImageNet pre-trained CNN : Extract One feature vector per frame
  # a max pooling layer that finds the maximum feature map over a local temporal neighborhood
  # a fully connected layer to unify the feature maps
#You may also want to include other parameters for your module
#if you are using a knn classifer, please indicate it well with your code.
#--------------------------------------------------
  
from torchvision import models
from torch import nn
class Vid_Classifier(nn.Module):
    def __init__(self, params_model):
        super(Vid_Classifier, self).__init__()
        num_classes = params_model["num_classes"]
        dr_rate= params_model["dr_rate"] #drop out rate
        pretrained = params_model["pretrained"]
        #--------------------------------------------------
        #Your code here
        #--------------------------------------------------
             
    def forward(self, x):
        #--------------------------------------------------
        #Your code here
        #--------------------------------------------------


In [None]:
num_classes = 7

In [None]:
params_model={
        "num_classes": num_classes,
        "dr_rate": 0.1,
        "pretrained" : True,}
model = Vid_Classifier(params_model) 

In [None]:
model

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

In [None]:
# path2weights = "./models/weights.pt"
# torch.save(model.state_dict(), path2weights)

### Problem 2.2
**Train and Test:10 points**
### Problem 2.2
**Fusion based implementation: 20 bonus points**


In [None]:
#define your training function

**Answer**:

Accuracy on test set: 

## Submission guidelines
---
Your need to submit a single zip file to Blackboard described as follow.

Please generate a pdf file that includes a ***google shared link*** (explained in the next paragraph). This pdf file should be named as ***Surname_Givenname_SBUID_pa*\*.pdf** (example: Jordan_Michael_111234567_pa3.pdf for this assignment).

To generate the ***google shared link***, first create a folder named ***Surname_Givenname_SBUID_pa**** in your Google Drive with your Stony Brook account. The structure of the files in the folder should be exactly the same as the one you downloaded. For instance in this homework:

```
Surname_Givenname_SBUID_pa4
        |---data
        |---CSE527-PA4-fall21.ipynb
```
Note that this folder should be in your Google Drive with your Stony Brook account.

Then right click this folder, click ***Get shareable link***, in the People textfield, enter the TA's email: ***bjha@cs.stonybrook.edu***, ***li.wenchen@stonybrook.edu***, ***yifeng.huang@stonybrook.edu***. Make sure that TAs who have the link **can edit**, ***not just*** **can view**, and also **UNCHECK** the **Notify people** box.

Note that in google colab, we will only grade the version of the code right before the timestamp of the submission made in blackboard. 

To submit to Blackboard, zip ***Surname_Givenname_SBUID_pa*\*.pdf** and ***Surname_Givenname_SBUID_pa**** folder together and name your zip file as ***Surname_Givenname_SBUID_pa*\*.zip**. 

**DO NOT upload the datasets to Blackboard.**

The input and output paths are predefined and **DO NOT** change them, (we assume that 'Surname_Givenname_SBUID_pa4' is your working directory, and all the paths are relative to this directory).  The image read and write functions are already written for you. All you need to do is to fill in the blanks as indicated to generate proper outputs.


-- DO NOT change the folder structure, please just fill in the blanks. <br>

You are encouraged to post and answer questions on Piazza. Based on the amount of email that we have received in past years, there might be dealys in replying to personal emails. Please ask questions on Piazza and send emails only for personal issues.

If you alter the folder structures, the grading of your homework will be significantly delayed and possibly penalized.

Be aware that your code will undergo plagiarism check both vertically and horizontally. Please do your own work.

<!--Write your report here in markdown or html-->
