# 4.1.0 Prednet and Breakfast Clips

## Jupyter Extensions

Load [watermark](https://github.com/rasbt/watermark) to see the state of the machine and environment that's running the notebook. To make sense of the options, take a look at the [usage](https://github.com/rasbt/watermark#usage) section of the readme.

In [1]:
# Load `watermark` extension
%load_ext watermark
# Display the status of the machine and packages. Add more as necessary.
%watermark -v -n -m -g -b -t -p torch,torchvision,cv2,h5py,pandas,matplotlib,seaborn,jupyterlab,lab

Wed Feb 26 2020 17:58:59 

CPython 3.6.10
IPython 7.12.0

torch 1.2.0
torchvision 0.1.8
cv2 3.4.2
h5py 2.8.0
pandas 1.0.1
matplotlib 3.1.3
seaborn 0.10.0
jupyterlab 1.2.6
lab 0+untagged.29.g1bb7899.dirty

compiler   : GCC 7.3.0
system     : Linux
release    : 4.15.0-76-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 16
interpreter: 64bit
Git hash   : 1bb789928298409106529d6e8942da764a7e8865
Git branch : master


Load [autoreload](https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) which will always reload modules marked with `%aimport`.

This behavior can be inverted by running `autoreload 2` which will set everything to be auto-reloaded *except* for modules marked with `%aimport`.

In [2]:
# Load `autoreload` extension
%load_ext autoreload
# Set autoreload behavior
%autoreload 1

Load `matplotlib` in one of the more `jupyter`-friendly [rich-output modes](https://ipython.readthedocs.io/en/stable/interactive/plotting.html). Some options (that may or may not have worked) are `inline`, `notebook`, and `gtk`.

In [3]:
# Set the matplotlib mode
%matplotlib inline

## Set the GPU

Make sure we aren't greedy.

In [4]:
!nvidia-smi

Wed Feb 26 17:59:14 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  TITAN Xp            Off  | 00000000:04:00.0 Off |                  N/A |
| 44%   70C    P2   190W / 250W |  11851MiB / 12196MiB |     56%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:05:00.0 Off |                  N/A |
| 51%   80C    P2   119W / 250W |  11851MiB / 12196MiB |     53%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:08:00.0 Off |                  N/A |
| 54%   

In [5]:
%env CUDA_VISIBLE_DEVICES=

env: CUDA_VISIBLE_DEVICES=


## Imports

In [20]:
from pathlib import Path

import torch
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torch.utils.data.sampler import SubsetRandomSampler

Local imports that may or may not be autoreloaded. This section contains things that will likely have to be re-imported multiple times, and have additions or subtractions made throughout the project.

In [7]:
# Constants to be used throughout the package
%aimport lab
import lab
%aimport lab.index
from lab.index import DIR_DATA_INT, DIR_DATA_RAW
%aimport lab.breakfast
import lab.breakfast as bk
%aimport lab.breakfast.constants
from lab.breakfast.constants import SEED
# Import the data subdirectories
%aimport lab.breakfast.index
from lab.breakfast.index import (DIR_BREAKFAST, 
                                 DIR_BREAKFAST_DATA, 
                                 DIR_COARSE_SEG, 
                                 DIR_FINE_SEG,
                                )
%aimport lab.breakfast.dataloader
from lab.breakfast.dataloader import Breakfast64DimFVDataset

## Initial Setup

Set [seaborn defaults](https://seaborn.pydata.org/generated/seaborn.set.html) for matplotlib.

In [8]:
sns.set()

## The Dataloader

In [9]:
ds = Breakfast64DimFVDataset()

In [10]:
ds[0]

(array([[-4.930377, -3.729419, -2.974148, ...,  4.952208, -0.0524  ,
         -2.923864],
        [-4.745723, -3.4886  , -3.447279, ...,  5.638719, -1.229474,
         -2.064759],
        [-4.840712, -3.733112, -3.627248, ...,  5.671757, -1.623616,
         -1.68041 ],
        ...,
        [-2.588912,  0.885864, -1.724053, ...,  2.534779, -7.805843,
         21.473339],
        [-1.485294,  0.796884, -2.246547, ...,  1.793978, -8.247074,
         22.106295],
        [-1.318826,  0.558616, -2.55644 , ...,  1.375821, -8.771461,
         22.229328]]),
 '/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/clips/38_pancake_stereo_1_start_1244_event_55_seed_117.npy')

## Splitting the Data

See [this](https://stackoverflow.com/questions/50544730/how-do-i-split-a-custom-dataset-into-training-and-test-datasets/50544887#50544887) stackoverflow post.

In [23]:
np.random.seed(SEED)

ds_length = len(ds)
indices = list(range(ds_length))
n_test = 128
batch_size = 16

np.random.shuffle(indices)
train_indices, test_indices = indices[n_test:], indices[:n_test]

In [24]:
train_sampler = SubsetRandomSampler(train_indices)
test_sampler = SubsetRandomSampler(test_indices)

train_loader = DataLoader(ds, batch_size=batch_size, sampler=train_sampler)
test_loader = DataLoader(ds, batch_size=batch_size, sampler=test_sampler)

In [25]:
num_epochs = 10

for epoch in range(num_epochs):
    for batch_idx, (data, path) in enumerate(train_loader):
        if batch_idx is 1:
            print(Path(path[0]).stem)

23_sandwich_cam01_start_905_event_53_seed_117
16_cereals_cam01_start_0_event_32_seed_117
18_scrambledegg_webcam02_start_723_event_19_seed_117
29_scrambledegg_stereo_1_start_480_event_30_seed_117
17_pancake_stereo_1_start_468_event_62_seed_117
12_salat_webcam01_start_1923_event_40_seed_117
12_friedegg_stereo_1_start_222_event_42_seed_117
22_pancake_cam01_start_2114_event_23_seed_117
24_coffee_stereo_1_start_115_event_27_seed_117
30_scrambledegg_cam01_start_1825_event_8_seed_117


## The Model

In [77]:
%aimport lab.breakfast.prednet
from lab.breakfast.prednet import PredNet
%aimport lab.breakfast.lstm
from lab.breakfast.lstm import LSTM

In [55]:
num_epochs = 10
A_channels = (1, 32, 48, 96)
R_channels = (1, 32, 48, 96)
lr = 0.001 # if epoch < 75 else 0.0001
nt = 64 # num of time steps

### Pytorch CUDA Version Errors

#### Pytorch 1.3 Error

In [34]:
layer_loss_weights = Variable(torch.FloatTensor([[1.], [0.], [0.], [0.]]).cuda())
time_loss_weights = 1./(nt - 1) * torch.ones(nt, 1)
time_loss_weights[0] = 0
time_loss_weights = Variable(time_loss_weights.cuda())

AssertionError: 
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

In [35]:
torch._C._cuda_getDriverVersion()

9010

In [36]:
torch.__version__

'1.3.1'

Installing pytorch version `1.2` to see if that helps the issue.

#### Pytorch 1.2 Errors

In [18]:
layer_loss_weights = Variable(torch.FloatTensor([[1.], [0.], [0.], [0.]]).cuda())
time_loss_weights = 1./(nt - 1) * torch.ones(nt, 1)
time_loss_weights[0] = 0
time_loss_weights = Variable(time_loss_weights.cuda())

AssertionError: 
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

#### Cudatoolkit 9.2

This couldn't work using the `miniconda` setup the lab uses. Got these errors when trying to uninstall, or bump versions:

```
ERROR conda.core.link:_execute(700): An error occurred while uninstalling package 'pytorch::pytorch-1.2.0-py3.6_cuda10.0.130_cudnn7.6.2_0'.
Rolling back transaction: done

[Errno 13] Permission denied: '/media/data/conda/abdullah/envs/bk/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so' -> '/media/data/conda/abdullah/envs/bk/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so.c~'
()
```

## Using CUDA 10

Problem was vaguely that the machines were running too low of a CUDA version. Machines with CUDA 10 had no problems.

In [56]:
# layer_loss_weights = Variable(torch.FloatTensor([[1.], [0.], [0.], [0.]]).cuda())
# time_loss_weights = 1./(nt - 1) * torch.ones(nt, 1)
# time_loss_weights[0] = 0
# time_loss_weights = Variable(time_loss_weights.cuda())

layer_loss_weights = Variable(torch.FloatTensor([[1.], [0.], [0.], [0.]]))
time_loss_weights = 1./(nt - 1) * torch.ones(nt, 1)
time_loss_weights[0] = 0
time_loss_weights = Variable(time_loss_weights)

In [57]:
model = PredNet(R_channels, A_channels, output_mode='error')
# if torch.cuda.is_available():
#     print('Using GPU.')
#     model.cuda()

In [58]:
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

def lr_scheduler(optimizer, epoch):
    if epoch < num_epochs //2:
        return optimizer
    else:
        for param_group in optimizer.param_groups:
            param_group['lr'] = 0.0001
        return optimizer

In [None]:
for epoch in range(num_epochs):
    optimizer = lr_scheduler(optimizer, epoch)
    for batch_idx, (data, path) in tqdm(enumerate(train_loader)):
        data = Variable(data)
        errors = model(data) # batch x n_layers x nt
        loc_batch = errors.size(0)
        errors = torch.mm(errors.view(-1, nt), time_loss_weights) # batch*n_layers x 1
        errors = torch.mm(errors.view(loc_batch, -1), layer_loss_weights)
        errors = torch.mean(errors)

        optimizer.zero_grad()

        errors.backward()

        optimizer.step()
        if i%10 == 0:
            print(f'Epoch: {epoch}/{num_epochs}, step: {i}/{len(ds)//batch_size}, '
                  f'errors: {errors.data[0]}')
            
        break
    break
#torch.save(model.state_dict(), 'training.pt')

### Putting Everything in One Cell

In [143]:
np.random.seed(SEED)

ds_length = len(ds)
indices = list(range(ds_length))
n_test = 128
batch_size = 8

np.random.shuffle(indices)
train_indices, test_indices = indices[n_test:], indices[:n_test]

train_sampler = SubsetRandomSampler(train_indices)
test_sampler = SubsetRandomSampler(test_indices)

train_loader = DataLoader(ds, batch_size=batch_size, sampler=train_sampler)
test_loader = DataLoader(ds, batch_size=batch_size, sampler=test_sampler)

num_epochs = 10
A_channels = (64, 128, 256, 512)
R_channels = (64, 128, 256, 512)

lr = 0.001 # if epoch < 75 else 0.0001
nt = 64 # num of time steps

layer_loss_weights = Variable(torch.FloatTensor([[1.], [0.], [0.], [0.]]))
time_loss_weights = 1./(nt - 1) * torch.ones(nt, 1)
time_loss_weights[0] = 0
time_loss_weights = Variable(time_loss_weights)

print('llw', layer_loss_weights.shape, 'tlw', time_loss_weights.shape)

model = PredNet(R_channels, A_channels, output_mode='error')

optimizer = torch.optim.Adam(model.parameters(), lr=lr)

def lr_scheduler(optimizer, epoch):
    if epoch < num_epochs //2:
        return optimizer
    else:
        for param_group in optimizer.param_groups:
            param_group['lr'] = 0.0001
        return optimizer
    
for epoch in range(num_epochs):
    optimizer = lr_scheduler(optimizer, epoch)
    for batch_idx, (data, path) in tqdm(enumerate(train_loader)):
        data = Variable(data)
        errors = model(data) # batch x n_layers x nt
        loc_batch = errors.size(0)
        errors = torch.mm(errors.view(-1, nt), time_loss_weights) # batch*n_layers x 1
        errors = torch.mm(errors.view(loc_batch, -1), layer_loss_weights)
        errors = torch.mean(errors, axis=0)

        optimizer.zero_grad()
        errors.backward()

        optimizer.step()
        if batch_idx % 10 == 0:
            print(f'Epoch: {epoch}/{num_epochs}, step: {batch_idx}/{len(ds)//batch_size}, '
                  f'errors: {errors.data[0]}')
            
        break
    break
#torch.save(model.state_dict(), 'training.pt')



0it [00:00, ?it/s][A[A

llw torch.Size([4, 1]) tlw torch.Size([64, 1])
torch.Size([8, 4, 64])
8
torch.Size([32, 1])
torch.Size([8, 1])
torch.Size([1])


0it [00:03, ?it/s]

Epoch: 0/10, step: 0/882, errors: 1.7341536283493042





In [111]:
model.r_channels

(1, 64, 128, 256, 0)

In [87]:
2 * model.a_channels[2] + model.r_channels[3]

256

In [88]:
model.a_channels[2]

128

In [89]:
model.r_channels[3]

0

In [92]:
model.n_layers - 1

3