# Autopilot at Tesla

- Image: (3, 960, 1280)
- ResNet-50 like Backbone
- FPN- DeepLabV3 - UNet <- like heads 
- ~15 tasks but then, those tasks have sub-tasks like object detection has the subtask of stationary object vs. moving object etc.

Goals:
- Be able to access another camera's scene to predict the depth of the scene in front of the car.
- Stitching up images across space and time happens with RNNs.
- One RNN takes in scenes from the camera and creates 3D space
- 8 hydronets that share backbone neural nets to predict different objects 
- R-CNNs on videso?
    - 8 cameras, 16 time steps, 32-batch_size: 8 * 16 * 32 -> 4096 images 
    
Requires a combination of data parallel and model parallel training


## Tasks:
- Explore ResNet50
- Check transfer learning w/ ResNet50
- Explore adding heads to models


In ResNets, let's say we desire underlying mapping of H(x):
if we let the stacked nonlinear layers fit another mapping of F(x):=H(x) - x, the original mapping will be recast to F(x) + x.
Hypothesis is that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. If an identity mapping i.e. layers added to the shallower architecture to make it deeper, if the identity mapping is optimal, it is eaasier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.

We formulate the F(x) + x by making use of feedforward NNs with "shortcut connections" i.e. skipping one or more layers. The shortcuts do not add any parameters and their outputs are simply added to the outputs of the stacked layers.

### Two results:
1. Extremely deep residual nets are easy to optimize vs. the counterpart "plain" nets that have simply stack layers, show greater training error when the depth increases.
2. Our deep residual nets can easily enjoy accuracy gains from greatly increase depth, producing results substantially better than previous networks.

### Residual Learning:
Let's say:
H(x)
This follows the idea of the degradation i.e. if we have a shallow function with identity mapping layers on top of it, the training error should be no greater than its shallower counter-part. With residual functions, if identity mappings are optimal, the solvers may simply drive the weights of the multiple nonlinear layers toward zero to approach identity mappings. 

### Identity Mapping by shortcutes

The residual learning is present after every few stacked layers. 

## Coding ResNet50

In [2]:
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import transforms

In [3]:
# let's define a pre-activation non-bottleneck residual block
# In bottleneck residual block, first we do 1 x 1 convolution to downsample the input volume depth
# then, we apply a 3x3(bottleneck) convolution to the reduced input
class PreActivationBlock(nn.Module):
    """PreActivation Residual Block i.e. no ReLU after adding x."""
    
    expansion = 1
    
    def __init__(self, in_slices, slices, stride=1):   # learnable block components
        super(PreActivationBlock, self).__init__()     # convolution and batch normalization
        
        self.bn_1 = nn.BatchNorm2d(in_slices)          # pass in the image or input if it is second block
        self.conv_1 = nn.Conv2d(in_channels=in_slices, out_channels=slices,      # output is the slices, which gets
                               kernel_size=3,stride=stride,padding=1,            # passed into the second bn and conv layer
                               bias=False)
        
        self.bn_2 = nn.BatchNorm2d(slices)
        self.conv_2 = nn.Conv2d(in_channels=slices, out_channels=slices,
                                kernel_size=3, stride=1, padding=1,
                                bias=False)
        
        # after we define the layer, we check if the output of the block is the same as the x input
        # if not, we use convolutions for the shortcut
        # REMEMBER: x remains the same. 
                        # in_slices is the input here and now, we are checking
            # if in_slices is not equal to slices from the last conv layer:
            # the shortcut is going to be convolved by a 1x1 filter with the same stride
            # and output channels as the one in the main path
        if stride != 1 or in_slices != self.expansion * slices:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels=in_slices,
                          out_channels=self.expansion*slices,     # self.expansion is a hyperparamter
                          kernel_size=1,                     # that allows us to expand the output depth of the res block
                          stride=stride,
                          bias=False)
            )
            
    # define the forward function
    def forward(self, x):
        # take the x and after batch norm, apply Relu to every single element of that
        out = F.relu(self.bn_1(x))     # perform ReLU on the first batch norm.
        # we define ReLU in the forward method because it does not have learnable parameters
        
        # reuse bn+relu in down-sampling layers
        # if the shortcut connection had to do convolution because the dimensions differed
        # between input/ouput dimension, we will reuse F.relu(self.bn_1(x))
        # we will perform ReLU non-linearity and batch normalization to the shortcut
        # if, however, x had equal dimensions as out, we will keep it untouched
        
        shortcut = self.shortcut(out) if hasattr(self, 'shortcut') else x
        
        # then we pass that to the first conv layer
        out = self.conv_1(out)
        out = F.relu(self.bn_2(out))
        out = self.conv_2(out)
        # add output of the second conv layer to the shortcut
        out += shortcut
        return out
    

In [4]:
# let's implement the residual network itself:
class PreActivationResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        """
        :param block: type of residual block (regular or bottleneck)
        :param num_blocks: a list with 4 int values.
             Each value reflects the number of residual blocks in the group
        :param num_classes: number of output classes
        """
        super(PreActivationResNet, self).__init__()


    
        self.in_slices=64
        
        # we define the first layer
        # in the original resnet as well, we first convolved it with 7x7 conv
        # with 64 output channels 
        self.conv_1 = nn.Conv2d(in_channels=3, out_channels=64,
                                kernel_size=3, stride=1, padding=1,
                                bias=False)
        # the network has 4 groups of residual blocks, just like the original
        # implementation
        self.layer_1 = self._make_group(block, 64, num_blocks[0], stride=1)
        self.layer_2 = self._make_group(block, 128, num_blocks[1], stride=1)
        self.layer_3 = self._make_group(block, 256, num_blocks[2], stride=1)
        self.layer_4 = self._make_group(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512 * block.expansion, num_classes)
        
    def _make_group(self, block, slices, num_blocks, stride):
        """
        create on residual group
        """
        strides = [stride] + [1] * (num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_slices, slices, stride))
            self.in_slices = slices * block.expansion

        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = self.conv_1(x)
        out = self.layer_1(out)
        out = self.layer_2(out)
        out = self.layer_3(out)
        out = self.layer_4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)

        return out
        

In [5]:
# resnet configuration with 34 layers:
def PreActivationResNet34():
    return PreActivationResNet(block=PreActivationBlock,
                               num_blocks = [3,4,6,3])

In [7]:
PreActivationResNet34().eval()

PreActivationResNet(
  (conv_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (layer_1): Sequential(
    (0): PreActivationBlock(
      (bn_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv_1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn_2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    )
    (1): PreActivationBlock(
      (bn_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv_1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn_2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    )
    (2): PreActivationBlock(
      (bn_1): 

In [8]:
def train_model(model, loss_function, optimizer, data_loader):
    """Train one Epoch"""
    
    # set training mode
    model.train()
    
    current_loss = 0.0
    current_acc = 0
    
    # iterate over the training data
    for i, (inputs, labels) in enumerate(data_loader):
        # forward
        inputs = inputs.to(device)
        labels = labels.to(device)
        
        # zero the parameter gradients
        optimizer.zero_grad()
        
        with torch.set_grad_enabled(True):
            # forward
            outputs = model(inputs)
            _, predictions = torch.max(outputs, 1)
            loss = loss_function(outputs, labels)
            
            # backward
            loss.backward()
            optimizer.step()
            
        # stats
        current_loss += loss.item() * inputs.size(0)
        current_acc += torch.sum(predictions == labels.data)
        
    total_loss = current_loss / len(data_loader.dataset)
    total_acc = current_acc.double() / len(data_loader.dataset)
    
    print('Train Loss: {.4f}; Accuracy: {.4f}'.format(total_loss, total_acc))
    

In [9]:
# define function to test a single epoch
def test_model(model, loss_function, data_loader):
    """Test for a single epoch"""
    
    # set model for eval mode:
    model.eval()
    
    current_loss = 0.0
    current_acc = 0
    
    # iterate over the validation data
    for i, (inputs, labels) in enumerate(data_loader):
        # send the input/labels to the GPU
        inputs = inputs.to(device)
        labels = labels.to(device)
        
        # forward
        with torch.set_grad_enabled(False):
            outputs = model(inputs)
            _, predictions = torch.max(outputs, 1)
            loss = loss_function(outputs, labels)
            
        current_loss += loss.item() * inputs.size(0)
        current_acc += torch.sum(predictions == labels.data)
        
    total_loss = current_loss / len(data_loader.dataset)
    total_acc = current_acc.double() / len(data_loader.dataset)
    
    print('Test Loss: {.4f}; Accuracy: {.4f}'.format(total_loss, total_acc))
    
    return total_loss, total_acc

### The network has been defined, now load the data:

In [11]:
# training data transformation
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4821, 0.4465), (0.2470, 0.2435, 0.2616))
])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True,
                                         download=True,
                                         transform=transform_train)
train_loader = torch.utils.data.DataLoader(dataset=train_set,
                                           batch_size=100,
                                           shuffle=True,
                                           num_workers=2)

# transform the test data - no randomcrop or RandomHorizontalFlip
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4821, 0.4465), (0.2470, 0.2435, 0.2616))

])
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                         download=True,
                                         transform=transform_test)
test_loader = torch.utils.data.DataLoader(dataset=testset,
                                          batch_size=100,
                                          shuffle=False,
                                          num_workers=2)


Files already downloaded and verified
Files already downloaded and verified


In [13]:
# load the model:
model = PreActivationResNet34()

# select gpu 0, if available
# otherwise, fallback to cpu
device = torch.device("cude:0" if torch.cuda.is_available() else "cpu")

# transfer the model to the GPU
model = model.to(device)

# loss function:
loss_function = nn.CrossEntropyLoss()

# we'll optimize all parameters:
optimizer = optim.Adam(model.parameters())

# let's train:

EPOCHS = 25

test_acc = []   # create an empty list:

for epoch in range(EPOCHS):
    print('Epoch {}/{}'.format(epoch+1, EPOCHS))
    
    train_model(model, loss_function, optimizer, train_loader)
    _, acc = test_model(model, loss_function, test_loader)
    test_acc.append(acc)
plot_accuracy(test_acc)

Epoch 1/25


RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:62] data. DefaultCPUAllocator: not enough memory: you tried to allocate %dGB. Buy new RAM!0


In [12]:
def plot_accuracy(accuracy: list):
    """Plot accuracy"""
    plt.figure()
    plt.plot(accuracy)
    plt.xticks(
        [i for i in range(0, len(accuracy))],
        [i + 1 for i in range(0, len(accuracy))])
    
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.show()

In [None]:
plt.accuracy(test_acc)