# Residual Networks- ResNet


![alt text](r.png "Title")




- Adding additional / new layers would not hurt the model’s performance as regularisation will skip over them if those layers were not useful.


- If the additional / new layers were useful, even with the presence of regularisation, the weights or kernels of the layers will be non-zero and model performance could increase slightly.


- Therefore, by adding new layers, because of the “Skip connection” / “residual connection”, it is guaranteed that performance of the model does not decrease but it could increase slightly.

- By stacking these ResNet blocks on top of each other, you can form a very deep network. Having ResNet blocks with the shortcut also makes it very easy for one of the blocks to learn an identity function.



Two main types of blocks are used in a ResNet:


1.The identity block — same as the one we saw above. The identity block is the standard block used in ResNets and corresponds to the case where the input activation has the same dimension as the output activation.




2.The Convolutional block — We can use this type of block when the input and output dimensions don’t match up. The difference with the identity block is that there is a CONV2D layer in the shortcut path.

![alt text](rr1.png "Title")

In [2]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

In [4]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
num_epochs = 25
batch_size = 100
learning_rate = 0.001

# Image preprocessing modules
transform = transforms.Compose([
    transforms.Pad(4),
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32),
    transforms.ToTensor()])


In [5]:
# CIFAR-10 dataset
train_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                             train=True, 
                                             transform=transform,
                                             download=True)

test_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                            train=False, 
                                            transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

Files already downloaded and verified


![resnetimage](https://user-images.githubusercontent.com/30661597/78585170-f4ac7c80-786b-11ea-8b00-8b751c65f5ca.PNG)

## Residual Block Class

In [6]:
def conv3x3(in_channels, out_channels, stride=1):
    return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = conv3x3(in_channels, out_channels, stride)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(out_channels, out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample
        
    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        if self.downsample:
            residual = self.downsample(x)
        out += residual
        out = self.relu(out)
        return out

In [4]:
# we have 3 layers.see colors in dia. each has different no of channels . each layer has 2 residual blocks
# function to make layers : make_layer . takes block,channels, layers(no of residual block in each layer), stride as arg
# layers will be a list [2,2,2] since each layer has 2 residual blocks
# layer[0] - first layer
# condition for downsampling:
# 1) when i/p size not equal to o/p size
# 2) when the size of feature map of i /p not equal to size of feature map of o /p - this can be found using stride
# when stride is not 1, then say 2, then the i/p size has been downsampled

## ResNet Class

In [7]:
class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        super(ResNet, self).__init__()
        self.in_channels = 16
        self.conv = conv3x3(3, 16)
        self.bn = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self.make_layer(block, 16, layers[0])  
        self.layer2 = self.make_layer(block, 32, layers[1], 2)
        self.layer3 = self.make_layer(block, 64, layers[2], 2)
        self.avg_pool = nn.AvgPool2d(8) # size of feature map is 8
        self.fc = nn.Linear(64, num_classes) # 64 channels at final layer that willbe i/p to final layer
        
    def make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if (stride != 1) or (self.in_channels != out_channels):
            downsample = nn.Sequential(conv3x3(self.in_channels, out_channels, stride=stride),
                                       nn.BatchNorm2d(out_channels))
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels # i/p channel changes for next layer.prev layer o/p becomes i/p to next
        for i in range(1, blocks):
            layers.append(block(out_channels, out_channels)) # append 2nd residual block
        return nn.Sequential(*layers)  # *before list of layers
    
    def forward(self, x):
        out = self.conv(x)
        out = self.bn(out)
        out = self.relu(out)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.avg_pool(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

## Create object of ResNet class and loss and activation function for it

In [8]:
model = ResNet(ResidualBlock, [2, 2, 2]).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

## Training the Model

In [9]:
decay = 0
model.train()
for epoch in range(num_epochs):
    
    # Decay the learning rate every 20 epochs
    if (epoch+1) % 20 == 0:
        decay+=1
        optimizer.param_groups[0]['lr'] = learning_rate * (0.5**decay)
        print("The new learning rate is {}".format(optimizer.param_groups[0]['lr']))
        
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ("Epoch [{}/{}], Step [{}/{}] Loss: {:.4f}"
                   .format(epoch+1, num_epochs, i+1, len(train_loader), loss.item()))

Epoch [1/25], Step [100/500] Loss: 1.6557
Epoch [1/25], Step [200/500] Loss: 1.2898
Epoch [1/25], Step [300/500] Loss: 1.3153
Epoch [1/25], Step [400/500] Loss: 1.2694
Epoch [1/25], Step [500/500] Loss: 1.1604
Epoch [2/25], Step [100/500] Loss: 1.1571
Epoch [2/25], Step [200/500] Loss: 1.1095
Epoch [2/25], Step [300/500] Loss: 0.9807
Epoch [2/25], Step [400/500] Loss: 1.0693
Epoch [2/25], Step [500/500] Loss: 0.8838
Epoch [3/25], Step [100/500] Loss: 0.9405
Epoch [3/25], Step [200/500] Loss: 0.9049
Epoch [3/25], Step [300/500] Loss: 0.8155
Epoch [3/25], Step [400/500] Loss: 0.8773
Epoch [3/25], Step [500/500] Loss: 0.6850
Epoch [4/25], Step [100/500] Loss: 0.7337
Epoch [4/25], Step [200/500] Loss: 0.6807
Epoch [4/25], Step [300/500] Loss: 0.9716
Epoch [4/25], Step [400/500] Loss: 0.9532
Epoch [4/25], Step [500/500] Loss: 0.8800
Epoch [5/25], Step [100/500] Loss: 0.6968
Epoch [5/25], Step [200/500] Loss: 0.7616
Epoch [5/25], Step [300/500] Loss: 0.7219
Epoch [5/25], Step [400/500] Loss:

In [None]:
## Testing the Model

In [10]:
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the model on the test images: {} %'.format(100 * correct / total))

Accuracy of the model on the test images: 86.8 %
