<a href="https://colab.research.google.com/github/Renan-Domingues/PyTorchRecipes/blob/main/ZeroingGradients.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Zeroing out Gradients in Pytorch

By defeult gradients are acumulate, not overwritten.
So it's beneficial to zero them when buiding a neural network

# Introduction

Gradient descent is the process of minimizing our loss.

In the training loop we have to zero the gradient so it dosen't acumulate in every epoch

# Steps

1. Import all necessary libraries for loading our data
2. Load and normalize the dataset
3. Build the neural network
4. Define the loss function
5. Zero the gradients while training the network

### 1. Import necessary libraries for loading our data

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

### 2. Load and normalize the dataset

In [2]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)

trainset = torchvision.datasets.CIFAR10(root='./data',
                                        train=True,
                                        download=True,
                                        transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data',
                                       train=False,
                                       download=True,
                                       transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:01<00:00, 106715309.87it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


### 3. Build the neural network

In [17]:
class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.conv1 = nn.Conv2d(3, 6, 5)
    self.pool = nn.MaxPool2d(2, 2)
    self.conv2 = nn.Conv2d(6, 16, 5)
    self.fc1 = nn.Linear(16 * 5 * 5, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

  def forward(self, x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    x = x.view(-1, 16 * 5 * 5)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x


### 4. Define a Loss function and optimizer

In [18]:
net = Net()

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

### 5. Zero the gradients while training the network

We have to loop over our data iterator, and feed the inputs to the network and optimize.

We zero out the gradients for each entity data. So we don't track any unnecessary information when train the neural network

In [19]:
for epoch in range(2): # Loop over the dataset multiple times
  running_loss= 0.0
  for i, data in enumerate(trainloader, 0):
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data

    #zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()

    # print statistics
    running_loss += loss.item()
    if i % 2000 == 1999:
      print('[%d, %5d] loss: %.3f' %
            (epoch + 1, i + 1, running_loss / 2000))
      running_loss = 0.0

print('Finished Training')

[1,  2000] loss: 2.165
[1,  4000] loss: 1.829
[1,  6000] loss: 1.630
[1,  8000] loss: 1.551
[1, 10000] loss: 1.507
[1, 12000] loss: 1.446
[2,  2000] loss: 1.393
[2,  4000] loss: 1.360
[2,  6000] loss: 1.330
[2,  8000] loss: 1.333
[2, 10000] loss: 1.307
[2, 12000] loss: 1.288
Finished Training


You can also use model.zero_grad(). This is the same as using optimizer.zero_grad() as long as all your model parameters are in that optimizer.