During training, we do a forward pass. Once the output is obtained, we compare the predicted output to the actual labels, and once we know how close the predicted values are from the actual labels, we teak the weights inside the network in such a way that the values the network predicts move closer to the true values (labels).

All of this is for a single batch, and we repeat this process for every batch until we have covered every sample in our training set. After we've completed this process for all of the batches and passed over every sample in our training set, we say that an epoch is complete. 

We use the word epoch to represent a time period in which our entire training set has been covered.

During the entire training process, we do as many epochs as necessary to reach our desired level of accuracy. With this, we have the following steps:

1. Get batch from the training set.
2. Pass batch to network.
3. Calculate the loss (difference between the predicted values and the true values).
4. Calculate the gradient of the loss function w.r.t the network's weights.
5. Update the weights using the gradients to reduce the loss.
6. Repeat steps 1-5 until one epoch is completed.
7. Repeat steps 1-6 for as many epochs required to reach the minimum loss.

we use a loss function to perform step 3. backpropagation and an optimization algorithm to perform step 4 and 5.

In [1]:
import torch 
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [2]:
train_set = torchvision.datasets.FashionMNIST(
    root='./data'
    ,train=True
    ,download=True
    ,transform=transforms.Compose([
        transforms.ToTensor()
    ])
)
train_loader = torch.utils.data.DataLoader(train_set
    ,batch_size=10
    ,shuffle=True
)

In [3]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)#linear, dense, and fully connected layer all are same
        self.out = nn.Linear(in_features=60, out_features=10)
    def forward(self, t):
    # (1) input layer
        t = t

        # (2) hidden conv layer
        t = self.conv1(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (3) hidden conv layer
        t = self.conv2(t)
        t = F.relu(t)
        t = F.max_pool2d(t, kernel_size=2, stride=2)

        # (4) hidden linear layer
        t = t.reshape(-1, 12 * 4 * 4)
        t = self.fc1(t)
        t = F.relu(t)

        # (5) hidden linear layer
        t = self.fc2(t)
        t = F.relu(t)

        # (6) output layer
        t = self.out(t)
        #t = F.softmax(t, dim=1)

        return t


In [4]:
network = Network()

In [5]:
batch = next(iter(train_loader)) # Getting a batch
images, labels = batch

In [6]:
preds = network(images)

## Calculating the loss

In [7]:
# Upto this we have already seen what's going on . Lets get to step 3 
loss = F.cross_entropy(preds, labels) # Calculating the loss

In [8]:
loss.item()
#this is our loss value which we want to minimize 
# The cross_entropy() function returned a scalar valued tenor, and 
#so we used the item() method to print the loss as a Python number.

2.321331024169922

In [9]:
#a function to just check number of correct predictions
def get_num_correct(preds, labels):
    return preds.argmax(dim=1).eq(labels).sum().item()

In [11]:
get_num_correct(preds, labels)
#here we got only 1 value right .

1

## Calculating the Gradients
Calculating the gradients is very easy using PyTorch. Since our network is a PyTorch nn.Module, PyTorch has created a computation graph under the hood. As our tensor flowed forward through our network, all of the computations where added to the graph. The computation graph is then used by PyTorch to calculate the gradients of the loss function with respect to the network's weights

In [13]:
print(network.conv1.weight.grad)
#None shows that no gradients inside our conv1 layer. 

None


In [14]:
loss.backward() # Calculating the gradients

#these gradients are used by the optimizer to update the respective weights

In [16]:
network.conv1.weight.grad.shape


torch.Size([6, 1, 5, 5])

## Updating the Weights
 To create our optimizer, we use the torch.optim package that has many optimization algorithm implementations that we can use. We'll use Adam for our example.
 
 To the Adam class constructor, we pass the network parameters (this is how the optimizer is able to access the gradients), and we pass the learning rate .

In [17]:
optimizer = optim.Adam(network.parameters(), lr=0.01)
optimizer.step() # Updating the weights

When the step() function is called, the optimizer updates the weights using the gradients that are stored in the network's parameters.

This means that we should expect our loss to be reduced if we pass the same batch through the network again

In [18]:
preds = network(images)
loss.item()

2.321331024169922

In [19]:
loss = F.cross_entropy(preds, labels)

In [20]:
loss.item()
# the loss is reduced 

2.1843767166137695

In [22]:
get_num_correct(preds, labels)
#we got 3 right prediction. But this isn't always the case .Sometime at initial steps the correct prediction number 
#goes down . Only thing we should care about is that the loss is reducing or not 

3

### Train Using a Single Batch
We can summarize the code for training with a single batch in the following way:

In [23]:
network = Network()

train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)
optimizer = optim.Adam(network.parameters(), lr=0.01)

batch = next(iter(train_loader)) # Get Batch
images, labels = batch

preds = network(images) # Pass Batch
loss = F.cross_entropy(preds, labels) # Calculate Loss

loss.backward() # Calculate Gradients
optimizer.step() # Update Weights

print('loss1:', loss.item())
preds = network(images)
loss = F.cross_entropy(preds, labels)
print('loss2:', loss.item())

loss1: 2.3059682846069336
loss2: 2.283470392227173
