In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent

We create a random data tensor to represent a single image with 3 channels, and height & width of 64, and its corresponding label initialized to some random values. Label in pretrained models has shape (1,1000).
3 channels is RGB. Labels means 1000 possible classes for the output.

In [2]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(100, 3, 64, 64)
labels = torch.rand(100, 1000)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

In [3]:
type(ResNet18_Weights.DEFAULT)

<enum 'ResNet18_Weights'>

In [4]:
# this torch.rand(1, 3, 64, 64) would mean data[0][0-2][0-63][0-63] representing one image with RGB values (3 channels) of 64 x 64 matrix
data[0][0]

tensor([[0.0654, 0.7688, 0.5981,  ..., 0.2081, 0.0020, 0.8625],
        [0.2944, 0.9093, 0.6864,  ..., 0.1150, 0.7419, 0.1909],
        [0.2712, 0.0591, 0.3737,  ..., 0.9167, 0.4531, 0.8166],
        ...,
        [0.6636, 0.7274, 0.8041,  ..., 0.4303, 0.1411, 0.6717],
        [0.7548, 0.7448, 0.5770,  ..., 0.7100, 0.4344, 0.7886],
        [0.5408, 0.0873, 0.0228,  ..., 0.5652, 0.8728, 0.7646]])

In [5]:
data.shape

torch.Size([100, 3, 64, 64])

In [6]:
labels.shape

torch.Size([100, 1000])

 we run the input data through the model through each of its layers to make a prediction. This is the forward pass.

In [7]:
prediction = model(data) # forward pass

In [8]:
prediction.shape # how did prediction become automatically like the labels shape?

torch.Size([100, 1000])

answer: The ResNet18 model is trained to classify images into one of 1000 categories because it was trained on the ImageNet dataset, which consists of over 1 million images from 1000 different classes. 

The ImageNet dataset is a popular benchmark for image classification models, and the task of classifying images into one of 1000 categories is a challenging and complex problem. By training the ResNet18 model on this dataset, the model has learned to extract useful features from images and use them to accurately predict the correct class label for a given image.

The ResNet18 architecture is a deep neural network that uses a series of convolutional layers to extract features from images, followed by several fully connected layers that perform the classification task. The final layer of the network has 1000 output nodes, each corresponding to one of the 1000 classes in the ImageNet dataset. During training, the weights of the network are adjusted using backpropagation to minimize the difference between the predicted class probabilities and the true class labels for the training data.

By training on the ImageNet dataset, the ResNet18 model has become a widely used pre-trained model for image classification tasks, as the learned features can be fine-tuned for other classification tasks with smaller datasets.

In [9]:
loss = (prediction - labels).sum()

In [10]:
loss

tensor(-49970., grad_fn=<SumBackward0>)

n the given line of code, loss represents the value of the loss function for a given batch of input data and labels. loss.backward() computes the gradients of the loss function with respect to the parameters of the network, which are stored in the model.parameters() object. These gradients can then be used to update the parameters of the network during the optimization step.

In [11]:
loss.backward() # backward pass, this will update model.parameters()

we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. We register all the parameters of the model in the optimizer.

In the given line of code, torch.optim.SGD is used to create an instance of the stochastic gradient descent (SGD) optimizer, which is a popular optimization algorithm used for training neural networks. The optimizer takes as input the parameters of the model (i.e., the weights and biases of the ResNet18 architecture), and sets the learning rate (lr) to 1e-2 and the momentum to 0.9.

The lr argument sets the step size at each iteration of the optimization algorithm. A larger learning rate can result in faster convergence during training, but can also cause the optimization process to overshoot and potentially fail to converge. A smaller learning rate can result in slower convergence, but can be more stable.

The momentum argument sets the momentum term for the optimization algorithm. Momentum is used to accelerate the optimization process by adding a fraction of the previous update to the current update. This can help the optimizer to overcome local minima and speed up convergence.

The optimizer object is then used in the training loop to update the parameters of the model based on the gradients computed during the backpropagation step. The optimizer.step() method updates the parameters based on the gradients, and the optimizer.zero_grad() method resets the gradients to zero for the next iteration. The goal of the optimization process is to minimize the loss function, which measures the difference between the predicted output of the model and the ground truth label for a given input.

In [12]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

In [13]:
optim.step() # gradient descent

In [14]:
model.parameters()

<generator object Module.parameters at 0x6ffcb1985550>