# Objectives :

* Introduction to Pytorch tensors
* Automatic differentiation and computational graphs 
* Introduction to nn.Module and Building the first neural network model in Pytorch


# **Introduction to Pytorch**


PyTorch is an open-source machine learning framework that enables developers to create and deploy deep learning models. Developed by Facebook AI Research, PyTorch has gained immense popularity due to its ease of use, flexibility, and dynamic computational graph feature. It offers a Python-based interface that allows users to build and train neural networks and perform various other machine learning tasks. PyTorch has a large and active community of contributors who constantly improve and expand its functionalities.

One of the key features of PyTorch is its *dynamic computational graph*, which enables users to define and modify their models on-the-fly during runtime. This is in contrast to other popular frameworks like TensorFlow, which require users to define a *static computational graph* before runtime. Additionally, PyTorch offers a variety of optimization techniques, including stochastic gradient descent, Adam, and RMSprop. PyTorch also has excellent support for GPUs, making it an ideal choice for deep learning tasks that require significant computational power. Overall, PyTorch is a powerful and intuitive framework that has become a popular choice for deep learning practitioners and researchers alike.

**Recall :** In the context of neural networks, a computational graph is a directed acyclic graph (DAG) that represents the mathematical operations performed by the network to transform the input data into output predictions. Each node in the graph represents a mathematical operation, and each edge represents the flow of data from one operation to another. The input data and the parameters of the neural network are also represented as nodes in the graph. The computational graph is used to compute the gradient of the loss function with respect to the parameters of the neural network during backpropagation, which is used to update the parameters to minimize the loss.

In PyTorch, computational graphs are constructed dynamically as the model is executed. This means that the graph is generated on-the-fly based on the input data and the parameters of the model, allowing for flexibility in the construction of complex neural network architectures. The computational graph in PyTorch is represented by a data structure called a "computational graph node", which contains information about the operation performed by the node, the inputs to the operation, and the outputs of the operation. The computational graph is an essential component of PyTorch's autograd package, which is used to automatically compute gradients and update the parameters of the model during training.

# Pytorch main packages: 

PyTorch consists of four main packages:
* torch: a general-purpose array library similar to NumPy that can do computations on GPU.
* torch.autograd: a package for automatically obtaining gradients.
* torch.nn: a neural net library with common layers and cost functions.
* torch.optim: an optimization package with common optimization algorithms like SGD, Adam, etc.

# **Installing Pytorch**

First, you need to install PyTorch. You can install PyTorch using pip by running the following command:

In [None]:
# You can run the code in this notebook without installing anything. However, if you want to run the code on your local machine, you need to install the required
# packaes :
# pip install torch

# **Creating Tensors in Pytorch**

Tensors are a fundamental data structure in PyTorch, used to represent data and computations. They are similar to arrays in other programming languages, but with some additional features that make them well-suited for working with deep learning models.

In PyTorch, a tensor is a multi-dimensional array of values, with a fixed size and data type. Tensors can have any number of dimensions, from zero to any positive integer. The size of each dimension can also vary, as long as it is consistent across all elements in the tensor.

Here's an example of how to create a tensor in PyTorch:

In [None]:
import torch

# create a 1-dimensional tensor of size 3
x = torch.tensor([1, 2, 3])

# create a 2-dimensional tensor of size 2x3
y = torch.tensor([[1, 2, 3], [4, 5, 6]])

# create a 3-dimensional tensor of size 2x2x2
z = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

In the above example, we have created three tensors with different dimensions using the torch.tensor() function. We can also specify the data type of the tensor using the dtype argument, which defaults to float32 if not specified.


A new tensor can be created from an existing one. So, if we wanted, we could create a new Tensor of zeros with the same properties (shape and data type) as the x we created


In [None]:
x = torch.zeros_like(x)
print(x) # tensor([0, 0, 0])

tensor([0, 0, 0])




Tensors in PyTorch can be manipulated using a variety of mathematical operations, such as addition, subtraction, multiplication, and division. Here's an example of how to perform arithmetic operations on tensors:

In [None]:
# create two tensors
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# perform addition
c = a + b

# perform multiplication
d = a * b

In [None]:
print("a + b",c)
print("a * b ",d) # notice this is element-wise multiplication

a + b tensor([5, 7, 9])
a * b  tensor([ 4, 10, 18])


 the above example, we have created two tensors and performed addition and multiplication operations on them. PyTorch also supports many other mathematical operations on tensors, such as matrix multiplication, dot product, and element-wise operations like sin, cos, tanh, etc.

PyTorch tensors can also be used for deep learning models, as they can store both input data and model parameters. Tensors can be passed as input to a neural network model, and the model will perform computations on them to generate the output.

Overall, tensors are a fundamental building block in PyTorch, and understanding their properties and operations is essential for working with deep learning models.




tensor([0, 0, 0])

#Shape of tensor
In PyTorch, the shape of a tensor refers to the number of dimensions and the size of each dimension. For example, a 2-dimensional tensor with a shape of (3, 4) has two dimensions, with the first dimension having a size of 3 and the second dimension having a size of 4.

The shape of a tensor can be inspected using the .shape attribute, which returns a tuple containing the size of each dimension. Here's an example:

In [None]:
import torch

# create a 2-dimensional tensor of size 3x4
x = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# get the shape of the tensor
print(x.shape)  # prints (3, 4)


torch.Size([3, 4])


In the above example, we create a 2-dimensional tensor x of size 3x4 and then print its shape using the .shape attribute.

# Additional properties

Additionally, we can also inspect other properties of a tensor, such as its data type, number of elements, and storage size. Here's an example:

In [None]:
import torch

# create a 1-dimensional tensor of size 5 with random values
x = torch.randn(5)

# get the data type of the tensor
print(x.dtype)  # prints torch.float32

# get the number of elements in the tensor
print(x.numel())  # prints 5

# get where the tensor is stored:

x.device
# device(type='cpu')

torch.float32
5


device(type='cpu')

# Tensor multiplication in PyTorch

##Dot Product

The dot product is a basic operation in linear algebra that takes two vectors and produces a scalar. In PyTorch, the dot product of two tensors can be computed using the torch.dot() function. However, this function only works for 1-dimensional tensors.

In [None]:
import torch

# create two 1-dimensional tensors of size 3
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])

# compute the dot product of the two tensors
z = torch.dot(x, y)

print(z)  # prints 32

tensor(32)


In the above example, we create two 1-dimensional tensors x and y of size 3 and compute their dot product using the torch.dot() function.

## Matrix multiplication

Matrix multiplication is a common operation in linear algebra and is used extensively in deep learning models. In PyTorch, matrix multiplication can be performed using the torch.mm() function or the torch.matmul() function.

In [None]:
import torch

# create two 2-dimensional tensors of size 2x3 and 3x4
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
y = torch.tensor([[7, 8, 9, 10], [11, 12, 13, 14], [15, 16, 17, 18]])

# perform matrix multiplication
z = torch.mm(x, y)

print(z)  # prints tensor([[ 74,  80,  86,  92], [173, 188, 203, 218]])

tensor([[ 74,  80,  86,  92],
        [173, 188, 203, 218]])


In the above example, we create two 2-dimensional tensors x of size 2x3 and y of size 3x4 and perform their matrix multiplication using the torch.mm() function. The resulting tensor z has a size of 2x4.

Note that the torch.matmul() function is more general than torch.mm(), as it can handle matrices with more than two dimensions and can automatically broadcast the input tensors if needed.

In [None]:
import torch

# create a 3-dimensional tensor of size 2x3x4 and a 2-dimensional tensor of size 4x5
x = torch.randn(2, 3, 4)
y = torch.randn(4, 5)

# perform matrix multiplication
z = torch.matmul(x, y)

print(z.shape)  # prints torch.Size([2, 3, 5])

torch.Size([2, 3, 5])


In this example, we create a 3-dimensional tensor x of size 2x3x4 and a 2-dimensional tensor y of size 4x5. The first dimension of x represents the batch size, the second dimension represents the number of rows, and the third dimension represents the number of columns. The y tensor represents a weight matrix of a neural network with 4 inputs and 5 outputs.

We perform matrix multiplication between x and y using the torch.matmul() function. Since x is a 3-dimensional tensor and y is a 2-dimensional tensor, PyTorch automatically broadcasts y along the first dimension of x to perform the matrix multiplication. The resulting tensor z has a size of 2x3x5, where the first dimension represents the batch size, the second dimension represents the number of rows, and the third dimension represents the number of outputs.

###Remark :
 In a neural network, the output of each layer is obtained by performing matrix multiplication between the input and the weight matrix, followed by the addition of a bias term and the application of an activation function. The torch.matmul() function is a fundamental building block for implementing this operation efficiently in PyTorch.

# Computional graphs in Pytorch

In PyTorch, computational graphs are built using the autograd package, which allows automatic differentiation for training neural networks. Here's an example of how to create a simple computational graph in PyTorch:



In [14]:
import torch
import torch.nn as nn

# Define the input data
x = torch.tensor(2.0, requires_grad=True)

# Define the computation graph
y = x ** 2 + 3 * x - 1

# Compute the gradients
y.backward()

# Access the gradients
grad_x = x.grad

# Explanation using math symbols
"""
Given:
x = 2.0

Computation Graph:
y = x^2 + 3x - 1

Expanding the terms:
y = x^2 + 3x - 1
  = 2.0^2 + 3(2.0) - 1
  = 4.0 + 6.0 - 1
  = 9.0

Gradients:
∂y/∂x = 2x + 3
∂y/∂x (at x = 2.0) = 2(2.0) + 3
                     = 4.0 + 3
                     = 7.0
"""

# Print the results
print("Input x:", x.item())
print("Output y:", y.item())
print("Gradient of x:", grad_x.item())

Input x: 2.0
Output y: 9.0
Gradient of x: 7.0


Now lets see a more complex example 🇰


In [15]:
import torch

# Define the input data
x1 = torch.tensor(2.0, requires_grad=True)
x2 = torch.tensor(3.0, requires_grad=True)

# Define the computation graph
y = x1 ** 2 + 3 * x2 - torch.sin(x1 * x2)

# Compute the gradients
y.backward()

# Access the gradients
grad_x1 = x1.grad # ∂y/∂x1 
grad_x2 = x2.grad # ∂y/∂x2

# Explanation using math symbols
"""
Given:
x1 = 2.0
x2 = 3.0

Computation Graph:
y = x1^2 + 3x2 - sin(x1 * x2)

Expanding the terms and applying sin element-wise:
y = x1^2 + 3x2 - sin(x1 * x2)
  = 2.0^2 + 3(3.0) - sin(2.0 * 3.0)
  = 4.0 + 9.0 - sin(6.0)
  ≈ 13.279415130615234

Gradients:
∂y/∂x1 = 2x1 - x2 * cos(x1 * x2)
∂y/∂x1 (at x1 = 2.0, x2 = 3.0) = 2(2.0) - 3.0 * cos(2.0 * 3.0)
                                = 4.0 - 3.0 * cos(6.0)
                                ≈  1.1194891929626465

∂y/∂x2 = 3 + x1 * cos(x1 * x2)
∂y/∂x2 (at x1 = 2.0, x2 = 3.0) = 3 + 2.0 * cos(2.0 * 3.0)
                                = 3 + 2.0 * cos(6.0)
                                ≈  1.0796594619750977
"""

# Print the results
print("Input x1:", x1.item())
print("Input x2:", x2.item())
print("Output y:", y.item())
print("Gradient of x1:", grad_x1.item())
print("Gradient of x2:", grad_x2.item())

Input x1: 2.0
Input x2: 3.0
Output y: 13.279415130615234
Gradient of x1: 1.1194891929626465
Gradient of x2: 1.0796594619750977


# Automatic differentiation 


Automatic differentiation is a technique used in computational graphs to compute the derivatives of functions or expressions with respect to their input variables. It allows for efficient and accurate calculation of gradients, which are crucial for training machine learning models.

In the context of neural networks, automatic differentiation is used to determine the gradients of the loss function with respect to the model's parameters. These gradients are then used in optimization algorithms, such as gradient descent, to update the model's parameters and minimize the loss.

There are two main types of automatic differentiation: forward-mode differentiation and backward-mode differentiation (also known as reverse-mode differentiation).

Forward-mode differentiation: In forward-mode differentiation, the derivatives are calculated by applying the chain rule in a forward manner. It computes the derivatives of each operation in the computational graph as the values flow forward from inputs to outputs. However, forward-mode differentiation becomes inefficient when the number of input variables is large compared to the number of output variables.

Backward-mode differentiation (reverse-mode differentiation): Backward-mode differentiation is the most commonly used method for automatic differentiation in deep learning frameworks like PyTorch and TensorFlow. It computes the derivatives by applying the chain rule in a backward manner. It starts from the final output and computes the gradients of intermediate variables with respect to the output variables by traversing the computational graph in a reverse direction. This method is efficient because it allows for the reuse of intermediate results and is especially useful when the number of output variables is large compared to the number of input variables.

PyTorch, as well as other deep learning frameworks, implements automatic differentiation by automatically constructing and evaluating the computational graph during the forward pass and then using backpropagation to compute the gradients efficiently during the backward pass. This makes it easy to compute gradients and perform gradient-based optimization for training neural networks.

# Introduction to nn.Module : building a complicated network

In [17]:
import torch
import torch.nn as nn


class MyNetwork(nn.Module):
    def __init__(self):
        super(MyNetwork, self).__init__()
        self.weight1 = nn.Parameter(torch.tensor(2.0))
        self.weight2 = nn.Parameter(torch.tensor(3.0))
        self.bias = nn.Parameter(torch.tensor(1.0))

    def forward(self, x):
        y = self.weight1 * x + self.weight2 * x + self.bias
        return y


# Create an instance of the network
model = MyNetwork()

# Define the input data
x = torch.tensor(2.0)

# Pass the input through the network
output = model(x)

# Print the results
print("Input x:", x.item())
print("Output:", output.item())


Input x: 2.0
Output: 11.0


1. We import the necessary modules, `torch` and `torch.nn` to work with PyTorch and its neural network components.

2. We define a custom neural network architecture `MyNetwork` that inherits from `nn.Module`. This allows us to utilize the functionality provided by `nn.Module` for managing and optimizing the network.

3. Inside the `MyNetwork` class, the `__init__` method is called when an instance of the class is created. We use `super()` to initialize the parent class, `nn.Module`. 

4. We define three learnable parameters, `weight1`, `weight2`, and `bias`, using `nn.Parameter`. These parameters will be automatically registered as tensors that require gradients and will be optimized during training.

5. The `forward` method defines the forward pass of the network. It takes an input `x`, and performs a linear operation by multiplying `weight1` with `x`, `weight2` with `x`, and adding the `bias`. The result is returned as the output `y`.

6. We create an instance of the network by calling `MyNetwork()`, which initializes the network with the defined parameters.

7. We define an input tensor `x` with a value of `2.0`.

8. Finally, we pass the input through the network by calling `model(x)` and store the output in the `output` variable.

9. We print the input `x` and the output `output` to display the results.


# Creating first model :

import the following, we talk about each line as we need later.


In [3]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
from torchvision.datasets import CIFAR10
from torchvision.transforms import ToTensor
from torchvision.transforms import Normalize, Compose
import os
import matplotlib.pyplot as plt
import numpy as np


# CIFAR-10 dataset

This dataset consists of 60,000 32x32 color images, all labeled as one of 10 classes. The training set is 50,000 images, while the test set is 10,000.


Source: https://www.cs.toronto.edu/~kriz/cifar.html




After importing CIFAR10 from torchvision, our next step is to download the dataset and prepare it for loading into the neural network.

To ensure that the images are normalized before being fed to the model, we define a transformation function and use **torchvision.transforms.Normalize** to normalize all of the images in the training and test datasets. This method requires the desired mean and standard deviation as arguments. Since CIFAR10 images are in color, we need to provide a value for each color channel (R, G, B). In this case, we set the values to 0.5, as we want the image data values to be close to 0. However, there are other, more precise approaches to normalization that could also be used.

In [5]:
transform = Compose(
    [ToTensor(),
     Normalize((0.5, 0.5, 0.5),  # mean
               (0.5, 0.5, 0.5))] # std. deviation
)

# Download the dataset

In [6]:

training_data = CIFAR10(root="cifar",
                        train = True, # train set, 50k images
                        download = True,
                        transform=transform)
test_data = CIFAR10(root = "cifar",
                    train = False, # test set, 10k images
                    download = True,
                    transform = transform)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to cifar/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:05<00:00, 28862648.14it/s]


Extracting cifar/cifar-10-python.tar.gz to cifar
Files already downloaded and verified


# Preparing data to feed it to the network:



With our dataset downloaded and normalized, we can now prepare it to be fed to the neural network. To do this, we utilize the PyTorch DataLoader, which allows us to specify the batch size for the data.

In [7]:
batch_size = 2
train_dataloader = DataLoader(training_data, 
                              batch_size=batch_size, 
                              shuffle=True)
test_dataloader = DataLoader(test_data, 
                             batch_size=batch_size, 
                             shuffle=True)


The **DataLoader** is an iterable, so to explore it further, we can examine the dimensions of one iteration by looking at train_dataloader.

In [None]:
for X, y in train_dataloader:
  print(f"Shape of X [N, C, H, W]: {X.shape}")
  print(f"Shape of y: {y.shape} {y.dtype}")
  break

Shape of X [N, C, H, W]: torch.Size([2, 3, 32, 32])
Shape of y: torch.Size([2]) torch.int64


# Defining the model :
To begin building our neural network, we first define our model class, which we'll call NeuralNetwork. This model will be a subclass of PyTorch's nn.Module, which is the base class for all neural network modules in PyTorch.

Since our dataset contains color images, each image has a shape of (3, 32, 32), representing a 32x32 tensor in each of the 3 RGB color channels. Since our initial model will consist of fully-connected layers, we need to flatten the input image data using nn.Flatten(). This will output a linear layer with 3072 (32 x 32 x 3) nodes. We then use nn.Linear() to create additional linear layers and ReLU layers, if desired. The output of our model will be 10 logits, corresponding to the 10 classes in our dataset.

Once we've defined the structure of the model, we define the sequence of the forward pass. Since our model is a simple sequential model, our forward method will be straightforward and compute an output tensor from input tensors.

Optionally, we can print the model once it's defined to get a summary of its structure.

In [8]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(32*32*3, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
        
model = NeuralNetwork()

print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=3072, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


# Optimizer 
In PyTorch, you can use CrossEntropyLoss() as your loss function, which is suitable for many machine learning tasks. However, for other tasks, you may want to use different loss functions that are more appropriate. To optimize our model, we will use stochastic gradient descent, which is available in the torch.optim package, along with other optimizers such as Adam and RMSprop. We simply need to pass the model parameters and the learning rate lr to the SGD() optimizer. If you want to apply momentum or weight decay in your model optimization, you can specify those using the momentum and weight_decay parameters in the SGD() optimizer (both of which default to 0).

In [9]:

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD( model.parameters(), lr=0.001 ) # momentum=0.9

# Training Loop :
Let's discuss the definition of the training loop. In this step, we will create our train() function, which will accept train_dataloader, model, loss_fn, and optimizer as arguments during the training process. The size variable is the length of the entire training dataset, which is 50,000. After that, we set the model to training mode by calling model.train(), which is a PyTorch nn.Module method. This enables certain behaviors that are desirable during training, such as dropout and batch normalization. In contrast, if we want to test our model's performance, we would call model.eval() instead.

Next, we iterate through each mini-batch and specify that we want to utilize the GPU with to(device). We feed the mini-batch to our model, compute the loss, and then backpropagate. Before starting backpropagation, we need to run optimizer.zero_grad(), which sets the gradient to zero. This step is necessary to ensure that we do not accumulate the gradient over subsequent passes, which can be a desired behavior in some cases, such as RNNs where gradient accumulation is necessary.

After the loss.backward() step, which uses the loss to compute the gradient, we use optimizer.step() to update the weights. Finally, we can print updates to the training process, outputting the computed loss after every 2000 training samples.

In [10]:
def train(dataloader, model, loss_fn, optimizer):
    # Get the size of the training dataset
    size = len(dataloader.dataset)
    
    # Set the model to training mode
    model.train()
    
    # Iterate through each mini-batch of the training data
    for batch, (X, y) in enumerate(dataloader):
        
        # Compute the model predictions
        pred = model(X)
        
        # Compute the loss between the predictions and the true labels
        loss = loss_fn(pred, y)
        
        # Reset the gradients
        optimizer.zero_grad()
        
        # Compute the gradients using backpropagation
        loss.backward()
        
        # Update the model parameters using the optimizer
        optimizer.step()
        
        # Print the current loss and the progress of the training process every 2000 training samples
        if batch % 2000 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [11]:
def test(dataloader, model, loss_fn):
    # Get the size of the dataset and the number of batches
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    # Set the model to evaluation mode
    model.eval()
    # Initialize variables for tracking test loss and number of correct predictions
    test_loss, correct = 0, 0
    # Use torch.no_grad() to disable gradient tracking during testing
    with torch.no_grad():
        # Iterate through each mini-batch in the data loader
        for X, y in dataloader:
            # Make predictions using the trained model
            pred = model(X)
            # Compute the test loss using the specified loss function
            test_loss += loss_fn(pred, y).item()
            # Count the number of correct predictions by comparing predicted labels to ground truth labels
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    # Compute the average test loss and accuracy across all mini-batches
    test_loss /= num_batches
    correct /= size
    # Print out the test error, accuracy, and average test loss
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [12]:
# we just train for 10 epochs
epochs = 2
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.285184  [    0/50000]
loss: 1.793684  [ 4000/50000]
loss: 1.726945  [ 8000/50000]
loss: 1.239966  [12000/50000]
loss: 1.765733  [16000/50000]
loss: 2.099144  [20000/50000]
loss: 1.573969  [24000/50000]
loss: 1.326612  [28000/50000]
loss: 0.936158  [32000/50000]
loss: 1.644593  [36000/50000]
loss: 0.942943  [40000/50000]
loss: 1.550724  [44000/50000]
loss: 2.535042  [48000/50000]
Test Error: 
 Accuracy: 44.0%, Avg loss: 1.600083 

Epoch 2
-------------------------------
loss: 1.283306  [    0/50000]
loss: 2.011713  [ 4000/50000]
loss: 1.411191  [ 8000/50000]
loss: 1.717604  [12000/50000]
loss: 0.114138  [16000/50000]
loss: 1.911854  [20000/50000]
loss: 0.846423  [24000/50000]
loss: 1.322043  [28000/50000]
loss: 2.667917  [32000/50000]
loss: 1.373672  [36000/50000]
loss: 2.000410  [40000/50000]
loss: 1.014187  [44000/50000]
loss: 0.506572  [48000/50000]
Test Error: 
 Accuracy: 48.4%, Avg loss: 1.459302 

Done!


Exercise : try to improve the accuracy of the above model. What would you change?


# Save the model 

In [None]:
torch.save(model.state_dict(), "cifar_fc.pth")


# Load the model 
When you want to load your model for inference, use torch.load() to grab your saved model, and map the learned parameters with load_state_dict.

In [None]:

model = NeuralNetwork()
model.load_state_dict(torch.load("cifar_fc.pth"))



# Evaluating the Model


In [16]:
correct = 0
total = 0

with torch.no_grad():
   for data in test_dataloader:
     images, labels = data
     outputs = model(images)
     _, predicted = torch.max(outputs.data, 1)
     total += labels.size(0)
     correct += (predicted == labels).sum().item()
     
print(f'Model accuracy: {100 * correct // total} %')

Model accuracy: 48 %


Excercise : Check how the model performs on individual classes 