TensorFlow and PyTorch are two of the most popular deep learning frameworks,

•	TensorFlow: Used in industry (Google, NVIDIA, Uber).
•	PyTorch: More popular in research and academia (Facebook AI, OpenAI, Hugging Face).


•	Use PyTorch if you want a more flexible, Pythonic, and research-friendly framework.
•	Use TensorFlow if you need a production-ready framework with strong deployment capabilities.



torchvision.datasets.MNIST is a built-in dataset loader in PyTorch that provides access to the MNIST dataset, which consists of 70,000 grayscale images of handwritten digits (0-9), each of size 28×28 pixels.

#Torchvision is a PyTorch package that provides datasets, model architectures, and image transformations for computer vision tasks.
#Torchvision includes popular datasets like MNIST, CIFAR10, ImageNet, and COCO.

In [None]:
import torch
import torchvision
from torch.utils.data import Dataset,DataLoader
import torch.nn as nn

1. **`import torch`**: This imports the core PyTorch library, which provides the fundamental building blocks for neural networks, including tensor operations, automatic differentiation, and neural network modules.

2. **`import torchvision`**: This imports the `torchvision` library, which is an extension of PyTorch that provides datasets, model architectures, and image transformations commonly used in computer vision tasks.  It often simplifies loading common datasets and pre-trained models.

3. **`from torch.utils.data import Dataset, DataLoader`**: This imports the `Dataset` and `DataLoader` classes from `torch.utils.data`.
    - `Dataset` is an abstract class representing a dataset. You typically create a custom subclass of `Dataset` to load and process your specific data.
    - `DataLoader` is a utility for efficiently loading data in batches during training.  It handles shuffling, batching, and parallel loading of data.

4. **`import torch.nn as nn`**: This imports the `torch.nn` module, which provides building blocks for constructing neural networks.  It includes layers (e.g., linear, convolutional, recurrent), activation functions, loss functions, and other components necessary for defining the architecture of a neural network.


In [None]:
train_set=torchvision.datasets.MNIST(root='/data',
                                     train=True,
                                     download=True,
                                     transform=torchvision.transforms.ToTensor()) #images in dataset are in array so converting it into tensor
test_set=torchvision.datasets.MNIST(root='/data',
                                     train=False,
                                     download=True,
                                    transform=torchvision.transforms.ToTensor())

#backpropagation will work only if ur data is is in tensors

100%|██████████| 9.91M/9.91M [00:00<00:00, 58.6MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 40.1MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 12.9MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 8.60MB/s]


train_set: Loads the training set of the MNIST dataset.

root='/data': Specifies the directory where the dataset will be stored.  If the data is not present, it will be downloaded to this location.  You might need to adjust this path if your data is in another directory.

train=True:  Indicates that this is the training dataset.

 download=True: Automatically downloads the dataset if it's not already present at the specified root directory.

 transform=torchvision.transforms.ToTensor(): Applies a transformation to each image in the dataset. torchvision.transforms.ToTensor() converts the images (which are typically PIL images) into PyTorch tensors, making them suitable for use with PyTorch models.  The tensors are scaled to have values in the range [0, 1].


In [None]:
#Checking for GPU
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') #the device to use for computations (GPU if available, otherwise CPU).
device

device(type='cuda')

In [None]:
train_loader=torch.utils.data.DataLoader(train_set,batch_size=32,num_workers=4,shuffle=True)
test_loader=torch.utils.data.DataLoader(test_set,batch_size=32,num_workers=4,shuffle=True)
# `num_workers=4`: Specifies that 4 worker processes will be used to load data in parallel. This can significantly speed up data loading, especially when dealing with large datasets.
# in test dataset, shuffling is not necessary. in train, its very important to shuffle



#Method 1

In [None]:
#just like we used to do in Keras
#here we dont have a Dense layer
#nn.Flatten(): flattens it into a single vector. For example, if your input is a 28x28 image, the nn.Flatten() layer will convert it into a vector of size 784 (28 * 28). This is necessary because fully connected layers (like nn.Linear) expect their input to be a single vector.


#nn.Linear(784, 512): This is a fully connected (or dense) layer. It takes the 784-dimensional input vector from the previous layer and applies a linear transformation to produce a 512-dimensional output vector. It has 784 input features and 512 output features. This layer learns a set of weights and biases to perform this transformation.


model=nn.Sequential(nn.Flatten(),
    nn.Linear(784,512),
    nn.ReLU(),
    nn.Linear(512,256),
    nn.ReLU(),
    nn.Linear(256,64),
    nn.ReLU(),
    nn.Linear(64,10)  #this is the output layer
)
model=torch.compile(model).to(device) # Compiles the model using torch.compile for potential performance gains (e.g. using TorchDynamo or AOTAutograd) and moves it to the specified device (GPU if available, otherwise CPU).
#it just initializes weights and biases

sequential and subclasses ... no functional API in pytorch

cross entropy logits =True so no activation function is required in the output layer

wintialization of weights and biases (compile)

In [None]:
#here we dont have model.compile, model.fit. Here we need to do custom training.we have to write it ourselves
optimizer=torch.optim.Adam(model.parameters(),lr=3e-4)
loss_fn=nn.CrossEntropyLoss()   #by default its sparse categorical

`model.parameters()`**: This crucial part passes all the trainable parameters (weights and biases) of your neural network model (`model`) to the optimizer.  The optimizer needs to know which values to adjust during training.  `model.parameters()` returns an iterator over all the tensors that have `requires_grad=True` in your model.


In [None]:
# prompt: what is loss_fn=nn.CrossEntropyLoss()

# The loss function `nn.CrossEntropyLoss()` is used for multi-class classification.
# It combines a softmax layer (to produce class probabilities) and a negative log-likelihood loss.
# This is appropriate when your model's output is a vector of raw scores (logits) for each class.
# The softmax function converts these logits into probabilities, and the negative log-likelihood loss then measures the difference between the predicted probabilities and the true labels.



In [None]:
# If you needed binary cross-entropy, you'd use:
# loss_fn = nn.BCELoss()  # For probabilities (output should be between 0 and 1)
# or
# loss_fn = nn.BCEWithLogitsLoss()  # For logits (raw output from the last layer, no sigmoid applied)


In [None]:
#No builtin function for accuracy over here. we have to write function for batch accuracy on our own. here accuracy is computed in batches
def batch_accuracy(output, y, N):
    pred = torch.argmax(output,dim=1)
    correct = (pred==y).sum().item()    # `.item()` extracts the value from the resulting tensor as a Python number
    return correct / N

This function is designed to calculate the accuracy of your model's predictions on a batch of data. It's a common practice in machine learning to evaluate model performance in batches for efficiency. Here's a step-by-step explanation:

**def batch_accuracy(output, y, N)::** This line defines the function named batch_accuracy and specifies its inputs:

**output:** This is the output of your model for a given batch of data. It likely contains predicted probabilities or scores for each class.

**y:** This represents the true labels (or ground truth) for the corresponding batch of data.

**N:** This is the total number of samples in the batch.

**pred =** torch.argmax(output, dim=1): This line calculates the predicted class labels for the batch:

**torch.argmax:** This function finds the index of the maximum value along a specified dimension (dim=1 in this case, which corresponds to the class dimension). This effectively selects the class with the highest predicted probability or score for each sample in the batch.
pred: This variable now stores the predicted class labels for the batch.
correct = (pred == y).sum().item(): This line calculates the number of correct predictions in the batch:

(pred == y): This comparison creates a Boolean tensor where True indicates a correct prediction and False indicates an incorrect prediction for each sample in the batch.
.sum(): This sums up all the True values in the Boolean tensor, effectively counting the number of correct predictions.
.item(): This extracts the single scalar value from the resulting tensor (the sum of correct predictions) and converts it into a standard Python number.
correct: This variable now holds the total number of correct predictions in the batch.
return correct / N: This line calculates and returns the accuracy of the batch:

**It divides the number of correct predictions (correct) by the total number of samples in the batch (N).**
The result is the accuracy of the model's predictions for this specific batch, represented as a fraction (or percentage if multiplied by 100).

In summary, this function takes the model's output, the true labels, and the batch size as input. It then determines the predicted labels, counts the correct predictions, and finally calculates and returns the accuracy for the batch. This accuracy value is essential for evaluating the performance of your model during training and testing.

In [None]:
epochs=10
def training():
  for i in range(epochs):
    loss=0
    accuracy=0
    model.train()
    for indx,(image,label) in enumerate(train_loader):
      image=image.to(device)  # send data to GPU as we did for model
      label=label.to(device)
      output=model(image)
     #forward propagation till here

      batch_loss=loss_fn(output,label)
      #backward propagation starts here

      optimizer.zero_grad() #initially batch gradients are set to zero
#Its primary function is to reset the gradients of the model's parameters (weights and biases) to zero before starting the backpropagation for the next batch of data.

      batch_loss.backward()  #It initiates the backpropagation process, which is the core of how neural networks learn.
# It calculates the gradients of the loss function with respect to all the model's parameters (weights and biases) that have requires_grad=True (meaning they are trainable).

      optimizer.step()  # update parameters
      #It's the step where the optimizer actually updates the model's parameters (weights and biases) based on the gradients that were calculated during backpropagation (batch_loss.backward()).

      loss+=batch_loss.item() #accumulate the loss values across multiple batches during the training process.
      accuracy+=batch_accuracy(output,label,label.shape[0])  ##label is a one-dimensional array. This gives number of samples in a batch, just like N
    print(f'Training Epoch: {i+1}, Accuracy: {accuracy/len(train_loader)}, Loss: {loss/len(train_loader)}')

    #len(train_loader)=number of batches

**In PyTorch, gradients** are accumulated by default during backpropagation. This means that if you don't reset them to zero before processing a new batch, the gradients from the previous batch will be added to the gradients of the current batch. This can lead to incorrect weight updates and hinder the learning process.

By calling **optimizer.zero_grad()**, you ensure that the gradients are calculated only for the current batch of data, allowing for proper weight updates.

**In summary**: The loop goes through each batch of data in your training set. For each batch, it gives you:

- `indx`: The batch number (starting from 0).
- `image`: A tensor containing the images in the batch.
- `label`: A tensor containing the correct labels for the images in the batch.


`model.train()` sets the model to training mode.
 This is important because some layers, like dropout and batch normalization,
 behave differently during training and evaluation.  In training mode, these layers are active,
  while in evaluation mode (set using `model.eval()`), they are typically turned off or behave deterministically.
   Failing to switch to training mode can lead to inaccurate results and prevent proper learning during the training process.


`training()`:  The main training loop.
    *   Iterates over epochs (passes over the entire training data).
    *   Iterates over batches of data from the training `DataLoader`.
    *   For each batch:
        1.  Moves data to the device.
        2.  Passes data through the model (`output = model(image)`).
        3.  Calculates the loss using the loss function.
        4.  Performs backpropagation (`batch_loss.backward()`), updating gradients.
        5.  Updates the model's weights using the optimizer (`optimizer.step()`).
        6.  Calculates the accuracy of the batch.
    *   Prints the average training loss and accuracy for each epoch.

In [None]:
def evaluate():
  accuracy=0
  loss=0
   # Sets the model to evaluation mode. This is important for layers like dropout and batch normalization that behave differently during training and evaluation.
  model.eval()
  # Disables gradient calculations. This is important during evaluation since gradients are not needed for inference. It saves memory and computational time.
  with torch.no_grad(): #no backpropagation is done here
    for (image,label) in test_loader:
      image=image.to(device)
      label=label.to(device)
      output=model(image)  # Performs a forward pass through the model to obtain the predicted output

      batch_loss=loss_fn(output,label)
      accuracy+=batch_accuracy(output,label,label.shape[0]) #label is a one-dimensional array
      loss+=batch_loss.item()
    print(f'Test Accuracy: {accuracy*100/len(test_loader)}, Test Loss: {loss/len(test_loader)}')


In [None]:
training()

W0404 09:30:21.709000 233 torch/_inductor/utils.py:1137] [0/0] Not enough SMs to use max_autotune_gemm mode


Training Epoch: 1, Accuracy: 0.9007166666666667, Loss: 0.33793503912091255




Training Epoch: 2, Accuracy: 0.9629, Loss: 0.12633406118626395
Training Epoch: 3, Accuracy: 0.9751, Loss: 0.08113887280840426
Training Epoch: 4, Accuracy: 0.9825666666666667, Loss: 0.05728902567047626
Training Epoch: 5, Accuracy: 0.9869, Loss: 0.04198549642139114
Training Epoch: 6, Accuracy: 0.9900333333333333, Loss: 0.031191517790624252
Training Epoch: 7, Accuracy: 0.9921833333333333, Loss: 0.0238970316521707
Training Epoch: 8, Accuracy: 0.99325, Loss: 0.020613962910869546
Training Epoch: 9, Accuracy: 0.9957333333333334, Loss: 0.014239662292811166
Training Epoch: 10, Accuracy: 0.9950833333333333, Loss: 0.014465812071080048


In [None]:
evaluate()

Test Accuracy: 98.01317891373802, Test Loss: 0.07694631563104506


#2 Method:

In [None]:

class first_nn(nn.Module):
  def __init__(self,input_size,hidden_size,num_classes):
    super(first_nn,self).__init__()
    self.flat=nn.Flatten()
    self.l1=nn.Linear(input_size,hidden_size)
    self.relu=nn.ReLU()
    self.l2=nn.Linear(hidden_size,hidden_size)
    self.l3=nn.Linear(hidden_size,num_classes)
  def forward(self,x):
    output=self.flat(x)
    output=self.l1(output)
    output=self.relu(output)
    output=self.l2(output)
    output=self.relu(output)
    output=self.l3(output)
    return output

In [None]:
model1=first_nn(784,512,10)
model1=torch.compile(model1).to(device)

In [None]:
optimizer=torch.optim.Adam(model1.parameters())
loss_fn=nn.CrossEntropyLoss()

In [None]:
epochs=10
def training1():
  for i in range(epochs):
    loss=0
    accuracy=0
    model1.train()
    for (image,label) in train_loader:
      image=image.to(device)
      label=label.to(device)

      output=model1(image)

      batch_loss=loss_fn(output,label)
      optimizer.zero_grad()
      batch_loss.backward()
      optimizer.step()
      loss+=batch_loss.item()
      accuracy+=batch_accuracy(output,label,label.shape[0])
    print(f'Training Epoch: {i+1}, Accuracy: {accuracy/len(train_loader)}, Loss: {loss/len(train_loader)}')

In [None]:
def evaluate1():
  accuracy=0
  loss=0
  model.eval()
  with torch.no_grad():
    for (image,label) in test_loader:
      image=image.to(device)
      label=label.to(device)

      output=model1(image)

      batch_loss=loss_fn(output,label)
      accuracy+=batch_accuracy(output,label,label.shape[0])
      loss+=batch_loss.item()
    print(f'Test Accuracy: {accuracy*100/len(test_loader)}, Test Loss: {loss/len(test_loader)}')


In [None]:
evaluate1()

Test Accuracy: 99.6235341151386, Test Loss: 0.011436372119413927
