#### DEEP LEARNINING WITH CUDA LAB REPORT NO.3


This report will focus on PyTorch library.
We'll cover topics such as:
  * Tensors
  * Building, training and evaluting models
  


In [None]:
import os
import torch
import numpy as np
import matplotlib.pyplot as plt
import torch
import torchvision
from torchvision import datasets, transforms
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms

### Tensors
At its most basic, a tensor is a mathematical object that generalizes scalars, **vectors**, and matrices to higher dimensions.

In the context of PyTorch and machine learning, tensors are essentially multi-dimensional arrays or matrices that are used for storing data.

Let's see what tensors are and what can we do with them.



### Creating tensors

Tensors can be created in various ways, including from existing data (like Python lists or NumPy arrays) or by using built-in PyTorch functions that generate tensors of specific types.



In [None]:
a = torch.tensor([1,2,3,4]) ### Creates a 1x4 tensor
b = torch.ones(2,3) ### Creates a 2x3 tensor filled with ones
c = torch.rand(2,5) ### Creates a 2x5 tensor filled with random values

### We can also create tensors from existing data:

d = [[1,2], [3,4]]
tensor_d = torch.tensor(d) ### Using matrix a, we can create 2x2 tensor

e = np.array([[1,2],[1,2],[3,4],[3,4]])
tensor_e = torch.from_numpy(e) ### Using NymPy arrays

### We can also specify the data type
f = torch.ones(2, 5, dtype=torch.float)
g = torch.arange(4, dtype=torch.int)


##Tensor attributes


In [None]:
print("Shape of the tensor:", a.shape)
print("Shape of the tensor:", c.shape)

# Dtype - returns the data type of the tensor's elements
print("Data type of the tensor:", a.dtype)

# Device-specific operations
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
a_gpu = a.to(device)  # Moving tensor `a` to GPU (if available)
print("Tensor stored a  on:", a.device)  # Output: cpu (default) or cuda:0 (if GPU is used)
print("Tensor stored a_gpu on:", a_gpu.device)  # Output: cpu (default) or cuda:0 (if GPU is used)

# Requires_grad - indicates whether the tensor is part of a computational graph and requires gradients
print("Does the tensor require gradients?:", a.requires_grad)  # Output: False (default)

Shape of the tensor: torch.Size([4])
Shape of the tensor: torch.Size([2, 5])
Data type of the tensor: torch.int64
Tensor stored a  on: cpu
Tensor stored a_gpu on: cuda:0
Does the tensor require gradients?: False


## Tensor operations

In [None]:

tensor_a = torch.tensor([1, 2, 3])
tensor_b = torch.tensor([4, 5, 6])

# Element-wise addition
sum_tensors = tensor_a + tensor_b
print("Element-wise addition:", sum_tensors)  # Output: tensor([5, 7, 9])

# Element-wise multiplication
product_tensors = tensor_a * tensor_b
print("Element-wise multiplication:", product_tensors)  # Output: tensor([4, 10, 18])

# Scalar multiplication
scalar_product = tensor_a * 2
print("Scalar multiplication:", scalar_product)  # Output: tensor([2, 4, 6])

# Slicing a tensor
sliced_tensor = tensor_b[1:]
print("Sliced tensor:", sliced_tensor)  # Output: tensor([5, 6])

# Computing the mean (requires floating-point dtype)
float_tensor = tensor_a.to(dtype=torch.float)
mean_value = float_tensor.mean()
print("Mean value:", mean_value)  # Output: tensor(2.)

# Transposing a matrix
matrix = torch.tensor([[1, 2], [3, 4]])
transposed_matrix = matrix.t()
print("Transposed matrix:", transposed_matrix)  # Output: tensor([[1, 3], [2, 4]])





Element-wise addition: tensor([5, 7, 9])
Element-wise multiplication: tensor([ 4, 10, 18])
Scalar multiplication: tensor([2, 4, 6])
Sliced tensor: tensor([5, 6])
Mean value: tensor(2.)
Transposed matrix: tensor([[1, 3],
        [2, 4]])


In PyTorch, both view and reshape methods are used to change the shape of a tensor. The view method requires the requested shape to be compatible with the original shape, and it returns a new tensor with the same data but a different shape. However, if the original tensor's layout in memory doesn't support the view, it will raise an error.

On the other hand, reshape can handle these situations by potentially returning a tensor with a copied data if necessary to meet the shape requirements.

In [None]:
a_tensor = torch.arange(12)
view_tensor = a_tensor.view(3, 4)  # Changing the shape using view
reshape_tensor = a_tensor.reshape(4, 3)  # Changing the shape using reshape

print(a_tensor)
print("\n")
print(view_tensor)
print("\n")
print(reshape_tensor)

### view_tensor = a_tensor.view(4, 4)  # Changing the shape using view - results in error
###reshape_tensor = a_tensor.reshape(4, 4)  # Changing the shape using reshape - also results in an error


tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])


tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


tensor([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]])


### Building Machine Learning models in PyTorch


In PyTorch, creating neural network models involves defining a class that inherits from nn.Module, the base class for all neural network modules.

This custom class, typically includes an __init__ method where layers and components of the model are instantiated.

The forward method is then implemented to describe the forward propagation of data through the network. The forward method's definition is crucial as it dictates the data flow through the model, but notably, PyTorch automates the backward propagation (for computing gradients) during training, leveraging its dynamic computation graph.




In [None]:
class MLPShallow(nn.Module):      #inheriting from nn.Module
  def __init__(self):
    super(MLPShallow, self).__init__()

    # linear is our classical dense or fully-connected layer
    # Inside the __init__ method, we define the layers of the model as class attributes.
    # Each layer type (nn.Linear, nn.Conv2d, etc.) comes with its own parameters, like the number of input and output features.
    self.input_layer = nn.Linear(784, 256)
    self.hidden_layer1 = nn.Linear(256, 512)
    self.hidden_layer2 = nn.Linear(512, 512)
    self.hidden_layer3 = nn.Linear(512, 256)
    self.hidden_layer4 = nn.Linear(256, 256)
    self.hidden_layer5 = nn.Linear(256, 128)
    self.hidden_layer6 = nn.Linear(128, 64)
    self.output_layer = nn.Linear(64, 10)

  #On top of nn.Linear PyTorch provides far more layeres we can choose for example:
  # Normalization Layers, Vision Layers, Padding Layers, Convolution Layers
  # implement a computational graph describing the data flow, here x is our data
  # the x will be activations - processed data but we keep the symbol - it represent the data flow.
  # note, that we define only the forward propagation of the signal, the backpropagation
  # will be performed automatically by the framework

  def forward(self, x):

  # Although not stored as class attributes, activation functions (F.relu, F.softmax, etc.)
  # are applied to the outputs of layers within the forward method.

    x = F.leaky_relu(self.input_layer(x))     # F is functional from imports
    x = F.leaky_relu(self.hidden_layer1(x))
    x = F.leaky_relu(self.hidden_layer2(x))
    x = F.leaky_relu(self.hidden_layer3(x))
    x = F.leaky_relu(self.hidden_layer4(x))
    x = F.leaky_relu(self.hidden_layer5(x))
    x = F.leaky_relu(self.hidden_layer6(x))


    #This:
     # x = F.softmax(self.output_layer(x), dim=1)
     # return x

    #is equal


    x = self.output_layer(x)

    return x


# Create a loss function

# CrossEntropyLoss will apply softmax internally

model = MLPShallow()


loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)


### Loading data



We'll use Fashion-MINST image classification dataset containing 60,000 examples and a test set of 10,000 examples.


In [None]:
### Here we are describing how to handle data - chaining transofmrations
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5))])

### Downloading data and aplying transformations
training_data = torchvision.datasets.MNIST(root="../data", train=True, transform=transform, download=True)
test_data = torchvision.datasets.MNIST(root="../data", train=False, transform=transform, download=True)


### Loading the data as well as specyfing the batch_size(It determines the number of images passed through the network in one forward/backward pass.),
### shuffle (Whether to shuffle the data at every epoch. It helps in reducing overfitting.)
### num_workers (How many subprocesses to use for data loading)
training_data_loader = torch.utils.data.DataLoader(training_data, batch_size=64, shuffle=True, num_workers=2)
test_data_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=True, num_workers=2)

###Training

Compared to tensorflow - training in PyTorch is handled in a different way, as we are resposible for creating and optimizing the training loop


In [None]:

# Training Loop
Epochs = 30
for epoch in range(Epochs):
 # Initialize variables to track training loss and accuracy
  training_loss = 0.0
  correct = 0
  total = 0
  for i, data in enumerate(training_data_loader, 0):
   # Unpack the batch of data; 'inputs' are the features, 'labels' are the target output
    inputs, labels = data

    # Flatten the input images into vectors for MLP processing
    inputs = inputs.view(inputs.shape[0], -1)

    # Clear previously calculated gradients before the forward pass
    optimizer.zero_grad()

    # Perform the forward pass through the model
    outputs = model(inputs)

    # The difference between the model's predictions (outputs) and the actual targets (labels)
    # is computed using a loss function. This loss guides the model's learning.
    loss = loss_function(outputs, labels)

    # Calling loss.backward() computes the gradient of the loss with respect to each parameter.
    # These gradients are used to adjust the parameters in the direction that reduces the loss.
    loss.backward()

    # The optimizer updates the model's parameters based on the gradients computed during backpropagation.
    # Different optimization algorithms (SGD, Adam, etc.) adjust the parameters in various ways.
    optimizer.step()

    # Accumulate the loss for reporting. '.item()' extracts the loss's scalar value.
    training_loss += loss.item()

    # Calculate the number of correctly predicted labels
    _, predicted = outputs.max(1)
    total += labels.size(0)
    correct += predicted.eq(labels).sum().item()

    # Compute the average loss and accuracy over the current batch
    avg_loss = training_loss / (i + 1)
    avg_acc = 100. * correct / total

  # Print the average loss and accuracy for the epoch
  print(f'Training Loss: {avg_loss:.3f} | Training acc: {avg_acc:.3f}', 'for epoch: ', epoch)

Training Loss: 2.300 | Training acc: 11.222 for epoch:  0
Training Loss: 1.658 | Training acc: 38.757 for epoch:  1
Training Loss: 0.477 | Training acc: 86.770 for epoch:  2
Training Loss: 0.260 | Training acc: 93.028 for epoch:  3
Training Loss: 0.192 | Training acc: 94.840 for epoch:  4
Training Loss: 0.152 | Training acc: 95.803 for epoch:  5
Training Loss: 0.123 | Training acc: 96.592 for epoch:  6
Training Loss: 0.103 | Training acc: 97.070 for epoch:  7
Training Loss: 0.089 | Training acc: 97.422 for epoch:  8
Training Loss: 0.078 | Training acc: 97.698 for epoch:  9
Training Loss: 0.063 | Training acc: 98.188 for epoch:  10
Training Loss: 0.060 | Training acc: 98.240 for epoch:  11
Training Loss: 0.054 | Training acc: 98.427 for epoch:  12
Training Loss: 0.045 | Training acc: 98.665 for epoch:  13
Training Loss: 0.043 | Training acc: 98.700 for epoch:  14
Training Loss: 0.037 | Training acc: 98.917 for epoch:  15
Training Loss: 0.034 | Training acc: 98.958 for epoch:  16
Trainin

### Evaluating performance

 Evaluating the model outside the training loop is advantageous as it allows performance metrics to be calculated just once, specifically for the model at its peak performance. This streamlined approach not only ensures efficiency but also focuses on assessing the model's best iteration.

In [None]:
model.eval()

# Initialize variables to track test loss and accuracy
test_loss = 0
correct = 0

# No gradient calculations needed for inference, which saves memory and computations
with torch.no_grad():

    # Iterate over all test data batches
    for i, (image, label) in enumerate(test_data_loader):

        # Reshape images to match the input expected by the network (flatten the images)
        image = image.view(image.shape[0], -1)

        # Forward pass: compute predicted outputs by passing images to the model
        output = model(image)

        # Calculate the batch's loss using the negative log likelihood loss.
        # 'reduction=sum' calculates the sum of losses across all examples in the batch.

        test_loss += F.nll_loss(output, label, reduction='sum').item()

        # Get the index of the max log-probability (the predicted class label)
        pred = output.data.max(1, keepdim=True)[1]

        # Count how many predictions match the true labels
        correct += pred.eq(label.data.view_as(pred)).sum()

    # Calculate average loss over all test data
    test_loss /= len(test_data_loader.dataset)

    # Print test set results including average loss and accuracy

    print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(test_loss, correct, len(test_data_loader.dataset), 100. * correct / len(test_data_loader.dataset)))


Test set: Avg. loss: -24.8135, Accuracy: 9804/10000 (98%)



###Summary

We've found that the PyTorch library offers a robust and flexible approach, granting developers and researchers extensive control over their models. This level of freedom and the dynamic computation graph PyTorch employs significantly benefit complex model development and research, where intricate customizations and iterative adjustments are common. While the report primarily focuses on PyTorch, it's worth noting that frameworks like TensorFlow can be advantageous for rapid prototyping and simpler networks, thanks to their streamlined workflows and comprehensive toolsets. Ultimately, PyTorch stands out in scenarios that demand deep model customization and hands-on control.