<table align="center">
  <td align="center"><a target="_blank" href="http://introtodeeplearning.com">
        <img src="https://i.ibb.co/Jr88sn2/mit.png" style="padding-bottom:5px;" />
      Visit MIT Deep Learning</a></td>
  <td align="center"><a target="_blank" href="https://colab.research.google.com/github/aamini/introtodeeplearning/blob/master/lab1/solutions/Part1_TensorFlow_Solution.ipynb">
        <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
  <td align="center"><a target="_blank" href="https://github.com/aamini/introtodeeplearning/blob/master/lab1/solutions/Part1_TensorFlow_Solution.ipynb">
        <img src="https://i.ibb.co/xfJbPmL/github.png"  height="70px" style="padding-bottom:5px;"  />View Source on GitHub</a></td>
</table>

# Copyright Information


In [None]:
# Copyright 2024 MIT Introduction to Deep Learning. All Rights Reserved.
#
# Licensed under the MIT License. You may not use this file except in compliance
# with the License. Use and/or modification of this code outside of MIT Introduction
# to Deep Learning must reference:
#
# © MIT Introduction to Deep Learning
# http://introtodeeplearning.com
#

# Lab 1: Intro to Pytorch and Music Generation with RNNs

In this lab, you'll get exposure to using PyTorch and learn how it can be used for solving deep learning tasks. Go through the code and run each cell. Along the way, you'll encounter several ***TODO*** blocks -- follow the instructions to fill them out before running those cells and continuing.


# Part 1: Intro to PyTorch

## 0.1 Install PyTorch

PyTorch is a popular deep learning library known for its flexibility and dynamic computation graph. Here we'll learn how computations are represented and how to define a simple neural network in PyTorch. Let's install PyTorch and a couple of dependencies.

In [None]:
import torch
import torch.nn as nn

# Download and import the MIT Introduction to Deep Learning package
!pip install mitdeeplearning --quiet
import mitdeeplearning as mdl

import numpy as np
import matplotlib.pyplot as plt

## 1.1 Why is PyTorch called PyTorch?

PyTorch is called PyTorch because it provides a Pythonic interface for creating and manipulating tensors, which are multi-dimensional arrays. PyTorch also provides the ability to perform computation on these tensors, define neural networks, and train them efficiently.

Let’s start by creating some tensors and inspecting their properties:


In [None]:
integer = torch.tensor(1234)
decimal = torch.tensor(3.14159265359)

print(f"`integer` is a {integer.ndim}-d Tensor: {integer}")
print(f"`decimal` is a {decimal.ndim}-d Tensor: {decimal}")


Vectors and lists can create 1-d tensors:

In [None]:
fibonacci = torch.tensor([1, 1, 2, 3, 5, 8])
count_to_100 = torch.tensor(range(100))

print(f"`fibonacci` is a {fibonacci.ndim}-d Tensor with shape: {fibonacci.shape}")
print(f"`count_to_100` is a {count_to_100.ndim}-d Tensor with shape: {count_to_100.shape}")


Next, let’s create 2-d (i.e., matrices) and higher-rank Tensors. In image processing and computer vision, we will use 4-d Tensors with dimensions corresponding to batch size, image height, image width, and number of color channels.


In [None]:
### Defining higher-order Tensors ###

'''TODO: Define a 2-d Tensor'''
matrix = torch.tensor([[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0]]) # TODO
# matrix = # TODO

assert isinstance(matrix, torch.Tensor), "matrix must be a torch Tensor object"
assert matrix.ndim == 2

'''TODO: Define a 4-d Tensor.'''
# Use torch.zeros to initialize a 4-d Tensor of zeros with size 10 x 256 x 256 x 3.
#   You can think of this as 10 images where each image is RGB 256 x 256.
images = torch.zeros(10, 256, 256, 3) # TODO
# images = # TODO

assert isinstance(images, torch.Tensor), "images must be a torch Tensor object"
assert images.ndim == 4, "images must be of rank 4"
assert images.shape == (10, 256, 256, 3), "images is incorrect shape"
print(f"images is a {images.ndim}-d Tensor with shape: {images.shape}")

The **shape** of a Tensor provides the number of elements in each Tensor dimension. The **shape** is quite useful, and we'll use it often. You can also use slicing to access subtensors within a higher-rank Tensor:

In [None]:
row_vector = matrix[1]
column_vector = matrix[:, 1]
scalar = matrix[0, 1]

print(f"`row_vector`: {row_vector}")
print(f"`column_vector`: {column_vector}")
print(f"`scalar`: {scalar}")

## 1.2 Computations on Tensors

A convenient way to think about and visualize computations in PyTorch is in terms of graphs. We can define this graph in terms of Tensors, which hold data, and the mathematical operations that act on these Tensors in some order. Let's look at a simple example, and define this computation using PyTorch:

![alt text](https://raw.githubusercontent.com/aamini/introtodeeplearning/master/lab1/img/add-graph.png)

In [None]:
# Create the nodes in the graph and initialize values
a = torch.tensor(15)
b = torch.tensor(61)

# Add them!
c1 = torch.add(a, b)
c2 = a + b  # PyTorch overrides the "+" operation so that it is able to act on Tensors
print(f"c1: {c1}")
print(f"c2: {c2}")


Notice how we've created a computation graph consisting of PyTorch operations, and how the output is a Tensor with value 76 -- we've just created a computation graph consisting of operations, and it's executed them and given us back the result.

Now let's consider a slightly more complicated example:

![alt text](https://raw.githubusercontent.com/aamini/introtodeeplearning/master/lab1/img/computation-graph.png)

Here, we take two inputs, `a, b`, and compute an output `e`. Each node in the graph represents an operation that takes some input, does some computation, and passes its output to another node.

Let's define a simple function in PyTorch to construct this computation function:

In [None]:
### Defining Tensor computations ###

# Construct a simple computation function
def func(a, b):
    '''TODO: Define the operation for c, d, e.'''
    c = torch.add(a, b)  # TODO
    # c = # TODO
    d = torch.subtract(b, 1)  # TODO
    # d = # TODO
    e = torch.multiply(c, d)  # TODO
    # e = # TODO
    return e


Now, we can call this function to execute the computation graph given some inputs `a,b`:

In [None]:
# Consider example values for a,b
a, b = 1.5, 2.5
# Execute the computation
e_out = func(a, b)
print(f"e_out: {e_out}")

Notice how our output is a Tensor with value defined by the output of the computation, and that the output has no shape as it is a single scalar value.

## 1.3 Neural networks in PyTorch
We can also define neural networks in PyTorch. PyTorch uses the `torch.nn` module, which provides a framework for building and training deep learning models.

Let's consider the example of a simple perceptron defined by just one linear (or fully-connected) layer: $ y = \sigma(Wx + b) $, where $W$ represents a matrix of weights, $b$ is a bias, $x$ is the input, $\sigma$ is the sigmoid activation function, and $y$ is the output.

Tensors can flow through abstract types called [```Layers```](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer) -- the building blocks of neural networks. ```Layers``` implement common neural networks operations, and are used to update weights, compute losses, and define inter-layer connectivity. We will first define a ```Layer``` to implement the simple perceptron defined above.

In [None]:
### Defining a network Layer ###

# num_inputs: number of input nodes
# num_outputs: number of output nodes
# x: input to the layer

class OurDenseLayer(torch.nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(OurDenseLayer, self).__init__()
        self.W = torch.nn.Parameter(torch.randn(num_inputs, num_outputs))
        self.bias = torch.nn.Parameter(torch.randn(num_outputs))

    def forward(self, x):
        '''TODO: define the operation for z.'''
        z = torch.matmul(x, self.W) + self.bias  # TODO
        # z = # TODO

        '''TODO: define the operation for out.'''
        y = torch.sigmoid(z)  # TODO
        # y = # TODO
        return y


Now, let's test the output of our layer.

In [None]:
# Define a layer and test the output!
layer = OurDenseLayer(2, 3)
x_input = torch.tensor([[1, 2.]])
y = layer(x_input)

print(f"input shape: {x_input.shape}")
print(f"output shape: {y.shape}")
print(f"output result: {y}")

Conveniently, PyTorch has defined a number of ```nn.Modules``` (or Layers) that are commonly used in neural networks, for example a [```nn.Linear```](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) module. 

Now, instead of using a single ```Module``` to define our simple neural network, we'll use the  [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) module from PyTorch and a single [`nn.Linear` ](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) layer to define our network. With the `Sequential` API, you can readily create neural networks by stacking together layers like building blocks.

In [None]:
### Defining a neural network using Sequential API ###

# Define the model
'''TODO: Define a neural network with a single linear layer followed by non-linearity to compute z'''
# model = nn.Sequential( ''' TODO ''' )
model = nn.Sequential(
    nn.Linear(2, 3),  # Dense layer with input size 2 and output size 3
    nn.Sigmoid()      # Sigmoid activation function
)

# Test the model with example input
x_input = torch.tensor([[1, 2.]])
model_output = model(x_input)
print(f"input shape: {x_input.shape}")
print(f"output shape: {y.shape}")
print(f"output result: {y}")


That's it! We've defined our model using the Sequential API. Now, we can test it out using an example input:

With PyTorch, you can also create more flexible models by subclassing [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html):

The `nn.Module` class captures what we refer to as a "layer" or as a "model". Using Subclassing, we can create a class for our model, and then define the forward pass through the network using the `forward` function. Subclassing affords the flexibility to define custom layers, custom training loops, custom activation functions, and custom models. Let's define the same neural network layer as above (i.e., Linear layer with an activation function after it) now using Subclassing rather than the `Sequential` model.

In [None]:
### Defining a model using subclassing ###

class LinearWithActivation(nn.Module):
    def __init__(self, num_inputs, num_outputs, activation):
        super(LinearWithActivation, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)
        if activation == "sigmoid":
            self.activation = nn.Sigmoid()
        else:
            raise ValueError(f"Need to implement support for activation {activation}")

    def forward(self, inputs):
        linear_output = self.linear(inputs)
        output = self.activation(linear_output)
        return output

# Test the SubclassModel
model = LinearWithActivation(2, 3, activation="sigmoid")
x_input = torch.tensor([[1, 2.]])
y = model(x_input)
print(f"input shape: {x_input.shape}")
print(f"output shape: {y.shape}")
print(f"output result: {y}")


Importantly, Subclassing affords us a lot of flexibility to define custom models. For example, we can use boolean arguments in the `forward` function to specify different network behaviors, for example different behaviors during training and inference. Let's suppose under some instances we want our network to simply output the input, without any perturbation. We define a boolean argument `isidentity` to control this behavior:

In [None]:
### Custom behavior with subclassing ###

class LinearButSometimesIdentity(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(LinearButSometimesIdentity, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)

    '''TODO: Implement the behavior where the network outputs the input, unchanged, under control of the isidentity argument.'''
    def forward(self, inputs, isidentity=False):
        if isidentity: 
            return inputs  # TODO
            # return # TODO
        else:
            return self.linear(inputs) # TODO
            # return # TODO


Let's test this behavior:

In [None]:
# Test the IdentityModel
model = LinearButSometimesIdentity(2, 3)
x_input = torch.tensor([[1, 2.]])

out_with_linear = model(x_input)  # TODO
# out_with_linear = # TODO

out_with_identity = model(x_input, isidentity=True)  # TODO
# out_with_identity = # TODO

print(f"input: {x_input}")
print(f"output with linear: {out_with_linear}")
print(f"output with identity: {out_with_identity}")
print("is out_with_identity equal to x_input? ", torch.allclose(out_with_identity, x_input))


Now that we have learned how to define `Modules` as well as neural networks in PyTorch using both the `Sequential` and Subclassing APIs, we're ready to turn our attention to how to actually implement network training with backpropagation.

## 1.4 Automatic Differentiation in PyTorch

In PyTorch, `torch.autograd` is used for [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation), which is critical for training deep learning models with [backpropagation](https://en.wikipedia.org/wiki/Backpropagation). 

We will use the PyTorch [`.backward()`](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html) method to trace operations for computing gradients. When a forward pass is made through the network, PyTorch builds a computational graph dynamically; then, to compute the gradient, the `backward()` method is called to perform backpropagation. 

Let's compute the gradient of $ y = x^2 $:

In [None]:
### Gradient computation ###

# y = x^2
# Example: x = 3.0
x = torch.tensor(3.0, requires_grad=True)
y = x ** 2
y.backward()  # Compute the gradient

dy_dx = x.grad
print("dy_dx of y=x^2 at x=3.0 is: ", dy_dx)
assert dy_dx == 6.0


In training neural networks, we use differentiation and stochastic gradient descent (SGD) to optimize a loss function. Now that we have a sense of how PyTorch's autograd can be used to compute and access derivatives, we will look at an example where we use automatic differentiation and SGD to find the minimum of $ L=(x-x_f)^2 $. Here $x_f$ is a variable for a desired value we are trying to optimize for; $L$ represents a loss that we are trying to minimize. While we can clearly solve this problem analytically ($ x_{min}=x_f $), considering how we can compute this using PyTorch's autograd sets us up nicely for future labs where we use gradient descent to optimize entire neural network losses.

In [None]:
### Function minimization with SGD ###

# Initialize x
x = torch.tensor([torch.randn(1)], requires_grad=True)
print(f"Initializing x={x.item()}")

learning_rate = 1e-2  # Learning rate
history = []
x_f = 4  # Target value

for i in range(500):
    # Compute the loss
    loss = (x - x_f) ** 2  # TODO
    # loss = # TODO

    # Backward pass 
    loss.backward()

    # Update x with gradient descent
    with torch.no_grad():
        x -= learning_rate * x.grad
    
    # Zero the gradients
    x.grad.zero_()

    history.append(x.item())

# Plot the evolution of x as we optimize toward x_f
plt.plot(history)
plt.plot([0, 500], [x_f, x_f])
plt.legend(('Predicted', 'True'))
plt.xlabel('Iteration')
plt.ylabel('x value')
plt.show()


This covers the basic concepts in PyTorch for Tensors, operations, neural networks, and automatic differentiation.
