<div style="background-image: linear-gradient(145deg, rgba(35, 47, 62, 1) 0%, rgba(0, 49, 129, 1) 40%, rgba(32, 116, 213, 1) 60%, rgba(244, 110, 197, 1) 85%, rgba(255, 173, 151, 1) 100%); padding: 1rem 2rem; width: 95%"><img style="width: 60%;" src="../../images/MLU_logo.png"></div>

# <a name="0">MLU Mathematical Fundamentals for Machine Learning</a>
# <a name="0">Lecture 4: Differential calculus</a>
## <a name="0">Lab 4.1: Torch and Automatic Differentiation</a>

 1. <a href="#1">Getting familiar with torch tensors</a> 
 2. <a href="#2">Operations with torch tensors</a> 
 3. <a href="#3">Automatric differentiation with autograd</a> 

[**PyTorch**](https://pytorch.org/) is an open-source deep learning framework primarily used for building and training neural networks. As an optimized tensor library for deep learning using GPUs and CPUs, it offers flexibility and high performance for machine learning tasks. Originally developed by Meta AI, PyTorch is now part of the Linux Foundation umbrella. PyTorch is one of the most popular deep learning frameworks. 

In this lab, you will get familiar with PyTorch and in particular will learn how its automatic differentiation module, `autograd`, works. With this knowledge, you'll be well equipped to implement the gradient descent algorithm. This will serve you when tackling the final project.

In [None]:
# Upgrade libraries
!pip install -q --upgrade pip
!pip install -q --upgrade scikit-learn

In [None]:
%%capture
# Import libraries
import numpy as np
import matplotlib.pyplot as plt

import torch

from IPython.display import Markdown, display

## <a name="0">Getting familiar with torch tensors</a> 
(<a href="#0">Go to top</a>)

Tensors are the fundamental data structure in PyTorch. They are similar to NumPy arrays but can be used on GPUs for accelerated computing. Tensors can represent scalars, vectors, matrices, and higher-dimensional data.

Let's create some tensors!

In [None]:
# 0D tensor (scalar)
scalar = torch.tensor(42)
display(Markdown("#### Scalar tensor:"))
print(scalar)
print(f"Size: {scalar.size()}")
print(f"Shape: {scalar.shape}\n")

# 1D tensor (vector)
vector = torch.tensor([1, 2, 3, 4, 5])
display(Markdown("#### 1D tensor (vector):"))
print(vector)
print(f"Size: {vector.size()}")
print(f"Shape: {vector.shape}\n")

# 2D tensor (matrix)
matrix = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
display(Markdown("#### 2D tensor (matrix):"))
print(matrix)
print(f"Size: {matrix.size()}")
print(f"Shape: {matrix.shape}\n")

Pytorch tensors can also be created from a NumPy array, and be converted to a Numpy array.

In [None]:
# From NumPy array
np_array = np.array([1, 2, 3, 4])
x_from_np = torch.from_numpy(np_array)
display(Markdown("#### Torch - NumPy conversion:"))
print(f"Tensor from NumPy: {x_from_np}")
print(f"Tensor back to NumPy: {x_from_np.numpy()}")

### Random Tensor Initialization
Random initialization is crucial in machine learning, particularly when working with neural networks. PyTorch provides several ways to create tensors with random values:

In [None]:
# Create a tensor with values uniformly distributed between 0 and 1
uniform_tensor = torch.rand(1000)

# Create a tensor with values uniformly distributed between -3 and 3
uniform_scaled = torch.rand(1000) * 6 - 3  # scales [0,1] to [-3,3]

# Visualize the distributions
plt.figure(figsize=(8, 3))

plt.subplot(1, 2, 1)
plt.hist(uniform_tensor.numpy(), bins=30)
plt.title('Uniform Distribution [0, 1]')
plt.xlabel('Value')
plt.ylabel('Count')

plt.subplot(1, 2, 2)
plt.hist(uniform_scaled.numpy(), bins=30)
plt.title('Uniform Distribution [-3, 3]')
plt.xlabel('Value')
plt.ylabel('Count')

plt.tight_layout()
plt.show()

### Exercise 1

<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 80%; max-height:80%; margin: 5px;" src="../../images/MLU_challenge.png" alt="MLU challenge" width=12% height=12%/>
    <span style="padding: 20px; align: left;">
        <p><b>It is your turn!</b></p>
        <p><b>Exercise 1. Gaussian initialization.</b></p>
        <p>Initialize two torch tensors with 1000 points each, the first sampling from a standard normal distribution of mean 0 and standard deviation 1, and the second from a normal distribution of mean 2 and standard deviation 0.4.</p>
        <p>Plot both distributions. Use the same <code>x_min</code>, <code>x_max</code> limits for the x axes in both plots so that you can see the relative size of one and the other. If you plot the density on the y axes, rather than the count, you'll be able to appreciate the difference between the two distributions more clearly.</p>
        </span>
</div>

In [None]:
###### YOUR CODE HERE ######






###### END OF CODE ######

<div style="align: left; border: 4px solid lightcoral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="../../images/MLU_question.png" alt="MLU solution" width=12% height=12%/>
    <span style="padding: 20px; align: left;">
        <p><b>Challenge Help</b></p>
        <p>You can use the function <code>torch.normal</code> to initialize the tensor. It takes as parameters the mean and std of the Guaussian distribution, together with a parameter <code>size</code> for the size of the returned object. To return a vector of dimension <code>dim</code>, you need to pass <code>size=(dim,)</code>.</p>
        <p>If you're stuck, remove the <code>#</code> before the <code>load</code> instruction in the next code cell to display a sample solution.</p>
    </span>
</div>

In [None]:
# %load solutions/lab41_ex1_solutions.txt

## <a name="1">Operations with torch tensors</a> 
(<a href="#0">Go to top</a>)

PyTorch supports various operations on tensors. Here are some common ones:

In [None]:
# Basic arithmetic operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

display(Markdown("#### Basic Operations:"))
print(a)
print(b)
print()
print("Addition:", a + b)  # or torch.add(a, b)
print("Multiplication:", a * b)  # element-wise multiplication, or torch.mul
print("Subtraction:", b - a)

Matrix multiplication can be computed using different syntaxes, including the `@` operator, `torch.matmul()`, and `torch.mm()`. 

In [None]:
mat1 = torch.tensor([[1, 2], [3, 4]])
mat2 = torch.tensor([[5, 6], [7, 8]])

print(mat1)
print(mat2)

display(Markdown("#### Matrix Multiplication:"))
    
# Using @ operator
mat_mul = mat1 @ mat2
print(f"mat1 @ mat2:\n{mat_mul}")
print()

# Using torch.matmul function
mat_mul_func = torch.matmul(mat1, mat2)
print(f"torch.matmul(mat1, mat2):\n{mat_mul_func}")
print()

# Using torch.mm function (only for 2D tensors)
mat_mul_mm = torch.mm(mat1, mat2)
print(f"torch.mm(mat1, mat2):\n{mat_mul_mm}")

display(Markdown("#### Matrix-Vector Multiplication:"))
vec = torch.tensor([1, 2])
mat_vec_mul = mat1 @ vec
print(f"mat1 @ [1, 2]:\n{mat_vec_mul}")
print()


## <a name="0">Automatic Differentiation with autograd</a> 
(<a href="#0">Go to top</a>)

One of the most powerful features of PyTorch is its automatic differentiation engine, known as Autograd. This system enables the efficient computation of gradients, which is crucial for training deep learning models.

PyTorch Autograd tracks operations on tensors and builds a computational graph dynamically. Each tensor has a `.grad_fn` attribute that references a function that created the tensor. To compute gradients, we can call `.backward()` on a scalar tensor, and PyTorch will automatically compute the gradients for all tensors in the graph that require gradients.

To use Autograd, you need to create tensors with `requires_grad=True`:

In [None]:
# Create a tensor with gradient tracking
x = torch.tensor([2.0, 3.5], requires_grad=True)
print("x:", x)
print("Requires gradient:", x.requires_grad)

When we define a variable in terms of tensors that have gradients, PyTorch automatically sets up the computation graph for that variable. This allows us to compute gradients with respect to the input tensors. 

Let's explore this concept with an example:
$$
z = x^2 + y^3
$$

PyTorch automatically creates a computation graph for this operation, and `z` knows that it was created as a result of operations on `x` and `y`.

The gradient of $z$ at each point is given by the vector of partial derivatives $(2 x, 3 y^2)$. At point $(x, y)=(2,2)$ the value of the gradient is $(4, 12)$.  

The `.grad_fn` attribute of `z` shows the function that created this tensor.

In [None]:
# Creating tensors with gradient tracking
x = torch.tensor([2.0], requires_grad=True)
y = torch.tensor([2.0], requires_grad=True)

# Performing operations
z = x**2 + y**3

# Computing gradients
z.backward()

# Should be 4.0 (derivative of x^2 --> 2x at x=2)
print("Gradient of z with respect to x:", x.grad)
# Should be 12.0 (derivative of y^3 --> 3y^2 at y=2)
print("Gradient of z with respect to y:", y.grad)  
print()
print(f"z.grad_fn: {z.grad_fn}")

### Exercise 2

The sigmoid function, defined as: 
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$
has a simple derivative that can be written in terms of the function:
$$
\frac{d}{dx}\sigma(x) = \sigma(x)(1 - \sigma(x))
$$

<div style="align: left; border: 4px solid cornflowerblue; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 80%; max-height:80%; margin: 5px;" src="../../images/MLU_challenge.png" alt="MLU challenge" width=12% height=12%/>
    <span style="padding: 20px; align: left;">
        <p><b>It is your turn!</b></p>
        <p><b>Exercise 2. Sigmoid derivative.</b></p>
        <p>Using pytorch autograd, compute and plot the derivative of the sigmoid function in the interval x = (-10, 10).</p>
        <p>Compare with the analytical computation of the derivative.</p>
        </span>
</div>

In [None]:
###### YOUR CODE HERE ######






###### END OF CODE ######

<div style="align: left; border: 4px solid lightcoral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; width: 65%">
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="../../images/MLU_question.png" alt="MLU solution" width=12% height=12%/>
    <span style="padding: 20px; align: left;">
        <p><b>Challenge Help</b></p>
        <p>Implement the sigmoid function to return a torch tensor. Apply the function to a tensor containing equally spaced points in the interval (-10, 10). Then compute the derivative by calling <code>torch.autograd.backward(y, torch.ones_like(x), create_graph=True)</code>, where <code>y</code> contains the values of the sigmoid on the desired interval. Read more about how to invoke <code>autograd</code> <a href="https://stackoverflow.com/questions/69148622/difference-between-autograd-grad-and-autograd-backward">here</a>.</p>
        <p>To be able to plot torch tensors with matplotlib, you need to detach the gradient and transform them into NumPy arrays with <code>.detach().numpy()</code>.
        <p>If you're stuck, remove the <code>#</code> before the <code>load</code> instruction in the next code cell to display a sample solution.</p>
    </span>
</div>

In [None]:
# %load solutions/lab41_ex2_solutions.txt

<div style="display: flex; align-items: center; justify-content: left; background-color:#330066; width:99%;"> 
        <img style="float: left; max-width: 100%; max-height:100%; margin: 15px;" src="../../images/MLU_robot.png" alt="MLU robot" width="100" height="100"/>
    <span style="color: white; padding-left: 10px; align: left; margin: 15px;">
        <h3>Congratulations!</h3>
        You have completed Lab 4.1: Torch and Automatic Differentiation of Lecture 4: Differential Calculus of MLU Mathematical Fundamentals of Machine Learning.
        <br/>
    </span>
</div>