### Problem 1: basic operations in PyTorch
- 1.1 Create a 3x3 tensor filled with zeros and another one filled with ones. Perform element-wise addition and subtraction between the two tensors.

- 1.2 Create a tensor of shape (3, 4) where each element is taken from a uniform distribution in the interval $[-1, 2]$. Calculate its mean and standard deviation using PyTorch functions.

- 1.3 Create two tensors of different shapes (e.g., one (3, 4) and another (4, 3)) and perform matrix multiplication between them.

### 1.1 Create a 3x3 tensor filled with zeros and another one filled with ones. Perform element-wise addition and subtraction between the two tensors.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import torch

a = torch.zeros([3, 3])
b = torch.ones([3, 3])

print('Element-wise Addition: \n ', torch.add(a, b))
print('Element-wise Subtraction: \n ', torch.sub(a, b), '\n', torch.sub(b, a))

Element-wise Addition: 
  tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
Element-wise Subtraction: 
  tensor([[-1., -1., -1.],
        [-1., -1., -1.],
        [-1., -1., -1.]]) 
 tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])


### 1.2 Create a tensor of shape (3, 4) where each element is taken from a uniform distribution in the interval  [âˆ’1,2]. Calculate its mean and standard deviation using PyTorch functions.

In [3]:
c = torch.distributions.uniform.Uniform(-1, 2).sample([3, 4])
print(c)
print('Mean of tensor c = ', torch.mean(c), '\n', 'Standard deviation of tensor c = ', torch.std(c))

tensor([[ 0.5193,  0.7880, -0.3425,  1.5424],
        [ 1.4085, -0.0309,  1.6380,  0.8759],
        [ 1.2010,  1.3906,  1.6726,  0.4333]])
Mean of tensor c =  tensor(0.9247) 
 Standard deviation of tensor c =  tensor(0.6690)


### 1.3 Create two tensors of different shapes (e.g., one (3, 4) and another (4, 3)) and perform matrix multiplication between them.

In [4]:
d = torch.tensor([[1, 3, 2, 4], [3, 1, 5, 5]])
e = torch.tensor(([[[[4, 3, 2, 1], [9, 1, 2, 8], [2, 1, 7, 6], [1, 1, 1, 3]]]]))
print('Matrix multiplication: \n', torch.matmul(d, e))

Matrix multiplication: 
 tensor([[[[39, 12, 26, 49],
          [36, 20, 48, 56]]]])


### Problem 2: broadcasting in PyTorch
First read the broadcasting tutorial for Numpy and PyTorch:
- Numpy: https://numpy.org/doc/stable/user/basics.broadcasting.html
- PyTorch: https://pytorch.org/docs/stable/notes/broadcasting.html

Then finish the following problem:
- 2.1 Create a tensor of shape (3, 4) and add a 1D tensor (e.g., shape (4,)) to it. Explain the result using broadcasting.
- 2.2 Multiply a tensor of shape (5, 1) by a 1D tensor (e.g., shape (5,)). Describe how broadcasting affects the multiplication.
- 2.3 Create a tensor of shape (3, 4) and add a 2D tensor (e.g., shape (3, 1)) to it. Explain how broadcasting expands the dimensions for addition.
- 2.4 Multiply a 3D tensor of shape (4, 1, 3) by a 2D tensor of shape (1, 3). Describe the result and how broadcasting operates in this case.

### 2.1 Create a tensor of shape (3, 4) and add a 1D tensor (e.g., shape (4,)) to it. Explain the result using broadcasting.

In [12]:
t1 = torch.tensor(([[[4, 3, 2, 1], [2, 1, 7, 6], [1, 1, 1, 3]]]))
t2 = torch.tensor([2, 1, 2, 3])
print(t1 + t2) # or torch.add(t1, t2)

# The result is a (3, 4) tensor.
# The one dimensional tensor is stretched to match the size of the (3, 4) tensor by replicating into 3 rows so that the 
# element of each row of t2 is broadcasted to the element in the same position in t1 for addition.

tensor([[[6, 4, 4, 4],
         [4, 2, 9, 9],
         [3, 2, 3, 6]]])


### 2.2 Multiply a tensor of shape (5, 1) by a 1D tensor (e.g., shape (5,)). Describe how broadcasting affects the multiplication.

In [22]:
x = torch.ones(5, 1)
y = torch.randn(5, )

print('x = ', x)
print('y = ', y)
print('x * y = ', torch.mul(x, y)) # or (x * y)

# Here, our  result is a (5, 5) tensor.
# Pytorch identifies a backward compatibility because the tensors here which are broadcastable have different shapes 
# but the same number of elements. Each tensor stretched to match the number of rows/columns in the other tensor. Then 
# element-wise multiplication is performed.

x =  tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.]])
y =  tensor([-0.0335, -0.3700,  0.1075, -0.4375,  1.0039])
x * y =  tensor([[-0.0335, -0.3700,  0.1075, -0.4375,  1.0039],
        [-0.0335, -0.3700,  0.1075, -0.4375,  1.0039],
        [-0.0335, -0.3700,  0.1075, -0.4375,  1.0039],
        [-0.0335, -0.3700,  0.1075, -0.4375,  1.0039],
        [-0.0335, -0.3700,  0.1075, -0.4375,  1.0039]])


### 2.3 Create a tensor of shape (3, 4) and add a 2D tensor (e.g., shape (3, 1)) to it. Explain how broadcasting expands the dimensions for addition.

In [26]:
t3 = torch.randint(1, 4, (3, 4))
t4 = torch.randint(3, 6, (3, 1))

print('t3 = ', t3)
print('t4 = ', t4)
print('t3 + t4 = ', torch.add(t3, t4))

# The result is a (3, 4) tensor. These tenspors are braodcastable for addition because they have the same number of rows.
# Like 3.1 above, the 2D tensor is expanded in dimension column-wise to match the size of the (3, 4) tensor by replicating 
# into 4 columns so that the element of each row of t3 is broadcasted to the element in the same position in t4 for 
# element-wise addition.

t3 =  tensor([[1, 3, 1, 1],
        [2, 1, 1, 1],
        [3, 2, 3, 2]])
t4 =  tensor([[4],
        [5],
        [5]])
t3 + t4 =  tensor([[5, 7, 5, 5],
        [7, 6, 6, 6],
        [8, 7, 8, 7]])


### 2.4 Multiply a 3D tensor of shape (4, 1, 3) by a 2D tensor of shape (1, 3). Describe the result and how broadcasting operates in this case.

In [18]:
t5 = torch.randint(1, 4, (4, 1, 3))
t6 = torch.randint(3, 6, (1, 3))

print('t5 = ', t5)
print('t6 = ', t6)
print('t5 * t6 = ', torch.mul(t5, t6))

# The result is a 3D tensor of shape (4, 1, 3). First the two tensors are broadcastable because in the 1nd and 2rd trailing 
# dimensions, x size == y size.
# So broadcasting creates a first trailing dimension of same size as t5 for t6 so that t5 size = t6 size, and it does 
# element-wise multiplication.

t5 =  tensor([[[3, 3, 3]],

        [[1, 1, 2]],

        [[2, 3, 1]],

        [[1, 2, 1]]])
t6 =  tensor([[4, 3, 3]])
t5 * t6 =  tensor([[[12,  9,  9]],

        [[ 4,  3,  6]],

        [[ 8,  9,  3]],

        [[ 4,  6,  3]]])


### Problem 3:
The following code is taken from the lecture note week3-gradient-descent. It implements the gradient descent method for the linear regression problem.

Replace the part on the gradient descent method (i.e. the function gradient(X, y, w)) to mini-batch stochastic gradient descent method with different mini-batch size 16, 32, 64. Show the result (including the result w and the history of the loss during iteration).

In [38]:
# Generate some random data
N = 1024
X = 2 * np.random.rand(N, 1)
y = 4 + 3 * X + np.random.randn(N, 1) * 0.1

# Add a column of ones to the feature matrix for the bias term
X_b = np.hstack((X, np.ones((N, 1))))

# Initialize the weights to zeros
w = np.zeros((2, 1))

# Set the learning rate and number of iterations
alpha = 0.1
num_iters = 1000

# Define the gradient of the cost function
def gradient(X, y, w):
    m = X.shape[0]
    grad = (2 / m) * X.T.dot(X.dot(w) - y)
    return grad

# Define the MSE (loss) of the cost function
def loss_comp(X, y, w):
    m = X.shape[0]
    loss = np.mean((X.dot(w) - y) ** 2)
    return loss
    
# Define batch sizes loss history
batch_sizes = [16, 32, 64]
loss_history = {16: [], 32: [], 64: []}
    
# Run mini-batch stuchastic gradient descent to update the weights with the different batch sizes
for bs in batch_sizes:
    w = np.zeros((2, 1))
    for i in range(num_iters):
        grad = gradient(X_b, y, w)
        rearr = np.random.permutation(N)
        X_rearr = X_b[rearr]
        y_rearr = y[rearr]
        for j in range(0, N, bs):
            X_batch = X_rearr[j:j + bs]
            y_batch = y_rearr[j:j + bs]
            grad = gradient(X_batch, y_batch, w)
            w = w - alpha * grad
        # Compute and store loss
        loss = loss_comp(X_b, y, w)
        loss_history[bs].append(loss)
        if i % 100 == 0:
            print('Epoch:', i, ',', 'Loss:', loss, ',', 'Slope:', w[0][0], ',', 'Intercept:', w[1][0])
    # Print the learned parameters
    print()
    print("Batch size: ", bs)
    print("End loss: ", loss_history[bs][-1])
    print("Slope: ", w[0])
    print("Intercept: ", w[1])
    print()

Epoch:  0 , Loss:  0.012254493553822854 , Slope:  3.0802101495897873 , Intercept:  3.8959833219915425
Epoch:  100 , Loss:  0.009388981540211436 , Slope:  2.9896950642558955 , Intercept:  3.998711776732756
Epoch:  200 , Loss:  0.009377810474703223 , Slope:  2.9994885384892696 , Intercept:  4.0122366394333735
Epoch:  300 , Loss:  0.009251533476782375 , Slope:  2.9912436779647678 , Intercept:  4.006211334784177
Epoch:  400 , Loss:  0.009584644074174826 , Slope:  3.0005735263386826 , Intercept:  4.0180502488409235
Epoch:  500 , Loss:  0.009246480018580732 , Slope:  2.9917089529221004 , Intercept:  4.006630066649127
Epoch:  600 , Loss:  0.009420606839088802 , Slope:  3.0030083429304875 , Intercept:  4.009867711244844
Epoch:  700 , Loss:  0.009414944940874626 , Slope:  2.986453981350423 , Intercept:  4.001344761660903
Epoch:  800 , Loss:  0.009325816804288253 , Slope:  2.9960726402354654 , Intercept:  3.995212721613862
Epoch:  900 , Loss:  0.009246366775358974 , Slope:  2.9976719218922354 , 

***

1. How difficult do you think of the homework? Please give your marks between 1-5 (1 is the easiest and 5 is the most difficult).

Ans: 3

2. How difficult do you think of the lecture? Please give your marks between 1-5 (1 is the easiest and 5 is the most difficult).

Ans: 3
