# Assignment 2 

**Total marks: 10 (This assignment total to 20, we will overall scale by a factor of 0.5)**

> ## Task 1 : Ascending the Gradient Descent [6 marks]

> Use the below dataset for Task 1: 
``` python
np.random.seed(45)
num_samples = 40
    
# Generate data
x1 = np.random.uniform(-1, 1, num_samples)
f_x = 3*x1 + 4
eps = np.random.randn(num_samples)
y = f_x + eps
```

> 1. Use ```torch.autograd``` to find the true gradient on the above dataset using linear regression (in the form $\theta_1x + \theta_0$) for any given values of $(\theta_0,\theta_1)$. **[1 mark]**


In [1]:
import numpy as np
import torch

# Set seed and generate data
np.random.seed(45)
num_samples = 40

# Generate data
x1 = np.random.uniform(-1, 1, num_samples)
f_x = 3 * x1 + 4
eps = np.random.randn(num_samples)
y = f_x + eps

# Convert data to PyTorch tensors
x1_tensor = torch.tensor(x1, dtype=torch.float32).view(-1, 1)
y_tensor = torch.tensor(y, dtype=torch.float32).view(-1, 1)

# Define the linear regression model
def linear_regression(x, theta_0, theta_1):
    return theta_1 * x + theta_0

# Initialize parameters
theta_0 = torch.tensor(0.0, requires_grad=True)  # Intercept
theta_1 = torch.tensor(0.0, requires_grad=True)  # Slope

# Forward pass
y_pred = linear_regression(x1_tensor, theta_0, theta_1)

# Define the loss (Mean Squared Error)
loss = torch.mean((y_pred - y_tensor) ** 2)

# Backward pass to compute gradients
loss.backward()

# Print gradients
print(f"Gradient with respect to theta_0: {theta_0.grad.item()}")
print(f"Gradient with respect to theta_1: {theta_1.grad.item()}")


Gradient with respect to theta_0: -7.447054386138916
Gradient with respect to theta_1: -1.0253016948699951


> 2. Using the same $(\theta_0,\theta_1)$ as above, calculate the stochastic gradient for all points in the dataset. Then, find the average of all those gradients and show that the stochastic gradient is a good estimate of the true gradient.  **[1 mark]**


In [3]:
# Dataset generation
def get_dataset():
    np.random.seed(45)
    num_samples = 40
    x1 = np.random.uniform(-1, 1, num_samples)
    f_x = 3 * x1 + 4
    eps = np.random.randn(num_samples)
    y = f_x + eps

    x_train = torch.tensor(x1, dtype=torch.float32).unsqueeze(1)
    y_train = torch.tensor(y, dtype=torch.float32).unsqueeze(1)

    return x_train, y_train

x_train, y_train = get_dataset()

In [4]:
import torch

# Initialize global parameters with requires_grad=True
theta_0 = torch.randn(1, requires_grad=True, dtype=torch.float32)
theta_1 = torch.randn(1, requires_grad=True, dtype=torch.float32)

# Define the linear regression model
def linear_regression_model(x):
    return theta_1 * x + theta_0

# Define the MSE loss function
def mse_loss(y_true, y_pred):
    return torch.mean((y_pred - y_true) ** 2)

# Compute true gradients
y_pred = linear_regression_model(x_train)
loss = mse_loss(y_train, y_pred)
loss.backward()

true_grad_values = (theta_0.grad.item(), theta_1.grad.item())

# Stochastic gradient function
def stochastic_gradient(x_train, y_train):
    stoch_grad_theta_0 = []
    stoch_grad_theta_1 = []
    
    # Use the same global parameters
    global theta_0, theta_1

    for i in range(x_train.shape[0]):
        # Create single data point tensors
        x_single = x_train[i].unsqueeze(0)  
        y_single = y_train[i].unsqueeze(0)  # Add batch dimension

        # Forward pass
        y_pred_i = linear_regression_model(x_single)
        loss = mse_loss(y_single, y_pred_i)
        
        # Zero gradients
        if theta_0.grad is not None:
            theta_0.grad.data.zero_()
        if theta_1.grad is not None:
            theta_1.grad.data.zero_()
        
        # Backward pass
        loss.backward()

        # Store gradients
        stoch_grad_theta_0.append(theta_0.grad.item())
        stoch_grad_theta_1.append(theta_1.grad.item())
    
    avg_theta_0 = sum(stoch_grad_theta_0) / len(stoch_grad_theta_0)
    avg_theta_1 = sum(stoch_grad_theta_1) / len(stoch_grad_theta_1)
    
    return avg_theta_0, avg_theta_1

# Compute stochastic gradients
stochastic_grad_values = stochastic_gradient(x_train, y_train)

#Results
print(f"True Grad Values - θ0: {true_grad_values[0]}, θ1: {true_grad_values[1]}")
print(f"Stochastic Grad Values - θ0: {stochastic_grad_values[0]}, θ1: {stochastic_grad_values[1]}")
  

True Grad Values - θ0: -8.68634033203125, θ1: -0.6147982478141785
Stochastic Grad Values - θ0: -8.686340588331223, θ1: -0.6147983506321907


```
  As we can see from the values of true gradient and sctochastic gradient values they are nearly same upto 6 decimal places.
  So we can estimate that stochastic gradient is nearly a good estimate of true gradient.
```