# **Linear Least Squares using Gradient Descent**

This notebook demonstrates the implementation of Linear Least Squares (LLS) using the Gradient Descent optimization algorithm. The objective is to find the optimal parameters (weights and bias) for a linear model that best fits a given dataset by minimizing the sum of squared errors.

# **1. Theoretical Background**

$$ \hat{y}_i = x_i \cdot w + b $$

Substituting the linear model into the error function, we get:

$$E_i = \frac{1}{2}(y_i - (x_i \cdot w + b))^2 $$

To minimize this error using Gradient Descent, we need to compute the partial derivatives of $E_i$ with respect to the bias ($b$) and each weight ($w^j$).

The derivative of $E_i$ with respect to $b$ is:

$$ \frac{\delta E_i}{\delta b} = -(y_i - (x_i \cdot w + b)) = (\hat{y}_i - y_i)$$

The derivative of $E_i$ with respect to each weight $w^j$ (corresponding to feature $x^j_i$) is:

$$\frac{\delta E_i}{\delta w^j} = -(y_i - (x_i \cdot w + b)) \cdot x^j_i = (\hat{y}_i - y_i) \cdot x^j_i $$

In the Gradient Descent update rule, we move in the opposite direction of the gradient. Thus, the updates for parameters will be:

$$ w \leftarrow w - \eta \frac{\delta E_i}{\delta w^j} = w - \eta (\hat{y}_i - y_i) \cdot x^j_i $$
$$ b \leftarrow b - \eta \frac{\delta E_i}{\delta b} = b - \eta (\hat{y}_i - y_i) $$

where $\eta$ is the learning rate.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # Included for consistency, though not directly used in 1D/2D LLS

np.random.seed(42) # Set random seed for reproducibility


# **2. Parameters and Data Generation**

This section defines the key parameters for the simulation, including the number of input features, observations, learning rate, and iterations. It then generates a synthetic dataset based on a true linear relationship with added noise, which will be used for training the LLS model.

In [ ]:
n = 1      # Number of input features (dimensions of x). Currently set for 1D input.
m = 100    # Number of observations (data points)
eta = 0.01 # Learning rate for Gradient Descent
num_iters = 100 # Number of iterations (epochs) for Gradient Descent
noise_level = 0.1 # Standard deviation of Gaussian noise added to the true y values

# Generate random input data X. Each row is an observation, each column is a feature.
X = np.random.rand(m, n)

# Define the 'true' underlying weights (w_true) and bias (b_true)
# Corrected: Define b_true and w_true with positive, sequential values for clarity
b_true = 1.0 # True bias term
w_true = np.arange(1, n + 1).reshape(-1, 1) # True weights for features: [1, 2, ..., n]

# Combine b_true and w_true into a single vector for the design matrix multiplication
w_true_full = np.vstack((np.array([[b_true]]), w_true))

print(f"True bias (b_true): {b_true:.4f}")
print(f"True weights (w_true): {w_true.T}")

# Construct the design matrix A by adding a column of ones for the bias term
A = np.hstack((np.ones((m, 1)), X))

# Compute the 'true' y values using the linear model and add Gaussian noise
y = A @ w_true_full + noise_level * np.random.randn(m, 1)


# **3. Linear Least Squares by Gradient Descent Function**

This function `LLS_by_GD` implements the core Gradient Descent algorithm for Linear Least Squares. It iteratively updates the model parameters (weights `w` and bias `b`) by moving in the direction opposite to the gradient of the error function. The process continues for a specified number of iterations, and the average error at each iteration is recorded.

In [ ]:
def LLS_by_GD(X, y, eta, num_iters):
    """
    Performs Linear Least Squares regression using Gradient Descent.

    Args:
        X (np.array): Input features (N_observations, N_features).
        y (np.array): True output values (N_observations, 1).
        eta (float): The learning rate.
        num_iters (int): The number of iterations (epochs) to perform.

    Returns:
        tuple: A tuple containing:
            - w (np.array): Learned weights (N_features, 1).
            - b (float): Learned bias.
            - avg_E_at_each_iteration (np.array): History of average errors per iteration.
    """
    N, D = X.shape # N: number of observations, D: number of features

    # Initialize weights (w) and bias (b) to zeros
    w = np.zeros((D, 1)) # Weights as a column vector
    b = 0.0              # Bias as a scalar

    # Initialize array to store the average error at each iteration
    avg_E_at_each_iteration = np.zeros(num_iters)

    # Gradient Descent main loop (iterates over epochs)
    for k in range(num_iters):
        # Initialize total error for the current epoch
        E_current_epoch = 0.0

        # Iterate over each training example (Stochastic Gradient Descent update)
        for i in range(N):
            # Predict the output for the current example using the current w and b
            # X[i, :].reshape(1, -1) ensures X[i,:] is a row vector for dot product
            y_pred = X[i, :].reshape(1, -1) @ w + b
            
            # Calculate the residual (prediction error: predicted - actual)
            residual = y_pred - y[i] # This is a scalar value
            
            # Accumulate the squared error for this example for epoch error calculation
            E_current_epoch += 0.5 * residual**2
            
            # Compute the gradients for weights and bias
            # gradient_w: (residual * X[i,:]) should be (D, 1) to match w's shape
            gradient_w = residual * X[i, :].reshape(-1, 1)
            # gradient_b: (residual) is scalar
            gradient_b = residual
            
            # Update parameters (w and b) using the Gradient Descent rule
            # Parameters are moved in the opposite direction of the gradient
            w = w - eta * gradient_w
            b = b - eta * gradient_b

        # Store the average error for the current epoch
        avg_E_at_each_iteration[k] = E_current_epoch / N

    return w, b, avg_E_at_each_iteration


# **4. Model Training and Visualization**

This section executes the `LLS_by_GD` function to train the linear model on the generated data. It then visualizes the results by plotting the original data points along with the learned regression line (for 1D input) or plane (for 2D input). Finally, it displays the learned weights and bias, and plots the evolution of the average error over the training iterations, demonstrating the convergence of the Gradient Descent algorithm.

In [ ]:
# Fit a linear model to the data using Gradient Descent
w_learned, b_learned, E_history = LLS_by_GD(X, y, eta, num_iters)

# Plotting the data and the fitted hyperplane/line based on the number of features (n)
plt.figure(figsize=(10, 7))

if n == 1:
    # For 1D input, plot scatter points and the regression line
    plt.scatter(X[:, 0], y, label='Original Data', color='blue', alpha=0.7)
    
    # Generate points for the fitted line
    x_line = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
    y_line = x_line @ w_learned + b_learned
    plt.plot(x_line, y_line, color='red', linewidth=2, label='Fitted Regression Line')
    
    plt.xlabel('Feature X')
    plt.ylabel('Output Y')
    plt.title('Linear Regression with Gradient Descent (1D Input)')
    plt.legend()
    plt.grid(True)
    plt.show()

elif n == 2:
    # For 2D input, a 3D plot would be required (e.g., scatter3 and surf)
    # This part is a placeholder as per the original MATLAB script's 'error TBC'
    print("Plotting for n=2 is not implemented in this version (requires 3D visualization).")
    # Example structure for 3D plot (requires more detailed implementation):
    # fig = plt.figure()
    # ax = fig.add_subplot(111, projection='3d')
    # ax.scatter(X[:, 0], X[:, 1], y, label='Original Data')
    # ... define grid for surface and plot ...
    # plt.show()

else:
    # For higher dimensions, visualization is not straightforward
    print(f"Visualization for n={n} features is not supported in this script.")

# Print the learned weights and bias
print(f"\nLearned weights (w): {w_learned.T}")
print(f"Learned bias (b): {b_learned:.4f}")

# Plot the cost function (average error) values along the iterations
plt.figure(figsize=(10, 6))
plt.plot(range(1, num_iters + 1), E_history, color='green', linewidth=2)
plt.xlabel('Iteration Number')
plt.ylabel('Average Error (E)')
plt.title('Average Error vs. Iteration in Gradient Descent')
plt.grid(True)
plt.show()
