 # Linear Regression: Analytic Solution, Full-Batch Gradient Descent, and Stochastic Gradient Descent

This analysis will implement Linear Regression using three methods: the Analytic Solution, Full-Batch Gradient Descent, and Stochastic Gradient Descent. We will calculate the regression coefficients and evaluate the model's performance using Sum Squared Error (SSE) and R-squared (R²) values.

In [3]:
import numpy as np
import matplotlib.pyplot as plt


In [5]:
# Dataset
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

Using the manually calculated coefficients 
𝑏0 = 1.235
b1 = 1.17

In [8]:
# Coefficients
b0 = 1.235
b1 = 1.17

# Predict y values
y_pred_manual = b0 + b1 * x


Calculating the Sum Squared Error (SSE) and R-squared (R²) for these predictions.

In [15]:
# Calculate Sum Squared Error (SSE)
SSE_manual = np.sum((y - y_pred_manual) ** 2)

# Calculate R-squared (R²)
SS_total = np.sum((y - np.mean(y)) ** 2)
R2_manual = 1 - (SSE_manual / SS_total)
print(f" R-squared: {R2_manual}")
print(f" Sum Squared Error: {SSE_manual}")

 R-squared: 0.9525379746835443
 Sum Squared Error: 5.624249999999998


### Full-Batch Gradient Descent

We initialize the parameters (coefficients) to zero. The learning rate controls how big a step we take in the direction of the gradient at each iteration. The number of iterations specifies how many times we update the coefficients.


In [26]:
# Initialize coefficients
theta_gd = np.zeros(2)  # Initialize theta (b0 and b1) to zero
learning_rate = 0.01  # Set the learning rate
num_iterations = 1000  # Set the number of iterations

In [28]:
# Add a column of ones to x for the intercept term (bias)
X = np.column_stack((np.ones(x.shape[0]), x))

In this step, we perform the gradient descent algorithm, where we update the coefficients iteratively by moving in the direction of the negative gradient of the cost function.

In [31]:
# Perform gradient descent
for _ in range(num_iterations):
    # Calculate the gradient of the cost function
    gradients = (X.T @ (X @ theta_gd - y)) / len(y)
    # Update the coefficients by taking a step proportional to the learning rate
    theta_gd -= learning_rate * gradients


Once we have the final coefficients from gradient descent, we make predictions and calculate the Sum Squared Error (SSE) and R-squared (R²) to evaluate the model's performance.

In [38]:
# Predict y values using the final coefficients
y_pred_gd = X @ theta_gd

# Calculate SSE to measure the total deviation of the predictions from the actual values
SSE_gd = np.sum((y - y_pred_gd) ** 2)

# Calculate R-squared (R²) to measure how well the model explains the variance in the data
R2_gd = 1 - (SSE_gd / SS_total)



In [40]:
# Output the final coefficients, SSE, and R²
print(f"Final coefficients (b0, b1): {theta_gd}")
print(f"Sum Squared Error (SSE): {SSE_gd}")
print(f"R-squared (R²): {R2_gd}")

Final coefficients (b0, b1): [1.17580361 1.17935476]
Sum Squared Error (SSE): 5.634861529064237
R-squared (R²): 0.9524484259150697


### Stochastic Gradient Descent

In [60]:
# Initialize coefficients
theta_sgd = np.zeros(2)

In [68]:
# Perform stochastic gradient descent
for _ in range(num_iterations):
    i = np.random.randint(0, len(y))
    gradients = (X[i].reshape(1, -1).T @ (X[i].reshape(1, -1) @ theta_sgd - y[i])).reshape(2,)
    theta_sgd -= learning_rate * gradients


In [72]:
# Predict y values
y_pred_sgd = X @ theta_sgd

# Calculate SSE and R²
SSE_sgd = np.sum((y - y_pred_sgd) ** 2)
R2_sgd = 1 - (SSE_sgd / SS_total)
print(f"Final coefficients (b0, b1): {theta_sgd}")
print(f"Sum Squared Error (SSE): {SSE_sgd}")
print(f"R-squared (R²): {R2_sgd}")

Final coefficients (b0, b1): [1.11041811 1.26582508]
Sum Squared Error (SSE): 7.326818650335652
R-squared (R²): 0.9381703067482223
