# Gradient Descent Method (First Principle) Demonstration In Python

## Background 
Linear regression is a fundamental machine learning algorithm used to model the relationship between a dependent variable (y) and independent variable(s) (x). Instead of using built-in libraries, we will implement linear regression from first principles using gradient descent optimization. This approach helps understand how machine learning algorithms work under the hood by:

- Defining a loss function (sum of squared errors)
- Computing gradients (partial derivatives)
- Iteratively updating parameters to minimize loss

Our objective is to find optimal parameters (b0, b1) for the equation y = b0 + b1x that best fit our data, and validate our implementation against statsmodels OLS to ensure correctness.

### Import Libraries

In [1]:
import numpy as np

#### Note : For this demonstration, we'll use a simple synthetic dataset with 10 data points (X from 1 to 10). This allows us to focus on understanding how gradient descent works without worrying about data complexities. The same principles apply to real-world datasets - you would just need to adjust the learning rate and possibly use feature scaling for faster convergence

In [2]:


X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2.1, 4.2, 5.8, 8.1, 10.3, 11.9, 14.2, 16.1, 17.8, 20.2])


# Initialize parameters
b0 = 0.0  # intercept
b1 = 0.0  # slope
learning_rate = 0.001  # Increased learning rate for faster convergence
iterations = 100000 # Increased iterations
n = len(X)  # number of data points

# Lists to store loss history
loss_history = []

# Gradient Descent with early stopping
tolerance = 1e-12  # Reduced tolerance for better convergence
previous_loss = float('inf')

for i in range(iterations):
    # Calculate predictions and errors
    y_pred = b0 + b1 * X
    ei = y - y_pred
    
    # Calculate loss (for tracking)
    loss = np.sum(ei ** 2)
    loss_history.append(loss)
    
    # Check if loss is minimized (converged)
    if abs(previous_loss - loss) < tolerance:
        print(f"\nConverged at iteration {i+1}!")
        print(f"Loss improvement: {abs(previous_loss - loss):.10f} < tolerance: {tolerance}")
        break
    
    previous_loss = loss
    
    # Calculate gradients (corrected with averaging)
    dL_db0 = (-2 / n) * np.sum(ei)
    dL_db1 = (-2 / n) * np.sum(ei * X)
    
    # Update parameters
    b0 = b0 - learning_rate * dL_db0
    b1 = b1 - learning_rate * dL_db1
    
    # Print progress every 100 iterations
    if (i + 1) % 1000 == 0:
        print(f"Iteration {i+1}: b0 = {b0:.4f}, b1 = {b1:.4f}, Loss = {loss:.4f}")

# Final results
print(f"\nFinal Parameters:")
print(f"b0 (intercept) = {b0:.4f}")
print(f"b1 (slope) = {b1:.4f}")

Iteration 1000: b0 = 0.2155, b1 = 1.9785, Loss = 0.3159
Iteration 2000: b0 = 0.1713, b1 = 1.9848, Loss = 0.2956
Iteration 3000: b0 = 0.1423, b1 = 1.9890, Loss = 0.2869
Iteration 4000: b0 = 0.1232, b1 = 1.9917, Loss = 0.2831
Iteration 5000: b0 = 0.1107, b1 = 1.9935, Loss = 0.2815
Iteration 6000: b0 = 0.1024, b1 = 1.9947, Loss = 0.2808
Iteration 7000: b0 = 0.0970, b1 = 1.9955, Loss = 0.2805
Iteration 8000: b0 = 0.0935, b1 = 1.9960, Loss = 0.2803
Iteration 9000: b0 = 0.0911, b1 = 1.9963, Loss = 0.2803
Iteration 10000: b0 = 0.0896, b1 = 1.9965, Loss = 0.2803
Iteration 11000: b0 = 0.0886, b1 = 1.9967, Loss = 0.2803
Iteration 12000: b0 = 0.0879, b1 = 1.9968, Loss = 0.2802
Iteration 13000: b0 = 0.0875, b1 = 1.9968, Loss = 0.2802
Iteration 14000: b0 = 0.0872, b1 = 1.9969, Loss = 0.2802
Iteration 15000: b0 = 0.0870, b1 = 1.9969, Loss = 0.2802
Iteration 16000: b0 = 0.0869, b1 = 1.9969, Loss = 0.2802
Iteration 17000: b0 = 0.0868, b1 = 1.9969, Loss = 0.2802
Iteration 18000: b0 = 0.0868, b1 = 1.997

### Optimum Results 

In [3]:
# Final results
print(f"\nFinal Parameters:")
print(f"b0 (intercept) = {b0:.4f}")
print(f"b1 (slope) = {b1:.4f}")


Final Parameters:
b0 (intercept) = 0.0867
b1 (slope) = 1.9970
