Approaching optimization in a linear regression model  
The optimization goal is to find parameters that minimize the Mean Squared Error (MSE) loss  

* Goal: MinL(w, b)  

Analytical (closed-formm) solution  
w=(XTX)−1XTy

Numerical (iterative) optimization - because of huge datasets, regularization terms. use gradient based optimization

In [None]:
import numpy as np

# Data
X = np.random.randn(100, 3)
y = X @ np.array([2, -1, 0.5]) + 1.0 + 0.1 * np.random.randn(100)

"""
`The true model` data is
y= 2x1​−1x2​+0.5x3​+1+noise

"""

# Initialize params
w = np.zeros(3)
b = 0.0
lr = 0.01

# Gradient Descent
for epoch in range(1000):
    y_pred = X @ w + b
    loss = np.mean((y_pred - y)**2)
    grad_w = (2 / len(y)) * X.T @ (y_pred - y)
    grad_b = (2 / len(y)) * np.sum(y_pred - y)
    w -= lr * grad_w
    b -= lr * grad_b

"""
- grad_w is the gradient of loss wrt weights ∇𝑤𝐿
- grad_b is the gradient wrt bias ∇𝑏𝐿

"""

print("Optimized weights:", w)
print("Optimized bias:", b)


Optimized weights: [ 1.98678917 -1.00814136  0.48887382]
Optimized bias: 0.9932611969247721


Regularization  
L2 Regularization (Ridge Regression): Adds a penalty proportional to the square of the weights. Discourages large weight values.  
L1 Regularization (Lasso Regression): Adds a penalty to the absolute value of the weights. Lasso automatically drops irrelevant features by setting their weights to 0.