# Gradient Descent, Regression, MLE and MSE
Sterling Hayden

## Parameter update rule for Simple Linear Regression using Gradient Descent


y = b<sub>0</sub> + b<sub>1</sub>x<sub>1</sub> + b<sub>2</sub>x<sub>2</sub> + ... + b<sub>J</sub>x<sub>J</sub>  
  
  
Where:
- y is the predicted output (dependent variable).
- x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>J</sub> are the J input features (independent variables).
- b<sub>0</sub> is the y-intercept.
- b<sub>1</sub>, b<sub>2</sub>, ..., b<sub>J</sub> are the weights associated with each feature.

**Step 1:** Define MSE    
MSE = 1/(2N) ∑<sub>i=1</sub><sup>N</sup> (y<sub>i</sub> - (b<sub>0</sub> + ∑<sub>j=1</sub><sup>J</sup> b<sub>j</sub> x<sub>ij</sub>)<sup>2</sup>  
  
Here:
- N is the number of data points.
- i indexes the data points.
- j indexes the features.

**Step 2:** Partial derivatives of MSE  
   
For b<sub>0</sub>:  
∂MSE/∂b<sub>0</sub> = (-1/N) Σ (y<sub>i</sub> - (b<sub>0</sub> + Σ b<sub>j</sub> x<sub>ij</sub>))  
  
For b<sub>j</sub>:  
∂MSE/∂b<sub>j</sub> = (-1/N) Σ x<sub>ij</sub>(y<sub>i</sub> - (b<sub>0</sub> + Σ b<sub>j</sub> x<sub>ij</sub>))  


**Step 3:** Update weights  
b<sub>0</sub> <- b<sub>0</sub> - ∂MSE/∂b<sub>0</sub>  
b<sub>j</sub> <- b<sub>j</sub> - ∂MSE/∂b<sub>j</sub>

**Step 4:** Repeat steps 2 and 3 until convergence

There are several libraries in Python that can perform linear regression, including gradient descent, and make it more convenient for us. One of the most commonly used libraries is scikit-learn. Here's how we can use scikit-learn to perform linear regression easily:

In [10]:
# imports for LR
from sklearn.linear_model import LinearRegression
import numpy as np

#  gen data
np.random.seed(0)
X = 2 * np.random.rand(100, 3)
y = 4 + 3*X[:, 0] + 2*X[:, 1] - 1.5*X[:, 2] + np.random.randn(100)

# Create LR model
model = LinearRegression()

# Fit model
model.fit(X, y)

# Get weights
intercept = model.intercept_
coefficients = model.coef_

print("Intercept (b0):", intercept)
print("Slope (b1):", coefficients[0])
print("Slope (b2):", coefficients[1])
print("Slope (b3):", coefficients[2])


Intercept (b0): 3.9502139024683567
Slope (b1): 2.7956623869503465
Slope (b2): 1.9899783163480775
Slope (b3): -1.3733845071909832


## Parameter update rule for Logistic Regression using Gradient Descent

## Proof that MSE is a special case of MLE