# Univariate Linear Regression with Stochastic Gradient Descent from Scratch

This example demonstrates implementing univariate linear regression using Stochastic Gradient Descent (SGD) from scratch. We'll use Scikit-Learn's data generator, compute gradients manually, and iteratively optimize the model parameters. We'll explain each step in detail, focusing on the gradient computation.

## 1. Introduction to Gradient Descent
Gradient Descent is an optimization algorithm to minimize a loss function by iteratively updating the model parameters (weight 𝑤 and bias b).
- **Loss Function**: Mean Squared Error (MSE):

![mse loss](https://raw.githubusercontent.com/Ebimsv/Machine_Learning_Course/refs/heads/main/pics/MSE.png)

- Here, 𝑥_𝑖 is the feature value, y_i is the true value, and w, b are the model parameters.

- **Gradient Computation**: The gradients of the loss function w.r.t 𝑤 and b are derived as:

![Gradient computation](https://raw.githubusercontent.com/Ebimsv/Machine_Learning_Course/refs/heads/main/pics/Gradient_computation.png)

In Stochastic Gradient Descent, we compute these gradients for a single data point at a time:

![gradients_w_b](https://raw.githubusercontent.com/Ebimsv/Machine_Learning_Course/refs/heads/main/pics/gradients_w_b.png)

## 2. Importing Libraries

In [19]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

## 3. Generate and Prepare Dataset
We create a dataset with one feature (univariate), normalize it for faster convergence, and split it into training and testing sets.

In [20]:
# Generate a synthetic dataset (univariate)
X, y = make_regression(n_samples=200, n_features=1, noise=15, random_state=42)
X = X.flatten()  # Flatten X to 1D array

# Normalize X and y
X = (X - np.mean(X)) / np.std(X)
y = (y - np.mean(y)) / np.std(y)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 4. Initialize Parameters
Randomly initialize the model parameters w (weight) and b (bias).

In [21]:
w = np.random.randn()  # Random weight
b = np.random.randn()  # Random bias
learning_rate = 0.01  # Step size for updates
epochs = 1000  # Number of iterations

## 5. Define Stochastic Gradient Descent
We implement the SGD algorithm by iterating through the dataset, computing the gradients for w and b, and updating the parameters.

In [1]:
import numpy as np  
from sklearn.utils import shuffle as sklearn_shuffle  

def sgd(X, y, w, b, learning_rate, epochs):      
    pass

## 6. Train the Model
We train the model using the training dataset and store the final parameters and loss history.

## 7. Evaluate the Model
We evaluate the trained model on both the training and testing datasets by computing the Mean Squared Error (MSE).

## 8. Visualize the Results
We visualize the loss history and compare predicted vs actual values.