# Gradient Descent

Gradient Descent is an optimization algorithm used to minimize a cost (loss) function by iteratively updating parameters in the opposite direction of the gradient.

## Steps:
1. Initialize parameters (weights)
2. Compute predictions using current parameters
3. Calculate loss (error)
4. Compute gradients (partial derivatives)
5. Update parameters
6. Repeat until convergence

## 1. Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")

## 2. Example: Linear Regression with Gradient Descent
We will fit a straight line $y = mx + c$ using gradient descent.

In [None]:
# Generate synthetic dataset
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

plt.scatter(X, y, alpha=0.7)
plt.xlabel("X")
plt.ylabel("y")
plt.title("Synthetic Data")
plt.show()

## 3. Gradient Descent Implementation

In [None]:
def gradient_descent(X, y, learning_rate=0.1, epochs=1000):
    m, n = X.shape
    X_b = np.c_[np.ones((m, 1)), X]  # add bias term
    
    theta = np.random.randn(n+1, 1)  # random initialization
    losses = []
    
    for epoch in range(epochs):
        gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
        theta -= learning_rate * gradients
        
        loss = np.mean((X_b.dot(theta) - y) ** 2)
        losses.append(loss)
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch}, Loss: {loss:.4f}")
    
    return theta, losses

## 4. Run Training

In [None]:
theta, losses = gradient_descent(X, y, learning_rate=0.1, epochs=1000)
print("Learned parameters:", theta.ravel())

plt.plot(losses)
plt.title("Loss over Epochs")
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")
plt.show()

## 5. Visualize Final Fit

In [None]:
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2,1)), X_new]
y_predict = X_new_b.dot(theta)

plt.scatter(X, y, alpha=0.7)
plt.plot(X_new, y_predict, "r-", linewidth=2)
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear Regression with Gradient Descent")
plt.show()

## 6. Gradient Descent Variants
- **Batch Gradient Descent**: Uses whole dataset per step (what we implemented)
- **Stochastic Gradient Descent (SGD)**: Uses one sample per step (faster, noisier)
- **Mini-Batch Gradient Descent**: Uses small random batches per step (balance of speed & accuracy)

## ✅ Summary
- Gradient Descent minimizes loss by iterative updates.
- Learning rate controls step size.
- Variants: Batch, Stochastic, Mini-Batch.
- It is the foundation of optimization in ML/DL.