## __For n dimensions data__

- __Find the derivative of the following nth term__

# üìò Gradient Descent ‚Äì Mathematical Formulation (n-Dimensions)

---

## 1Ô∏è‚É£ Dataset (n-dimensional)

Assume:
- Number of features = **n**
- Input variables: $x_1, x_2, \dots, x_n$
- Output variable: $y$

Example (2 features):

| $x_1$ | $x_2$ | $y$ |
|-----|-----|-----|
| 8.1 | 93  | 3.2 |
| 7.5 | 95  | 3.5 |

---

## 2Ô∏è‚É£ Linear Regression Model

### Hypothesis Function

$
\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
$

Vector form:

$
\hat{y} = \boldsymbol{\beta}^T \mathbf{x}
$

Where:
- $\boldsymbol{\beta} = [\beta_0, \beta_1, \beta_2, \dots, \beta_n]$
- $\mathbf{x} = [1, x_1, x_2, \dots, x_n]$

---

## 3Ô∏è‚É£ Initialization

- Initialize parameters randomly:

$
\beta_0 = 0,\quad \beta_1 = 1,\quad \beta_2 = 1,\dots
$

- Set:
  - Epochs = 100
  - Learning rate ($\eta$) = 0.1

---

## 4Ô∏è‚É£ Loss Function (Mean Squared Error)

For $m$ data points:

$
L = \frac{1}{2} \sum_{i=1}^{m} (y_i - \hat{y}_i)^2
$

Substitute hypothesis:

$
L = \frac{1}{2} \sum_{i=1}^{m}
\left(y_i - (\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2})\right)^2
$

For two samples:

$
L = \frac{1}{2}
\Big[
(y_1 - \hat{y}_1)^2 + (y_2 - \hat{y}_2)^2
\Big]
$

---

## 5Ô∏è‚É£ Partial Derivatives (Gradients)

### üîπ Gradient w.r.t. Bias ($\beta_0$)

$
\frac{\partial L}{\partial \beta_0}
= \frac{1}{2}
\Big[
2(y_1 - \hat{y}_1)(-1) + 2(y_2 - \hat{y}_2)(-1)
\Big]
$

Simplified:

$
\frac{\partial L}{\partial \beta_0}
= - \sum_{i=1}^{m} (y_i - \hat{y}_i)
$

---

### üîπ Gradient w.r.t. Weight ($\beta_1$)

$
\frac{\partial L}{\partial \beta_1}
= \frac{1}{2}
\Big[
2(y_1 - \hat{y}_1)(-x_{11})
+ 2(y_2 - \hat{y}_2)(-x_{21})
\Big]
$

General form:

$
\frac{\partial L}{\partial \beta_1}
= - \sum_{i=1}^{m} (y_i - \hat{y}_i)x_{i1}
$

---

### üîπ Gradient w.r.t. Weight ($\beta_2$)

$
\frac{\partial L}{\partial \beta_2}
= - \sum_{i=1}^{m} (y_i - \hat{y}_i)x_{i2}
$

---

## 6Ô∏è‚É£ General Gradient Formula (n Features)

For any parameter $\beta_j$:

$
\frac{\partial L}{\partial \beta_j}
= - \sum_{i=1}^{m} (y_i - \hat{y}_i)x_{ij}
$

Where:
- $x_{ij}$ = value of $j^{th}$ feature in $i^{th}$ sample

---

## 7Ô∏è‚É£ Gradient Descent Update Rule

### Parameter Update

$
\beta_j^{(new)} =
\beta_j^{(old)} - \eta \frac{\partial L}{\partial \beta_j}
$

So:

$
\beta_0 = \beta_0 - \eta \cdot \text{slope}
$

$
\beta_1 = \beta_1 - \eta \cdot \text{slope}
$

$
\beta_2 = \beta_2 - \eta \cdot \text{slope}
$

---

## 8Ô∏è‚É£ Dimensional Insight

- Features ‚Üí **n**
- Parameters ‚Üí **n + 1**
- Loss surface ‚Üí **(n + 1)-dimensional**
- Gradient descent searches minimum in **high-dimensional space**

---

## 9Ô∏è‚É£ Key Exam Notes

- Gradient = direction of steepest increase
- Gradient Descent moves in **opposite direction**
- Learning rate controls step size
- Too large $\eta$ ‚Üí divergence
- Too small $\eta$ ‚Üí slow convergence

---

## ‚úÖ One-Line Summary

Gradient Descent in n-dimensions minimizes the loss function by iteratively updating all parameters using partial derivatives with respect to each parameter.


## Gradient Descent Diagram


![Gradient Descent Diagram](Screen_shot/Screenshot%202026-01-14%20211955.png)
![Gradient Descent Diagram](Screen_shot/Screenshot%202026-01-14%20213325.png)
![Gradient Descent Diagram](Screen_shot/Screenshot%202026-01-14%20222421.png)
![Gradient Descent Diagram](Screen_shot/Screenshot%202026-01-14%20223307.png)



In [25]:
from sklearn.datasets import load_diabetes
import numpy as np

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [26]:
x,y = load_diabetes(return_X_y=True)
print(x.shape)
print(y.shape)

(442, 10)
(442,)


In [27]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=2)

In [28]:
x_train

array([[-0.00188202, -0.04464164, -0.06979687, ..., -0.03949338,
        -0.06291688,  0.04034337],
       [-0.00914709, -0.04464164,  0.01103904, ..., -0.03949338,
         0.01703607, -0.0052198 ],
       [ 0.02354575,  0.05068012, -0.02021751, ..., -0.03949338,
        -0.09643495, -0.01764613],
       ...,
       [ 0.06350368,  0.05068012, -0.00405033, ..., -0.00259226,
         0.08449153, -0.01764613],
       [-0.05273755,  0.05068012, -0.01806189, ...,  0.1081111 ,
         0.03606033, -0.04249877],
       [ 0.00175052,  0.05068012,  0.05954058, ...,  0.1081111 ,
         0.06898589,  0.12732762]])

In [29]:
reg = LinearRegression()
reg.fit(x_train,y_train)

In [30]:
print(reg.coef_)
print(reg.intercept_)

[  -9.15865318 -205.45432163  516.69374454  340.61999905 -895.5520019
  561.22067904  153.89310954  126.73139688  861.12700152   52.42112238]
151.88331005254167


In [31]:
y_pred = reg.predict(x_test)
r2_score(y_test,y_pred)

0.4399338661568968

In [38]:
class GDRegressor:
    def __init__(self,learning_rate = 0.01 , epochs = 100):
        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs

    def fit(self,x_train,y_train):
        # init your coef
        self.intercept_ = 0
        # x_train ka shape batayega ki coef_ kitne hai
        self.coef_ = np.ones(x_train.shape[1])
        # print(self.intercept_,self.coef_)
        for i in range(self.epochs):
            y_hat = np.dot(x_train,self.coef_) + self.intercept_
            
            intercept_der = -2 * np.mean(y_train - y_hat)
            self.intercept_ = self.intercept_ - (self.lr * intercept_der)

            coef_der  = -2 * np.dot((y_train - y_hat),x_train)/x_train.shape[0]
            self.coef_ = self.coef_ - (self.lr * coef_der)

    def predict(self,x_test):
        return np.dot(x_test,self.coef_) + self.intercept_

![Gradient Descent Diagram](Screen_shot/Screenshot%202026-01-14%20230631.png)

In [39]:
gdr = GDRegressor(epochs=10)


In [36]:
gdr.fit(x_train,y_train)

In [41]:
y_pred = gdr.predict(x_test)

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'