# 1. Why partial derivatives matter in ML
ML models have many parameters:
* weights
* biases
* feature coefficients
Partial derivatives answer:
    “How does the loss change if I change one parameter, keeping others fixed?”

That is the foundation of gradient descent.

# 2. Core Idea (Must Understand)

Given a function with multiple variables, the core concept of a partial derivative is to focus on one variable at a time.

$$f(x, y)$$

* $\frac{\partial f}{\partial x} \longrightarrow$ treat $y$ as **constant**
* $\frac{\partial f}{\partial y} \longrightarrow$ treat $x$ as **constant**

---

# 3. Basic Example (Given)

Given the function:
$$f(x, y) = x^2 + 3y$$

### Manual Derivatives

$$\frac{\partial f}{\partial x} = 2x$$

$$\frac{\partial f}{\partial y} = 3$$



In [2]:
def df_dx(x,y):
    return 2*x
def df_dy(x,y):
    return 3
print(df_dx(2,3),df_dy(2,5))

4 3


# 4. Gradient Vector (Very Important)

The **gradient** collects all partial derivatives into a single vector. 
$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix}$$

In [3]:
# Example:
def gradient(x,y):
    return np.array([2*x,3])

ML meaning:
* gradient points in direction of steepest increase
* gradient descent moves in opposite direction

#### 5. $$f(x,y) = x^{2}+xy+y^{2}$$

### Manual derivatives 
$$\frac{\partial f}{\partial x} = 2x+y$$
$$\frac{\partial f}{\partial y} = x+2y$$

In [4]:
def grad_f(x,y):
    df_dx = 2*x + y
    df_dy = x + 2*y
    return df_dx,df_dy
grad_f(1,2)

(4, 5)

### 6. ML Loss Function example
Mean Squeared Error(single data point):

$$L(w,b) = (wx+b-y)^{2}$$

Partial Derivatives
$$\frac{\partial L}{\partial w} = 2(wx+b−y)x$$

$$\frac{\partial L}{\partial b} = 2(wx+b−y) $$

In [7]:
def grad_mse(w,b,x,y):
    error = w*x+b-y
    dL_dw = 2*error*x
    dL_db = 2*error
    return dL_dw, dL_db
grad_mse(1,0,2,3)
# This is exactly what linear regression uses internally.

(-4, -2)

### 7. Numerical partial derivatives

In [19]:
def numerical_partial(f,x,y,var="x",h=1e-5):
    if var == 'x':
        return (f(x+h,y)-f(x-h,y))/(2*h)
    else:
        return (f(x,y+h)-f(x,y-h))/(2*h)    
    
def f(x,y):
    return x**2 - y**2

print(numerical_partial(f,1,2,'x'))

print(numerical_partial(f,1,2,'y'))

# Used in:
#     gradient checking
#     debugging ML models

2.0000000000131024
-4.000000000026205


### 8. Gradient descent with partial derivatives

In [21]:
w,b = 0,0
lr = 0.1
x,y =  2,4

for _ in range(10):
    dw,db = grad_mse(w,b,x,y)
    w -= lr*dw
    b -= lr*db
w,b
# Shows learning via partial derivatives

(1.6, 0.8)

# 9. Practice Exercises (Do These)

### Basic (Multivariate)

1.  **Compute partial derivatives of:**
    $$f(x, y) = x^2 + y^2$$
2.  **Evaluate gradient at $(1, 1)$** (using the partial derivatives from Q1).

### Intermediate (Multivariate)

3.  **Find partial derivatives of:**
    $$f(x, y) = 3x^2y + 2y$$
4.  **Verify using numerical approximation** (Compute the partial derivatives for $f(x, y) = 3x^2y + 2y$ using both the analytical method and a numerical method like the central difference formula, and compare the results).

### Advanced 
* 5. Derive gradients of MSE for multiple data points
* 6. Implement gradient descent manually
* 7. Visualize loss surface and gradient direction
* 8. Explain why gradients vanish at minima

## Understand:
* partial derivatives = parameter-wise sensitivity
* gradient = learning signal
* holding others constant = isolate effect