## Calculus for AI & ML

### 1. What is Calculus? (Why it matters in AI)
Calculus helps us understand:
- How outputs change when inputs change
- How to minimize loss/error in ML models

**In ML:**
- Training = minimizing a loss function
- Minimization uses derivatives (gradients)

**Placement takeaway:**
> Calculus = backbone of **optimization**, **gradient descent**, **backpropagation**

### 2. Functions (Core Foundation)

**Definition**

A **function** maps input → output  
\[
f(x) = y
\]

**In ML:**
- Model = function  
- Input = features  
- Output = prediction  

Example:
```text
y = f(x) = wx + b
```

**Must know:**
- Domain, range
- Linear vs non-linear functions

**Interview line:**
> "A machine learning model is essentially a function that maps input features to predictions."

### 3. Composite Functions (Function inside Function)
**Definition**

$(f \circ g)(x) = f(g(x))$

**In Neural Networks:**
- Each layer applies a function
- Output of one layer → input of next

**Example:**
```python
z = wx + b
a = σ(z)
```

**Why important:**
- Backpropagation uses chain rule
- Deep learning = nested composite functions

**Interview focus:**
> "Neural networks are compositions of functions optimized using chain rule."

### 4. Scalar Multiplication & Addition (Functions)
**Scalar Multiplication**

$g(x) = c \cdot f(x)$

**Addition**

$h(x) = f(x) + g(x)$

**In ML:**
- Scaling loss functions
- Combining multiple loss terms (regularization)

**Example:**
```python
Loss = MSE + λ * Regularization
```

**Must know:**
- Scaling affects learning speed, not direction
-Addition combines objectives

### 5. Scalar Multiplication & Addition (Inputs)
**Definition**
Transforming input before function:

$f(ax + b)$

**In ML:**
- Feature scaling
- Bias terms
- Normalization

**Example:**
```python
x_normalized = (x - mean) / std
```

**Practical importance:**
- Faster convergence
- Stable gradients

**Interview point:**
> "Feature scaling improves gradient descent convergence."

### 6. Differentiation (Heart of ML)
**Definition**
Derivative = rate of change

$\frac{dy}{dx}$

**Interpretation:**
- Slope of curve
- Direction to reduce loss

**In ML:**
- Gradient = derivative of loss wrt parameters
- Used in Gradient Descent

**Example:**
```python
Loss(w) → dLoss/dw
```

**Absolute must-know:**
- What derivative represents
- Why derivative = optimization tool

### 7. Differentiation Rules (Placement MUST)
**Power Rule**

$\frac{d}{dx}(x^n) = nx^{n-1}$

**Constant Rule**

$\frac{d}{dx}(c) = 0$

**Sum Rule**

$\frac{d}{dx}(f + g) = f' + g'$

**Chain Rule (MOST IMPORTANT)**

$\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$

**ML connection:**
- Backpropagation = repeated chain rule

**Interview warning:**
- If you don't understand chain rule → DL will feel magic

### 8. Finding Minima / Maxima (Optimization)
**Concept**
- Minimum = lowest loss
- Maximum = rarely used in ML

**Method:**
- Find derivative
- Set derivative = 0
- Check nature of point

$\frac{dL}{dw} = 0$

**In ML:**
- Training = find minimum loss
- Gradient descent approximates this

**Practical reality:**
- We don't solve analytically
- We use iterative updates

### 9. Gradient Descent (Practical Implementation)
**Update Rule**

$w = w - \alpha \cdot \frac{dL}{dw}$

**Where:**
- α = learning rate

**Intuition:**
- Move in direction of steepest decrease

**What you actually use in code:**
```python
w = w - lr * grad
```

**Interview note:**
- Learning rate too high → divergence
- Too low → slow convergence