# **1. Limits & Continuity**

---

## 1. **Limit**

### **Definition**

The **limit** of $f(x)$ as $x$ approaches $a$ is:

$$
\lim_{x \to a} f(x) = L
$$

if $f(x)$ gets arbitrarily close to $L$ when $x$ gets close to $a$.

---

### **Types of Limits**

* **Finite:** $\lim_{x \to 2} (3x+1) = 7$
* **Infinite:** $\lim_{x \to \infty} \frac{1}{x} = 0$
* **One-sided:**

  * Left-hand limit: $\lim_{x \to a^-} f(x)$
  * Right-hand limit: $\lim_{x \to a^+} f(x)$

---

### **Basic Limit Rules**

If $\lim f(x) = A$ and $\lim g(x) = B$:

* Sum: $\lim (f+g) = A+B$
* Product: $\lim (f \cdot g) = A \cdot B$
* Quotient: $\lim \frac{f}{g} = \frac{A}{B}$, if $B \neq 0$

---

### **Special Limits**

* $\lim_{x \to 0} \frac{\sin x}{x} = 1$
* $\lim_{x \to \infty} \left(1 + \frac{1}{x}\right)^x = e$

✔️ These appear in derivatives and ML optimization formulas.

---

## 2. **Continuity**

### **Definition**

A function $f(x)$ is **continuous at $x=a$** if:

1. $f(a)$ is defined.
2. $\lim_{x \to a} f(x)$ exists.
3. $\lim_{x \to a} f(x) = f(a)$.

---

### **Types of Discontinuities**

* **Removable:** “Hole” in the graph (e.g., $\frac{x^2-1}{x-1}$ at $x=1$).
* **Jump:** Different left and right limits.
* **Infinite:** Goes to $\pm \infty$ near a point (vertical asymptote).

---

### **Example**

1. $f(x) = x^2$ → continuous everywhere.
2. $f(x) = \frac{1}{x}$ → discontinuous at $x=0$.

---

## ⚡ ML Intuition

* **Gradients require limits** (derivative is defined using limits).
* **Continuity ensures smooth optimization** → gradient descent works well.
* **Discontinuous functions** → harder to optimize (non-differentiable points).

---

## Quick Summary

| Concept       | Meaning                               | ML Connection                  |
| ------------- | ------------------------------------- | ------------------------------ |
| Limit         | What function approaches near a point | Foundation of derivatives      |
| Continuity    | No jumps, breaks, holes               | Needed for smooth optimization |
| Discontinuity | Function undefined / jumps            | Non-smooth loss functions      |

---
---
---

# **2. Differentiation Rules**

---

## 1. **Definition of Derivative**

$$
f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}
$$

👉 Measures the **instantaneous rate of change** (slope).

---

## 2. **Basic Rules**

* **Constant Rule:**

$$
\frac{d}{dx}[c] = 0
$$

* **Power Rule:**

$$
\frac{d}{dx}[x^n] = nx^{n-1}
$$

* **Constant Multiple:**

$$
\frac{d}{dx}[c f(x)] = c f'(x)
$$

* **Sum/Difference:**

$$
\frac{d}{dx}[f(x) \pm g(x)] = f'(x) \pm g'(x)
$$

---

## 3. **Product Rule**

If $h(x) = f(x) g(x)$:

$$
h'(x) = f'(x)g(x) + f(x)g'(x)
$$

**Example:**
$\frac{d}{dx}[x e^x] = 1 \cdot e^x + x \cdot e^x = (1+x)e^x$

✔️ **ML Use:** Loss functions with multiple terms (e.g., $w \cdot x$).

---

## 4. **Quotient Rule**

If $h(x) = \frac{f(x)}{g(x)}$:

$$
h'(x) = \frac{f'(x)g(x) - f(x)g'(x)}{(g(x))^2}, \quad g(x) \neq 0
$$

**Example:**
$\frac{d}{dx}\left(\frac{x^2}{x+1}\right) = \frac{2x(x+1) - x^2(1)}{(x+1)^2} = \frac{x^2+2x}{(x+1)^2}$

✔️ **ML Use:** Regularization terms like $\frac{1}{1+e^{-x}}$ (sigmoid).

---

## 5. **Chain Rule**

If $y = f(g(x))$:

$$
\frac{dy}{dx} = f'(g(x)) \cdot g'(x)
$$

**Example:**
$\frac{d}{dx}[\sin(x^2)] = \cos(x^2) \cdot 2x$

✔️ **ML Use:** Backpropagation is just repeated application of chain rule!

---

## 6. **Derivatives of Common Functions**

* $\frac{d}{dx}[e^x] = e^x$
* $\frac{d}{dx}[\ln x] = \frac{1}{x}$
* $\frac{d}{dx}[\sin x] = \cos x$
* $\frac{d}{dx}[\cos x] = -\sin x$
* $\frac{d}{dx}[\tanh x] = 1 - \tanh^2 x$

✔️ Sigmoid derivative:

$$
\sigma(x) = \frac{1}{1+e^{-x}}, \quad \sigma'(x) = \sigma(x)(1-\sigma(x))
$$

(super important in neural nets).

---

## ⚡ Quick ML Connection

| Rule                        | Use in ML                          |
| --------------------------- | ---------------------------------- |
| Product Rule                | Weight × input terms               |
| Quotient Rule               | Softmax, sigmoid derivatives       |
| Chain Rule                  | Backpropagation in deep learning   |
| Power Rule                  | Polynomial loss, regularization    |
| Exponential/Log derivatives | Logistic regression, cross-entropy |

---
---
---


# **3. Gradient = Vector of Partial Derivatives**

---

## 1. **Partial Derivative**

For a multivariable function $f(x_1, x_2, \dots, x_n)$:

The **partial derivative w\.r.t. $x_i$** is:

$$
\frac{\partial f}{\partial x_i} = \lim_{h \to 0} \frac{f(x_1, ..., x_i+h, ..., x_n) - f(x_1, ..., x_i, ..., x_n)}{h}
$$

👉 It measures how $f$ changes if only $x_i$ changes, keeping others fixed.

---

### **Example**

$$
f(x,y) = x^2y + 3y
$$

* $\frac{\partial f}{\partial x} = 2xy$
* $\frac{\partial f}{\partial y} = x^2 + 3$

---

## 2. **Gradient**

The **gradient** of $f(x_1, x_2, ..., x_n)$ is the **vector of partial derivatives**:

$$
\nabla f(x) = \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n} \right]^T
$$

👉 Gradient points in the **direction of steepest increase** of the function.

---

### **Example**

For $f(x,y) = x^2 + y^2$:

$$
\nabla f(x,y) = [2x, \; 2y]^T
$$

At point (1,1): $\nabla f = [2, 2]$.

Geometric meaning: function increases fastest in direction $[2,2]$.

---

## 3. **Gradient in ML**

* Loss function $L(w)$ depends on **parameters (weights)** $w$.
* Gradient = tells us how to adjust each weight to minimize loss.

**Gradient Descent Update Rule:**

$$
w := w - \eta \nabla L(w)
$$

where $\eta$ = learning rate.

---

### **ML Examples**

* **Linear Regression**: Gradient w\.r.t. weights gives Normal Equation or iterative updates.
* **Logistic Regression**: Gradient involves sigmoid derivative.
* **Neural Networks**: Gradients flow back using **backpropagation** (chain rule + gradients).

---

## 4. **Hessian (Second Derivative Generalization)**

* Hessian = matrix of second partial derivatives.
* Tells about curvature (convex, concave, saddle).
* Used in advanced optimization (Newton’s method).

---

## ⚡ Quick Summary

| Concept            | Formula                                                                | ML Connection                 |
| ------------------ | ---------------------------------------------------------------------- | ----------------------------- |
| Partial derivative | $\frac{\partial f}{\partial x_i}$                                      | Sensitivity to one variable   |
| Gradient           | $\nabla f = [\partial f/\partial x_1, \dots, \partial f/\partial x_n]$ | Direction of steepest change  |
| Gradient Descent   | $w := w - \eta \nabla L(w)$                                            | Training ML models            |
| Hessian            | Matrix of 2nd derivatives                                              | Optimization, convexity check |

---
---
---

# **4. Jacobian & Hessian**

---

## 1. **Jacobian**

### **Definition**

For a vector-valued function

$$
\mathbf{f}(x) =
\begin{bmatrix}
f_1(x_1, \dots, x_n) \\
f_2(x_1, \dots, x_n) \\
\vdots \\
f_m(x_1, \dots, x_n)
\end{bmatrix}
$$

The **Jacobian matrix** is the matrix of first-order partial derivatives:

$$
J = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} =
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \dots & \frac{\partial f_1}{\partial x_n} \\
\frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \dots & \frac{\partial f_2}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \dots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix}
$$

---

### **Example**

$$
\mathbf{f}(x,y) =
\begin{bmatrix}
f_1(x,y) = x^2y \\
f_2(x,y) = \sin(x) + y
\end{bmatrix}
$$

Jacobian:

$$
J =
\begin{bmatrix}
\frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\
\frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y}
\end{bmatrix}
=
\begin{bmatrix}
2xy & x^2 \\
\cos(x) & 1
\end{bmatrix}
$$

---

### **ML Connection**

* Used in **backpropagation** (mapping gradients between layers).
* **Change of variables** in probability (determinant of Jacobian for density transforms).
* **Neural networks:** Jacobian tells how small input changes affect outputs.

---

## 2. **Hessian**

### **Definition**

For a scalar function $f(x_1, x_2, \dots, x_n)$:

The **Hessian matrix** is the matrix of second-order partial derivatives:

$$
H =
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \dots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\
\frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \dots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \dots & \frac{\partial^2 f}{\partial x_n^2}
\end{bmatrix}
$$

👉 Always square ($n \times n$) and usually symmetric (if mixed partials are continuous).

---

### **Example**

$$
f(x,y) = x^2 + xy + y^2
$$

* $\frac{\partial^2 f}{\partial x^2} = 2$
* $\frac{\partial^2 f}{\partial x \partial y} = 1$
* $\frac{\partial^2 f}{\partial y^2} = 2$

Hessian:

$$
H =
\begin{bmatrix}
2 & 1 \\
1 & 2
\end{bmatrix}
$$

---

### **ML Connection**

* **Optimization:**

  * If all eigenvalues of $H$ > 0 → convex (local minimum).
  * If all < 0 → concave (local maximum).
  * Mixed signs → saddle point.
* **Newton’s Method:** Uses Hessian to update parameters faster than gradient descent.
* **Deep learning:** Checking curvature → helps detect vanishing/exploding gradients.

---

## ⚡ Quick Summary

| Concept  | Definition                                           | ML Use                    |
| -------- | ---------------------------------------------------- | ------------------------- |
| Jacobian | Matrix of first-order partials (vector → vector)     | Backprop, transformations |
| Hessian  | Matrix of second-order partials (scalar → curvature) | Optimization, convexity   |

---
---
---

# **5. Taylor Series Approximation**

---

## 1. **Idea**

A **smooth function** can be approximated near a point by a polynomial using its derivatives.

At $x = a$:

$$
f(x) \approx f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \frac{f^{(3)}(a)}{3!}(x-a)^3 + \dots
$$

👉 This is called the **Taylor Series expansion** of $f(x)$ about $a$.

---

## 2. **Special Case: Maclaurin Series (a = 0)**

$$
f(x) \approx f(0) + f'(0)x + \frac{f''(0)}{2!}x^2 + \frac{f^{(3)}(0)}{3!}x^3 + \dots
$$

---

### **Examples**

* $e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots$
* $\sin x = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \dots$
* $\cos x = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \dots$

✔️ Neural nets often approximate nonlinearities using such expansions.

---

## 3. **First-Order Approximation (Linearization)**

$$
f(x) \approx f(a) + f'(a)(x-a)
$$

👉 This is just the **tangent line approximation** near $x=a$.
Used in **gradient descent** (local linear approximation of loss).

---

## 4. **Second-Order Approximation (Quadratic)**

$$
f(x) \approx f(a) + f'(a)(x-a) + \frac{1}{2}(x-a)^T H (x-a)
$$

(where $H$ = Hessian).

👉 This captures **curvature**, useful in **Newton’s method**.

---

## 5. **Error Term (Remainder)**

The error in truncating the series after $n$ terms:

$$
R_n(x) = \frac{f^{(n+1)}(c)}{(n+1)!}(x-a)^{n+1}
$$

for some $c$ between $a$ and $x$.

✔️ Guarantees how close approximation is.

---

## 6. **ML Intuition**

* **Optimization:**

  * Gradient descent ≈ first-order Taylor expansion.
  * Newton’s method ≈ second-order expansion (uses Hessian).

* **Activation Functions:**

  * Sigmoid, tanh can be approximated using Taylor expansion for analysis.

* **Kernel methods:**

  * Some kernels are derived from expansions (e.g., polynomial kernels).

* **Uncertainty:**

  * Approximating log-likelihood functions in Bayesian inference.

---

## ⚡ Quick Summary

| Approximation | Formula                                                       | ML Use                 |
| ------------- | ------------------------------------------------------------- | ---------------------- |
| 1st order     | $f(x) \approx f(a) + f'(a)(x-a)$                              | Gradient descent       |
| 2nd order     | $f(x) \approx f(a) + f'(a)(x-a) + \frac{1}{2}(x-a)^T H (x-a)$ | Newton’s method        |
| Higher order  | Add cubic, quartic… terms                                     | Function approximation |

---
---
---

# **6. Integration Basics**

---

## 1. **What is Integration?**

* **Differentiation** = rate of change (slope).
* **Integration** = accumulation (area under a curve).

👉 Think of integration as the **inverse of differentiation**.

---

## 2. **Indefinite Integral (Antiderivative)**

The **indefinite integral** of $f(x)$:

$$
\int f(x)\, dx = F(x) + C
$$

where $F'(x) = f(x)$, and $C$ = constant of integration.

---

### **Basic Rules**

* Power rule: $\int x^n dx = \frac{x^{n+1}}{n+1} + C, \quad (n \neq -1)$
* Constant multiple: $\int c f(x) dx = c \int f(x) dx$
* Sum rule: $\int (f+g) dx = \int f dx + \int g dx$

---

### **Common Integrals**

* $\int e^x dx = e^x + C$
* $\int \frac{1}{x} dx = \ln |x| + C$
* $\int \sin x dx = -\cos x + C$
* $\int \cos x dx = \sin x + C$

---

## 3. **Definite Integral (Area Under Curve)**

$$
\int_a^b f(x)\, dx = F(b) - F(a)
$$

👉 Represents the **net area under $f(x)$** between $x=a$ and $x=b$.

---

### **Example**

$$
\int_0^1 x^2 dx = \left[\frac{x^3}{3}\right]_0^1 = \frac{1}{3}
$$

---

## 4. **Fundamental Theorem of Calculus**

Connects derivatives and integrals:

1. Differentiation undoes integration.

   $$
   \frac{d}{dx} \left( \int_a^x f(t) dt \right) = f(x)
   $$

2. Integration accumulates derivatives:

   $$
   \int_a^b f(x) dx = F(b) - F(a)
   $$

---

## 5. **ML Applications**

* **Probability:**

  * Continuous distributions → area under density = 1.
  * Example: $\int_{-\infty}^\infty p(x) dx = 1$.

* **Expectation & Variance:**

  * $E[X] = \int x p(x) dx$.
  * $Var(X) = \int (x-\mu)^2 p(x) dx$.

* **Neural nets:** Softmax normalization involves integrals in continuous analogues.

* **Partition functions:** In probabilistic models (e.g., Boltzmann machines).

---

## ⚡ Quick Summary

| Concept             | Formula                                  | Meaning                 |
| ------------------- | ---------------------------------------- | ----------------------- |
| Indefinite Integral | $\int f(x) dx = F(x) + C$                | General antiderivative  |
| Definite Integral   | $\int_a^b f(x) dx = F(b) - F(a)$         | Area under curve        |
| Fundamental Theorem | $\frac{d}{dx}\int f = f(x)$              | Derivatives ↔ Integrals |
| ML Use              | Probability, expectations, normalization | Ensures valid models    |

---
---
---

# **7. Multivariable Calculus: Gradient, Divergence, Curl**

---

## 1. **Gradient**

### **Definition**

For a scalar function $f(x,y,z)$:

$$
\nabla f = \left[ \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right]^T
$$

👉 The **gradient vector** points in the **direction of steepest increase** of $f$.

---

### **Example**

$$
f(x,y) = x^2 + y^2 \quad \Rightarrow \quad \nabla f = [2x, \; 2y]
$$

At (1,1): gradient = \[2,2], pointing outward.

✔️ **ML Use:** Gradient = backbone of **gradient descent** for loss minimization.

---

## 2. **Divergence**

### **Definition**

For a vector field $\mathbf{F} = [F_x, F_y, F_z]$:

$$
\nabla \cdot \mathbf{F} = \frac{\partial F_x}{\partial x} + \frac{\partial F_y}{\partial y} + \frac{\partial F_z}{\partial z}
$$

👉 Measures the **“net flow out”** of a point (like sources/sinks).

---

### **Example**

$$
\mathbf{F} = [x, y, z] \quad \Rightarrow \quad \nabla \cdot \mathbf{F} = 1+1+1 = 3
$$

✔️ **ML Use:** Appears in **divergence measures** (e.g., KL Divergence conceptually tied to flow/expansion).

---

## 3. **Curl**

### **Definition**

For a vector field $\mathbf{F} = [F_x, F_y, F_z]$:

$$
\nabla \times \mathbf{F} =
\begin{bmatrix}
\frac{\partial F_z}{\partial y} - \frac{\partial F_y}{\partial z} \\
\frac{\partial F_x}{\partial z} - \frac{\partial F_z}{\partial x} \\
\frac{\partial F_y}{\partial x} - \frac{\partial F_x}{\partial y}
\end{bmatrix}
$$

👉 Measures **rotation** or swirling strength of a vector field.

---

### **Example**

$$
\mathbf{F} = [-y, x, 0] \quad \Rightarrow \quad \nabla \times \mathbf{F} = [0, 0, 2]
$$

This represents a counter-clockwise rotation in the xy-plane.

✔️ **ML Use:**

* Physics-inspired ML (fluid simulations, robotics).
* Vector field learning (e.g., diffusion models).

---

## 4. **Big Picture**

* **Gradient ($\nabla f$)** → slope (steepest ascent).
* **Divergence ($\nabla \cdot F$)** → expansion/contraction of vector fields.
* **Curl ($\nabla \times F$)** → rotation of vector fields.

---

## ⚡ Quick Summary

| Concept    | Formula           | Meaning                           | ML Connection                      |
| ---------- | ----------------- | --------------------------------- | ---------------------------------- |
| Gradient   | $\nabla f$        | Steepest increase of scalar field | Gradient descent                   |
| Divergence | $\nabla \cdot F$  | Net outflow at a point            | Info flow, KL divergence analogy   |
| Curl       | $\nabla \times F$ | Local rotation of field           | Physics-inspired ML, vector fields |

---
---
---