# Common Math Symbols and Notations in Machine Learning

## 1. General Notations
- **Scalars**: Lowercase letters ($x$, $y$, $\theta$)
- **Vectors**: Bold lowercase letters ($\mathbf{x}$, $\mathbf{w}$)
- **Matrices**: Bold uppercase letters ($\mathbf{X}$, $\mathbf{W}$)
- **Tensors**: Calligraphic letters ($\mathcal{T}$)
- **Sets**: Uppercase letters ($\mathcal{D}$ for dataset)
- **Indexing**: $x_i$ (i-th element of vector $\mathbf{x}$), $\mathbf{W}_{i,j}$ (i,j-th element of matrix $\mathbf{W}$)

---

## 2. Loss Functions & Errors
| Symbol       | Meaning |
|--------------|---------|
| $\mathcal{L}$ | General loss function |
| $J(\theta)$ | Cost function |
| $\ell(y, \hat{y})$ | Loss between true $y$ and predicted $\hat{y}$ |
| $\text{MSE}$ | Mean Squared Error $\frac{1}{n} \sum (y_i - \hat{y}_i)^2$ |
| $\text{CE}$ | Cross-Entropy Loss $-\sum y_i \log(\hat{y}_i)$ |
| $\epsilon$ | General error term |
| $E$ | Expected value ($\mathbb{E}[X]$) |

---

## 3. Optimization & Gradients
| Symbol       | Meaning |
|--------------|---------|
| $\nabla_\theta J$ | Gradient of $J$ w.r.t. $\theta$ |
| $\partial J / \partial \theta$ | Partial derivative |
| $\eta$ | Learning rate |
| $\theta_{t+1} = \theta_t - \eta \nabla J(\theta_t)$ | Gradient descent update |
| $\alpha$ | Momentum coefficient |
| $\beta_1, \beta_2$ | Adam hyperparameters |
| $\lambda$ | Regularization coefficient |
| $\arg \min_\theta J(\theta)$ | Optimal $\theta$ minimizing $J$ |

---

## 4. Probability & Statistics
| Symbol       | Meaning |
|--------------|---------|
| $p(x)$ | Probability density/mass function |
| $P(y \mid x)$ | Conditional probability |
| $\mathbb{E}[X]$ | Expectation |
| $\text{Var}(X)$ | Variance |
| $\mathcal{N}(\mu, \sigma^2)$ | Gaussian distribution |
| $\sim$ | "Distributed as" ($x \sim \mathcal{N}(0,1)$) |

---

## 5. Linear Algebra
| Symbol       | Meaning |
|--------------|---------|
| $\mathbf{X}^T$ | Matrix transpose |
| $\mathbf{W}^\dagger$ | Pseudoinverse |
| $\|\mathbf{x}\|_2$ | L2 norm |
| $\|\mathbf{x}\|_1$ | L1 norm |
| $\mathbf{I}$ | Identity matrix |
| $\text{tr}(\mathbf{A})$ | Trace |
| $\text{det}(\mathbf{A})$ | Determinant |

---

## 6. Neural Networks
| Symbol       | Meaning |
|--------------|---------|
| $\sigma(z)$ | Activation function |
| $\text{ReLU}(z)$ | $\max(0, z)$ |
| $\mathbf{h}^{(l)}$ | Hidden layer $l$ |
| $\mathbf{W}^{(l)}$ | Weight matrix |
| $\mathbf{b}^{(l)}$ | Bias vector |
| $\odot$ | Hadamard product |

---

## 7. Special Notations
| Symbol       | Meaning |
|--------------|---------|
| $\mathbb{1}$ | Indicator function |
| $\delta_{ij}$ | Kronecker delta |
| $\langle \mathbf{x}, \mathbf{y} \rangle$ | Dot product |

---

### Example: Gradient Descent Update
$$
\theta_{t+1} = \theta_t - \eta \nabla_\theta J(\theta_t)
$$
where:
- $\theta$ = model parameters
- $\eta$ = learning rate
- $\nabla_\theta J$ = gradient of loss