# **1. Scalars, Vectors, Matrices, Tensors**

---

## 1. **Scalar**

* **Definition:** A single number (real or complex).
* **Dimension:** 0D (just a value).
* **Examples:**

  * Temperature = 37°C
  * Learning rate $\eta = 0.01$
* **Notation:** Usually lowercase letters (e.g., $a, x, y$).

✔️ Think of a scalar as just one point on the number line.

---

## 2. **Vector**

* **Definition:** An **ordered list of numbers** (1D array). Represents magnitude + direction.
* **Dimension:** 1D
* **Examples:**

  * Position in 3D space: $v = [x, y, z]$
  * Feature vector: $[height, weight, age]$
* **Notation:** Bold lowercase letters ($\mathbf{v}$), or with arrow ($\vec{v}$).
* **Operations:**

  * Addition: $[1,2] + [3,4] = [4,6]$
  * Dot product: $\mathbf{a} \cdot \mathbf{b} = \sum a_i b_i$

✔️ Think of a vector as a **list of features** or a point in space.

---

## 3. **Matrix**

* **Definition:** A **2D array** of numbers arranged in rows and columns.
* **Dimension:** 2D
* **Examples:**

  * $A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$
  * Image in grayscale = matrix of pixel values.
* **Notation:** Bold uppercase letters ($\mathbf{A}, \mathbf{B}$).
* **Operations:**

  * Matrix Addition: Add element-wise.
  * Matrix Multiplication: Row × Column rule.
  * Transpose: Flip rows ↔ columns.

✔️ Think of a matrix as a **table of values** or **multiple vectors stacked**.

---

## 4. **Tensor**

* **Definition:** A **generalization of scalars, vectors, and matrices** to higher dimensions.
* **Dimension:** nD (multi-dimensional arrays).
* **Examples:**

  * Scalar → 0D Tensor
  * Vector → 1D Tensor
  * Matrix → 2D Tensor
  * Color Image → 3D Tensor (Height × Width × Channels)
  * Batch of Images → 4D Tensor (Batch × Height × Width × Channels)
* **Notation:** Bold script letters or calligraphic letters ($\mathcal{T}$).

✔️ Think of a tensor as a **container of numbers with multiple axes (dimensions)**.

---

## Quick Visualization

| Object | Dimension | Example                                        | ML Use Case                  |
| ------ | --------- | ---------------------------------------------- | ---------------------------- |
| Scalar | 0D        | $a = 5$                                        | Learning rate, loss value    |
| Vector | 1D        | $[1, 2, 3]$                                    | Feature vector               |
| Matrix | 2D        | $\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ | Weight matrix in NN          |
| Tensor | nD        | Image = $(64, 64, 3)$                          | Data representation in ML/DL |

---
---
---

# **2. Matrix Operations**

---

## 1. **Matrix Addition**

* **Rule:** Add corresponding elements (same dimensions required).
* If $A$ and $B$ are both $m \times n$, then:

  $$
  (A + B)_{ij} = A_{ij} + B_{ij}
  $$

**Example:**

$$
A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad
B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}
$$

$$
A + B = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}
$$

✔️ Simple element-wise operation.

---

## 2. **Matrix Multiplication**

Two types: **Scalar × Matrix** and **Matrix × Matrix**

### (a) Scalar Multiplication

Multiply each entry by a scalar $k$.

$$
kA = \begin{bmatrix} ka_{11} & ka_{12} \\ ka_{21} & ka_{22} \end{bmatrix}
$$

Example:

$$
2 \times \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}
= \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix}
$$

---

### (b) Matrix × Matrix Multiplication

* **Rule:** Row of 1st × Column of 2nd.
* If $A$ is $m \times n$, $B$ must be $n \times p$.
* Result = $m \times p$.

Formula:

$$
(A \times B)_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}
$$

**Example:**

$$
A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad
B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}
$$

$$
A \times B =
\begin{bmatrix}
(1\cdot5 + 2\cdot7) & (1\cdot6 + 2\cdot8) \\
(3\cdot5 + 4\cdot7) & (3\cdot6 + 4\cdot8)
\end{bmatrix}
=
\begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}
$$

✔️ Order matters: $A \times B \neq B \times A$ in general.

---

## 3. **Matrix Transpose**

* **Rule:** Flip over diagonal → rows become columns.
* If $A$ is $m \times n$, then $A^T$ is $n \times m$.

$$
(A^T)_{ij} = A_{ji}
$$

**Example:**

$$
A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}
$$

$$
A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}
$$

✔️ Useful in dot products, orthogonality, and linear transformations.

---

## ⚡ Quick Summary

* **Addition:** Same size → add element-wise.
* **Multiplication:** Row × Column (dimensions must match).
* **Transpose:** Flip rows ↔ columns.

---
---
---

# **3. Identity & Inverse Matrices**

---

## 1. **Identity Matrix** ($I$)

* **Definition:** A **square matrix** with **1’s on the diagonal** and **0’s elsewhere**.
* Acts like **“1” for matrices** under multiplication.
* Size: $n \times n$.

$$
I = \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$

**Property:**

$$
AI = IA = A
$$

✔️ Think of it as the **neutral element** in matrix multiplication.

---

## 2. **Inverse Matrix** ($A^{-1}$)

* **Definition:** For a square matrix $A$, its inverse $A^{-1}$ is defined such that:

$$
A A^{-1} = A^{-1} A = I
$$

* **Conditions:**

  1. $A$ must be **square** ($n \times n$).
  2. $A$ must be **non-singular** (determinant ≠ 0).

---

### **How to Compute Inverse (2×2 Case)**

If

$$
A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}
$$

then

$$
A^{-1} = \frac{1}{ad - bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}
$$

where $ad - bc = \det(A)$.

**Example:**

$$
A = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}
$$

$$
\det(A) = (2)(3) - (5)(1) = 6 - 5 = 1
$$

$$
A^{-1} = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}
$$

---

### **Key Properties of Inverses**

* $(A^{-1})^{-1} = A$
* $(AB)^{-1} = B^{-1}A^{-1}$ (note the **reverse order**)
* $(A^T)^{-1} = (A^{-1})^T$
* If $\det(A) = 0$ → **No inverse** (singular matrix).

---

## ⚡ Why Important in ML?

* In **Linear Regression**:
  Solution of $y = X\beta$ is:

  $$
  \hat{\beta} = (X^T X)^{-1} X^T y
  $$
* In **PCA**: Eigen decomposition uses inverses.
* In **Optimization**: Newton’s method uses Hessian inverse.

---
---
---

# **4. Determinant & Rank of a Matrix**

---

## 1. **Determinant** ($\det(A)$ or $|A|$)

* **Definition:** A scalar value that represents the **scaling factor** of a matrix transformation.
* Defined only for **square matrices** ($n \times n$).

---

### **Geometric Meaning**

* $|A|$ = area (2D) or volume (3D) scaling factor after applying matrix $A$.
* If $\det(A) = 0$ → transformation **collapses space** (loses dimension).

---

### **Computation**

* **2×2 Matrix:**

$$
A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \quad \det(A) = ad - bc
$$

* **3×3 Matrix:**

$$
A = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}
$$

$$
\det(A) = a(ei - fh) - b(di - fg) + c(dh - eg)
$$

---

### **Key Properties**

* $\det(AB) = \det(A) \cdot \det(B)$
* $\det(A^T) = \det(A)$
* If $\det(A) = 0$, matrix is **singular** (no inverse).

---

✔️ **ML Use:**

* Inverse exists iff $\det(A) \neq 0$.
* Covariance matrix determinant → volume of distribution spread.
* Jacobian determinant → change of variables in probability (used in normalizing flows).

---

## 2. **Rank of a Matrix**

* **Definition:** Number of **linearly independent rows or columns** in a matrix.
* Rank tells how much **useful information** a matrix has.

---

### **Interpretation**

* **Full Rank:** Rank = min(rows, cols). → No redundancy.
* **Rank Deficient:** Rank < min(rows, cols). → Some rows/cols are linear combinations of others.

---

### **Examples**

1.

$$
A = \begin{bmatrix} 1 & 2 \\ 3 & 6 \end{bmatrix}
$$

Here row2 = 3×row1 → **Rank = 1** (not full rank).

2.

$$
B = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
$$

Rank = 2 (full rank).

---

### **How to Find Rank**

* Reduce matrix to **Row Echelon Form (REF)** or **Reduced Row Echelon Form (RREF)** → count nonzero rows.
* Alternatively → number of non-zero singular values (from SVD).

---

### **Key Facts**

* $\text{rank}(A) \leq \min(m, n)$.
* If rank < n (columns), then system $Ax=b$ has **no unique solution**.
* Rank is crucial in **linear regression, PCA, and dimensionality reduction**.

---

## ⚡ Summary

* **Determinant:** Scalar that shows scaling + invertibility.
* **Rank:** Number of independent vectors (information capacity).
* If $\det(A) = 0 \Rightarrow \text{rank}(A) < n$.

---
---
---


# **5. Dot Product & Cross Product**

---

## 1. **Dot Product** ($\mathbf{a} \cdot \mathbf{b}$)

### **Definition:**

* A scalar value (not a vector).
* Measures **similarity** between two vectors.

$$
\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^n a_i b_i
$$

or geometrically:

$$
\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos\theta
$$

where $\theta$ is the angle between vectors.

---

### **Example:**

$$
\mathbf{a} = [1, 2, 3], \quad \mathbf{b} = [4, 5, 6]
$$

$$
\mathbf{a} \cdot \mathbf{b} = 1\cdot4 + 2\cdot5 + 3\cdot6 = 32
$$

---

### **Key Properties:**

* $\mathbf{a} \cdot \mathbf{b} = \mathbf{b} \cdot \mathbf{a}$ (commutative).
* $\mathbf{a} \cdot \mathbf{a} = \|\mathbf{a}\|^2$.
* If $\mathbf{a} \cdot \mathbf{b} = 0 \Rightarrow \mathbf{a}, \mathbf{b}$ are **orthogonal**.

---

✔️ **ML Use Cases:**

* Cosine Similarity:
  $\cos\theta = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\|\|\mathbf{b}\|}$ (used in NLP, embeddings).
* Loss functions & projections.

---

## 2. **Cross Product** ($\mathbf{a} \times \mathbf{b}$)

### **Definition:**

* A vector (not a scalar).
* Only defined in **3D space**.
* Result is **perpendicular** to both $\mathbf{a}, \mathbf{b}$.

$$
\mathbf{a} \times \mathbf{b} =
\begin{vmatrix}
\mathbf{i} & \mathbf{j} & \mathbf{k} \\
a_1 & a_2 & a_3 \\
b_1 & b_2 & b_3
\end{vmatrix}
$$

$$
= (a_2b_3 - a_3b_2)\mathbf{i} - (a_1b_3 - a_3b_1)\mathbf{j} + (a_1b_2 - a_2b_1)\mathbf{k}
$$

---

### **Example:**

$$
\mathbf{a} = [1, 2, 3], \quad \mathbf{b} = [4, 5, 6]
$$

$$
\mathbf{a} \times \mathbf{b} =
\begin{bmatrix} (2\cdot6 - 3\cdot5), & (3\cdot4 - 1\cdot6), & (1\cdot5 - 2\cdot4) \end{bmatrix}
$$

$$
= [-3, 6, -3]
$$

---

### **Geometric Meaning:**

* Magnitude = area of parallelogram formed by $\mathbf{a}, \mathbf{b}$.

$$
\|\mathbf{a} \times \mathbf{b}\| = \|\mathbf{a}\|\|\mathbf{b}\|\sin\theta
$$

---

### **Key Properties:**

* $\mathbf{a} \times \mathbf{b} = -(\mathbf{b} \times \mathbf{a})$.
* $\mathbf{a} \times \mathbf{a} = \mathbf{0}$.
* Result is **orthogonal** to both vectors.

---

✔️ **ML Use Cases:**

* Less common than dot product in ML, but appears in **3D computer vision, graphics, robotics**.
* Useful in calculating **normal vectors** for 3D planes.

---

## ⚡ Summary

| Operation     | Result | Dimension | Key Use Case                               |
| ------------- | ------ | --------- | ------------------------------------------ |
| Dot Product   | Scalar | Any nD    | Similarity, projections, embeddings        |
| Cross Product | Vector | Only 3D   | Geometry, 3D ML/vision, orthogonal vectors |

---
---
---


# **6. Vector Norms (L1, L2, ∞ Norm)**

---

## 1. **What is a Norm?**

* A **function** that assigns a non-negative length (magnitude) to a vector.
* Must satisfy:

  1. $\|\mathbf{x}\| \geq 0$ and $\|\mathbf{x}\| = 0 \iff \mathbf{x} = 0$
  2. $\|c \mathbf{x}\| = |c|\|\mathbf{x}\|$ (scaling property)
  3. $\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|$ (triangle inequality)

---

## 2. **L1 Norm (Manhattan Norm / Taxicab Norm)**

$$
\|\mathbf{x}\|_1 = \sum_{i=1}^n |x_i|
$$

* Distance = “grid-like path” (like moving on city streets).
* Encourages **sparsity** (many coefficients become 0 in ML).

**Example:**

$$
\mathbf{x} = [3, -4, 5]
$$

$$
\|\mathbf{x}\|_1 = |3| + |-4| + |5| = 12
$$

✔️ **ML Use:** Lasso Regression (L1 regularization).

---

## 3. **L2 Norm (Euclidean Norm)**

$$
\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2}
$$

* Standard **Euclidean distance** (straight-line).
* Smooth, differentiable → good for optimization.

**Example:**

$$
\mathbf{x} = [3, -4, 5]
$$

$$
\|\mathbf{x}\|_2 = \sqrt{3^2 + (-4)^2 + 5^2} = \sqrt{50} \approx 7.07
$$

✔️ **ML Use:** Ridge Regression (L2 regularization), Gradient Descent step sizes.

---

## 4. **Infinity Norm (Max Norm, $L_\infty$)**

$$
\|\mathbf{x}\|_\infty = \max_i |x_i|
$$

* Largest absolute value among vector elements.
* Focuses on the “worst-case” component.

**Example:**

$$
\mathbf{x} = [3, -4, 5]
$$

$$
\|\mathbf{x}\|_\infty = \max(3, 4, 5) = 5
$$

✔️ **ML Use:** Robust optimization, adversarial ML (bounding perturbations).

---

## 5. **Comparison of Norms**

For $\mathbf{x} = [3, -4, 5]$:

| Norm       | Formula                     | Value |   |    |   |   |    |    |
| ---------- | --------------------------- | ----- | - | -- | - | - | -- | -- |
| L1         | (                           | 3     | + | -4 | + | 5 | )  | 12 |
| L2         | $\sqrt{3^2 + (-4)^2 + 5^2}$ | 7.07  |   |    |   |   |    |    |
| $L_\infty$ | (\max(                      | 3     | , | -4 | , | 5 | )) | 5  |

---

## ⚡ ML Intuition

* **L1** → Sparsity (feature selection).
* **L2** → Stability, smooth solutions.
* **$L_\infty$** → Robustness against max perturbation.

---
---
---

# **7. Linear Independence, Basis, Dimension**

---

## 1. **Linear Independence**

* A set of vectors $\{v_1, v_2, ..., v_n\}$ is **linearly independent** if **no vector can be written as a linear combination of others**.
* Otherwise, they are **linearly dependent**.

**Formally:**

$$
c_1v_1 + c_2v_2 + \dots + c_nv_n = 0 \quad \Rightarrow \quad c_1 = c_2 = \dots = c_n = 0
$$

---

**Example (Independent):**

$$
v_1 = [1, 0], \quad v_2 = [0, 1]
$$

No way to express one using the other → Independent.

**Example (Dependent):**

$$
v_1 = [1, 2], \quad v_2 = [2, 4]
$$

Here $v_2 = 2v_1$ → Dependent.

✔️ **ML Use:** Linear independence means features contain **unique information**.

---

## 2. **Basis of a Vector Space**

* A **basis** is a **set of linearly independent vectors** that can represent **all vectors in the space** via linear combinations.
* Basis vectors are like the **"coordinate system"** for the space.

**Example:**

* Standard basis of $\mathbb{R}^2$:

$$
e_1 = [1, 0], \quad e_2 = [0, 1]
$$

Any vector $[x, y]$ can be written as $x e_1 + y e_2$.

✔️ **Different bases** can represent the same space (e.g., rotated axes).

✔️ **ML Use:** PCA finds a **new basis** (principal components) that explains variance.

---

## 3. **Dimension of a Vector Space**

* The **dimension** of a space = number of vectors in its **basis**.
* It represents the **degrees of freedom** or amount of independent information.

**Examples:**

* Line ($\mathbb{R}^1$) → Dimension = 1
* Plane ($\mathbb{R}^2$) → Dimension = 2
* 3D Space ($\mathbb{R}^3$) → Dimension = 3

✔️ **ML Use:**

* Dimensionality = number of features.
* Dimensionality reduction (PCA, autoencoders) → reduce redundancy.

---

## ⚡ Quick Summary

| Concept             | Meaning                                            | ML Connection                     |
| ------------------- | -------------------------------------------------- | --------------------------------- |
| Linear Independence | Vectors don’t overlap in information               | Ensures features aren’t redundant |
| Basis               | Minimum set of independent vectors that span space | PCA basis, embeddings             |
| Dimension           | Number of vectors in basis                         | Feature space size                |

---
---
---

# **8. Eigenvalues & Eigenvectors**

---

## 1. **Definition**

For a square matrix $A$:

$$
A \mathbf{v} = \lambda \mathbf{v}
$$

* $\mathbf{v}$ = **Eigenvector** (non-zero vector).
* $\lambda$ = **Eigenvalue** (scalar).

👉 Interpretation: Applying $A$ to $\mathbf{v}$ just **scales it**, not change its direction.

---

## 2. **How to Find Them**

* From equation:

$$
(A - \lambda I)\mathbf{v} = 0
$$

* Non-trivial solution exists if:

$$
\det(A - \lambda I) = 0
$$

This gives the **characteristic polynomial**, solving it → eigenvalues $\lambda$.

* Substitute $\lambda$ back to solve for $\mathbf{v}$.

---

## 3. **Example (2×2 Case)**

$$
A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}
$$

Characteristic equation:

$$
\det(A - \lambda I) =
\begin{vmatrix} 2-\lambda & 1 \\ 1 & 2-\lambda \end{vmatrix}
= (2-\lambda)^2 - 1 = \lambda^2 - 4\lambda + 3
$$

Solve: $\lambda^2 - 4\lambda + 3 = 0 \Rightarrow \lambda = 1, 3$.

For $\lambda = 3$:

$$
(A - 3I)v = 0 \Rightarrow \begin{bmatrix} -1 & 1 \\ 1 & -1 \end{bmatrix} v = 0
$$

Eigenvector: $[1, 1]^T$.

For $\lambda = 1$:
Eigenvector: $[1, -1]^T$.

---

## 4. **Geometric Meaning**

* Eigenvectors = **special directions** that remain unchanged (up to scaling) under transformation.
* Eigenvalues = scaling factors along those directions.

✔️ Example: Stretching a rubber sheet → eigenvectors = directions of stretch, eigenvalues = stretch amount.

---

## 5. **Key Properties**

* A matrix has at most $n$ eigenvalues (for $n \times n$).
* $\det(A) = \prod \lambda_i$.
* $\text{trace}(A) = \sum \lambda_i$.
* If all eigenvalues > 0 → matrix is **positive definite** (important in optimization).

---

## 6. **ML Use Cases**

### 🔹 **Principal Component Analysis (PCA)**

* Covariance matrix $C = \frac{1}{n}X^TX$.
* Eigenvectors of $C$ = **principal directions** (new feature axes).
* Eigenvalues = variance explained by each principal component.

---

### 🔹 **Optimization (Convexity & Curvature)**

* Hessian matrix $H$ of second derivatives:

  * If all eigenvalues > 0 → convex (local min).
  * If all eigenvalues < 0 → concave (local max).
  * Mixed signs → saddle point.

---

### 🔹 **Other Uses**

* Spectral clustering (graph Laplacians).
* PageRank (Google) uses eigenvector centrality.
* Deep learning stability → weight matrix eigenvalues control gradient explosion/vanishing.

---

## ⚡ Summary

| Concept          | Meaning                               | ML Connection                              |
| ---------------- | ------------------------------------- | ------------------------------------------ |
| Eigenvector      | Special direction preserved by matrix | PCA axes, graph embeddings                 |
| Eigenvalue       | Scaling factor along eigenvector      | Variance explained, curvature strength     |
| Large eigenvalue | Strong effect in that direction       | Dominant feature/component                 |
| Small eigenvalue | Weak/flat direction                   | Can be removed in dimensionality reduction |

---
---
---

# **9. Orthogonality & Projections**

---

## 1. **Orthogonality**

### **Definition**

* Two vectors $\mathbf{a}, \mathbf{b}$ are **orthogonal** if:

$$
\mathbf{a} \cdot \mathbf{b} = 0
$$

* Means they are **perpendicular** in space.

---

### **Properties**

* Orthogonal vectors are **linearly independent**.
* An **orthogonal basis** = set of mutually perpendicular vectors.
* If all basis vectors are unit length → **orthonormal basis**.

---

**Example:**

$$
\mathbf{a} = [1, 0], \quad \mathbf{b} = [0, 1]
$$

$\mathbf{a} \cdot \mathbf{b} = 0 $ → orthogonal.

✔️ **ML Use:** Word embeddings, PCA (principal components are orthogonal).

---

## 2. **Projection of a Vector onto Another**

### **Definition**

Projection of vector $\mathbf{a}$ onto vector $\mathbf{b}$:

$$
\text{proj}_{\mathbf{b}}(\mathbf{a}) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{b}\|^2} \mathbf{b}
$$

---

### **Geometric Meaning**

* It’s the **shadow** of $\mathbf{a}$ onto the direction of $\mathbf{b}$.
* Decomposes $\mathbf{a}$ into:

  $$
  \mathbf{a} = \text{proj}_{\mathbf{b}}(\mathbf{a}) + \mathbf{a}_\perp
  $$

  where $\mathbf{a}_\perp$ is orthogonal to $\mathbf{b}$.

---

### **Example**

$$
\mathbf{a} = [3, 4], \quad \mathbf{b} = [1, 0]
$$

$$
\text{proj}_{\mathbf{b}}(\mathbf{a}) = \frac{(3)(1) + (4)(0)}{1^2 + 0^2}[1, 0] = [3, 0]
$$

So $[3,4]$ is decomposed into $[3,0]$ (parallel to $\mathbf{b}$) and $[0,4]$ (orthogonal).

---

## 3. **Projections onto Subspaces (Generalized)**

* For a matrix $X$ with columns as basis vectors, the projection of vector $y$ onto the subspace spanned by $X$ is:

$$
\hat{y} = X(X^TX)^{-1}X^T y
$$

✔️ This is exactly the formula used in **Least Squares Regression** (fitting line/plane to data).

---

## 4. **ML Use Cases**

* **PCA** → Projects data onto eigenvector directions (principal components).
* **Least Squares Regression** → Projects target $y$ onto the column space of $X$.
* **Embeddings** → Orthogonal vectors = uncorrelated features.
* **Orthogonalization** → Gram-Schmidt used to build orthogonal basis.

---

## ⚡ Quick Summary

| Concept                        | Formula                                                         | ML Connection                |
| ------------------------------ | --------------------------------------------------------------- | ---------------------------- |
| Orthogonality                  | $\mathbf{a} \cdot \mathbf{b} = 0$                               | Ensures independent features |
| Projection (onto $\mathbf{b}$) | $\frac{\mathbf{a}\cdot \mathbf{b}}{\|\mathbf{b}\|^2}\mathbf{b}$ | Used in regression, PCA      |
| Orthonormal basis              | Unit-length orthogonal vectors                                  | PCA, embeddings              |

---
---
---