# **1. Linear Algebra: Scalars & Vectors**

---

## 🔹 1. **What is a Scalar?**

* A **scalar** is a single number (value) representing **magnitude only**, with no direction.
* It can be an **integer**, **real number**, or **complex number**.

### 🔢 Examples:

* 5, -3.2, 0, π
* In Python: `a = 5` is a scalar.

### 📌 **Notation**:

* Usually denoted as: lowercase letters like `a`, `b`, `α`, `β`.

### 🤖 **In AI/ML:**

* Scalars appear as:

  * **Learning rate (α)** in gradient descent.
  * **Loss value** after a forward pass in training.
  * **Single pixel intensity** in grayscale images.

---

## 🔹 2. **What is a Vector?**

A **vector** is an **ordered list of numbers** (scalars) that has both:

* **Magnitude**
* **Direction**

### 🔢 Example:

A vector `v = [3, 4]` in 2D space
A vector `x = [5.2, -1.7, 3.0]` in 3D space

### 📌 **Notation**:

* Bold lowercase: **v**, **x**, or with arrow: →𝑣
* In programming (Python/NumPy): `v = np.array([3, 4])`

### 🧠 **Intuition**:

* A vector is like an arrow pointing from origin to a coordinate point.
* In 2D/3D: easy to visualize.
* In higher dimensions (e.g., 300D word embeddings), it’s still a direction in space, just not visually intuitive.

---

## 🔹 3. **Types of Vectors**:

| Type               | Description                     | Example           |
| ------------------ | ------------------------------- | ----------------- |
| **Row Vector**     | 1 × n matrix                    | `[1, 2, 3]`       |
| **Column Vector**  | n × 1 matrix                    | `[[1], [2], [3]]` |
| **Zero Vector**    | All elements are 0              | `[0, 0, 0]`       |
| **Unit Vector**    | Magnitude = 1                   | `[1/√2, 1/√2]`    |
| **One-Hot Vector** | Only one element is 1, others 0 | `[0, 0, 1, 0]`    |

---

## 🔹 4. **Key Operations on Vectors**

| Operation                 | Formula          | Meaning                                   |   |   |              |                      |
| ------------------------- | ---------------- | ----------------------------------------- | - | - | ------------ | -------------------- |
| **Addition**              | `a + b`          | Add elements position-wise                |   |   |              |                      |
| **Scalar Multiplication** | `k * v`          | Multiply each element by a scalar         |   |   |              |                      |
| **Dot Product**           | `a • b = Σ aᵢbᵢ` | Cosine similarity (magnitude + direction) |   |   |              |                      |
| **Norm (Magnitude)**      | v = √(Σ vᵢ²) | Length of the vector |

### ✳️ Example:

```python
import numpy as np
v = np.array([3, 4])
magnitude = np.linalg.norm(v)  # → 5.0
```

---

## 🔹 5. **Real Use-Cases of Vectors in AI**

| Domain                     | How Vectors Are Used                                             |
| -------------------------- | ---------------------------------------------------------------- |
| **NLP**                    | Word embeddings (e.g., Word2Vec turns "king" into a 300D vector) |
| **Computer Vision**        | Flattened image pixels as high-dimensional vectors               |
| **Recommendation Systems** | User/item preferences as feature vectors                         |
| **Optimization**           | Gradients are vectors used to update model weights               |
| **Clustering (ML)**        | Each data point is a vector in feature space                     |

---

## 🔹 6. **Short Tricks / Memory Hacks**

* ✅ Remember: **scalars** = size only, **vectors** = size + direction
* ✅ Dot product tells **how aligned** two vectors are.
* ✅ A unit vector just tells you the **pure direction**.

---

## 📌 **Summary (Flash Notes)**

| Concept    | Scalar              | Vector                          |
| ---------- | ------------------- | ------------------------------- |
| Definition | Single value        | List of values                  |
| Direction  | ❌ No                | ✅ Yes                           |
| Visual     | Point               | Arrow                           |
| Notation   | `a`, `α`            | `v`, `→v`, bold lowercase       |
| Used In    | Learning rate, loss | Features, embeddings, gradients |
| Examples   | `5`, `π`            | `[2, -1]`, `[1, 0, 0]`          |

---
---
---
---

# **2. Linear Algebra: Norm, Vector Space, Cosine Similarity & Basic Terms**

---

## 🔹 1. **Norm of a Vector**

The **norm** of a vector is a measure of its **length or magnitude**. It tells you **how far** the vector is from the origin in space.

### 📌 **Common Types of Norms**:

| Type                    | Formula           | Description                    | Use in AI              |                                     |                                 |
| ----------------------- | ----------------- | ------------------------------ | ---------------------- | ----------------------------------- | ------------------------------- |
| **L1 Norm (Manhattan)** | `‖v‖₁ = vᵢ'    | Sum of absolute values              | Sparsity (Lasso Regularization) |
| **L2 Norm (Euclidean)** | `‖v‖₂ = √(Σ vᵢ²)` | Euclidean distance from origin | Most common in ML & DL |                                     |                                 |
| **Infinity Norm (Max)** | `‖v‖∞ = max(vᵢ)' | Maximum value among vector elements | Rare, used in adversarial ML    |

### 🧠 **Example**:

Let `v = [3, 4]`

* L2 norm: `‖v‖₂ = √(3² + 4²) = √25 = 5`

```python
import numpy as np
v = np.array([3, 4])
np.linalg.norm(v)  # Output: 5.0
```

---

## 🔹 2. **Unit Vector**

A **unit vector** has **magnitude 1** and only gives **direction**.

### ✳️ Formula:

$$
\hat{v} = \frac{v}{‖v‖}
$$

### 📌 In AI:

Used to normalize features (e.g., during cosine similarity), helps in **standardizing input**.

---

## 🔹 3. **Vector Space**

A **vector space** is a collection of vectors that can be:

* Added together
* Multiplied by scalars
  ...and still remain within the space.

### 🧠 **Example**:

The 2D plane `ℝ²` is a vector space. If `v₁ = [1, 2]` and `v₂ = [3, 4]`, then `v₁ + v₂ = [4, 6]` is also in `ℝ²`.

### ✅ **Properties of a Vector Space**:

* Closure under addition & scalar multiplication
* Existence of zero vector
* Existence of additive inverse
* Distributive & associative laws

### 🤖 **In AI:**

* **Word embeddings** live in a high-dimensional vector space.
* **Linear models** (like SVM, logistic regression) operate within vector spaces.

---

## 🔹 4. **Cosine Similarity**

Measures the **angle** between two vectors, not the magnitude.

### ✳️ Formula:

$$
\cos(θ) = \frac{A \cdot B}{‖A‖ ‖B‖}
$$

* Ranges from `-1` (opposite direction) to `1` (same direction)
* If vectors point in the **same direction**, cosine similarity = 1

### 🧠 **Why use Cosine Similarity?**

* When **magnitude doesn’t matter**, only direction (e.g., text data, embeddings)
* More robust than Euclidean distance in high-dimensional data

```python
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

A = np.array([[1, 2]])
B = np.array([[2, 4]])
cosine_similarity(A, B)  # Output: [[1.0]]
```

### 🤖 **In AI:**

| Application    | Use                                     |
| -------------- | --------------------------------------- |
| NLP            | Compare word embeddings                 |
| Recommendation | Item/user similarity                    |
| Clustering     | Distance metric between feature vectors |

---

## 🔹 5. **Orthogonality**

Two vectors are **orthogonal** if their **dot product is 0**, meaning they are **perpendicular** and **independent**.

### ✳️ Dot Product:

$$
a \cdot b = 0 \Rightarrow \text{Vectors are orthogonal}
$$

### 🤖 In AI:

* **Orthogonal vectors** → uncorrelated features.
* Helps in **feature selection** and reducing **multicollinearity**.

---

## 🔹 6. **Span**

The **span** of a set of vectors is the **set of all vectors** that can be formed by their **linear combinations**.

### 🧠 Example:

If vectors `v1 = [1, 0]`, `v2 = [0, 1]` then their span covers all of ℝ².

### 🤖 In AI:

* Determines the **expressiveness** of the feature space
* **Basis vectors** span a space → used in dimensionality reduction (PCA)

---

## 🔹 7. **Linear Independence**

A set of vectors is **linearly independent** if **none** of them can be written as a **linear combination** of the others.

### 🤖 In AI:

* Ensures that features provide **unique** information
* Basis vectors in PCA are linearly independent

---

## ✅ Quick Summary Table:

| Term                    | Meaning                                                    | Use in AI                               |
| ----------------------- | ---------------------------------------------------------- | --------------------------------------- |
| **Norm**                | Length of a vector                                         | Feature normalization, gradient control |
| **Unit Vector**         | Vector with magnitude 1                                    | Direction only, normalization           |
| **Vector Space**        | Set of vectors closed under addition/scalar multiplication | Feature representation, embeddings      |
| **Cosine Similarity**   | Angle between vectors                                      | Similarity in NLP, recsys               |
| **Orthogonality**       | Perpendicular vectors (dot = 0)                            | Feature independence                    |
| **Span**                | All vectors formed from combinations                       | Coverage of feature space               |
| **Linear Independence** | No vector depends on others                                | Avoid redundancy in features            |

---

## 🚀 Use-Cases Recap:

| Concept             | Real Use                                                          |
| ------------------- | ----------------------------------------------------------------- |
| Norms               | Control size of gradients in training (avoid exploding gradients) |
| Cosine Similarity   | Search engines, sentence similarity, recommendation engines       |
| Vector Space        | Word embeddings (GloVe, BERT), Feature vectors                    |
| Orthogonality       | Ensure non-overlapping features                                   |
| Linear Independence | Dimensionality reduction, noise removal                           |

---
---
---
---

# **3. Linear Algebra: Dot Product and Projections**

---

## 🔹 1. **Dot Product (Scalar Product)**

### ✅ **Definition**:

The **dot product** of two vectors results in a **scalar** and tells how much two vectors **align** with each other.

### ✳️ **Formula (Algebraic)**:

For two vectors `A = [a₁, a₂, ..., aₙ]`, `B = [b₁, b₂, ..., bₙ]`:

$$
A \cdot B = a₁b₁ + a₂b₂ + ... + aₙbₙ = \sum_{i=1}^{n} a_i b_i
$$

### ✳️ **Formula (Geometric)**:

$$
A \cdot B = ‖A‖‖B‖ \cos(θ)
$$

Where:

* `‖A‖` = magnitude (norm) of vector A
* `θ` = angle between A and B

---

### 🧠 **Interpretation**:

* If `A • B > 0`: angle < 90°, vectors point in **same direction**
* If `A • B = 0`: vectors are **orthogonal (perpendicular)**
* If `A • B < 0`: angle > 90°, vectors point in **opposite directions**

---

### 🧪 **Example**:

```python
import numpy as np
a = np.array([2, 3])
b = np.array([4, 1])
dot = np.dot(a, b)  # Output: 11
```

$$
A \cdot B = 2×4 + 3×1 = 8 + 3 = 11
$$

---

### 🤖 **Applications in AI/ML**:

| Use Case                 | Description                                                |
| ------------------------ | ---------------------------------------------------------- |
| **Cosine Similarity**    | Normalized dot product tells **text/vector similarity**    |
| **Attention Mechanisms** | Query • Key → Attention score (Transformer, GPT)           |
| **Neural Networks**      | Neuron output: `z = w • x + b`                             |
| **Loss Gradients**       | Backprop uses dot product between error and weight vectors |

---

## 🔹 2. **Projection of a Vector**

### ✅ **Definition**:

The **projection** of vector `A` onto vector `B` is the **shadow of A** in the direction of B.

### ✳️ **Formula (scalar projection)**:

$$
\text{proj}_{B}(A) = \frac{A \cdot B}{‖B‖}
$$

### ✳️ **Formula (vector projection)**:

$$
\vec{\text{proj}}_{B}(A) = \left( \frac{A \cdot B}{‖B‖^2} \right) B
$$

* It gives a new vector **in the direction of B**, but scaled to how much A lies in that direction.

---

### 🧠 **Geometric Intuition**:

Imagine shining a light on A — its shadow on B is the **projection**.

* If A and B are aligned → projection is full magnitude.
* If A and B are orthogonal → projection is 0.

---

### 🧪 **Example**:

Let `A = [3, 4]` and `B = [1, 0]`

```python
A = np.array([3, 4])
B = np.array([1, 0])
scalar_proj = np.dot(A, B) / np.linalg.norm(B)  # Output: 3
vector_proj = (np.dot(A, B) / np.dot(B, B)) * B  # Output: [3, 0]
```

The vector `[3, 0]` is A's projection onto B.

---

### 🤖 **Applications in AI/ML**:

| Use Case                               | Description                                                                |
| -------------------------------------- | -------------------------------------------------------------------------- |
| **Gradient Descent**                   | Gradients are projected directions for parameter updates                   |
| **PCA (Principal Component Analysis)** | Projects high-dimensional data onto **principal axes**                     |
| **Word Embeddings**                    | Project one word vector onto another to measure **semantic relationships** |
| **Feature Extraction**                 | Useful in dimensionality reduction and **noise filtering**                 |

---

## ✅ Summary Table

| Concept               | Formula              | Output | Intuition          | AI Use                  |
| --------------------- | -------------------- | ------ | ------------------ | ----------------------- |
| **Dot Product**       | `A • B = Σ aᵢbᵢ`     | Scalar | Measures alignment | Neural nets, similarity |
| **Scalar Projection** | `(A • B)/‖B‖`        | Scalar | Shadow length      | Similarity, projection  |
| **Vector Projection** | `((A • B)/‖B‖²) × B` | Vector | Shadow vector      | PCA, attention          |
| **Orthogonal**        | `A • B = 0`          | -      | Independent        | Feature design          |

---

## ✍️ Bonus: Visualization Tip

* Think of **dot product** as asking: *“How much of A goes in B’s direction?”*
* Think of **projection** as: *“Let’s find the actual portion of A that lies on B.”*

---
---
---
---

# **4. Linear Algebra: Matrices and Matrix Operations**

---

## 🔹 1. **What is a Matrix?**

### ✅ **Definition**:

A **matrix** is a **2D rectangular array** of numbers, arranged in **rows** and **columns**.

If a matrix has `m` rows and `n` columns, we say it's of **dimension** `m × n`.

$$
A = \begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix} \quad (2 × 3 \text{ matrix})
$$

### 📌 **Terminology**:

* **Element** at row `i`, column `j` is written as `aᵢⱼ`
* **Square matrix**: Rows = Columns (`n × n`)
* **Column vector**: `n × 1` matrix
* **Row vector**: `1 × n` matrix

---

## 🔹 2. **Basic Matrix Operations**

---

### 📘 A. **Matrix Addition / Subtraction**

* Only possible if matrices are of **same size**

$$
A + B = [aᵢⱼ + bᵢⱼ]
$$

### 🔍 AI Use:

* Combining outputs from layers
* Adding bias terms to weight matrices

---

### 📘 B. **Scalar Multiplication**

* Multiply **each element** by a scalar

$$
kA = [k × aᵢⱼ]
$$

### 🔍 AI Use:

* Scaling features or learning rates

---

### 📘 C. **Matrix Multiplication**

* A `m × n` matrix `A` can be multiplied by a `n × p` matrix `B`
* Result: `m × p` matrix

$$
C = A × B \quad \text{where } c_{ij} = \sum_{k=1}^{n} a_{ik} × b_{kj}
$$

> Not **element-wise**, this is **dot product** between row of A and column of B.

### 🔍 AI Use:

* **Core of neural network forward pass**:

  * `Z = W × X + b` where:

    * `X` is input matrix
    * `W` is weight matrix
    * `Z` is output (logits)

---

### 📘 D. **Transpose of a Matrix (Aᵀ)**

* Flip matrix over its diagonal

$$
Aᵀ_{ij} = A_{ji}
$$

If

$$
A = \begin{bmatrix}1 & 2\\ 3 & 4\end{bmatrix} \quad ⇒ \quad Aᵀ = \begin{bmatrix}1 & 3\\ 2 & 4\end{bmatrix}
$$

### 🔍 AI Use:

* Used in **vector-matrix operations**
* Helps align dimensions during dot products
* Transposing weight matrices in backpropagation

---

### 📘 E. **Identity Matrix (I)**

* Square matrix with `1s` on diagonal and `0s` elsewhere

$$
I = \begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}
$$

$$
AI = IA = A
$$

### 🔍 AI Use:

* Acts like **1** for matrix multiplication
* Used in **initialization**, **linear transformations**, and **solving equations**

---

### 📘 F. **Inverse Matrix (A⁻¹)**

* For square matrix A, if:

$$
AA^{-1} = A^{-1}A = I
$$

...then `A⁻¹` is the inverse.

> Only **non-singular** matrices (det(A) ≠ 0) have inverses.

### 🔍 AI Use:

* Solving equations like `Ax = b` → `x = A⁻¹b`
* In practice, avoided due to instability; **pseudo-inverse or numerical solvers** are used instead

---

### 📘 G. **Element-wise Operations**

Sometimes denoted with `⊙` (Hadamard Product):

* Multiply corresponding elements of same-sized matrices

```python
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[10, 20], [30, 40]])
C = A * B  # Element-wise multiplication
```

### 🔍 AI Use:

* Activation functions applied element-wise
* Gate computations in **LSTMs**, **attention**

---

## 🔹 3. **Matrix Properties (Quick Table)**

| Property                 | Description        |
| ------------------------ | ------------------ |
| **Associativity**        | A(BC) = (AB)C      |
| **Distributivity**       | A(B + C) = AB + AC |
| **Non-Commutative**      | AB ≠ BA (usually)  |
| **Transpose of Product** | (AB)ᵀ = BᵀAᵀ       |

---

## 🔹 4. **Matrix in AI — Real Examples**

| Concept        | Matrix Example                             | AI Usage                |
| -------------- | ------------------------------------------ | ----------------------- |
| **Inputs**     | Image: `28x28` → `784x1`                   | CNNs, input layers      |
| **Weights**    | Hidden layer weights: `n_hidden × n_input` | Feedforward computation |
| **Batch Data** | Batch of 64 inputs → `64 × input_dim`      | Training in batches     |
| **Embeddings** | Word matrix: `vocab_size × embedding_dim`  | NLP                     |
| **Attention**  | Matrices: Query × Key = Score              | Transformers, GPT       |

---

## ✅ Summary Flash Notes

| Operation | Meaning            | AI/ML Use                     |
| --------- | ------------------ | ----------------------------- |
| `A + B`   | Add element-wise   | Combine features              |
| `kA`      | Scalar multiply    | Scale features                |
| `AB`      | Matrix multiply    | Neural net forward pass       |
| `Aᵀ`      | Transpose          | Adjust shape for dot product  |
| `A⁻¹`     | Inverse            | Solve equations (theoretical) |
| `I`       | Identity           | Neutral element               |
| `A ⊙ B`   | Element-wise mult. | Activation, attention gates   |

---

## ✍️ Visual Intuition:

* Think of matrices as **data transformers** — each operation transforms input vectors to new representations.
* In neural networks: matrices **rotate, stretch, and compress** data through layers.

---
---
---
---

# **5.Linear Algebra: Determinant and Rank**

*(With AI/ML Applications)*

---

## 🔷 1. **Determinant of a Matrix**

### ✅ **Definition**:

The **determinant** is a **scalar value** that summarizes certain properties of a square matrix.

For a square matrix `A`, the determinant is denoted as `det(A)` or `|A|`.

---

### 🧮 **For Small Matrices**:

#### 🔹 2×2 Matrix:

$$
A = \begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
\Rightarrow \text{det}(A) = ad - bc
$$

#### 🔹 3×3 Matrix:

$$
A = \begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i
\end{bmatrix}
\Rightarrow \text{det}(A) = a(ei - fh) - b(di - fg) + c(dh - eg)
$$

For larger matrices, we use **cofactor expansion** or **LU decomposition**.

---

### 🧠 **Geometric Meaning**:

* **2D**: Area of the parallelogram formed by 2 column vectors
* **3D**: Volume of the parallelepiped
* **Higher Dimensions**: Generalizes to hypervolume

---

### 🔍 **Key Properties**:

| Property                              | Implication                            |
| ------------------------------------- | -------------------------------------- |
| `det(A) = 0`                          | Matrix is **singular**, not invertible |
| `det(AB) = det(A) × det(B)`           | Multiplicative                         |
| `det(Aᵀ) = det(A)`                    | Transpose doesn't change determinant   |
| Swapping rows ⇒ Changes sign of `det` | Useful in LU-based computations        |

---

### 🤖 **Use in AI/ML**:

| Application                      | Description                                                              |
| -------------------------------- | ------------------------------------------------------------------------ |
| **Model Invertibility**          | `det(A) = 0` means we **can’t solve** linear systems reliably            |
| **Feature Space Transformation** | Measures change in volume under transformation                           |
| **Jacobian Determinant**         | In generative models (e.g., **normalizing flows**) to compute likelihood |
| **Stability Checks**             | In optimization algorithms & numerical solvers                           |

---

### ✳️ **Python Example**:

```python
import numpy as np

A = np.array([[1, 2], [3, 4]])
det = np.linalg.det(A)  # Output: -2.0
```

---

## 🔷 2. **Rank of a Matrix**

### ✅ **Definition**:

The **rank** of a matrix is the **maximum number of linearly independent rows or columns**.

* Denoted as `rank(A)`
* Rank ≤ min(rows, columns)

---

### 🔍 **What Does Rank Tell Us?**

| Rank             | Meaning                                               |
| ---------------- | ----------------------------------------------------- |
| Full Rank        | All rows/columns are linearly independent             |
| Rank < min(m, n) | Some rows/columns are linear combinations (redundant) |

---

### 🧠 **Geometric Interpretation**:

* Rank tells us the **dimension of the space spanned** by the matrix (called the **column space**).
* For instance, a 3D dataset lying on a 2D plane → rank = 2.

---

### 📌 **Key Properties**:

| Property                           | Description                               |
| ---------------------------------- | ----------------------------------------- |
| `rank(A) = rank(Aᵀ)`               | Symmetry                                  |
| `rank(AB) ≤ min(rank(A), rank(B))` | Multiplication effect                     |
| Full rank matrix has inverse       | If square and `rank = n`, then invertible |

---

### 🤖 **Use in AI/ML**:

| Application                        | Description                                                        |
| ---------------------------------- | ------------------------------------------------------------------ |
| **PCA (Dimensionality Reduction)** | Rank = number of meaningful dimensions                             |
| **Overfitting Detection**          | Low-rank data suggests **feature redundancy**                      |
| **Linear Regression**              | Design matrix `X` must be full rank to **uniquely solve** `Xw = y` |
| **Data Compression**               | Low-rank approximations to reduce storage (SVD, autoencoders)      |

---

### ✳️ **Python Example**:

```python
A = np.array([[1, 2], [2, 4]])
rank = np.linalg.matrix_rank(A)  # Output: 1 (because second row is twice the first)
```

---

## ✅ Summary Table

| Concept         | Formula   | Tells You                          | AI Use                                       |
| --------------- | --------- | ---------------------------------- | -------------------------------------------- |
| **Determinant** | `det(A)`  | Volume scale factor; invertibility | Generative models, numerical stability       |
| **Rank**        | `rank(A)` | # of independent directions        | PCA, redundancy detection, model solvability |

---

## 🧩 Real-World AI Examples

| Task                  | Matrix Use                                                      |
| --------------------- | --------------------------------------------------------------- |
| **PCA**               | Choose `k` largest eigenvectors → rank defines how many to keep |
| **Linear Regression** | Solve `Xw = y` only if `rank(X) = #features`                    |
| **Autoencoders**      | Compress to a low-rank latent space                             |
| **Text Embeddings**   | Analyze rank to reduce dimensions via SVD or LSA                |

---
---
---
---

# **6. Linear Algebra: Eigenvalues and Eigenvectors**

---

## 🔷 1. **What are Eigenvalues and Eigenvectors?**

### ✅ **Definition**:

For a **square matrix** `A`, an **eigenvector** `v` is a **non-zero vector** such that when you multiply it by the matrix `A`, the result is a **scaled version** of the same vector `v`.

$$
A \cdot v = \lambda \cdot v
$$

* `v`: Eigenvector
* `λ` (lambda): **Eigenvalue**

---

### 🧠 **Visual Intuition**:

Think of a matrix `A` as a **transformation** (like stretching, rotating, scaling).
Most vectors change **direction and length** under `A`.
But **eigenvectors keep their direction** — only **scaled** by λ (eigenvalue).

---

### 🔢 **Example (2×2 Matrix)**:

Let:

$$
A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix},\quad
v_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix},\quad
v_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}
$$

Then:

$$
A v_1 = 2v_1,\quad A v_2 = 3v_2
$$

→ `v₁` and `v₂` are eigenvectors with eigenvalues 2 and 3, respectively.

---

## 🔷 2. **How to Compute Eigenvalues and Eigenvectors**

### 🧮 Step 1: Characteristic Equation

$$
A \cdot v = \lambda \cdot v \Rightarrow (A - \lambda I)v = 0
$$

To solve:

$$
\det(A - \lambda I) = 0
$$

This gives a polynomial in λ → **characteristic equation**.
Roots are the **eigenvalues**.

### 🧮 Step 2: Solve for Eigenvectors

Once you know λ, plug into:

$$
(A - \lambda I)v = 0
$$

and solve for non-zero `v`.

---

### ✳️ **Python Example**:

```python
import numpy as np

A = np.array([[2, 1], [1, 2]])
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)
```

---

## 🔷 3. **Key Properties**

| Property                                                  | Description       |
| --------------------------------------------------------- | ----------------- |
| Only for square matrices                                  | A must be `n × n` |
| Eigenvectors are non-zero                                 | `v ≠ 0`           |
| Matrix can have ≤ n eigenvalues                           | Depending on rank |
| Eigenvectors of **symmetric matrices** are **orthogonal** | Useful in PCA     |

---

## 🔷 4. **Use of Eigenvalues and Eigenvectors in AI/ML**

### ✅ A. **Principal Component Analysis (PCA)**

* PCA finds the **directions of maximum variance** in data.
* These directions are the **eigenvectors** of the **covariance matrix**.
* The corresponding **eigenvalues** tell how much variance each component explains.

$$
\text{Covariance Matrix} = \frac{1}{n} X^T X
\Rightarrow \text{Eigenvectors = PCA directions}
$$

### ✅ B. **Spectral Clustering**

* Uses **eigenvectors of graph Laplacians** to cluster nodes in graphs.

### ✅ C. **Understanding Model Stability**

* In optimization, **eigenvalues of the Hessian matrix** show if a point is a **minima, maxima, or saddle point**.

### ✅ D. **Deep Learning (Backpropagation)**

* Eigenvalues of **weight matrices** can indicate **vanishing/exploding gradients** → helps with stability.

### ✅ E. **Quantum Computing, GNNs, LSA, SVD**

* All rely heavily on eigen-decomposition concepts.

---

## 🔷 5. **Summary Table**

| Concept         | Meaning                                          | AI/ML Use                             |
| --------------- | ------------------------------------------------ | ------------------------------------- |
| **Eigenvector** | Vector that keeps direction under transformation | PCA directions, stable directions     |
| **Eigenvalue**  | Scalar that scales the eigenvector               | Explained variance, gradient strength |
| **Symmetric A** | Orthogonal eigenvectors                          | Used in PCA, SVD                      |
| **Large λ**     | Strong variation in that direction               | Keep top λs in PCA                    |
| **λ = 0**       | Data lies in a lower dimension                   | Rank deficiency                       |

---

## 🔷 6. **Bonus: PCA Explained via Eigen Concepts**

### Let data matrix `X`:

* Compute **covariance matrix**: `C = XᵀX`
* Compute **eigenvectors of C** → principal directions
* Select top `k` eigenvectors with largest eigenvalues
* Project data onto those `k` directions → reduced dimensional space

---

## 📌 Flash Notes

| Term              | Description                          |
| ----------------- | ------------------------------------ |
| `Av = λv`         | Definition of eigenvalue/eigenvector |
| `det(A - λI) = 0` | Characteristic equation              |
| **λ > 1**         | Stretching                           |
| **λ = 1**         | No scale change                      |
| **λ < 1**         | Compression                          |
| **λ = 0**         | Collapse (no variance)               |

---
---
---
---

# **7. Linear Algebra: Vector Spaces & Transformations**
---

## 🔷 1. **What is a Vector Space?**

### ✅ **Definition**:

A **vector space** is a set of vectors (real or complex) that is:

* **Closed under vector addition**
* **Closed under scalar multiplication**

That means:
If `v₁` and `v₂` are in the vector space **V**, then:

* `v₁ + v₂` ∈ V
* `a * v₁` ∈ V for any scalar `a`

---

### 📌 **Requirements (Axioms)**:

A valid vector space must follow these rules:

| Property                            | Example                       |
| ----------------------------------- | ----------------------------- |
| Closure under addition              | `v + w ∈ V`                   |
| Closure under scalar multiplication | `a * v ∈ V`                   |
| Associativity & commutativity       | `(u + v) + w = u + (v + w)`   |
| Zero vector exists                  | `0 ∈ V` such that `v + 0 = v` |
| Every vector has an inverse         | `v + (-v) = 0`                |

---

### 🔢 **Examples of Vector Spaces**:

| Space          | Example                                                |
| -------------- | ------------------------------------------------------ |
| ℝ² (2D plane)  | `[1, 2]`, `[3, 4]`                                     |
| ℝ³ (3D space)  | `[1, 0, -1]`                                           |
| ℝⁿ             | Word embeddings (300D vectors)                         |
| Matrix space   | All `m × n` matrices                                   |
| Function space | Functions like `f(x) = sin(x)` form vector spaces too! |

---

### 🧠 **Why Are Vector Spaces Important in AI?**

They allow us to:

* Represent data as geometric objects
* Use **linear algebra** to manipulate, reduce, and transform high-dimensional information
* Define **direction, distance, and angles** between points (vectors)
* Analyze **data structure**, redundancy, and learnable patterns

---

## 🔷 2. **Basis and Dimension**

### ✅ **Basis**:

A **basis** of a vector space is a **minimal set of linearly independent vectors** that **span** the space.

> Think of it as the set of "building blocks" needed to reconstruct any vector in the space.

### ✅ **Dimension**:

The number of basis vectors → **dimension** of the space.

### 🔢 Examples:

* Basis of ℝ²: `[(1,0), (0,1)]` → 2D
* Word embeddings: 300 basis vectors → 300D space

---

## 🔷 3. **Linear Transformation (Mapping Vectors)**

### ✅ **Definition**:

A **linear transformation** `T` maps vectors from one space to another:

$$
T: V → W
$$

It satisfies:

* **T(v + w) = T(v) + T(w)**
* **T(c \* v) = c \* T(v)**

> In matrix form:

$$
T(v) = A \cdot v
$$

Where `A` is a matrix representing the transformation.

---

### 🧠 **Visual Intuition**:

Think of a matrix as:

* Rotating vectors
* Stretching/shrinking them
* Projecting them onto lower dimensions
* Reflecting or flipping them

Each transformation reshapes the vector space, and this is how neural networks **learn**.

---

### 🔢 Examples:

#### 🔹 Rotation Matrix (2D):

$$
R = \begin{bmatrix}
\cos θ & -\sin θ \\
\sin θ & \cos θ
\end{bmatrix}
$$

#### 🔹 Scaling Matrix:

$$
S = \begin{bmatrix}
2 & 0 \\
0 & 3
\end{bmatrix}
\Rightarrow \text{Scales x by 2 and y by 3}
$$

#### 🔹 Projection Matrix:

$$
P = \begin{bmatrix}
1 & 0 \\
0 & 0
\end{bmatrix}
\Rightarrow \text{Projects vector onto the x-axis}
$$

---

## 🔷 4. **Kernel and Image**

| Term                    | Meaning                                     |
| ----------------------- | ------------------------------------------- |
| **Kernel (Null Space)** | Set of all vectors `v` such that `T(v) = 0` |
| **Image (Range)**       | Set of all output vectors `T(v)`            |

* If the **kernel is non-zero**, the transformation **loses information** (used in detecting **dimensionality reduction**).
* Full-rank transformations have only the **zero vector in the kernel**.

---

## 🔷 5. **AI & ML Applications of Vector Spaces & Transformations**

| Area                      | Use of Vector Spaces/Transformations                                                        |
| ------------------------- | ------------------------------------------------------------------------------------------- |
| **PCA**                   | Transforms high-D data to lower-D using eigenvectors                                        |
| **Neural Networks**       | Each layer applies a **linear transformation** followed by non-linearity                    |
| **Word Embeddings**       | Words live in a vector space; directions capture meaning (e.g., king - man + woman ≈ queen) |
| **Autoencoders**          | Learn a compressed subspace that represents the data                                        |
| **GANs**                  | Latent vectors lie in a learned vector space, transformed into images                       |
| **Graph Neural Networks** | Project graph nodes into new vector spaces at each layer                                    |
| **Transformer Attention** | Query, Key, Value vectors are linearly transformed for attention calculation                |

---

## 📌 Flash Summary

| Concept                   | Definition                                            | Use in AI                   |
| ------------------------- | ----------------------------------------------------- | --------------------------- |
| **Vector Space**          | Set of vectors closed under addition and scalar mult. | All ML data lives here      |
| **Basis**                 | Minimal spanning set                                  | Defines dimensions          |
| **Dimension**             | # of basis vectors                                    | Complexity of data          |
| **Linear Transformation** | Map vectors via matrices                              | Neural layers, embeddings   |
| **Kernel**                | Vectors mapped to zero                                | Dimensionality loss check   |
| **Image**                 | Output of transformation                              | Useful in PCA & activations |

---
---
---
---

# **8. Linear Algebra: Orthogonality and Orthonormal Basis**

---

## 🔷 1. **Orthogonality**

### ✅ **Definition**:

Two vectors `u` and `v` are **orthogonal** if their **dot product is zero**:

$$
u \cdot v = 0
$$

This means the vectors are **perpendicular** in space.

---

### 🧠 **Why Important?**

Orthogonal vectors are **linearly independent** → no overlap in information.

---

### 🔢 **Example**:

$$
u = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad
v = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \Rightarrow u \cdot v = 0
$$

→ Vectors are **orthogonal** (90° apart)

---

## ✅ Properties of Orthogonal Vectors

| Property                                  | Description                  |
| ----------------------------------------- | ---------------------------- |
| `u • v = 0`                               | Vectors are perpendicular    |
| `‖u + v‖² = ‖u‖² + ‖v‖²`                  | **Pythagoras theorem** holds |
| Orthogonal vectors ⇒ Linearly independent | No redundancy                |
| Used in QR, PCA, SVD                      | Computational stability      |

---

## 🤖 **Orthogonality in AI/ML**

| Task                 | Use of Orthogonality                                           |
| -------------------- | -------------------------------------------------------------- |
| **PCA**              | Principal components are orthogonal directions of max variance |
| **Neural Nets**      | Orthogonal initialization of weights improves convergence      |
| **Word Embeddings**  | Orthogonal directions = unrelated concepts                     |
| **Clustering**       | Orthogonal distance metrics in K-means / cosine similarity     |
| **Gradient Descent** | Orthogonal gradients reduce interference during updates        |

---

## 🔷 2. **Orthonormal Vectors**

### ✅ **Definition**:

Vectors are **orthonormal** if they are:

1. **Orthogonal** (perpendicular)
2. Each has **unit length** (magnitude = 1)

$$
u \cdot v = 0,\quad ‖u‖ = 1,\quad ‖v‖ = 1
$$

---

### 🔢 **Example**:

$$
e₁ = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad
e₂ = \begin{bmatrix} 0 \\ 1 \end{bmatrix}
\Rightarrow \text{Orthonormal Basis of ℝ²}
$$

---

## ✅ Properties of Orthonormal Vectors

| Property           | Meaning                                                              |
| ------------------ | -------------------------------------------------------------------- |
| `u • u = 1`        | Unit length                                                          |
| `u • v = 0`        | Perpendicular                                                        |
| **Easy to invert** | If matrix has orthonormal columns, its inverse is its **transpose**: |

$$
Q^{-1} = Q^T
\] |
| **Efficient projections** | Easier computation in ML tasks |

---

## 🔷 3. **Orthonormal Basis**

### ✅ **Definition**:
An **orthonormal basis** is a set of **orthonormal vectors** that **span a vector space**.

So any vector in the space can be written as a **linear combination** of those basis vectors.

---

### 🔢 Example in ℝ³:
\[
e₁ = [1, 0, 0],\quad
e₂ = [0, 1, 0],\quad
e₃ = [0, 0, 1]
$$

These form the **standard orthonormal basis** for ℝ³.

---

## 🤖 **Applications in AI/ML**

| Application                            | Description                                                                |
| -------------------------------------- | -------------------------------------------------------------------------- |
| **PCA**                                | Eigenvectors form an **orthonormal basis** of the data space               |
| **Autoencoders**                       | Latent representations aim for orthogonality to **reduce feature overlap** |
| **SVD (Singular Value Decomposition)** | Decomposes matrix into orthonormal matrices (U, V)                         |
| **QR Decomposition**                   | Used in linear regression, provides orthonormal basis                      |
| **Orthogonal Initialization**          | Helps stabilize deep learning training                                     |

---

### 📌 Fast Projection using Orthonormal Basis

If `U = [u₁, u₂, ..., uₖ]` are orthonormal, then:

To project `x` onto this space:

$$
\text{proj}_{U}(x) = \sum_{i=1}^{k} (x \cdot uᵢ) uᵢ
$$

This makes dimensionality reduction (like PCA) **computationally cheap** and **interpretable**.

---

## 🔷 4. **Gram-Schmidt Process** (To create orthonormal basis)

Given a set of vectors, the **Gram-Schmidt algorithm** constructs an orthonormal basis.

### ✳️ Python Snippet:

```python
import numpy as np

def gram_schmidt(V):
    U = []
    for v in V:
        for u in U:
            v = v - np.dot(v, u) * u
        U.append(v / np.linalg.norm(v))
    return np.array(U)

V = np.array([[1, 1], [1, 0]])
orthonormal_basis = gram_schmidt(V)
```

---

## ✅ Flash Summary

| Concept           | Meaning                                                   | Use in ML                         |
| ----------------- | --------------------------------------------------------- | --------------------------------- |
| Orthogonal        | Vectors at 90°                                            | Feature independence              |
| Orthonormal       | Orthogonal + Unit Length                                  | Basis in PCA, SVD                 |
| Orthonormal Basis | Minimal, stable basis                                     | Efficient vector projections      |
| Gram-Schmidt      | Turns linearly independent vectors into orthonormal basis | Preprocessing for PCA, regression |

---
---
---
---
---