# Machine Learning Foundation Series: Intro to Linear Algebra

Linear algebra is the study of solving systems of linear equations. In its simplest form, consider the equation

$$
2x + 5 = 25.
$$

Subtracting 5 from both sides and dividing by 2 gives the unique solution $x = 10$. In contrast, if an equation contains any non‐linear terms (for example, $x^2$ or $\sqrt{x}$), it falls outside the realm of linear algebra.

A system of linear equations can have:
- A **unique solution** (intersecting lines),
- **No solution** (parallel lines), or
- **Infinitely many solutions** (coincident lines).

### Example: The Sheriff and the Bank Robber

Imagine a sheriff in a car traveling at 180 km/h (which is 3 km/min) chasing a bank robber in a car traveling at 150 km/h (2.5 km/min). The bank robber gets a 5‑minute head start. Setting the distance functions equal gives

$$
2.5T = 3(T - 5),
$$

which, when solved, yields $T = 30$ minutes and a crossover distance of 75 km. This simple system illustrates the nature of linear equations: a single intersection point where the paths meet.

---

## Tensors: The Data Structures of Linear Algebra

Tensors generalize the concepts of scalars, vectors, and matrices to any number of dimensions:

- **Scalars (0D Tensors):** Single numeric values.
- **Vectors (1D Tensors):** Ordered arrays representing points or directions in space.
- **Matrices (2D Tensors):** Rectangular arrays with rows and columns.
- **Higher-Dimensional Tensors:** For example, a 4D tensor can represent a batch of images with dimensions corresponding to (batch size, height, width, channels).

These data structures are created and manipulated using Python libraries:
- **NumPy:** The foundation for array and matrix operations.
- **TensorFlow:** Provides tensors along with automatic differentiation and production-level tools.
- **PyTorch:** Known for its intuitive API and GPU acceleration.

A typical Python snippet for creating a vector with NumPy might look like this:

```python
import numpy as np
x = np.array([25, 3])
```

Note that one-dimensional arrays in NumPy may require extra dimensions (e.g., using nested brackets) for operations like transposition to take effect.

---

## Vector Operations and Norms

Vectors not only indicate a point in space but also carry magnitude and direction. A common operation is **transposition**, which turns a row vector into a column vector (and vice versa). In notation, the transpose of vector $\mathbf{x}$ is written as

$$
\mathbf{x}^T.
$$

### Norms

Norms are functions that measure the "length" or "magnitude" of a vector. Some common norms include:

- **L2 Norm (Euclidean Norm):**

$$
\| \mathbf{x} \|_2 = \sqrt{\sum_{i} x_i^2}
$$

- **L1 Norm:**

$$
\| \mathbf{x} \|_1 = \sum_{i} |x_i|
$$

- **Squared L2 Norm:**

$$
\| \mathbf{x} \|_2^2 = \sum_{i} x_i^2
$$

- **Max Norm (Infinity Norm):**

$$
\| \mathbf{x} \|_\infty = \max_i |x_i|
$$

These norms are fundamental in various machine learning tasks—for example, in regularizing regression models or measuring distances in high-dimensional spaces.

A **unit vector** is a special vector whose L2 norm is equal to 1:

$$
\| \mathbf{u} \|_2 = 1.
$$

---

## Special Vectors: Basis, Orthogonal, and Orthonormal

In any vector space, certain vectors serve as building blocks:

- **Basis Vectors:**  
  Typically chosen as unit vectors along the axes, they allow any vector in the space to be represented as a linear combination. For a 2D space, the common basis vectors are

  $$
  \mathbf{i} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad \mathbf{j} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}.
  $$

- **Orthogonal Vectors:**  
  Two vectors are orthogonal if their dot product is zero,

  $$
  \mathbf{x} \cdot \mathbf{y} = 0,
  $$

  meaning they are at a right angle to one another.

- **Orthonormal Vectors:**  
  Vectors that are both orthogonal and of unit length. Basis vectors in Euclidean spaces are typically orthonormal.

---

## Matrices and High-Dimensional Tensors

A **matrix** is a two-dimensional tensor. In standard notation, matrices are denoted by uppercase bold letters (e.g., $\mathbf{X}$) with elements $x_{ij}$ where $i$ indicates the row and $j$ the column. For example, a matrix with three rows and two columns has the shape (3, 2).

Slicing operations allow you to extract entire rows or columns:

- **Entire row:** $\mathbf{X}[i, :]$
- **Entire column:** $\mathbf{X}[:, j]$

Higher-dimensional tensors extend these concepts. For instance, in deep learning, a common 4D tensor for image data has the shape:

$$
(\text{batch size}, \text{height}, \text{width}, \text{channels})
$$

A batch of 32 images with dimensions 28×28 pixels and 3 color channels would be represented as a tensor of shape $(32, 28, 28, 3)$.