# Linear algebra 1

Contents

1. Notation and conventions (rows/columns)
2. Basic vector operations (addition, scaling)
3. Norms (L2, Lp, triangle inequality)
4. Unit vectors and normalization
5. Dot products (formulas, Cauchy-Schwarz, orthogonality)
6. Matrix multiplication (row perspective, column perspective)

In this notebook, we recall some basic facts about vectors and matrices. 

# Notation and conventions

## Direct products of sets
Let $S_1,\dotsc,S_n$ be a finite collection of non-empty sets. The *direct* (or *Cartesian*) *product* of these sets is the set
$$S_1 \times \dotsb \times S_n = \{ (x_1,\dotsc,x_n) \mid x_i \in S_i \textup{ for } i=1,\dotsc,n \}.$$
The reason for the "$\times$" notation is that if each $S_i$ is a finite set (of size $|S_i|$), then the size of the direct product is the product of the sizes: $$|S_1 \times \dotsb \times S_n| = |S_1| \times \dotsb \times |S_n|.$$

## The space $\mathbb{R}^n$
The real line is denoted $\mathbb{R}$, and sometimes, $\mathbb{R}^1$. The notation $\mathbb{R}^n$ represents the direct product of $\mathbb{R}$ with itself $n$ times:
\begin{align*}
    \mathbb{R}^n & \coloneqq \underbrace{\mathbb{R} \times \dotsb \times \mathbb{R}}_{n \textup{ copies}} \\
                & = \{ (x_1,\dotsc,x_n) \mid x_1,\dotsc,x_n \in \mathbb{R} \}.
\end{align*}
These are the most intuitive spaces to do geometry and visualize things, and as such, elements of $\mathbb{R}^n$ are often referred to as *points*. For example, $\mathbb{R}^1$ is simply the real line, $\mathbb{R}^2$ is a plane, $\mathbb{R}^3$ is an abstract representation of physical 3d-space, and so on.

## Vectors
When we want to do arithmetic with points in $\mathbb{R}^n$, such as addition and multiplication, it becomes appropriate to consider them as **vectors**. Informally, a vector is an arrow that starts at the origin and ends at some point. For us, vectors will always mean (unless otherwise specified) ***column vectors**, denoted as follows:
\begin{equation*}
    \vec{x} = \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} \in \mathbb{R}^n.
\end{equation*}
When we want to instead work with **row vectors**, we use the superscript "$(\cdot)^T$" (for **transpose**) to denote our vector:
\begin{equation*}
    \vec{x}^T = [x_1 \, \dotsb \, x_n] \in \mathbb{R}^n.
\end{equation*}
Since the transpose of a row is a column, the notation $\vec{x} = [x_1 \, \dotsb \, x_n]^T$ means that $\vec{x}$ is a column vector.

## Displacement vectors
THe vectors $\vec{x}$ or $\vec{x}^T$ above should be thought of as the arrow starting at the origin and ending at the point $(x_1,\dotsc,x_n)$. Very often, it is useful to visualize vectors with different starting points. When the starting point is not the origin, we refer to these as *displacement vectors*. For example, given any starting point $P = (a_1,\dotsc,a_n) \in \mathbb{R}^n$, the same vector $\vec{x}$ above can be visualized as the displacement vector which starts at $P$ and ends at the point $Q = (a_1 + x_1,\dotsc,a_n + x_n)$. 

The term "displacement" arises because displacement vectors naturally arise in the context of physics: if an object starts at a point $P = (a_1,\dotsc,a_n)$ and ends at $Q = (b_1,\dotsc,b_n)$, then the displacement of the object is given by the displacement vector 
\begin{equation*}
    \vec{PQ} = \begin{bmatrix} b_1 - a_1 \\ \vdots \\ b_n - a_n \end{bmatrix} \in \mathbb{R}^n.
\end{equation*}

# Basic properties of vectors

## Distance between points
The **norm** or **magnitude** of a vector $\vec{x} = \begin{bmatrix} \; x_1 & \dotsb & x_n \; \end{bmatrix}^T \in \mathbb{R}^n$ is simply its length. The formula for the length is a generalization of the Pythagorean Theorem:
\begin{equation*}
    || \vec{x} ||^2 = x_1^2 + \dotsb + x_n^2.
\end{equation*}
Given points $P = (a_1,\dotsc,a_n)$ and $Q = (b_1,\dotsc,b_n) \in \mathbb{R}^n$, the *distance* between $P$ and $Q$ is defined to be the norm of the displacement vector $\vec{PQ}$:
\begin{equation*}
    \textup{Distance between $P$ and $Q$ } = ||\vec{PQ}|| = \sqrt{ \sum_{i=1}^n (b_i -a_i)^2 }.
\end{equation*}

## Linear combinations
Vectors $\vec{x}$ and $\vec{y}$ in the same space $\mathbb{R}^m$ can be added to give a new vector $\mathbb{x}+\mathbb{y} \in \mathbb{R}^m$; the addition is done coordinate-wise. Note that if we view points $P$ and $Q$ as vectors $\vec{u}$ and $\vec{v}$, then the displacement vector $\vec{PQ}$ is given by $\vec{PQ} = \vec{v} - \vec{u}$. 

Given a scalar $\lambda \in \mathbb{R}$, we can scale $\vec{x}$ by $\lambda$ to get a new vector $\lambda\vec{x} \in \mathbb{R}^m$. Given vectors $\vec{x}_1,\dotsc,\vec{x}_n,\vec{y} \in \mathbb{R}^m$, we say that $\vec{y}$ is a *linear combination$ of the $\vec{x}_i$'s if there exist scalars $c_1,\dotsc,c_n \in \mathbb{R}$  such that
\begin{equation*}
    \vec{y} = c_1\vec{x}_1 + \dotsb + c_n \vec{x}_n \in \mathbb{R}^m.
\end{equation*}

The **span** of the vectors $\vec{x}_1,\dotsc,\vec{x}_n$ is the set of all linear combinations:
\begin{equation*}
    \textup{span}(\vec{x}_1,\dotsc,\vec{x}_n) = \{c_1\vec{x}_1 + \dotsb + c_n \vec{x}_n \in \mathbb{R}^m \mid c_1,\dotsc,c_n \in \mathbb{R} \}.
\end{equation*}
Note that $\textup{span}(\vec{x}_1,\dotsc,\vec{x}_n)$ is a subset of $\mathbb{R}^m$. 

## Dot product
The **dot product** of two vectors $\vec{x} = \begin{bmatrix} \; x_1 & \dotsb & x_n \; \end{bmatrix}^T$ and $\vec{y} = \begin{bmatrix} \; y_1 & \dotsb & y_n \; \end{bmatrix}^T$ is the scalar defined as follows:
\begin{equation*}
    \vec{x} \cdot \vec{y} = x_1y_1 + \dotsb + x_ny_n.
\end{equation*}
Note that $\vec{x}^T$ can be thought of as a $1 \times n$ matrix, and $\vec{y}$ can be thought of as an $n \times 1$ matrix. Then, the dot product is simply given by matrix multiplication:
\begin{equation*}
    \vec{x} \cdot \vec{y} = \vec{x}^T \vec{y} \in \mathbb{R}.
\end{equation*}
Often, one can compute the dot product of two vectors without knowing all their entries. Indeed, if $\theta$ denotes the angle between $\vec{x}$ and $\vec{y}$, then we have the formula
\begin{equation*}
    \vec{x} \cdot \vec{y} = ||\vec{x} || \cdot ||\vec{y}|| \cdot \cos(\theta).
\end{equation*}



## Rows and columns of datasets
Suppose we have a ML provlem with features $X_1,\dotsc,X_n$ and target $Y$. For simplicity, assume that all features and target are continuous variables. Suppose there are $m$ instances or observations for which we know the values of $X_1,\dotsc,X_n,Y$. These values can be put together into a labelled dataset that looks like:
\begin{equation*}
    \begin{bmatrix}
        x_{11}  & \dotsb & x_{1n} & y_1\\
        \vdots  & \ddots & \vdots & \vdots\\
        x_{m1}  & \dotsb & x_{mn} & y_m
    \end{bmatrix}.
\end{equation*}
We write $\mathbb{R}^{m \times n}$ to denote the set of $m\times n$ matrices. The **design matrix** is the $m \times n$ matrix $$X = [x_{ij}] \in \mathbb{R}^{m \times n}.$$ (This notation means that the entry in $i$-th row and $j$-th column is $x_{ij}$.) So,
\begin{equation*}
    \textup{Dataset } = \begin{bmatrix} \; X & \vec{y} \; \end{bmatrix} \in \mathbb{R}^{m \times (n+1)}.
\end{equation*}

- **Rows.** Observe that the $i$-th row of $X$ is
\begin{equation*}
    \vec{r}_i = \begin{bmatrix} \; x_{i1} & \dotsb & x_{in} \; \end{bmatrix} \in \mathbb{R}^n.
\end{equation*}
By the **feature space**, we mean the space $\mathbb{R}^n$ of all possible row vectors. Each vector $\vec{r}$ in the feature space corresponds to an instance, and the $i$-th coordinate of $\vec{r}$ records the value of $X_i$ for that instance. 

In our framework of supervised ML, the "ground truth" function $\mathbf{F}$ is a function from the feature space to $\mathbb{R}$ (which should be the thought of as the "target space", i.e. all possible $Y$-values):
\begin{equation*}
    \mathbf{F}: \mathbb{R}^n \to \mathbb{R}.
\end{equation*}
By assumption, for every row $(\vec{r},y)$ in the labelled dataset, this ground truth function satisfies
\begin{equation*}
    y = \matthbf{F}(\vec{r}) + \epsilon,
\end{equation*}
where $\epsilon$ is some small error term. 

- **Columns.**  The columns of $X$ are vectors in $\mathbb{R}^m$; each column $\vec{x}_i$ corresponds to the feature $X_i$. The last column $\vec{y} \in \mathbb{R}^m$ corresponds to the target. We often summarize this info by writing
\begin{equation*}
    \textup{Dataset } = \begin{bmatrix} \; \vec{x}_1 & \dotsb & \vec{x}_n & \vec{y} \; \end{bmatrix} = \begin{bmatrix} \; X & \vec{y} \; \end{bmatrix} \in \mathbb{R}^{m \times n}.
\end{equation*}
Another way to view the dataset in terms of rows and not columns. 