# Cheatsheet 1. Vectors

This cheatsheet covers vectors only. Matrices will be covers in a separate notebook. 

In [4]:
import numpy as np

# Table of Contents
(clickable is saved as HTML or downloades as .ibynb)  
- [Definitions](#vectors)
- [Vector Operations](#vecop)
- [Vector Norm](#vecnorm)
- [Dot Product](#dotproduct)
- [Outer Product](#outerproduct)
- [Vectors with Complex Numbers](#complexvectors)
- [Spaces and Subspaces](#spaces)
- [Span](#span)
- [Linear Independence](#linindep)
- [Basis](#basis)

<a id="vectors"></a>
# Defintions

Vector - an ordered list of numbers. Typical notations for the vector: 
$$\vec{v}, \boldsymbol{v}$$
In this cheatsheet the vectors will be written as simple lowercase letters - e.g., $a$, $b$. 

Geometrically a vector specifies the direction. Standard position is when the tail is at the origin.   
<b>IMPORTANT</b>: In Liner Algebra a vector is assumed to be a column vector. 

<a id="vecop"></a>
# Basic Vector Operations

- Vector addition:  
    - Geometrically: plact tail of one vector to the head of the other and connect. 
    - Algebraically: add elementwise 
- Scalar multiplication:   
    - Scalar is typically denoted by some lowercase Greek letter: $\alpha, \beta, \lambda$
    - Scalar multiplication never changes the direction of a vector (it can reverse it, when $\lambda < 0$, but the direction is considered to be the same - the angle doesn't change!

<a id="vecnorm"></a>
# Vector Norm 

Vector magnitude (vector norm) can be calculated from the Pythagorean Theorem which can be expressed using dot product: 
$$\lVert{\vec{v}}\rVert = \sqrt{\vec{v}^T \vec{v}}$$

<b>In Python:</b>

In [5]:
import numpy as np 
v = [1, 2, -4]
np.linalg.norm(v)

4.58257569495584

<a id="dotproduct"></a>
# Dot Product (Inner Product)

Dot Product - single number that provides information about the relation between the vectors. 

$$\alpha = a \cdot b = <a,b> = a^T b = \sum_{i=1}^{n}a_i b_i = \lVert a \rVert \lVert b \rVert \cos{\theta_{ab}}$$

<b>In Python:</b>

In [6]:
a = [1,2,3]
b = [4,5,6]
np.dot(a,b)

32

<b>Important note</b>: $a^T b = b^T a$ 

Dot product is only defined for the vectors of the same length (for obvious reasons). 

From this equation we can also find the angle between any two vectors:

$$\cos \theta_{ab} = {\frac{\alpha}{\lVert a \rVert \lVert b \rVert}}$$

$$\theta_{ab} = \arccos \Big({\frac{\alpha}{\lVert a \rVert \lVert b \rVert}}\Big)$$

Another way to calculate dot product is as the product of vector norms scaled by the cosine of the angle between them:

$$\alpha = \lVert a \rVert \lVert b \rVert \cos{\theta_{ab}}$$

The angle (or just the value of cosine) gives us the information about the angle between the vectors. There are four cases possible: 

1) $\cos \theta > 0 \implies \alpha > 0$, acute angle  
2) $\cos \theta < 0 \implies \alpha < 0$, obtuse angle  
3) $\cos \theta = 0 \implies \alpha = 0$, orthogonal   
4) $\cos \theta = 1 \implies \alpha = \lVert a \rVert \lVert b \rVert$, collinear 

### Proof for the dot product
We start with The Law of Cosines: 

$${\lVert a - b \rVert}^2 = {\lVert a \rVert}^2 + {\lVert b \rVert}^2 - 2 {\lVert a \rVert} {\lVert b \rVert} \cos \theta_{ab} $$

(Note: if $\theta_{ab} = 90$, we get standard Pythagorean theorem)

Now, let's write out the norm using the definitions for the dot product: 

$${\lVert a - b \rVert}^2 = (a-b)^T (a-b) = a^T a - a^T b - b^T a + b^T b = \\ = a^T a - a^T b - a^T b + b^T b = 
a^T a - 2 a^T b + b^T b $$

Now we can equate these two parts: 
$${\lVert a \rVert}^2 + {\lVert b \rVert}^2 - 2 {\lVert a \rVert} {\lVert b \rVert} \cos \theta_{ab} = a^T a - 2 a^T b + b^T b$$

Reminder: ${\lVert v \rVert}^2 = v^T v$, so these terms cancel out, as well as -2

$${\lVert a \rVert} {\lVert b \rVert} \cos \theta_{ab} =  a^T b$$
QED


<a id="outerproduct"></a> 
       
# Outer Product 

Unlinke inner product, outer product produces a matrix and is denoted as follows: 

$$v w^T$$
Reminder, $v$ and $w$ are column vectors. 

$$\begin{bmatrix}
1\\
2\\
3\\
4\\
\end{bmatrix}
\begin{bmatrix}
a&b&c&d\\
\end{bmatrix}=
\begin{bmatrix}
1a&1b&1c&1d\\
2a&2b&2c&2d\\
3a&3c&3c&3d\\
4a&4d&4c&4d\\
\end{bmatrix}
$$

There are two perspectives on outer product: column and row.   
Column (where we think of each column of the new matrix as the product of the first vector, multiplied elementwise by the corresponding element of the second vector. For example: 

$$\begin{bmatrix}
1\\
2\\
3\\
4\\
\end{bmatrix}
\begin{bmatrix}
a\\
\end{bmatrix}=
\begin{bmatrix}
1a\\
2a\\
3a\\
4a\\
\end{bmatrix}
$$

Row (where we think of each row of the new matrix as of the product of each element of the first vector with the whole second vector. For example: 

$$\begin{bmatrix}
1\\
\end{bmatrix}
\begin{bmatrix}
a&b&c&d\\
\end{bmatrix}=
\begin{bmatrix}
1a&1b&1c&1d\\
\end{bmatrix}
$$

<a id="complexvectors"></a> 
       
# Vectors with Complext Numbers

$i=\sqrt{-1}$ (so that we can solve thing like $x^2 + 1 = 0$

Complex vectors are important for Fourier Transform (I hope I'll get there eventually...*sobs*)

### Complex numbers multiplication

$$z = a+ ib, z \in \mathbb{C}$$
$$w = c+id, w \in \mathbb{C}$$
$$zw = (a+ib)(c+id)=ac+aid+cib+i^2bd=ac+aid+cid-bd$$

### Hermitian Transpose

<table>
    <tr>
        <th>Complex Number</th>
        <th>Conjugate</th>
    </tr>
    <tr>
        <td>$a+ib$</td>
        <td>$a-ib$</td>
    </tr>
    <tr>    
        <td>$a-ib$</td>
        <td>$a+ib$</td>    
    </tr>    
    <tr>    
        <td>$a$ (imaginary part is 0)</td>
        <td>$a$</td>    
    </tr>        
</table>

$${\begin{bmatrix}
1+3i\\
-2i\\
4\\
-5\\
\end{bmatrix}}^H=
{\begin{bmatrix}
1+3i\\
-2i\\
4\\
-5\\
\end{bmatrix}}^*=
\begin{bmatrix}
1-3i&2i&4&-5\\
\end{bmatrix}$$

The rationale for Hemitian transpose is very simple. Let's say we have a comple number $z=3+4i$. If we plot it on real-imaginary plane (x - real part, y - imaginary part), then the magnitude of the vector will be 5 (according to Pythagorean theorem). However, if we do simple dot product with traditional transpose we will not get $5^2$: 

$$[3+4i]^T*[3+4i]=9+12i+12i+16i^2=9+24i-16=-7+24i$$

Note: here the transpose doesn't change anything since we have a vector with just one element. 
However, if we make Hermitian transpose, it starts to make sense: 

$$[3+4i]^H*[3+4i]=[3-4i]*[3+4i]=9+12i-12i-16i^2=9+16=25$$

In Python: 

In [7]:
v = np.array( [ 3, 4j, 5+2j, np.complex(2,-5) ] )
print( v.T )
print( np.transpose(v) )
print( np.transpose(v.conjugate()) )

[3.+0.j 0.+4.j 5.+2.j 2.-5.j]
[3.+0.j 0.+4.j 5.+2.j 2.-5.j]
[3.-0.j 0.-4.j 5.-2.j 2.+5.j]


# Unit Vectors

Unit vector is a vector $\mu v$ such that: 
$$\lVert \mu v \rVert = 1$$

We can state that: 
$$\mu = \frac{1}{\lVert v \rVert}$$, which follows from: 
$$\lVert \mu v \rVert = \frac{1}{\lVert v \rVert} {\lVert v \rVert} = 1$$

### Cosine Similarity
We can also notice that we can rewrite the dot product as follows: 

$$v_1^T v_2 = \lVert v_1 \rVert \lVert v_2 \rVert \cos (v_1, v_2)$$

Now if we divide both parts by vector norms for $v_1$ and $v_2$ (we're allowed to do so because magnitudes are scalars): 

$$\frac{v_1^T v_2}{\lVert v_1 \rVert \lVert v_2 \rVert} =  \cos (v_1, v_2)$$

This expression is also known as cosine similarity (Pearson's correlation) and has various applicatons in maching learning. 

<a id="spaces"></a>
# Spaces and Subspaces

Field (space) is a set on which addition, subtraction, multiplication and division are valid operations. Examples:   
- $\mathbb{R}$ - real,   
- $\mathbb{C}$ - complex,  
- $\mathbb{Z}$ - integers.

**Subspace** is the set of all vectors that can be created by taking linear combinations of some vector or set of vectors. Examples: 
- $\lambda v; \lambda \in \mathbb{R}$  
- $\lambda v + \beta \omega; \lambda, \beta \in \mathbb{R}$

A vector subspace must:   
- be closed under addition and scalar multiplication  
- contain the zero vector. 

Or, more formally:  

$$\forall v, \omega \in V, \forall \lambda, \alpha \in \mathbb{R}; \lambda v + \alpha \omega \in V$$

Reads: for all $v$ and $\omega$ that are in the subspace $V$ and for all $\lambda$ and $\alpha$ that are real numbers, all their linear combinations defined as $\lambda v + \alpha \omega$ must stay in $V$. This covers the case with the zero vector as well - we just need to set $\lambda$ and $\alpha$ to 0. 


Example:  
$$\lambda{\begin{bmatrix}
1\\
2\\
4\\
\end{bmatrix}}, \mu{\begin{bmatrix}
2\\
1\\
4\\
\end{bmatrix}}$$

These two vectors span a plan that goes infinitely long and **passes through the origin** (-> the subspace contains the zero vector). There can be infinitely many such subspaces spanned by differen vector pairs.

Ambient ND is the space that contains all possible subspaces. For example, for Ambient 3D it will contain:   
- 0D subspace (exactly one). 
- 1D subspace (infinitely many, because we can have infinitely many lines that go through the origin. 
- 2D subspace (inifinitely many, because we can have infinitely many planes that go through the origin.  
- 3D subspace (exactly one)

**More vectore $\neq$ more dimensions**

### Example in $\mathbb{R}^5$

$$\text{0D:} \Big\{ \begin{bmatrix}
0\\
0\\
0\\
0\\
0\\
\end{bmatrix}\Big\}, \text{1D:} \Big\{ \lambda \begin{bmatrix}
0\\
1\\
3\\
1\\
0\\
\end{bmatrix} \Big\}, \text{2D:} \Big\{\alpha \begin{bmatrix}
0\\
1\\
3\\
1\\
0\\
\end{bmatrix}, \beta \begin{bmatrix}
9\\
4\\
2\\
3\\
1\\
\end{bmatrix}\Big\}, \\ \text{Still 2D:} \Big\{\alpha \begin{bmatrix}
0\\
0\\
0\\
0\\
1\\
\end{bmatrix}, \beta \begin{bmatrix}
0\\
0\\
0\\
1\\
0\\
\end{bmatrix}, \gamma \begin{bmatrix}
0\\
0\\
0\\
2\\
0\\
\end{bmatrix}\Big\}, \text{3D:} \Big\{\alpha \begin{bmatrix}
0\\
0\\
0\\
0\\
1\\
\end{bmatrix}, \beta \begin{bmatrix}
0\\
0\\
0\\
1\\
0\\
\end{bmatrix}, \gamma \begin{bmatrix}
0\\
0\\
1\\
0\\
0\\
\end{bmatrix}\Big\}$$

### Subspace vs Subset

**Subset** - a set of points that satifly some conditions. It doesn't need to be closed or include the origin, and that is the key different from a **subspace**. 

#### Examples
1) All points on XY plane such that $x>0, y > 0$. Subset, but **NOT** a subspace because if we multiply a vector in that subset by -1, we will be out of this subset. Hence, it's not a subspace.   

2) All points on XY plane such that $x^2 + y^2 = 0$. Subset, but **NOT** a subspace, because it doesn't even contain the origin.  

3) All points in XY such that $y=4x, \forall x$. Subset and subspace. Goes through the origin. No matter what scalar we take, we're still on that line.   

4) All points in XY such that $y=4x+1, \forall x$. Subset, but **NOT** a subspace. It doesn't go through the origin.  

So, we can see that the basic method to define if a subset is also a subspace is as follows: 
- Determine whether the origin is in the set. 
- Try to write down the critera in terms of scalars and vectors of the form $\alpha v + \beta \omega$.

<a id="span"></a>
# Span
Span - entire space that can be reached by linear combination of the vectors. The same span can be formed by DIFFERENT sets of vectors - they will span the same space, but we will use different coefficients. Some of them are just more convinient in certain cases (for example, for eigendecomposition).

$$\text{span}(\{v_1, ..., v_n\}) = \{\alpha_1 v_1 + ... \alpha_n v_n, \alpha_i \in \mathbb{R} \}$$

### Example

Example:  
$$v = \begin{bmatrix}
1\\
2\\
0\\
\end{bmatrix}, \omega = \begin{bmatrix}
3\\
2\\
1\\
\end{bmatrix}, S = \Big\{ \begin{bmatrix}
1\\
1\\
0\\
\end{bmatrix}, \begin{bmatrix}
1\\
7\\
0\\
\end{bmatrix}  \Big\}$$

$v$ is in the span of S, because we can get exactly this vector by taking 5/6 of the first vector of S and 1/6 of the second. 

### Example

Very important example of when different span can be more convenient. Let's, say we have two spans: 

$$S_1 = \Big\{ \begin{bmatrix}
1\\
0\\
\end{bmatrix}, \begin{bmatrix}
0\\
1\\
\end{bmatrix}  \Big\}, S_2 = \Big\{ \begin{bmatrix}
-3/2\\
-1/2\\
\end{bmatrix}, \begin{bmatrix}
-1/2\\
-1/2\\
\end{bmatrix}  \Big\}$$  

The first span is standard XY plan - Cartesian coordinate system with orthogonal vectors in the basis. So if have a vector $v$ that is defined as $[3/2, 1/2]$ in $S_1$, we can actually get a nicer version of it by switching to $S_2$, where it will be just $[-1,0]$

<a id="linindep"></a>
# Linear Independence

Formal defintion:  
The vectors are linearly dependent if there is a linear combination, such that:  

$$\lambda_1 v_1 + \lambda_2 v_2 + ... + \lambda_n v_n = 0, \lambda_i \in \mathbb{R}, \exists \lambda_i \neq 0$$ 

(i.e. excluding the case when all lambdas are zero - trivial solution). 

**A set of M vectros is independent if each vector points in a geometric dimension not reachable using other vectors in the set.**

More informally: vectors are linearly dependent if one or more vectors can be expressed using other vectors. 
**Checking for linear independence:** 
- Step 1: Count vectors and comper with $\mathbb{R}^N$ (>N -> dependent) 
- Step 2: Check for 0s in corresponding dimensions. Also, if 0 vector is in the set -> dependent
- Educated guess and test (try to get a linear combination)  
- Matrix rank method 


### Examples: 

1) $\{ \omega_1, \omega_2 \} = \Big\{ \begin{bmatrix}
1\\
2\\
3\\
\end{bmatrix}, \begin{bmatrix}
2\\
4\\
6\\
\end{bmatrix}  \Big\}$ - linearly dependent, because $\omega_2 = 2 \omega_1$  

2) $\{ v_1, v_2, v_3 \} = \Big\{ \begin{bmatrix}
0\\
2\\
5\\
\end{bmatrix}, \begin{bmatrix}
-27\\
5\\
-37\\
\end{bmatrix}, \begin{bmatrix}
3\\
1\\
8\\
\end{bmatrix} \Big\}$ - linearly dependent, because $v_2=7 v_1 - 9 v_3$

### Theorem: Maximum N independent vectors in $\mathbb{R}^N$

- Any set of $M>N$ vectors in $\mathbb{R}^N$ is **dependent**  
- Any set of $M<=N$ vectors in $\mathbb{R}^N$ **could be independent**  

<a id = "basis"></a>
# Basis 

A set of vectors can form a **basis** for some subspace if that set spans that subspace and if it contains linearly independent vectors. 

Examples of basis: 

$$\mathbb{R}^2: S = \Big\{ \begin{bmatrix}
1\\
0\\
\end{bmatrix}, \begin{bmatrix}
0\\
1\\
\end{bmatrix} \Big\}$$

$$\mathbb{R}^3: L = \Big\{ \begin{bmatrix}
1\\
0\\
0\\
\end{bmatrix}, \begin{bmatrix}
0\\
1\\
0\\
\end{bmatrix}, \begin{bmatrix}
0\\
0\\
1\\
\end{bmatrix} \Big\}$$

The basis for $\mathbb{R}^2$ shown above is not the only one to define the same space. We can also define it as: 

$$T = \Big\{ \begin{bmatrix}
1\\
1\\
\end{bmatrix}, \begin{bmatrix}
0\\
2\\
\end{bmatrix} \Big\}$$

Now, if we have a point in S, that is expressed as $(2,1)$, we can express it in T as $(2,-\frac{1}{2})$. Looks worse, right? But for the point $(3,3)$ in S we will get a more compact representation with T: $(3,0)$

**Important:**  
We **CAN'T** have *extra* vectors in the basis because that we'll more than one way to identify a vector in that span. However, if don't have *redundant* vectors, we can uniquely identify a vector. 
There are $\infty$ number of bases but they all must contain only linearly independent vectors.