# 0. Background
As we progress in our understanding of the math surrounding machine learning, AI, and DS, there will be a host of linear algebra concepts that we are forced to reckon with. From PCA and it's utilization of eigenvalues and eigenvectors, to neural networks reliance on linear combinations and matrix multiplication, the list goes on and on. Having a very solid grasph on linear algebra is crucial to realizing _how_ and _why_ these algorithms work. 

This notebook in particular is going to focus on the connection between the following:

> * **Linear Combinations**
* **Linear Transformations**
* **The Dot Product**
* **Functions**

These concepts are incredibly prevelant and linked to each other in beautiful ways, however, this link is generally missing in the way linear algebra is taught-particularly when studying machine learning. Before moving I recommend reviewing my notebook concerning vectors. 

# 1. Linear Combination 
If you go to wikipedia, you can find the following definition regarding a **linear combination**:

> A linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results. For example, a linear combination of $x$ and $y$ would be any expression of the form $ax + by$, where a and b are constants.

Now, this can be defined slightly more formally in regards to vectors as follows:

> If $v_1,...,v_n$ is a set of vectors, and $a_1,...,a_n$ is a set of scalars, then their linear combination would take the form:<br>
$$a_1\vec{v_1} + a_2\vec{v_2}+...+a_n\vec{v_n}$$
Where, it should be noted that all $\vec{v}$'s are vectors. Hence, it can be expanded as:<br>
<br>
$$a_1\begin{bmatrix}
    v_1^1 \\
    v_1^2 \\
    .     \\
    .     \\
    v_1^m
\end{bmatrix} 
+
a_2\begin{bmatrix}
    v_2^1 \\
    v_2^2 \\
    .     \\
    .     \\
    v_2^m
\end{bmatrix} 
+
a_n\begin{bmatrix}
    v_n^1 \\
    v_n^2 \\
    .     \\
    .     \\
    v_n^m
\end{bmatrix} 
$$
where for generality we have defined $\vec{v}$ to be an $m$ dimensional vector. Notice that the final result is a single $m$ dimensional vector. So, for instance, in a simple case, we could have:<br>
<br>
$$a_1\begin{bmatrix}
    v_1^1 
\end{bmatrix} 
+
a_2\begin{bmatrix}
    v_2^1
\end{bmatrix} 
+
a_n\begin{bmatrix}
    v_n^1
\end{bmatrix} 
$$
<br>
$$a_1\begin{bmatrix}
    v_1
\end{bmatrix} 
+
a_2\begin{bmatrix}
    v_2
\end{bmatrix} 
+
a_n\begin{bmatrix}
    v_n
\end{bmatrix} 
$$<br>
$$a_1v_1+a_2v_2+a_nv_n$$
And end up with a 1 dimensional vector, often just viewed as a scalar. 

Now, this definition is good to have in mind, however we can make it a bit more concrete by expanding visually. For instance, if you have a pair of numbers that is meant to describe, a vector, such as:

$$\begin{bmatrix}
    3 \\
    -2
\end{bmatrix} 
$$

<img src="images/linear-comb-1.png" width="300">

We can think of each coordinate as a scalar (how does it stretch or squish vectors?). In linear algebra, there are two very important vectors, commonly known as $\hat{i}$ and $\hat{j}$:

<img src="images/linear-comb-2.png" width="300">

Now, we can think of the coordinates of our vector as stretching $\hat{i}$ and $\hat{j}$:

<img src="images/linear-comb-3.png" width="300">

In this sense, the vector that these coordinates describe is the sum of two scaled vectors:

#### $$(3)\hat{i} + (-2)\hat{j}$$

Note that $\hat{i}$ and $\hat{j}$ have a special name; they are refered to as the _basis vectors_ of the _xy_ coordinate system. This means that when you think about vector coordinates as scalars, the basis vectors are what those coordinates are actually scaling. 

Now, this brings us to our first definiton:

> **Linear Combination:** Any time you are scaling two vectors and then adding them together, you have a linear combination. For example: <br>
$$(3)\hat{i} + (-2)\hat{j}$$<br>
Or, more generally:
$$a\vec{v} + b \vec{w}$$<br>
Where, above both $a$ and $b$ are scalars. 

This can be seen visually: 

<img src="images/linear-comb-4.png" width="300">

And we can see that as we scale $\vec{v}$ and $\vec{w}$ we can create many different linear combinations:

<img src="images/linear-comb-5.png" width="400">

This brings up another definition, _span_. 

> **Span**: The set of all possible vectors that you can reach with a linear combination of a given pair of vectors is known as the _span_ of those two vectors. 

So, the span of most 2-d vectors is all of space, however, if they line up then they span is a specific line. When two vectors do happen to line up we can say that they are _linearly dependent_, and one can be expressed in terms of the other. On the other hand, if they do line up, they are said to be _linearly independent_.



# 2. Linear Transformations and Matrices
Linear transformations are absolutely fundamental in order to understand matrix vector multiplication (well, unless you want to rely on memorization). To start, let's just parse the term "Linear Transformation". 

Transformation is essentially just another way of saying _function_. This is where the first bit of confusion can arise though if you are being particularly thoughtful about the process's-what exactly is a function? It is helpful to define it before moving forward.

**Function**<br>
Generally, in mathematics we view a function as a process that take in an input and returns an output (this coincides nicely with the computer science view as well). It can be viewed as:

#### $$x \rightarrow f(x) \rightarrow y$$
Or, expanded as:
#### $$x_1, x_2, ... , x_n \rightarrow f(x_1, x_2, ... , x_n) \rightarrow y$$

This is how it is _generally_ encountered, where anywhere from one to several inputs are taken in, and a single output is produced. However, let's define a function more rigorously:

> A function is a _process_ or a relation that associates each element $x$ of a set $X$, the _domain_ of the function, to a single element $y$ of another set $Y$ (possibly the same set), the codomain of the function.

The important point to recognize from the above definition is that, while it is common for a function to map elements from a set $X$ to a different set $Y$, the two sets can be _same_. Hence, although it is not encountered quite as often in ML, a function can map $X \rightarrow X$. 

Now, back to our term transformation; it is something that takes in inputs, and spits out an output for each one. In the context of linear algebra, we like to think about transformations that take in some vector, and spit out another vector:

$$\begin{bmatrix}
    5 \\
    7
\end{bmatrix} \rightarrow 
L(\vec{v})
\rightarrow
\begin{bmatrix}
    2 \\
    -3
\end{bmatrix}
$$

This is where we can see an example of a function that does not map to a different space necessarily, but potentially to itself. In other words, generally if we have a function that takes in two inputs, we end up with one output:

#### $$f(x,y) = z$$

However, we can clearly see here that we take in two inputs (coordinates of the vector) and end up with two outputs (coordinates of the transformed vector). 

So, why use the word transformation instead of function if they essentially mean the same thing? Well, it is to be suggestive of _movement_! The way to think about functions of vectors is to use movement. If a transformation takes some input vector to some output vector, we image that input vector moving over to the output vector:

<img src="images/linear-trans-1.png" width="400">

And in order to think about the transformation as a whole, we can think about _every possible input vector_ moving over to its corresponding _output vector_. 