<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Linear algebra: review of basic concepts with numpy
---

<a id="learning-objectives"></a>
## Learning Objectives

### Core

- Recognize scalars, vectors, and matrices
- Know how to calculate additions and dot products of vectors and matrices
- Implement basic linear algebra concepts in NumPy
 - create vectors and matrices (`np.array`)
 - vector addition/subtraction 
 - vector norm (`np.linalg.norm`)
 - scalar product/matrix product (`np.dot`)
 - identity matrix (`np.eye`)
 - transposed matrix (`.T`)
 - inverse matrix (`np.linalg.inv`)

### Target
- Recognize uses of linear algebra in machine learning
 - distance between actual and predicted values
 - least squares
- Check for equality of numpy arrays (`np.isclose` and `np.allclose`)


### Stretch
- Implement linear algebra operations through the use of lists

### Lesson Guide

- [Introduction](#introduction)
- [Scalars, vectors and matrices](#scalars-vectors-and-matrices)
- [Types of vectors](#types-of-vectors)
	- [Vectors and Matrices are useful for multi-dimensional concepts](#vectors-and-matrices-are-useful-for-multi-dimensional-concepts)
- [Basic matrix algebra](#basic-matrix-algebra)
	- [Addition and subtraction](#addition-and-substraction)
	- [Scalar multiplication](#scalar-multiplication)
	- [Vector norm](#vector-norm)
	- [Dot product](#dot-product)
	- [Matrix multiplication](#matrix-multiplication)
	- [The identity matrix](#the-identity-matrix)
    - [The transposed matrix](#the-transposed-matrix)
    - [The inverse matrix](#the-inverse-matrix)
- [Applications to machine learning](#applications-to-machine-learning)
	- [Distance between actual values and predicted values](#distance-between-actual-values-and-predicted-values)
	- [Least squares](#least-squares)
- [Independent practice](#independent-practice)
- [Additional resources](#additional-resources)

In [1]:
import numpy as np

<a id="introduction"></a>
## Introduction
---
As data scientists we will often have to do with data contained in lists or tables. Often we will transform data to some numerical form so that we can handle it like vectors and matrices. Doing so enables us to use the machinery of linear algebra to understand our data and to make predictions and forecasts. For this reason, in this lesson we will review some of the basic concepts of linear algebra.

As you learned in a previous lesson, lists and tuples in python are very flexible. They can contain elements of any python data type in any kind of mixture. Although that is a very attractive feature for many python applications, for the linear algebra tasks of a data scientist's everyday life, it is not. The high flexibility of python lists leads to considerable increase in computation time, and also indexing could work in a simpler way to resemble matrix operations.

That is the main reason why numpy has been created. It created a new datatype, the numpy array, which is more restrictive than python lists. It allows only a single data type for its elements. Data types can be numeric or strings, but no mixtures. Using arrays leads to much better numerical performance and is crucial for doing any kind of more advanced numerical computations with python, not only related to linear algebra. 

<a id="scalars-vectors-and-matrices"></a>
## Scalars, vectors and matrices
---

A **scalar** is a single number like

$$a = 3$$

A **vector** is an ordered tuple or list of numbers like

$$\vec{u} = \left( \begin{array}{c}
1&3&7
\end{array} \right)$$

We can use numpy arrays to encode vectors.

In [2]:
u = np.array([1, 3, 7])
print(type(u))

<class 'numpy.ndarray'>


In [3]:
# illustrate shape
np.array([1, 3, 7]).shape

(3,)

In [4]:
np.array([[1, 3, 7]]).shape

(1, 3)

In [5]:
np.array([[1], [3], [7]]).shape

(3, 1)

In [6]:
# display the numpy array
u

array([1, 3, 7])

In [7]:
# print the numpy array
print(u)

[1 3 7]


An $m \times n$ **matrix** A (read $m$ times $n$ or $m$ by $n$) is a rectangular array of numbers with $m$ rows and $n$ columns. Each number in the matrix is an entry. Entries are conventionally denoted as $A_{mn}$.

$$A= \left( \begin{array}{c}
A_{11} & A_{12} & ... & A_{1n}  \\
A_{21} & A_{22} & ... & A_{2n}  \\
\vdots & \vdots & \ddots & \vdots \\
A_{m1} & A_{m2} & ... & A_{mn}
\end{array} \right)$$

A matrix can be encoded by forming a list of lists and putting it into numpy array format.

#### Pure python version

In [8]:
[[1, 3, 7], [4, 6, 3], [2, 5, 6]]

[[1, 3, 7], [4, 6, 3], [2, 5, 6]]

#### Numpy version

In [9]:
A = np.array([[1, 3, 7], [4, 6, 3], [2, 5, 6]])
A

array([[1, 3, 7],
       [4, 6, 3],
       [2, 5, 6]])

Printing looks more like the maths way of writing matrices.

In [10]:
print(A)

[[1 3 7]
 [4 6 3]
 [2 5 6]]


We can check what is the shape of a given vector or matrix:

In [11]:
print(u.shape)
print(A.shape)

(3,)
(3, 3)


<a id="types-of-vectors"></a>
## Types of vectors
---

A (column) **vector** is a matrix with a single column. Its entries are called the components of the vector.

$$\vec{v} = \left( \begin{array}{c}
1 \\
3 \\
7 \\
\end{array} \right)$$

Components are denoted $v_{i}$.

Create the vector:

In [12]:
v = np.array([1, 3, 7])
print(v)
print(v.shape)

[1 3 7]
(3,)


Set the shape explicitly:

In [13]:
v.shape = (3, 1)
print(v.shape)
print(v)

(3, 1)
[[1]
 [3]
 [7]]


A matrix with a single row is a **row vector**.

$$\vec{u} = \left( \begin{array}{c}
1&3&7
\end{array} \right)$$

In [14]:
u = np.array([1, 3, 7])
print(u.shape)
print(u)

(3,)
[1 3 7]


Set the shape explicitly:

In [15]:
u.shape = (1, 3)
print(u.shape)
print(u)

(1, 3)
[[1 3 7]]


<a id="vectors-and-matrices-are-useful-for-multi-dimensional-concepts"></a>
### Vectors and Matrices are useful for multi-dimensional concepts

<center><img src="./assets/images/r3_vectors.png" style="width:500px;height:350px;"></center>
We can represent vectors as arrows in n-dimensional space, having magnitude and direction.

(Image from: 
[Louis Scharf, Linear Algebra: Vectors. OpenStax CNX. Sep 17, 2009](http://cnx.org/contents/3d05d982-e21c-4f8a-ab5a-d3e94186f924@6).)

<a id="basic-matrix-algebra"></a>
## Basic matrix algebra

<a id="addition-and-substraction"></a>
### Addition and subtraction
Vector **addition** is straightforward, if the two vectors are of equal dimensions:

$$\vec{v} = \left( \begin{array}{c}
1 \\
3 \\
7
\end{array} \right), \;\; \vec{w} = \left( \begin{array}{c}
1 \\
0 \\
1\end{array} \right)$$

In [16]:
# using column vectors
v = np.array([1, 3, 7])
v.shape = (3, 1)
w = np.array([1, 0, 1])
w.shape = (3, 1)

$$\vec{v} + \vec{w} =
\left( \begin{array}{c}
1 \\
3 \\
7
\end{array} \right) 
+ \left( \begin{array}{c}
1 \\
0 \\
1
\end{array} \right) = 
\left( \begin{array}{c}
1+1 \\
3+0 \\
7+1
\end{array} \right) = 
\left( \begin{array}{c}
2 \\
3 \\
8
\end{array} \right)
$$

In [17]:
v + w

array([[2],
       [3],
       [8]])

Having the vectors be rows or columns will _not_ affect the operation, as long as both vectors are of the same shape.

In [18]:
# using row vectors
v = np.array([1, 3, 7])
w = np.array([1, 0, 1])
v + w

array([2, 3, 8])

#### How would subtraction work?

### What if our vectors have different shapes?

First, lets look at adding vectors with the same shape (row) but different sizes.

In [19]:
v = np.array([2, 3, 9])
w = np.array([2])
v + w

array([ 4,  5, 11])

We can see that the single value in vector $w$ is added to all of the values in vetor $v$.

\begin{eqnarray}\vec{v} + \vec{w} 
&=&
\left( \begin{array}{c}
2&3&9
\end{array} \right) + \left( \begin{array}{c}
2 
\end{array} \right) \\
&=& 
\left( \begin{array}{c}
2+2&3+2&9+2
\end{array} \right) \\
&=& 
\left( \begin{array}{c}
4&5&11
\end{array} \right)
\end{eqnarray}

Lets see how this applies to vectors of different shapes.

In [20]:
v = np.array([1, 2, 3])
print(v.shape)
w = np.array([3, 6, 9])
w.shape = (3, 1)
print(w.shape)
print(v + w)

(3,)
(3, 1)
[[ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


While we start with a vector of shape [1,3] and another one of shape [3,1], we end up with a matrix of shape [3,3].  How is this possible?

Below we can see that the first value in vector $w$ is added to all of the values in vector $v$ to make the first row of the matrix.  This process is repeated with the second and third values of vector $w$ to create the respective second and third rows of the resulting matrix.


\begin{equation}
\vec{v} + \vec{w} =
\left( \begin{array}{c}
1&2&3
\end{array} \right)
+ \left( \begin{array}{c}
3 \\
6 \\
9
\end{array} \right) = 
\left( \begin{array}{c}
1+3&2+3&3+3 \\
1+6&2+6&3+6 \\
1+9&2+9&3+9
\end{array} \right) = 
\left( \begin{array}{c}
4&5&6 \\
7&8&9 \\
10&11&12
\end{array} \right)
\end{equation}


<a id="scalar-multiplication"></a>
### Scalar multiplication
We scale a vector with **scalar multiplication**, multiplying a vector by a scalar (single number):

$$ 2 \left( \begin{array}{c}
1 \\
3 \\
7
\end{array} \right) = 
 \left( \begin{array}{c}
2 \cdot 1 \\
2 \cdot 3 \\
2 \cdot 7
\end{array} \right) = 
 \left( \begin{array}{c}
2 \\
6 \\
14
\end{array} \right)$$

In [21]:
sv = np.array([1, 3, 7])
sv.shape = (3, 1)
2 * sv

array([[ 2],
       [ 6],
       [14]])

If $\vec{a} = \left( \begin{array}{c}
2 \\
1 
\end{array} \right)$ and $b = 3$:

$b\cdot\vec{a} = 3\cdot\left( \begin{array}{c}
2 \\
1 
\end{array} \right) 
= \left( \begin{array}{c}
6 \\
3 
\end{array} \right)$

<center>![](./assets/images/scalar_multiplication_3a.png)</center>

The nature of the operation remains the same for vectors with row shapes as well.

In [22]:
sv = np.array([[2, 5, 3]])
2 * sv

array([[ 4, 10,  6]])

<a id="vector-norm"></a>
### Vector norm
The **magnitude** of a vector with $n$ components is interpretable as its length in $n$-dimensional space, and is calculable via the Euclidean distance.

For a vector

$$\vec{v} = \left( \begin{array}{c}
v_{1} \\
v_{2} \\
\vdots \\
v_{n}
\end{array} \right)$$

its magnitude is given by 

$$\| \vec{v} \| = \sqrt{v_{1}^{2} + v_{2}^{2} + ... + v_{n}^{2}}$$

For example for the vector 

$$\vec{v} = 
\left( \begin{array}{c}
3 \\
4
\end{array} \right)$$ 

the magnitude is

$$\| \vec{v} \| = \sqrt{3^{2} + 4^{2}} = 5$$

This is also called the vector **norm**. You will see this often in machine learning in the context of _least squares_.

To calculate the norm of a vector, we can use a function from numpy's linalg package.

In [23]:
np.linalg.norm(np.array([3, 4]))

5.0

#### Independent practice  
Use vectors $u$ and $v$ given below to calculate the following. First calculate by hand, then with numpy.  

- Find the sum of vectors $u$ and $v$
- Find the difference between $u$ and $v$
- Find the scalar product of $u$ and `3`
- Find the magnitude of vector $u$

In [24]:
u = np.array([3.0, 4.0])
v = np.array([2.0, 1.0])

<a id="dot-product"></a>
### Dot product
The **dot product**, also called **scalar product** of two $n$-dimensional vectors is:

$$ \vec{v} \cdot \vec{w} =\sum _{i=1}^{n}v_{i}w_{i}=v_{1}w_{1}+v_{2}w_{2}+\cdots +v_{n}w_{n} $$

For two vectors

$$\vec{v} = \left( \begin{array}{c}
1 \\
3 \\
7
\end{array} \right), \vec{w} 
= \left( \begin{array}{c}
1 \\
0 \\
1
\end{array} \right)$$

the dot product gives 

$$ \vec{v} \cdot \vec{w} = 1*1 + 3*0 + 7*1 = 8 $$

In numpy, it is calculated in the following way:

In [25]:
v = np.array([1, 3, 7])
w = np.array([1, 0, 1])
v.dot(w)

8

If the dot product between two vectors is equal to zero, they are said to be **orthogonal** to each other. Can you find vectors which are  orthogonal to each other?

<a id="matrix-multiplication"></a>
### Matrix multiplication
**Matrix multiplication**, $A_{mn} * B_{ij}$, is valid when the left matrix has the same number of columns as the right matrix has rows ($n = i$). Each entry is the dot product of corresponding row and column vectors.

![](./assets/images/matrix-multiply-a.gif)
(Image: mathisfun.com!)

$$\left( \begin{array}{c}
1 & 2 & 3  \\
4 & 5 & 6
\end{array} \right)*
\left( \begin{array}{c}
7 & 8 \\
9 & 10 \\
11 & 12 
\end{array} \right) = 
\left( \begin{array}{c}
1*7 + 2*9 + 3*11 & ... \\
... & ... \\
\end{array} \right)
= 
\left( \begin{array}{c}
58 & 64 \\
139 & 154 \\
\end{array} \right)
$$

In [26]:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8], [9, 10], [11, 12]])
A.dot(B)

array([[ 58,  64],
       [139, 154]])

<a id="the-identity-matrix"></a>
### The identity matrix
The **identity matrix** $\mathbb{1}$ is the quadratic matrix where $\mathbb{1} A = A$. This is like multiplying a number by 1 resulting in no change.

$$ \mathbb{1} \cdot \left( \begin{array}{c}
A_{11} & A_{12} \\
A_{21} & A_{22} 
\end{array} \right)
= \left( \begin{array}{c}
A_{11} & A_{12} \\
A_{21} & A_{22} 
\end{array} \right)$$

E.g.:

$$ \left( \begin{array}{c}
1 & 0 \\
0 & 1 
\end{array} \right)
\cdot \left( \begin{array}{c}
3 & 4 \\
5 & 6 
\end{array} \right)
= \left( \begin{array}{c}
(1 \cdot 3 + 0 \cdot 5) & (1 \cdot 4 + 0 \cdot 6) \\
(0 \cdot 3 + 1 \cdot 5) & (0 \cdot 4 + 1 \cdot 6) 
\end{array} \right)
=
\left( \begin{array}{c}
3 & 4 \\
5 & 6 
\end{array} \right)$$

In numpy, the identity matrix of the required shape is obtained by (replace $n$ with the value you desire)

In [27]:
n = 3
np.eye(n)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Independent Practice
Calculate the dot product of the below matrices by hand and then check your results using numpy.


$$ \left( \begin{array}{c}
1&2&3 \\
4&5&6
\end{array} \right) \cdot 
\left( \begin{array}{c}
1&2 \\
3&4 \\
5&6
\end{array} \right)$$

<a id="the-transposed-matrix"></a>

### The transposed matrix

Another useful matrix operation is transposition which transforms an $m\times n$ matrix $A$ into an $n\times m$-matrix $A^T$, its transpose.

The transpose of the matrix $A$ is obtained by exchanging row and column indices of the original matrix. 
For example the transpose of 

$$
A =
\left(
\begin{array}{ccc} 
2 & 1 & 0\\
-2& 3 & 4\\
 1& 0 & 2
\end{array}
\right)
$$

is

$$
A^T =
\left(
\begin{array}{ccc} 
2 & -2 & 1\\
1 &  3 & 0\\
0 &  4 & 2
\end{array}
\right)
$$

In numpy, the transpose is calculated easily:

In [28]:
C = np.array([[2, 1, 0], [-2, 3, 4], [1, 0, 2]])
C.T

array([[ 2, -2,  1],
       [ 1,  3,  0],
       [ 0,  4,  2]])

Transposing twice gives the original matrix:

In [29]:
C.T.T

array([[ 2,  1,  0],
       [-2,  3,  4],
       [ 1,  0,  2]])

<a id="the-inverse-matrix"></a>
### The inverse matrix

The inverse of a matrix A denoted as $A^{-1}$ is the matrix whose matrix product with the original matrix gives the identity matrix,

$$A^{-1}\cdot A = {\mathbb 1}\ .$$

It is obtained with `np.linalg.inv()`.

**Note:** The inverse matrix can only exist for quadratic matrices, but it does not exist for all of them.

If it exists, the inverse matrix can be calculated with the command below, otherwise an error is thrown.

In [30]:
C

array([[ 2,  1,  0],
       [-2,  3,  4],
       [ 1,  0,  2]])

In [31]:
np.linalg.inv(C)

array([[ 0.3 , -0.1 ,  0.2 ],
       [ 0.4 ,  0.2 , -0.4 ],
       [-0.15,  0.05,  0.4 ]])

Verify that this is the inverse:

In [32]:
np.round(np.linalg.inv(C).dot(C))

array([[ 1., -0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Asserting equality of two matrices to numerical precision can be verified with

In [33]:
np.allclose(np.linalg.inv(C).dot(C), np.eye(3))

True

<a id="applications-to-machine-learning"></a>
## Applications to machine learning
---

<a id="distance-between-actual-values-and-predicted-values"></a>

### Distance between actual values and predicted values
We often need to know the difference between predicted values and actual values.
We calculate this as:

$$\| \vec{\rm actual} - \vec{\rm predicted} \| =\sqrt{({\rm actual}_1 - {\rm predicted}_1)^2 + ({\rm actual}_2 - {\rm predicted}_2)^2}$$

<a id="least-squares"></a>

### Least squares
Many machine learning models are composed in the following form:

$$\min \| \vec{y} - f(X) \|$$

The goal is to minimize the distance between model predictions and actual data.

<a id="independent-practice"></a>
## Independent practice
---

Review the numpy operations and try out their examples [here]( http://docs.scipy.org/doc/numpy/reference/routines.linalg.html).

<a id="additional-resources"></a>
## Additional resources
---

+ For a surprisingly comprehensive (yet dense!) review, be sure to check out [Linear algebra in four pages](http://www-bcf.usc.edu/~lototsky/MATH408/LinAlg1.pdf)
+ This [deck](http://cseweb.ucsd.edu/classes/wi05/cse252a/linear_algebra_review.pdf) provides great insight into linear operations and advanced geometric topics
+ Stanford's Review and Reference [26-page](http://cs229.stanford.edu/section/cs229-linalg.pdf) guide provides a nice review
+ Spend some time on [Khan Academy](https://www.khanacademy.org/math/linear-algebra/matrix-transformations#concept-intro)!