In [1]:
import numpy as np

In [2]:
# Avoid misleading representation of floating values (for inverse matrices in dot product 
# for instance)
# See https://stackoverflow.com/questions/24537791/numpy-matrix-inversion-rounding-errors
np.set_printoptions(suppress=True)

In [3]:
%%html
<style>
.pquote {
  text-align: left;
  margin: 40px 0 40px auto;
  width: 70%;
  font-size: 1.5em;
  font-style: italic;
  display: block;
  line-height: 1.3em;
  color: #5a75a7;
  font-weight: 600;
  border-left: 5px solid rgba(90, 117, 167, .1);
  padding-left: 6px;
}
.notes {
  font-style: italic;
  display: block;
  margin: 40px 10%;
}
img + em {
  text-align: center;
  display: block;
  color: gray;
  font-size: 0.9em;
  font-weight: 600;
}
</style>

$$
\newcommand\bs[1]{\boldsymbol{#1}}
$$

# Introduction

We will see some very important concepts in this chapter. The dot product is used in every equation explaining data science algorithms so it's worth the effort to understand it. Then we will see some properties of this operation. Finally, we will to get some intuition on the link between matrices and systems of linear equations.

# 2.2 Multiplying Matrices and Vectors

The standard way to multiply matrices is not to multiply each element of one with each element of the other (this is called the *element-wise product*), but to calculate the sum of the products between rows and columns. The matrix product, also called **dot product**, is calculated as following:

<img src="images/dot-product.png" width="400" alt="An example of how to calculate the dot product between a matrix and a vector" title="The dot product between a matrix and a vector">
<em>The dot product between a matrix and a vector</em>

The number of columns of the first matrix must be equal to the number of rows of the second matrix. Thus, if the shape of the first matrix is ($m \times n$), then the second matrix must be of shape ($n \times x$). The resulting matrix will then have the shape ($m \times x$).

## The dot product of a matrix and a vector.

$$\bs{A} \times \bs{b} = \bs{C}$$

with $
\bs{A}=
\begin{bmatrix}
    1 & 2\\\\
    3 & 4\\\\
    5 & 6
\end{bmatrix}
$ and $\bs{b}=\begin{bmatrix}
    2\\\\
    4
\end{bmatrix}$. We saw that the formula is the following:

$$
\begin{align*}
&\begin{bmatrix}
    A_{1,1} & A_{1,2} \\\\
    A_{2,1} & A_{2,2} \\\\
    A_{3,1} & A_{3,2}
\end{bmatrix}\times
\begin{bmatrix}
    B_{1,1} \\\\
    B_{2,1}
\end{bmatrix}=\\\\
&\begin{bmatrix}
    A_{1,1}B_{1,1} + A_{1,2}B_{2,1} \\\\
    A_{2,1}B_{1,1} + A_{2,2}B_{2,1} \\\\
    A_{3,1}B_{1,1} + A_{3,2}B_{2,1}
\end{bmatrix}
\end{align*}
$$

So we will have:

$$
\begin{align*}
&\begin{bmatrix}
    1 & 2 \\\\
    3 & 4 \\\\
    5 & 6
\end{bmatrix}\times
\begin{bmatrix}
    2 \\\\
    4
\end{bmatrix}=\\\\
&\begin{bmatrix}
    1 \times 2 + 2 \times 4 \\\\
    3 \times 2 + 4 \times 4 \\\\
    5 \times 2 + 6 \times 4
\end{bmatrix}=
\begin{bmatrix}
    10 \\\\
    22 \\\\
    34
\end{bmatrix}
\end{align*}
$$

To drive the point home, note the shapes of the matrices. We can see in this example that the shape of $\bs{A}$ is ($3 \times 2$) and the shape of $\bs{b}$ is ($2 \times 1$). So the shape of $\bs{C}$ must be ($3 \times 1$).

### The dot product with numpy
The numpy function `dot()` can be used to compute the matrix product (or dot product). Let's reproduce the exemple above:

In [4]:
A = np.array([[1, 2], [3, 4], [5, 6]])
A

array([[1, 2],
       [3, 4],
       [5, 6]])

In [5]:
B = np.array([[2], [4]])
B

array([[2],
       [4]])

In [6]:
C = np.dot(A, B)
C

array([[10],
       [22],
       [34]])

It is equivalent to use the method `dot()` of Numpy arrays:

In [7]:
C = A.dot(B)
C

array([[10],
       [22],
       [34]])

## The dot product of two matrices.

$$\bs{A} \times \bs{B} = \bs{C}$$

with:

$$\bs{A}=\begin{bmatrix}
    1 & 2 & 3 \\\\
    4 & 5 & 6 \\\\
    7 & 8 & 9 \\\\
    10 & 11 & 12
\end{bmatrix}
$$

and:

$$\bs{B}=\begin{bmatrix}
    2 & 7 \\\\
    1 & 2 \\\\
    3 & 6
\end{bmatrix}
$$

So we have:

$$
\begin{align*}
&\begin{bmatrix}
    1 & 2 & 3 \\\\
    4 & 5 & 6 \\\\
    7 & 8 & 9 \\\\
    10 & 11 & 12
\end{bmatrix}\times
\begin{bmatrix}
    2 & 7 \\\\
    1 & 2 \\\\
    3 & 6
\end{bmatrix}=\\\\
&\begin{bmatrix}
    2 \times 1 + 1 \times 2 + 3 \times 3 & 7 \times 1 + 2 \times 2 + 6 \times 3 \\\\
    2 \times 4 + 1 \times 5 + 3 \times 6 & 7 \times 4 + 2 \times 5 + 6 \times 6 \\\\
    2 \times 7 + 1 \times 8 + 3 \times 9 & 7 \times 7 + 2 \times 8 + 6 \times 9 \\\\
    2 \times 10 + 1 \times 11 + 3 \times 12 & 7 \times 10 + 2 \times 11 + 6 \times 12 \\\\
\end{bmatrix}\\\\
&=
\begin{bmatrix}
    13 & 29 \\\\
    31 & 74 \\\\
    49 & 119 \\\\
    67 & 164
\end{bmatrix}
\end{align*}
$$

Let's check the result with Numpy:

In [8]:
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [9]:
B = np.array([[2, 7], [1, 2], [3, 6]])
B

array([[2, 7],
       [1, 2],
       [3, 6]])

In [10]:
C = A.dot(B)
C

array([[ 13,  29],
       [ 31,  74],
       [ 49, 119],
       [ 67, 164]])

## Formalization of the dot product
Algebraically, the dot product is the sum of the products of the corresponding entries of two sequences of numbers.
$$
C_{i,j} = A_{i,k}B_{k,j} = \sum_{k}A_{i,k}B_{k,j}
$$

At this point, your gut might be telling you that the dot product is a convoluted way to multiply two matrices. Your intuition for the best approach probably points toward simple element-wise multiplication (like we did with addition in the last lesson). 

To show you the value of the dot product, let's look at a real-life example of why we multiply matrices with the dot product method.

## Exercise:  Calculating Weekly Pie Sales with the Dot Product
Say you own a pie shop and sell the following three types of pies:
 - Apple pies cost 3 dollars each
 - Cherry pies cost 4 dollars each
 - Blueberry pies cost 2 dollars each

And this table tells you many you've sold in the past 4 days:

<img src="images/pie_sales.png">

To calculate total sales for Monday, you'd do this:<br>
>Apple pie value + Cherry pie value + Blueberry pie value<br>
>or, mathematically...<br>
>$3 \times 13 + 4 \times 8 + 2 \times 6 = 83$

By matching each price to each quantity, multiplying, and then summing all of the results, we can get our total sales. That process is actually the dot product!

To prove this, create in the cell below a vector $\bs{p}$ and a matrix $\bs{S}$ containing the prices and sales data, respectively. Then, use numpy to calculate the dot product and determine total pie sales for each day. Finally, sum those daily sales to get the total sales.

In [15]:
# Your code here!!!




# Properties of the dot product

We will now see some interesting properties that arise from matrix multiplication. Knowing - or at least accepting - these properties will be useful in coming chapters.

## Matrix mutliplication is distributive

$$\bs{A}(\bs{B}+\bs{C}) = \bs{AB}+\bs{AC}$$

### Example

$$
\bs{A}=\begin{bmatrix}
    2 & 3 \\\\
    1 & 4 \\\\
    7 & 6
\end{bmatrix}, 
\bs{B}=\begin{bmatrix}
    5 \\\\
    2
\end{bmatrix}, 
\bs{C}=\begin{bmatrix}
    4 \\\\
    3
\end{bmatrix}
$$


$$
\begin{align*}
\bs{A}(\bs{B}+\bs{C})&=\begin{bmatrix}
    2 & 3 \\\\
    1 & 4 \\\\
    7 & 6
\end{bmatrix}\times
\left(\begin{bmatrix}
    5 \\\\
    2
\end{bmatrix}+
\begin{bmatrix}
    4 \\\\
    3
\end{bmatrix}\right)=
\begin{bmatrix}
    2 & 3 \\\\
    1 & 4 \\\\
    7 & 6
\end{bmatrix}\times
\begin{bmatrix}
    9 \\\\
    5
\end{bmatrix}\\\\
&=
\begin{bmatrix}
    2 \times 9 + 3 \times 5 \\\\
    1 \times 9 + 4 \times 5 \\\\
    7 \times 9 + 6 \times 5
\end{bmatrix}=
\begin{bmatrix}
    33 \\\\
    29 \\\\
    93
\end{bmatrix}
\end{align*}
$$

is equivalent to

$$
\begin{align*}
\bs{A}\bs{B}+\bs{A}\bs{C} &= \begin{bmatrix}
    2 & 3 \\\\
    1 & 4 \\\\
    7 & 6
\end{bmatrix}\times
\begin{bmatrix}
    5 \\\\
    2
\end{bmatrix}+
\begin{bmatrix}
    2 & 3 \\\\
    1 & 4 \\\\
    7 & 6
\end{bmatrix}\times
\begin{bmatrix}
    4 \\\\
    3
\end{bmatrix}\\\\
&=
\begin{bmatrix}
    2 \times 5 + 3 \times 2 \\\\
    1 \times 5 + 4 \times 2 \\\\
    7 \times 5 + 6 \times 2
\end{bmatrix}+
\begin{bmatrix}
    2 \times 4 + 3 \times 3 \\\\
    1 \times 4 + 4 \times 3 \\\\
    7 \times 4 + 6 \times 3
\end{bmatrix}\\\\
&=
\begin{bmatrix}
    16 \\\\
    13 \\\\
    47
\end{bmatrix}+
\begin{bmatrix}
    17 \\\\
    16 \\\\
    46
\end{bmatrix}=
\begin{bmatrix}
    33 \\\\
    29 \\\\
    93
\end{bmatrix}
\end{align*}
$$

In [13]:
A = np.array([[2, 3], [1, 4], [7, 6]])
A

array([[2, 3],
       [1, 4],
       [7, 6]])

In [14]:
B = np.array([[5], [2]])
B

array([[5],
       [2]])

In [15]:
C = np.array([[4], [3]])
C

array([[4],
       [3]])

$\bs{A}(\bs{B}+\bs{C})$:

In [16]:
D = A.dot(B+C)
D

array([[33],
       [29],
       [93]])

is equivalent to $\bs{AB}+\bs{AC}$:

In [17]:
D = A.dot(B) + A.dot(C)
D

array([[33],
       [29],
       [93]])

## Matrix mutliplication is associative

$$\bs{A}(\bs{BC}) = (\bs{AB})\bs{C}$$


In [18]:
A = np.array([[2, 3], [1, 4], [7, 6]])
A

array([[2, 3],
       [1, 4],
       [7, 6]])

In [19]:
B = np.array([[5, 3], [2, 2]])
B

array([[5, 3],
       [2, 2]])

$\bs{A}(\bs{BC})$:


In [20]:
D = A.dot(B.dot(C))
D

array([[100],
       [ 85],
       [287]])

is equivalent to $(\bs{AB})\bs{C}$:

In [21]:
D = (A.dot(B)).dot(C)
D

array([[100],
       [ 85],
       [287]])

## But matrix multiplication is not necessarily* commutative

$$\bs{AB} \neq \bs{BA}$$

*<sub>An exception is when one matrix is the Identity Matrix. We'll learn about this in a later lesson.<sub>

In [22]:
A = np.array([[2, 3], [6, 5]])
A

array([[2, 3],
       [6, 5]])

In [23]:
B = np.array([[5, 3], [2, 2]])
B

array([[5, 3],
       [2, 2]])

$\bs{AB}$:

In [24]:
AB = np.dot(A, B)
AB

array([[16, 12],
       [40, 28]])

is different from $\bs{BA}$:

In [25]:
BA = np.dot(B, A)
BA

array([[28, 30],
       [16, 16]])

## However vector multiplication is commutative

$$\bs{x^{ \text{T}}y} = \bs{y^{\text{T}}x} $$

In [32]:
x = np.array([[2], [6]])
x

array([[2],
       [6]])

In [27]:
y = np.array([[5], [2]])
y

array([[5],
       [2]])

$\bs{x^\text{T}y}$:

In [28]:
x_ty = x.T.dot(y)
x_ty

array([[22]])

is equivalent to $\bs{y^\text{T}x}$:

In [29]:
y_tx = y.T.dot(x)
y_tx

array([[22]])

## Exercise
Use numpy to show that

$$(\bs{AB})^{\text{T}} = \bs{B}^\text{T}\bs{A}^\text{T}$$

$\bs{A}$ and $\bs{B}$ are defined for you below.

In [34]:
A = np.array([[2, 3], [1, 4], [7, 6]])
A

array([[2, 3],
       [1, 4],
       [7, 6]])

In [35]:
B = np.array([[5, 3], [2, 2]])
B

array([[5, 3],
       [2, 2]])

$(\bs{AB})^{\text{T}}$:

In [36]:
# Complete the code below
AB_t = 
AB_t

SyntaxError: invalid syntax (<ipython-input-36-1032b3c0e167>, line 2)

$\bs{B}^\text{T}\bs{A}^\text{T}$:

In [37]:
# Complete the code below
B_tA = 
B_tA

SyntaxError: invalid syntax (<ipython-input-37-9c7737eb3719>, line 2)

# Representing a System of Linear Equations
Matrices and the dot product can be used to represent a system of linear equations; and linear algebra can be applied to these matrices to solve the system of linear equations. This fact is part of the reason why linear algebra has so many applications to machine learning.

In case your high school algebra is rusty - mine sure is! - a [linear equation](https://en.wikipedia.org/wiki/Linear_equation#Matrix_form) is an algebraic equation in which each term is either a constant or the product of a constant and (the first power of) a single variable (however, different variables may occur in different terms).

A [system of linear equations](https://en.wikipedia.org/wiki/System_of_linear_equations) (also called a linear system) is a collection of two or more linear equations involving the same set of variables. For example,

$$
\begin{cases}
y = -\frac{1}{2}x + 6 \\\\
y = -\frac{3}{4}x + \frac{13}{2}
\end{cases}
$$

A solution to a linear system is an assignment of values to the variables such that all of the equations are simultaneously satisfied. Above, the solutions are $x = 2$ and $y=5$.

Matrices can be used to describe a system of linear equations. But first, we need to [rewrite](https://www.khanacademy.org/math/algebra/two-var-linear-equations/standard-form/v/converting-from-slope-intercept-to-standard-form) those equations in [general (or standard) form](https://en.wikipedia.org/wiki/Linear_equation#Forms_for_two-dimensional_linear_equations). Doing that, we get:

$$
\begin{cases}
x + 2y = 12 \\\\
3x + 4y = 26
\end{cases}
$$

Once an equation is in standard form, i.e.:

$$Ax + By = C$$

it can be slightly rewritten using notation more amenable to matrices:

$$A_{1,1}x_{1} + A_{1,2}x_{2} = b_{1}$$

Note how the $B$ was replaced by $A_{1,2}$, the $y$ was replaced by $x_2$, and the $C$ was replaced by $b_1$. We did this in order to keep with matrix notation:  the $A$'s remained uppercase to denote that they'll belong to a matrix while the $x$'s and $b$ were left lowercase to denote that they'll belong to a vector. You can see that here, where we go from standard form to matrix form:
$$\begin{pmatrix} A_{1,1}&A_{1,2} \end{pmatrix}\begin{pmatrix}x_{1}\\x_{2}\end{pmatrix} = \begin{pmatrix}b_{1}\end{pmatrix}.$$

But since we have a system of two linear equations, standard form would actually look like this:

$$A_{1,1}x_{1} + A_{1,2}x_{2} = b_1$$
$$A_{2,1}x_{1} + A_{2,2}x_{2} = b_2$$

and so the matrix representation becomes:

$$\begin{pmatrix}
A_{1,1} & A_{1,2}\\
A_{2,1} & A_{2,2}
\end{pmatrix}
\begin{pmatrix}
x_1\\x_2
\end{pmatrix} = 
\begin{pmatrix}
b_1\\
b_2
\end{pmatrix}.$$

Plugging in the numbers from our linear system, we get:

$$\begin{pmatrix}
1 & 2\\
3 & 4
\end{pmatrix}
\begin{pmatrix}
x_1\\x_2
\end{pmatrix} = 
\begin{pmatrix}
12\\
26
\end{pmatrix}.$$

And then here's the magic! It's the **dot product** of the two leftmost terms, which we'll call $\bs{A}$ and $\bs{x}$, that gives us the original set of equations in standard form:

<img src="images/system-linear-equations-matrix-form.png" width="400" alt="Matrix form of a system of linear equation" title="Matrix form of a system of linear equation">
<em>Matrix form of a system of linear equations</em>

What's great about this notation is that it can be extended indefinitely for any number of equations with any number of variables. 

First, you have matrix $\bs{A}$, which is the leftmost matrix in the image above. What is matrix $\bs{A}$? It contains the *weights* for each variable (that's why there are $n$ columns) and each equation (that's why there are $m$ rows):

$$
\bs{A}=
\begin{bmatrix}
    A_{1,1} & A_{1,2} & \cdots & A_{1,n} \\\\
    A_{2,1} & A_{2,2} & \cdots & A_{2,n} \\\\
    \cdots & \cdots & \cdots & \cdots \\\\
    A_{m,1} & A_{m,2} & \cdots & A_{m,n}
\end{bmatrix}
$$

And then there's vector $\bs{x}$. It simply contains all of the $n$ unknown variables.

$$
\bs{x}=
\begin{bmatrix}
    x_1 \\\\
    x_2 \\\\
    \cdots \\\\
    x_n
\end{bmatrix}
$$

The whole system of equations can be therefore  be written like this:

$$
\begin{bmatrix}
    A_{1,1} & A_{1,2} & \cdots & A_{1,n} \\\\
    A_{2,1} & A_{2,2} & \cdots & A_{2,n} \\\\
    \cdots & \cdots & \cdots & \cdots \\\\
    A_{m,1} & A_{m,2} & \cdots & A_{m,n}
\end{bmatrix}
\times
\begin{bmatrix}
    x_1 \\\\
    x_2 \\\\
    \cdots \\\\
    x_n
\end{bmatrix}
=
\begin{bmatrix}
    b_1 \\\\
    b_2 \\\\
    \cdots \\\\
    b_m
\end{bmatrix}
$$

Or more simply:

$$\bs{Ax}=\bs{b}$$

Just remember that $\bs{A}$ is a matrix and $\bs{b}$ and $\bs{x}$ are vectors! And more importantly, note that $\bs{Ax}$ signifies the dot product operation.

> If you're curious about solving systems of linear equations, there are several named methods, like [Gauss-Jordan](https://en.wikipedia.org/wiki/Gaussian_elimination) elimination, which can be expressed algorithmically as matrix elementary row operations. You could also check out this [module in the sympy package](http://docs.sympy.org/dev/modules/solvers/solveset.html#sympy.solvers.solveset.linsolve).

# What's next?
Next we will see two important matrices: the identity matrix and the inverse matrix. We will see why they are important in linear algebra and how to use them with numpy. Finally, we will see an example on how to solve a system of linear equations with the inverse matrix.

# References

- [Math is fun - Multiplying matrices](https://www.mathsisfun.com/algebra/matrix-multiplying.html)