<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Linear Algebra

---

## Introduction

### Demo

Let's look at the Ames housing data again.

In [2]:
import numpy as np
import pandas as pd

In [3]:
ames_df = pd.read_csv('C:/Users/Ashley/Desktop/General_Assembly/Lessons/descriptive_statistics_in_python-master/assets/data/ames_train.csv')

In [5]:
# Pick out two features to focus on
X = ames_df.loc[:, ['OverallQual', 'GarageCars']]
print(X.head())
X.shape

   OverallQual  GarageCars
0            7           2
1            6           2
2            7           2
3            7           3
4            8           3


(1460, 2)

`X` is a 2D array of numbers with 1460 rows and 3 columns. We call a two-dimensional array of numbers a **matrix**.

In [6]:
y = ames_df.loc[:, 'SalePrice']
print(y.head())
print(y.shape)

0    208500
1    181500
2    223500
3    140000
4    250000
Name: SalePrice, dtype: int64
(1460,)


`y` is a one-dimensional sequence of numbers with 1460 elements. We call a one dimensional sequence of numbers a **vector**.

We could model the average price of a house as some baseline number, plus some multiple of "OverallQual", plus some multiple of "GarageCars." To do that, let's add a column of 1s to `X` and consider multiplying each column by a constant and adding up the results within each row

In [7]:
X.loc[:, 'Constant'] = 1

In [8]:
X.head()

Unnamed: 0,OverallQual,GarageCars,Constant
0,7,2,1
1,6,2,1
2,7,2,1
3,7,3,1
4,8,3,1


Let's take a wild guess that average sales price in dollars is $10,000 * OverallQual + 20,000 * GarageCars + 100,000$:

In [9]:
wild_guess = (10_000 * X.loc[:, 'OverallQual']
              + 20_000 * X.loc[:, 'GarageCars']
              + 100_000 * X.loc[:, 'Constant']
             )
wild_guess.head()

0    210000
1    200000
2    210000
3    230000
4    240000
dtype: int64

How good was our guess? Let's calculate a standard measure of regression model error, the Root Mean Squared Error (RMSE).

In [10]:
np.sqrt(((wild_guess - y)**2).mean())

62851.14548349673

We could play around with the numbers we multiplied each column by (called *coefficients*) to try to reduce this measure of error, but luckily we don't have to: **it is possible to use linear algebra to derive a formula for the set of coefficients that gives us the smallest possible RMSE.**

If you are curious, the formula is $(X^TX)^{-1}X^Ty$. My favorite presentation of a derivation of this result is at the end of [this video](https://www.youtube.com/watch?v=5u4G23_OohI).

Incidentally, what we just did is **linear regression**, the first type of model that we will discuss in this course.

### Linear Algebra and Data Science

Not only linear regression but most of what we do in data science consists of vector and matrix operations, which are the topic of linear algebra. As a result, **linear algebra is the most important area of math for data science.**

You don't need to be able to carry out linear algebra calculations by hand, but a data science does need to have a strong understanding of how they work.

We can't teach you everything that a data scientist needs to know about linear algebra in this course, but we can give you a refresher if you are familiar with it and introduce you to some of the key concepts if you are not. The lesson README provides links to additional resources.

## NumPy

NumPy is the foundational Python library for working with vectors and matrices.

NumPy arrays look similar to Python lists, but they represent vectors and matrices and are optimized for math rather than for serving as general-purposes containers.

In [11]:
# Python lists aren't designed for doing math
[1,2,3] + [4,5,6]

[1, 2, 3, 4, 5, 6]

In [13]:
# NumPy applies basic arithmetic operations elementwise
np.array([1,2,3]) + np.array([4,5,6])

array([5, 7, 9])

Levels of nesting in a NumPy array correspond to different dimensions.

In [16]:
# Python lists can be nested
[[1,2],[3,4], [5,6]]

[[1, 2], [3, 4], [5, 6]]

In [17]:
# NumPy interprets nesting as indicating dimensions.
# This array represents a 3x2 matrix (3 rows, 2 columns).
np.array([[1,2],[3,4], [5,6]])

array([[1, 2],
       [3, 4],
       [5, 6]])

In [18]:
# A NumPy array has a `shape` attribute
np.array([[1,2],[3,4], [5,6]]).shape

(3, 2)

`pandas` DataFrames are built on top of Numpy arrays

In [26]:
# Create a DataFrame
b = pd.DataFrame({'x': [1, 3, 5], 'y': [2, 4, 6]})
b

Unnamed: 0,x,y
0,1,2
1,3,4
2,5,6


In [27]:
# Pull out the underlying numpy array
b.values

array([[1, 2],
       [3, 4],
       [5, 6]], dtype=int64)

## Scalars, Vectors, and Matrices

- A **scalar** is single number, e.g. $a=5.328$ or $b=7$.
- A **vector** is an ordered sequence of numbers, e.g. $\vec{u} = \left[ \begin{array}{c} 1&3&7 \end{array} \right]$
- An $m\times n$ **matrix** is a rectangular array of numbers with $m$ rows and $n$ columns. The entry in the $i$th row and $j$th column of a matrix $\mathbf{A}$ is denoted $a_{ij}$.

$$\mathbf{A}= \left[ \begin{array}{c}
a_{11} & a_{12} & ... & a_{1n}  \\
a_{21} & a_{22} & ... & a_{2n}  \\
... & ... & ... & ... \\
a_{m1} & a_{m2} & ... & a_{mn}
\end{array} \right]$$

**Conventions**:

- Use a lowercase letter to denote a scalar.
- Use a lowercase letter to denote a vector (such as `y`), possibly with some kind of annotation such as an arrow overhead (but not in code).
- Use an uppercase letter to denote a matrix (such as `X`).
- **Math**: start counting with 1
- **Python**: Start counting with 0.

**Exercise (1 min., post to Slack right away)**

- What is $M_{32}$ in the example below, using the standard **mathematician**'s convention of starting counting with 1?

In [28]:
M = np.array([[1, 3, 7], [2, 6, 3], [9, 8, 0], [4, 5, 6]])
M

array([[1, 3, 7],
       [2, 6, 3],
       [9, 8, 0],
       [4, 5, 6]])

8

- What is `M[3, 2]`, using the **Python** convention of starting counting with 0?

$\blacksquare$ 6


## Basic Matrix Algebra

### Addition and Subtraction
Vector **addition** is done elementwise:

$\vec{v} + \vec{w} =
\left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right] + \left[ \begin{array}{c}
1 \\
0 \\
1
\end{array} \right] = 
\left[ \begin{array}{c}
1+1 \\
3+0 \\
7+1
\end{array} \right] = 
\left[ \begin{array}{c}
2 \\
3 \\
8
\end{array} \right]
$

So is vector **subtraction**:

$\vec{v} - \vec{w} =
\left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right] - \left[ \begin{array}{c}
1 \\
0 \\
1
\end{array} \right] = 
\left[ \begin{array}{c}
1-1 \\
3-0 \\
7-1
\end{array} \right] = 
\left[ \begin{array}{c}
0 \\
3 \\
6
\end{array} \right]
$

In [40]:
# Create numpy arrays corresponding to v and w above
v = np.array([1,3,7])
w = np.array([1,0,1])

In [33]:
# Add and subtract
print(v+w)
print(v-w)

[2 3 8]
[0 3 6]


**Matrix** addition and subtraction is also done elementwise.

Addition:

$A + B = \left[ \begin{array}{c}
1 & 2 & 6  \\
4 & 8 & 3  \\
9 & 2 & 1 \\
\end{array} \right]
+
\left[ \begin{array}{c}
9 & 7 & 3  \\
6 & 8 & 2  \\
4 & 2 & 4 \\
\end{array} \right]
=
\left[ \begin{array}{c}
1+9 & 2+7 & 6+3  \\
4+6 & 8+8 & 3+2  \\
9+4 & 2+2 & 1+4 \\
\end{array} \right]
=
\left[ \begin{array}{c}
10 & 9 & 9  \\
10 & 16 & 5  \\
13 & 4 & 5 \\
\end{array} \right]
$

Subtraction:

$A - B = \left[ \begin{array}{c}
1 & 2 & 6  \\
4 & 8 & 3  \\
9 & 2 & 1 \\
\end{array} \right]
-
\left[ \begin{array}{c}
9 & 7 & 3  \\
6 & 8 & 2  \\
4 & 2 & 4 \\
\end{array} \right]
=
\left[ \begin{array}{c}
1-9 & 2-7 & 6-3  \\
4-6 & 8-8 & 3-2  \\
9-4 & 2-2 & 1-4 \\
\end{array} \right]
=
\left[ \begin{array}{c}
-8 & -5 & 3  \\
-2 & 0 & 1  \\
5 & 0 & -3 \\
\end{array} \right]
$

In [34]:
# Create 2D NumPy arrays corresponding to A and B above
A = np.array([[1,2,6],[4,8,9],[9,2,1]])
B =np.array([[9,7,3],[6,8,2],[4,2,4]])

In [35]:
# Add and subtract
print(A+B)
print(A-B)

[[10  9  9]
 [10 16 11]
 [13  4  5]]
[[-8 -5  3]
 [-2  0  7]
 [ 5  0 -3]]


### Multiplication

Multiplying with vectors and matrices can take a few different forms.

#### Elementwise multiplication

$\vec{v} * \vec{w} =
\left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right] * \left[ \begin{array}{c}
1 \\
0 \\
1
\end{array} \right] = 
\left[ \begin{array}{c}
1*1 \\
3*0 \\
7*1
\end{array} \right] = 
\left[ \begin{array}{c}
1 \\
0 \\
7
\end{array} \right]
$


In [36]:
# NumPy uses `*` for elementwise multiplication
print(v*w)

[1 0 7]


#### Scalar Multiplication: A Vector or Matrix times a *Number*
Multiplying a vector or matrix by a scalar (single quantity) is a matter of simply multiplying all of its elements by that scalar:

$ 2 \cdot \vec{v} = 2 \cdot \left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right] = 
 \left[ \begin{array}{c}
2 \cdot 1 \\
2 \cdot 3 \\
2 \cdot 7
\end{array} \right] = 
 \left[ \begin{array}{c}
2 \\
6 \\
14
\end{array} \right]$ 

$3 \cdot A =
3 \cdot \left[ \begin{array}{c}
1 & 2 & 6  \\
4 & 8 & 3  \\
9 & 2 & 1 \\
\end{array} \right]
=
\left[ \begin{array}{c}
3\cdot1 & 3\cdot2 & 3\cdot6  \\
3\cdot4 & 3\cdot8 & 3\cdot3  \\
3\cdot9 & 3\cdot2 & 3\cdot1 \\
\end{array} \right]
=
\left[ \begin{array}{c}
3 & 6 & 18  \\
12 & 24 & 9  \\
27 & 6 & 3 \\
\end{array} \right]
$

In [37]:
# Multiply v by a scalar
print(2*v)

[ 2  6 14]


In [38]:
# Multiply A by a scalar
print(A*3)

[[ 3  6 18]
 [12 24 27]
 [27  6  3]]


#### Dot Product: Multiplying *Two Vectors* and Reducing the Result to a Single Number
Suppose three girl scout troops have 1, 3, and 7 members, respectively. Each member of the first troop sells 5 boxes of girl scout cookies; each member of the second troop sells 4; and each member of the third troop sells 2. How many boxes did they sell in all?

$1 \cdot 5 + 3 \cdot 4 + 7 \cdot 2 = 5 + 12 + 14 = 31$

This calculation is an examples of a **dot product**, which you calculate by multiplying corresponding elements of two vectors and adding the results:

$\vec{v} = \left[ \begin{array}{c}
1 \\
3 \\
7
\end{array} \right], \vec{w} = \left[ \begin{array}{c}
5 \\
4 \\
2
\end{array} \right]$

$ \vec{v} \cdot \vec{w} = \left[ \begin{array}{c}
1 & 3 & 7
\end{array} \right] \cdot \left[ \begin{array}{c}
5 \\
4 \\
2
\end{array} \right] = 1 \cdot 5 + 3 \cdot 4 + 7 \cdot 2 = 31 $

If you think of vectors as arrows in space, then the dot product of two vectors reflects both *how big they are* and *the extent to which they point in the same direction.*

In [41]:
# Calculate the dot product of v and w
print(v@w)

8


#### Matrix-Vector Multiplication

Suppose you wanted to take the dot products of a bunch of vectors with one particular vector. You could arrange each of the vectors in the first set horizontally and stack them on top of each other, and put the results of those dot products in a stack in the same order.

Let's see [what this process looks like](http://matrixmultiplication.xyz/).

**This is exactly what we did in the demo at the start of this lesson.**

Consider the first row of `X`:

In [42]:
X.head(1)

Unnamed: 0,OverallQual,GarageCars,Constant
0,7,2,1


We multiplied the three elements of this row by $10,000$, $20,000$, and $100,000$, respectively, and added up the results. That's just this dot product:

$\left[ \begin{array}{c}
7 &
2 &
1
\end{array} \right] \cdot \left[ \begin{array}{c}
10,000 \\
20,000 \\
100,000
\end{array} \right]
= 7 * 10,000 + 2 * 20,000 + 1 * 100,000 = 70,000 + 40,000 + 100,000 = 210,000
$

To calculate more predictions for more houses, we just stack up the vectors representing the features of those houses and stack up the resulting dot products:

$\left[ \begin{array}{c}
7 &
2 &
1 \\
6 &
2 &
1 \\
& \ldots &
\end{array} \right] \cdot \left[ \begin{array}{c}
10,000 \\
20,000 \\
100,000
\end{array} \right]
=
\left[ \begin{array}{c}
7 * 10,000 + 2 * 20,000 + 1 * 100,000 \\
6 * 10,000 + 2 * 20,000 + 1 * 100,000 \\
\ldots
\end{array} \right]
= 
\left[ \begin{array}{c}
210,000 \\
200,000 \\
\ldots
\end{array} \right]
$

In [44]:
# Calculate the dot product of X with our wild-guess coefficients
X@np.array([10000,20000,10000])

0       120000
1       110000
2       120000
3       140000
4       150000
5       100000
6       130000
7       120000
8       120000
9        80000
10       80000
11      160000
12       80000
13      140000
14       90000
15      120000
16      110000
17       90000
18      100000
19       80000
20      150000
21      100000
22      130000
23      100000
24       80000
25      150000
26      100000
27      150000
28       80000
29       70000
         ...  
1430    100000
1431    110000
1432     70000
1433    110000
1434    100000
1435    110000
1436     90000
1437    150000
1438    110000
1439    120000
1440    110000
1441    110000
1442    170000
1443     90000
1444    120000
1445     90000
1446     80000
1447    130000
1448     70000
1449     60000
1450     60000
1451    150000
1452    100000
1453     60000
1454    120000
1455    110000
1456    110000
1457    100000
1458     80000
1459     80000
Length: 1460, dtype: int64

#### Matrix-Matrix Multiplication

Suppose we wanted to dot-product each of one bunch of vectors with each of another bunch of vectors. We could arrange the vectors in the first set horizontally and stack them on top of each other as before; arrange the vectors in the second set vertically and stack them next to each other; and store the dot product of the $i$th row of the first set and the $j$th column of the second set in the $i$th row and $j$th column of an output matrix. That's matrix multiplication.

Let's see [what this process looks like](http://matrixmultiplication.xyz/).

**Matrix multiplication is just a bunch of vector dot products.**

In the demo at the start of the lesson, we said that $(X^TX)^{-1}X^Ty$ was the formula for the coefficients that minimize RMSE. Let's take a look at the $X^TX$ component of this expression.

$X^T$ is the result of swapping rows with columns, so that e.g. the 10th row of $X$ is the 10th column of $X^T$:

In [45]:
X.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1450,1451,1452,1453,1454,1455,1456,1457,1458,1459
OverallQual,7,6,7,7,8,5,8,7,7,5,...,5,8,5,5,7,6,6,7,5,5
GarageCars,2,2,2,3,3,2,2,2,2,1,...,0,3,2,0,2,2,2,1,1,1
Constant,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


To make the problem easier, let's throw away all but the first four rows of `X`.

In [46]:
X_small = X.head(4)

In [47]:
print(X_small.T.values)
print(X_small.values)

[[7 6 7 7]
 [2 2 2 3]
 [1 1 1 1]]
[[7 2 1]
 [6 2 1]
 [7 2 1]
 [7 3 1]]


Multiplying $X^T$ by $X$ amounts to finding the dot product of every row of $X^T$ (i.e. every column of $X$) with every column of $X$.

![](../assets/images/matrix_mult_whiteboard.png)

In [48]:
# Calculate the dot product of X.T with X
X.T@X

Unnamed: 0,OverallQual,GarageCars,Constant
OverallQual,57105,16642,8905
GarageCars,16642,5374,2580
Constant,8905,2580,1460


**Exercise (4 mins., in pairs)**

Compute this matrix product by hand (I suggest on paper), and then with Python. Make sure that your answers agree!

**Note:** data scientists don't have to be good at multiplying matrices by hand -- the computer does that. However, working some problems by hand is important for developing an understanding of the process.

$
\left[ \begin{array}{c}
1 & 2 \\
4 & 8 \\
\end{array} \right]
\times \left[ \begin{array}{c}
2 & 9 \\
3 & 6 \\
\end{array} \right]
$

In [50]:
n =np.array([[1,2],[4,8]])
m=np.array([[2,9],[3,6]])
print(n@m)

[[ 8 21]
 [32 84]]


In [51]:
np.array([[8,21],[32,84]])

array([[ 8, 21],
       [32, 84]])

$\blacksquare$

## How Linear Algebra Is Used in Machine Learning

- Supervised learning models usually use a **matrix** of features $X$ (with one column for each variable and one row for each observation) to predict a corresponding **vector** target $y$ (with one row for each observation).
- Unsupervised learning models usually look for structure in a **matrix** of features $X$.
- The math underlying these models is typically a set of standard linear algebra operations (multiplication, inversion, projection, etc.)

The computer will do the math for you, but understanding how it works will help you choose appropriate algorithms and diagnose problems when you aren't getting what you expect.