In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
import numpy.linalg as la

## Linear Algebra

https://numpy.org/doc/stable/reference/routines.linalg.html

Vectors

* Norm
* Dot Product
* Similarity
* Projection
* Linear Independence
    
Matricies

* Operations
* System of linear equations
* Eigenvalues eigenvectors
* Matrix Decomposition


## Vectors

**Definition** A vector is an object that has a magnitude and a direction

###  Magnitude of a vector 

The length of a vector v = $(v_1,v_2,...,v_n)$ is called its **norm**
 
 <div style="font-size: 115%;">
$$ ||v|| = \sqrt{\sum_{i=1}^{n}v_i^2}$$
</div>

This is often referred to as the L2 Norm, $||v||_2$

#### Unit Vector

Denote the unit vector for a vector v by $\hat{v}$

<div style="font-size: 115%;">
$$ \hat{v} = \frac{v}{||v||}$$
</div>

In [3]:
def Norm(v):
    return np.sqrt(np.sum(v**2))
v = np.array([1,2,3,4,5])
Norm_v = Norm(v)
Unit_v = v/Norm_v
print(f'Norm v : {Norm_v} \nUnit v: {Unit_v}\nNorm Unit v: {Norm(Unit_v)}' )

Norm v : 7.416198487095663 
Unit v: [0.13483997 0.26967994 0.40451992 0.53935989 0.67419986]
Norm Unit v: 1.0


### Direction of a vector

Given a vector $v = (v_1,v_2,...,v_n)$, the direction of $v$ is the vector $(\frac{v_1}{||v||},\frac{v_2}{||v||},...,\frac{v_n}{||v||})$

In [4]:
Unit_v

array([0.13483997, 0.26967994, 0.40451992, 0.53935989, 0.67419986])

### Dot product 

Multiply vectors to get a scalar

<div style="font-size: 115%;">
$$  v = (v_1,v_2,...,v_n), w = (w_1,w_2,...,w_n)$$
</div>    

#### Algebraic Definition

<div style="font-size: 115%;">
$$  v\centerdot{ w} = \sum_i^n v_i*w_i $$
</div>

#### Geometric Definition

<div style="font-size: 115%;">
$$  v\centerdot{ w} = cos(\theta)\Vert v \Vert \Vert w\Vert$$
</div>

In [5]:
v = np.array([1,2,3,4,5])
w = np.array([2,4,6,8,10])
np.vdot(v,w),v.dot(w),w.dot(v)

(110, 110, 110)

In [6]:
v = np.array([2,2])
w = np.array([2,0])
np.vdot(v,w)

4

In [7]:
import math

round(np.cos(math.radians(45)) * la.norm(v) * la.norm(w),10)


4.0

### Measures of similarity of two vectors

Similarity of vectors is used heavily in Support Vector Machines and for semantic similarity in NLP.

In [8]:
v = np.array([1,1,1,1])

w = np.array([2,2,2,2])
np.dot(v,w)/(la.norm(v)*la.norm(w))

1.0

#### Orthogonal (perpendicular) vectors
 
Two vectors are orthogonal when $a\cdot{b} = 0$

In [9]:
a = np.array((1,1))
b= np.array((-1,1))
adotb = np.vdot(a,b)
adotb

0

### Projection

![](proj.png)

#### Vector Projection of x onto y

<div style="font-size: 115%;">
$$ proj_y x = \frac{x\cdot{y}}{||y||^2}y$$
</div>

#### Scalar Projection of x onto y

The length of the projection

<div style="font-size: 115%;">
$$ proj_y x = \frac{x\cdot{y}}{||y||}$$
</div>

$x\cdot{y}$ is the length of the projection of x onto the unit vector $\hat{y}$ ( $\frac{y}{||y||}$ )

In [10]:
y = np.array([3,0])
x = np.array([2,1])
unit_y = y/Norm(y)
np.vdot(x,y)/Norm(y)

2.0

In [11]:
theta = np.arcsin(1/Norm(x))
Norm(x)*np.cos(theta)                  

2.0

### Linear Independence

A set of  vectors, $(v_1,v_2...v_n)$ are linearly dependent if one of the vectors can be expressed as a linear combination of the others. At least two of the vectors lie on the same line (they differ in magnitude only)

Vectors that are not linearly dependent are linearly independent.

A set of vectors are linearly independent if:
<div style="font-size: 115%;"> 
$$ a_1 v_1 + a_2 v_2 + ...+ a_n v_n = 0 \text{ iff all } a_i = 0$$
</div>


In [12]:
# Not linear independent since 2v-1w = 0, 
v = np.array([1,2,3,4])
w = np.array([2,4,6,8])

np.sum(2*v + (-1*w))


0

In [13]:
v = np.array([1,2,3,4])
w = np.array([3,5,7,9])
np.sum(2*v + (-1*(w - 1)))

0

In [14]:
2*v-w+1

array([0, 0, 0, 0])

In [15]:
print(f'cosine of angle between the vectors: {np.round(np.dot(v,w)/(la.norm(v)*la.norm(w)),2)}')

cosine of angle between the vectors: 1.0


## Matricies (2-dimensional arrays)

Matricies specify **Linear Transformations** 


### Matrix Multiplication Operations

#### Matrix Multiplication

To multiply they must be compatible: number of columns of 1st matrix = number of rows of second matrix.

The result is the number of rows of 1st matrix and number of columns of 2nd. 
    

In [16]:
# numpy matmul
A = np.array([1,2,3,4,5,6,7,8]).reshape(4,2)
B = np.array([1,2,3,4,5,6]).reshape(2,3)
# 4x2 * 2x3 = 4x3

In [17]:
print(A@B)

[[ 9 12 15]
 [19 26 33]
 [29 40 51]
 [39 54 69]]


##### Matmul implements @

In [18]:
M = np.matmul(A,B)
print("Shape: ", M.shape)
print("M\n",M)

Shape:  (4, 3)
M
 [[ 9 12 15]
 [19 26 33]
 [29 40 51]
 [39 54 69]]


In [19]:
# numpy dot product
np.dot(A,B)

array([[ 9, 12, 15],
       [19, 26, 33],
       [29, 40, 51],
       [39, 54, 69]])

#### Matrix by a scalar

In [20]:
A*2

array([[ 2,  4],
       [ 6,  8],
       [10, 12],
       [14, 16]])

In [21]:
np.dot(A,2)

array([[ 2,  4],
       [ 6,  8],
       [10, 12],
       [14, 16]])

#### Hadamard Product

The Hadamard produt is element by element multiplication. Often denoted as $A \circ B$.

In python, it is the * operator



In [22]:
A = np.array( [[1,1],[0,1]] )
B = np.array( [[2,0],[3,4]] )

print("A\n",A)
print("B\n",B)
M = A*B # Hadamard
print("Shape of A*B: ", M.shape)
print("M\n", M)

A
 [[1 1]
 [0 1]]
B
 [[2 0]
 [3 4]]
Shape of A*B:  (2, 2)
M
 [[2 0]
 [0 4]]


Matrices must have same shape or can broadcast one into the other

In [23]:
C = np.array([5]*8).reshape(4,2)
try: 
    A*C
except ValueError:
    print('Value Error')

Value Error


In [24]:
D = np.array([[5],[6]]).reshape(2,1)
print("A\n",A)
print("D\n",D)
print(A.shape,D.shape)
print(A*D)

A
 [[1 1]
 [0 1]]
D
 [[5]
 [6]]
(2, 2) (2, 1)
[[5 5]
 [0 6]]


#### Matrix times a vector

Matrix as a transformation operator on a vector changes orientation and length of a vector ,i.e. it rotates and stretches or shrinks a vector.

In numpy: np.matmul or np.dot

In [25]:
A= np.arange(10).reshape(5,2)
v = np.array([11,12])
print("A\n",A)
print("v\n",v)
b = A.dot(v)
print("b\n",b)

A
 [[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
v
 [11 12]
b
 [ 12  58 104 150 196]


In [26]:
print(np.matmul(A,v))

[ 12  58 104 150 196]


### Diagonal and Trace

The diagonal of a matrix is $a_{ij}$ where i=j elements.

The trace of a matrix is the sum of diagonal elements.

In [27]:
A = np.array([1,2,3])
np.diag(A)


array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [28]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [29]:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
print("A\n",A)
print(f'Trace of A: {np.trace(A)}')

A
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Trace of A: 15


### Transpose

In [30]:
A = np.array([[1,2,3],[4,5,6]])
print("A\n",A)
print("A-transpose\n",A.transpose())
print("<AA-transpose>\n",np.dot(A,A.transpose())) #Notice its a square matrix


A
 [[1 2 3]
 [4 5 6]]
A-transpose
 [[1 4]
 [2 5]
 [3 6]]
<AA-transpose>
 [[14 32]
 [32 77]]


In [31]:
A.T

array([[1, 4],
       [2, 5],
       [3, 6]])

### Determinant
 
The determinent of a matrix only exists for square matrices.

For 2x2 matrix $\begin{bmatrix}a & b \\ c & d\end{bmatrix}$, det = ad-bc 

If the determinent is non-zero, then matrix has an inverse.

The determinent reflects what the linear transformation of a matrix does. It is related to area or volume of a region.

In [32]:
A = np.arange(4).reshape(2,2)
la.det(A)

-2.0

###  Matrix Inversion ($A^{-1}$)

$$ AA^{-1} = I$$

To invert a matrix it must be a square matrix and have non-zero determinant.

If det(A) = 0 then A is called a singular matrix

In [33]:
A = np.arange(9).reshape(3,3)
A[0,0] = 1
A[2,2] = 1
print("A\n",A)
B = la.inv(A)
print("A Inverse\n",B)
print("AA-inverse\n",np.round(np.matmul(A,B)))


A
 [[1 1 2]
 [3 4 5]
 [6 7 1]]
A Inverse
 [[ 3.1 -1.3  0.3]
 [-2.7  1.1 -0.1]
 [ 0.3  0.1 -0.1]]
AA-inverse
 [[ 1.  0.  0.]
 [ 0.  1.  0.]
 [-0.  0.  1.]]


In [34]:
A = np.array([[1,3],[2,6]])
try:
    la.inv(A)
except:
    print("Singular Matrix")

Singular Matrix


In [35]:
A

array([[1, 3],
       [2, 6]])

#### Moore-Penrose Inverse (for real-valued matrices)

For any mxn matrix (not necessarily square) A that has full rank (i.e. independent rows or columns). If columns of A are linearly independent then:
$$ A^+ = (A^TA)^{-1}A^T$$
$$\text{Left Inverse } A^+A = I$$

If the rows of A are linearly independent:
$$ A^+ = A^T(AA^T)^{-1}$$
$$\text{Right Inverse } AA^+ = I$$

In [36]:
A = np.array([[1,2,3,4],
              [5,7,9,10]]).reshape(4,2)

print("Linearly independent columns\n",A)

AT_A = np.matmul(A.T,A)
A_plus = np.matmul(la.inv(AT_A),A.T)
print(f'A+\n{A_plus}')
I = np.matmul(A_plus,A).round(2)
I

Linearly independent columns
 [[ 1  2]
 [ 3  4]
 [ 5  7]
 [ 9 10]]
A+
[[-0.38515901 -0.17314488 -0.45229682  0.46289753]
 [ 0.32862191  0.16607774  0.41342756 -0.32155477]]


array([[1., 0.],
       [0., 1.]])

In [37]:
A_plus = la.pinv(A)
I = np.matmul(A_plus,A)

print(f'Left inverse: {A_plus}\nI = {I.round(4)}')

Left inverse: [[-0.38515901 -0.17314488 -0.45229682  0.46289753]
 [ 0.32862191  0.16607774  0.41342756 -0.32155477]]
I = [[ 1. -0.]
 [-0.  1.]]


### Linear operations on vectors (in a vector space)
 
Matrices are linear operators acting on column vectors.
       
#### Linear system of equations
<div style="font-size: 115%;"> 
$$  Ax = b$$
</div>

$$
\begin{bmatrix}
    1 & 2 & 3 \\
    4 & 5 & 6 \\
    7 & 8 & 9
\end{bmatrix}
\begin{bmatrix} x_1 \\ x_2 \\ x_3\end{bmatrix}
=
\begin{bmatrix} 21 \\ 32 \\ 43\end{bmatrix}
$$
  
To solve $x = A^{-1}b$ you must find $A^{-1}$ 

Finding the inverse means solving $AA^{-1} = I$. This is inefficient (and can induce numerical error) because you still had to solve a linear system of equations.

More efficient algorithms use matrix decomposition such as LU decomposition, which decomposes A into an Upper Triangular (U) and a Lower Triangular Matrix (L). Solves L(U(x)) = b
   


In [38]:
A = np.array((2,4,3,6,16,10,4,12,9)).reshape(3,3)
b = np.array((21,32,43))
x = la.solve(A,b)
print("Solution\n",x)

Solution
 [ 10.   -11.75  16.  ]


### Eigenvalues and Eigenvectors

Matricies are linear operators. The eigenvectors of an linear operator(matrix) are those vectors that don't change direction under the linear transformation. They stretch (or shrink) by the amount indicated by the eigenvalue. 

<div style="font-size: 115%;">
$$Ax =  \lambda x$$
</div>

$\lambda$ is an eigenvalue, x is an eigenvector

Only nxn matrices have eigenvectors. A nxn matrix will have n eigenvectors.

Two useful properties:

* Sum of the eigenvalues = trace of A, 
* Product of eigenvalues = det(A) 

    
To find all n eigenvalues and eigenvectors solve:

<div style="font-size: 115%;">   
$$(A - \lambda I)x = 0$$ 
</div>

where:

* $\lambda$ is the vector of eigenvalues
* x is the matrix of eigenvectors (i.e. the columns of x)

We use he numpy linear algebra module has function eig to find the eigenvectors and eigenvalues
     

In [39]:
A = np.array([i for i in range(9)]).reshape(3,3)
print("A\n",A)
E = la.eig(A)

# Eigenvalues
print("The eigenvalues are: ", np.round(E[0],5))

# Eigenvectors

print("The eigenvecors are: \n" , np.round(E[1],3))

A
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
The eigenvalues are:  [13.34847 -1.34847 -0.     ]
The eigenvecors are: 
 [[ 0.165  0.8    0.408]
 [ 0.506  0.104 -0.816]
 [ 0.847 -0.591  0.408]]


In [40]:
e,v = la.eig(A)
print("Eigenvalues\n",e)
print("Eigenvectors\n",v)

Eigenvalues
 [ 1.33484692e+01 -1.34846923e+00 -1.26963291e-15]
Eigenvectors
 [[ 0.16476382  0.79969966  0.40824829]
 [ 0.50577448  0.10420579 -0.81649658]
 [ 0.84678513 -0.59128809  0.40824829]]


In [41]:
for i in range(len(E[0])):
    print(f"Eigenvalue {i+1}: {E[0][i]}, Eigenvector: {E[1][:,i]} \n")

Eigenvalue 1: 13.348469228349538, Eigenvector: [0.16476382 0.50577448 0.84678513] 

Eigenvalue 2: -1.3484692283495348, Eigenvector: [ 0.79969966  0.10420579 -0.59128809] 

Eigenvalue 3: -1.2696329104036546e-15, Eigenvector: [ 0.40824829 -0.81649658  0.40824829] 



#### Positive Definite Matrices

A Positive Definite matrix if:

<div style="font-size: 115%;"> 
$$x^tAx > 0 \text{ for all non-zero x} \in R^n $$
</div>

A positive semi-definite matrix if:

<div style="font-size: 115%;"> 
$$x^tAx \ge 0 \text{ for all non-zero x} \in R^n $$ 
</div>

A matrix a A is Positive Definite if $<Ax,x> \ge 0$ for all real-valued vectors x. <.,.> is the inner (i.e. dot product)

A symmetric matrix is a Positive Definite matrix if its eigenvalues are all > 0, $\ge0$ for semi-definite.

For 2x2 matrix, 
$\begin{bmatrix}a & b \\b & c\end{bmatrix}\text{ Positive Definite if } ac-b^2 > 0 \text{ for a > 0}$
 
Positive Definite Matricies are the generalization of positive real numbers to matricies. This means you can take square roots.



### Eigendecomposition

If an n√ón matrix A has n linearly independent eigenvectors, then A may be decomposed as follows:

<div style="font-size: 115%;"> 
$$A = B \Lambda B^{-1}$$
</div>

$\Lambda$ is a diagonal matrix of the eigenvalues
B is a matrix whose columns are the independent eigenvectors



In [42]:
A = np.array([[0,1,1],[2,1,0],[3,4,5]])
print("A\n",A)
u, V = la.eig(A)
print(f'B\n {V}\nLAMBDA\n {np.diag(u)}\nB-inverse\n {la.inv(V)}')

A
 [[0 1 1]
 [2 1 0]
 [3 4 5]]
B
 [[ 1.80228488e-01  6.72063326e-01 -2.06609884e-16]
 [ 7.42582208e-02 -7.24947536e-01 -7.07106781e-01]
 [ 9.80817725e-01  1.50936928e-01  7.07106781e-01]]
LAMBDA
 [[ 5.85410197  0.          0.        ]
 [ 0.         -0.85410197  0.        ]
 [ 0.          0.          1.        ]]
B-inverse
 [[ 0.70644772  0.82712339  0.82712339]
 [ 1.29850561 -0.22181124 -0.22181124]
 [-1.25707872 -1.09994388  0.31426968]]


#### $B \Lambda B^{-1}$

In [43]:
print(np.dot(V,np.dot(np.diag(u), la.inv(V))).round(5))


[[ 0.  1.  1.]
 [ 2.  1. -0.]
 [ 3.  4.  5.]]


### Singular Value Decomposition
$$A = UDV^t$$

Where U and V are orthogonal, i.e. 
      
$$U^{-1}=U^t$$ 

$$V^{-1} = V^t$$

D is a diagonal matrix, the singular values of A

SVD is used in Latent Semantic Analysis, Principal Component Analysis

In [44]:
X = np.array((1,1/2,1/3,1/2,1/3,1/4,1/3,1/4,1/5)).reshape(3,3)
print("X\n",X)
S = la.svd(X)
print("U: \n", np.round(S[0],3))
print("D: \n", np.round(S[1],3))
print("V-transpose: \n", np.round(S[2],3))

X
 [[1.         0.5        0.33333333]
 [0.5        0.33333333 0.25      ]
 [0.33333333 0.25       0.2       ]]
U: 
 [[-0.827  0.547  0.128]
 [-0.46  -0.528 -0.714]
 [-0.323 -0.649  0.689]]
D: 
 [1.408 0.122 0.003]
V-transpose: 
 [[-0.827 -0.46  -0.323]
 [ 0.547 -0.528 -0.649]
 [ 0.128 -0.714  0.689]]
