## Vectors

A vector is typically an ordered data object of numbers which have both a magnitude and direction. It's important to note that vectors are an <b>element</b> of a vector space. 

Matrices,on the other hand, are a rectangular array of values. A vector is simply a one dimensional matrix. 

With that said, we <i>can</i> represent a vector with a list, for example: 

In [12]:
A = [2.0, 3.0, 5.0]

In [13]:
# Using numpy to declare a vector 
import numpy as np
v = np.array([2,3,5])
u = np.array([1,2,3])

print("This is vector v: ")
print(v)

print("This is vector u: ")
print(u)

This is vector v: 
[2 3 5]
This is vector u: 
[1 2 3]


In [53]:
# Once more
v_mat=np.mat('[2;3;5')
u_mat=np.mat('1;2;3')

print("This is vector v_mat: ")
print(v_mat)

print("This is vector u_mat: ")
print(u_mat)

This is vector v_mat: 
[[2]
 [3]
 [5]]
This is vector u_mat: 
[[1]
 [2]
 [3]]


## So, whats the difference?

Numpy matrices are strictly 2-dimensional, while numpy arrays  are N-dimensional. Matrix objects are a subclass of an array, so they inherit all the attributes and methods of arrays.

The main advantage of numpy arrays is that they are more general than 2-dimensional matrices. What happens when you want a 3-dimensional array? (A tensor is a type of multi-dim array) Then you have to use an numpy array, not a matrix object. 

# Operations on Vectors

In [15]:
#addition
u+v
    # or
np.add(u,v)

#dot product

print(sum(u*v))
    # or
np.dot(u,v)

#vector magnitute
import scipy.linalg as norm
print(norm.norm(u))
print(norm.norm(v))
    

23
3.7416573867739413
6.164414002968976


In [16]:
#Formula for calculating magnitute below

![title](linear-algebra-with-python/norm.png)

# Matricies

A Matrix is a 2D array that stores real or complex numbers. You can think of them as multiple vectors in an array! You can use numpy to create matrices:

In [17]:
matrix1 = np.array(
    [[0, 4],
     [2, 0]]
)
matrix2 = np.array(
    [[-1, 2],
     [1, -2]]
)


## Challenge

Using `numpy`, create a 3 x 5 matrix with values of your choice. 

In [18]:
challenge_matrix = np.array([[1,3,2,4,5],
                             [1,3,2,4,5],
                             [1,3,2,4,5]])

challenge_matrix.shape

(3, 5)

# Operations on Matricies

### Matrix Addition

In [19]:
matrix_sum = matrix1 + matrix2

In [20]:
matrix_sum

array([[-1,  6],
       [ 3, -2]])

### Matrix Substraction

In [21]:
matrix_diff = matrix1 - matrix2

In [22]:
matrix_diff

array([[1, 2],
       [1, 2]])

# Matrix dot product

In [23]:
np.dot(matrix1,matrix2)

array([[ 4, -8],
       [-2,  4]])

### Matrix Transpose

In [26]:
# To take the dot product of two matricies, we require a mxn matrix and nxl matrix 
# where l is the number of rows in the second matrix.
# If we want to multiply two matricies with the same dimensions we must 
# take the Transpose of one

matrix_3 = np.array([[1,3],[3,3],[1,1]])
print(matrix_3)
#print(np.dot(matrix_1,matrix_3))

import time

start = time.time()
matrix_3_tranpose = matrix_3.transpose()
end = time.time()

time = end-start
print(time)

start_1 =  time.time()
np.transpose(matrix_3)
end_1 = time.time()

time_1 = end_1 -start_
print(np.dot(matrix1,matrix_3_tranpose))

[[1 3]
 [3 3]
 [1 1]]
[[12 12  4]
 [ 2  6  2]]


### Properties of Matrix Multiplication

Matrix multiplication is not commutative i.e. AB and BA are not equal

In [27]:
a_b = print(np.dot(matrix1,matrix2))

[[ 4 -8]
 [-2  4]]


In [28]:
b_a = print(np.dot(matrix2,matrix1))

[[ 4 -4]
 [-4  4]]


### Generate Identity Matrix

In [29]:
identity_4 = np.eye(4)
identity_4

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

### Matrix Inverse

In [31]:
# We know that a matrix A is invertible if there exists a matrix A^-1 such that
# A^-1(A) = I and AA^-1 = I

inverse = np.linalg.inv(matrix_sum)

In [32]:
inverse

array([[0.125 , 0.375 ],
       [0.1875, 0.0625]])

## Challenge

A =  [[1 2]
      [3 4]]
      
B = [[-2.   1. ]
     [ 1.5 -0.5]]
     
C = [[1 2 3]
     [4 5 6]]
     
Given matrix A and B, mutiply AB - call this mat1. Mutiply BA - call this mat2. Are these matrix inverses?

Given matrix C, create an identity matrix - call it id1 to multiply C*id1- call this mat3.

Given matrix C, create an identity matrix - call it id2 to multiply id2*C- call this mat4.

In [40]:
challenge_A = np.array([[1,2],[3,4]])

inverse_challenge = np.linalg.inv(challenge_A)

inverse_challenge

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

## Matrix Trace

The trace of a square (n x n) matrix is the sum of its elements along the main diagnal (upper left to lower right) 

In [43]:
np.trace(matrix_3)

4

### Matrix Determinant
The determinant of a matrix is defined to be the alternating sum of permutations of the elements of a matrix. The formula is as follows:

![title](linear-algebra-with-python/det.png)

To compute a determinant, we require a square matrix.

In [45]:
# To get the determinant in Python:

det = np.linalg.det(matrix_sum)

# Determinants are very instrumental in solving systems of linear equations in that a system 
# of linear equations is said to have a unique solution if and only if its determinant is non-zero.

# Specifically, The determinant of a square matrix A detects whether A is invertible: 
#If det(A)=0 then A is not invertible 
#If det(A) is not zero then A is invertible 

In [46]:
det

-16.000000000000007

## Solving systems of linear Equations

In [46]:
#Say we want to solve the following system of equations

x + y + z = 6
2y + 5z = −4
2x + 5y − z = 27

#This can be representad by Ax = b:

[1 1 1  [x]  = [6
0 2 5   [y]    -4 
2 5 -1] [z]    27]

#So,
#X = A^-1(b)


SyntaxError: invalid syntax (<ipython-input-46-3f384d792457>, line 4)

In [47]:
matrix = np.array([[1,1,1],[0,2,5],[2,5,-1]])
matrix

array([[ 1,  1,  1],
       [ 0,  2,  5],
       [ 2,  5, -1]])

In [48]:
inverse = np.linalg.inv(matrix)
b = np.array([6,-4,27])
solution = np.dot(inverse,b)

In [49]:
print("Solution")
print("x:",solution[0])
print("y:",solution[1])
print("z:",solution[2])

Solution
x: 5.0
y: 2.9999999999999996
z: -1.9999999999999998


## Applicability to Data Science

One of the most fundamental methods of numerical prediction is via liner regression. In linear regression we ask the basic question, how does Y change with resepct to some change in X? So, the task here is to compute some parameter value that holds this relationship. 

$ Y  = B_0 + B_1X_1 + B_2X_2 + ... + B_nX_n $

Say, we wanted to understand the impact of a new fertilizer on crop yields. In this case, our Y would be crop yield (in kg's per acre) and X's would be various variable refering to the fertilizer. We could have tens or hundreds of X's for which we would need to compute individual parameters using perhaps tens of thousands of data rows. 

In some cases it is impossible to compute these parameters without linear algebra and it many cases it is extremely difficult (imagine having to do this by finding the roots). As we saw earlier, sklearn and statsmodel are built on top of Numpy and its ability to perform array operations efficiently. So, when we use sklearn of statsmodel we are ultimately using linear algebra. 

Lets use a dataset to do this via linear algebra and using matrix inverse. 

In [48]:
import pandas as pd
df = pd.read_csv("yield_forecast.csv")
df = df.drop(["asd_desc","state"],axis = 1)
print(df.shape)


(203, 15)


In [49]:
df

Unnamed: 0,cyield,irig_flag,days_under0,dewPoint,precipAccumulation,precip,days_under_n10,days_over42,days_over32,humidity,temp_delta,temperatureMin,apparentTemperatureMin,precipIntensity,y_pred
0,91.7,1,0.0,33.626884,25.550,118.0,0.0,203.0,214.0,0.499395,24.403767,44.021628,41.704372,0.003418,95.130667
1,110.0,1,0.0,34.370654,8.030,157.0,0.0,206.0,213.0,0.522570,25.096075,42.816776,40.463224,0.002680,95.130667
2,95.9,1,0.0,35.726977,9.752,129.0,0.0,208.0,215.0,0.518465,25.992884,44.433674,42.092744,0.002350,95.130667
3,104.2,1,0.0,37.326402,1.089,146.0,0.0,213.0,213.0,0.512617,27.430888,46.196215,43.915981,0.002568,95.130667
4,97.4,1,0.0,38.406419,3.805,132.0,0.0,212.0,215.0,0.528884,27.271953,46.369070,44.069163,0.002329,95.130667
5,145.0,1,0.0,34.378551,1.254,137.0,0.0,207.0,213.0,0.531636,26.164019,40.776402,38.332290,0.001316,135.631579
6,135.9,1,0.0,31.529673,3.898,108.0,0.0,204.0,210.0,0.496449,27.080561,39.440280,36.112850,0.001222,142.791667
7,126.8,1,0.0,29.416308,2.191,106.0,0.0,212.0,214.0,0.441168,29.742430,39.777336,36.011589,0.000644,142.791667
8,137.5,1,0.0,32.244766,5.621,106.0,0.0,202.0,214.0,0.507804,29.107523,38.919673,34.844907,0.000912,137.315000
9,114.5,1,0.0,34.437804,2.945,148.0,0.0,206.0,213.0,0.533645,24.807477,41.086822,38.864159,0.001537,113.253226


In [50]:
y = df["y_pred"]
#list_ = [int(i) for i in range(189)]
#y = df.drop(list_)
df = df.drop(["y_pred"],axis = 1)

In [51]:
#y = df["y_pred"]
#df = df.drop(["y_pred"],axis = 1)


#df = df.drop(list_)

In [52]:
#Converting X to matrix
X = np.asmatrix(df)
#taking transpose of X and assigning it to x
x= np.transpose(X)
#finding multiplication
T= x.dot(X) # so this is xx^T
T.shape

(14, 14)

In [54]:
T

matrix([[1.60748394e+06, 1.12576000e+04, 0.00000000e+00, 6.06163503e+05,
         6.47434845e+04, 1.91853010e+06, 0.00000000e+00, 3.25699320e+06,
         3.41810440e+06, 8.93360879e+03, 4.19126165e+05, 7.29574957e+05,
         6.85956913e+05, 2.70966222e+01],
        [1.12576000e+04, 1.02000000e+02, 0.00000000e+00, 3.60654389e+03,
         3.58882000e+02, 1.18100000e+04, 0.00000000e+00, 2.02410000e+04,
         2.12570000e+04, 5.28811847e+01, 2.58985458e+03, 4.40652807e+03,
         4.13759482e+03, 1.48768122e-01],
        [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00, 0.00000000e+00],
        [6.06163503e+05, 3.60654389e+03, 0.00000000e+00, 2.74861221e+05,
         3.23393690e+04, 8.44690489e+05, 0.00000000e+00, 1.41072674e+06,
         1.48196746e+06, 4.06711968e+03, 1.82037670e+05, 3.25186173e+05

In [91]:

#inverse of T - provided it is invertible otherwise we use pseudoinverse 
# pseudo inverse computation
inv = np.linalg.pinv(T)
#calculating θ
theta=(inv.dot(X.T)).dot(y)
    

In [92]:
theta

matrix([[ 5.23699103e-01,  2.21845220e+01, -1.19943571e-08,
         -3.18646475e+00, -1.89460250e-03, -9.53212642e-02,
          3.55884015e-10,  2.64542300e-01, -2.32925737e-01,
          1.11240114e+02,  3.92686137e-01,  3.50291062e+00,
         -1.72575230e+00, -4.18702196e+02]])

# Challenge 

Find the solution to the following system:



1x + 2y + 3z = 11

4x + 5y + 6z = -13

5x + 7y + 9z = -8

## Eigendecomposition: Eigenvalues and Eigenvectors

At its most basic matrix decomposition is the process of representing some matrix of interest as a product of two or more matricies. It’s often useful to express a given matrix as the product of other, simpler matrices. These matrix decompositions (also known as factorizations) can help us understand the structure of matrices by revealing their constituents.

So, The fundamental eigenvalue equation is

![title](linear-algebra-with-python/bab09470c0a743ad21485fe638ce04b8fdf3e68c.svg)

Which states that for a **square** matrix A, an eigenvector v is any vector that gets scaled when multpiled by A. In essence, an eigenvector v of a matrix A is a non-zero vector that, when A is applied to it, does not change direction.  Applying A to the eigenvector only scales the eigenvector by the scalar value λ, called an eigenvalue. 

It is useful here to think of A as a linear transformation which has the effect visualized like so. Mona Lisa Pic. 


So, the vector v only stretches or shrinks. 


To find the eigenvalues of a matrix, start from the eigenvalue equation, and insert the identity 1, and rewrite the equation: 

Av = λv

Av = λ(Iv)

Av - λ(Iv) = 0

(A - λI)*v = 0

Eigenvalues of M are those values of λ that satisfy det(A − λI). So, the last line above has a non-zero solution v if and only if the determinant of the matrix (A − λI) is zero.


In [59]:
A = np.array([[7,3],[3,-1]])
print(A)
print("Eigenvalues :",np.linalg.eigvals(A))
np.trace(A)

[[ 7  3]
 [ 3 -1]]
Eigenvalues : [ 8. -2.]


6

In [60]:
# To simultaneously compute eigenvector and eigvenvalues

eig_vals,eig_vecs = np.linalg.eig(A)

In [61]:
print("Eigenvalues :",eig_vals)
print("Eigenvectors:",eig_vecs)

Eigenvalues : [ 8. -2.]
Eigenvectors: [[ 0.9486833  -0.31622777]
 [ 0.31622777  0.9486833 ]]


In [176]:
(print("The vector,",eig_vecs[:,0]),print("corresponds to,",eig_vals[0]))
(print("The vector,",eig_vecs[:,1]),print("corresponds to,",eig_vals[1]))



The vector, [0.9486833  0.31622777]
corresponds to, 8.0
The vector, [-0.31622777  0.9486833 ]
corresponds to, -2.0


(None, None)

In [177]:
#Lets's do a quick check:
A_check = np.dot(A,eig_vecs[:,0])

In [178]:
A_check

array([7.58946638, 2.52982213])

In [179]:
A_eig_check = np.dot(eig_vals[0],eig_vecs[:,0])

In [182]:
A_eig_check 

array([7.58946638, 2.52982213])

In [185]:
#Yay!!
A_check == A_eig_check

array([ True,  True])

## Challenge 

In [188]:
#Can the following matrix be eigendecomposed?
challenge_matrix = matrix2
print(challenge_matrix)

[[-1  2]
 [ 1 -2]]


### Discussion: 
The eigendecomposition of a matrix is a similarity transformation (a change of basis) where the new basis matrix consists of eigenvectors of the matrix. Its role in machine learning becomes clear when we use it for Singular Value decomposition. 

# Singular Value Decomposition

What if we consider a non-square, m x n matrix A? We can no longer apply the eigendecomposition as above. The SVD always exists for any sort of rectangular or square matrix, whereas the eigendecomposition can only exist for square matrices, and even among square matrices sometimes it doesn't exist. Why? Because as we saw in the challenge, the square matrix was not eigendecomposable and had a solution for λ equal to zero. 


The singular value decomposition breaks a matrix into the product of three matrices: an m × m orthogonal matrix U which consists of left singular vectors, an m × n matrix D with the singular values σi on the diagonal, and an n × n orthogonal matrix V T of right singular vector.

![title](linear-algebra-with-python/svd.gif)

The singular values of a m×n matrix A are the square roots of the eigenvalues of the n×n matrix $X^TX$.

Thus, if A is a N×N real symmetric matrix , its eigenvalues and singular values coincide, but it is not generally the case! We will demonstrate this.



To find the matrices U , D, and V , perform eigendecomposition on the matrix products $AA^T$ and $A^TA$. The reason for the inclusion of such matricies is because multiplying the matrix by its transpose is a trick for turning a non-square matrix into a square matrix while preserving some of its properties. The matrix $AA^T$ has the same column space as A whlie $A^TA$ has the same row space as A. 


In [193]:
non_square_matrix = [matrix_3]

In [194]:
U, s, V = np.linalg.svd(A, full_matrices=True) #  for square matrix (m x m)
U_1, s_1, V_1 = np.linalg.svd(non_square_matrix, full_matrices=False) # for normal matrix (m x n)

In [187]:
U

array([[-0.9486833 , -0.31622777],
       [-0.31622777,  0.9486833 ]])

In [191]:
s

array([8., 2.])

In [192]:
V

array([[-0.9486833 , -0.31622777],
       [ 0.31622777, -0.9486833 ]])

As we can see the SVD and Eigendecomposition for the square matrix A are the same. 
Below, we have the SVD computation for the non_square_matrix.

In [195]:
U_1

array([[[-0.56231339,  0.82692421],
        [-0.78448919, -0.53345732],
        [-0.2614964 , -0.17781911]]])

In [196]:
s_1

array([[5.34803427, 1.18259439]])

In [197]:
V_1

array([[[-0.59410191, -0.80438978],
        [-0.80438978,  0.59410191]]])

## Application of SVD to Data Science: PCA

### SVD to PCA

When we want to reduce the dimensionality of a matrix (think a dataframe), we often use SVD since SVD can help us strip a dataset of redundnacy/noise and leave us only with its important features. SVD is an important concept utilized in Principal Component Analysis. Redundant features cause a lots of problems in running machine learning algorithms. Also, running an algorithm on the original data set will be time inefficient and will require a lot of memory. 

Principal component analysis (PCA) refers to the process by which principal components are computed. A Principal components allow us to summarize this set with a smaller number of representative variables that collectively explain most of the variability in the original set.PCA is an unsupervised approach, since it involves only a set of features $X_1, X_2, . . . , X_p$ and no associated response Y . P


So in PCA, we try and discovers relationships between our variables and reduces variables down to uncorrelated, synsthetic representations. These relationships are discovered by understanding the directions of maximum varaince in our data.

Let X be a n x m matrix (n samples and m variables). Then its covariance matrix C is given by:                                      
                               $C = X^TX / (n-1)$

What is the covariance matrix? The covariance matrix captures the joint variability of pairs of variables $X_1$ and  $X_2$ and therefore captures the covariance of all pairs of variables. 

In this matrix,C, we may encounter some zero and non-zero values. Zero implies independence between the variables. So, why are we worried about the covariance matrix? As mentioned before, PCA is interested in identifying improved representations that explain most of the variability. 

Once the covarience matrix is computed now we must identify the hierarchy of our new low-dimensional space. That is, which is the direction along which our first principal component exists. Then our second, and so on so forth until we have maximized the explained variance. To do this, we use SVD where the singular values we compute and their associated eigenvectors.

Roughly speaking, the singular values (square root of eigenvalues) with the lowest values bear the least information about the data, and those are the ones we want to drop. The common approach is to rank the eigenvectors from highest to lowest corresponding singular value and choose the top k eigenvectors.

Once we do this, we now have a new subspace which defines our data and a hierarchy of the variables that exist in that subspace. 

# Homework for Linear Algebra

Problem:
The matrix A has ($1, 2, 1)^T$ and $(1, 1, 0)^T$ as eigenvectors, both with eigenvalue 7, and its trace is 2.
Find the determinant of A