# Vectors and Matrices in Python
This notebook is part of a collection of supplementary material designed to bring student up to speed on the mathematics required for COMP47750 Mathematics with Python.   
This notebook introduces the basics of matrix multiplication and eigenvalue decomposition.  
This material is covered in the lecture **M3 Matrices**. 

In [1]:
import numpy as np
import scipy
import pandas as pd
from scipy.sparse.csgraph import laplacian
import matplotlib.pyplot as plt 
import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
%matplotlib inline

Pre-multiplying a vector `p1` by matrix `moves`

In [2]:
moves = np.array([[0.6,0.1,0.1],
                  [0.2,0.7,0.3],
                  [0.2,0.2,0.6],
                 ])

In [3]:
p1 =  np.array([30,20,35])

In [4]:
p2 = moves.dot(p1)
p2

array([23.5, 30.5, 31. ])

If we repeatedly pre-multiply a vector by a matrix it converges to an eigenvector of that matrix. 

In [5]:
p = p1
for it in range(1,9):
    p = moves.dot(p)
    print('p%d' % (it),np.around(p,1))

p1 [23.5 30.5 31. ]
p2 [20.2 35.3 29.4]
p3 [18.6 37.6 28.8]
p4 [17.8 38.7 28.5]
p5 [17.4 39.2 28.4]
p6 [17.2 39.4 28.4]
p7 [17.1 39.6 28.3]
p8 [17.1 39.6 28.3]


Using `np.linalg.eig` to get the eigenvectors and eigenvalues of the matrix `moves`.

In [6]:
e_val, e_vec = np.linalg.eig(moves)
e_val

array([1. , 0.5, 0.4])

The eigenvectors are columns in the `e_vec` matrix. 

In [7]:
np.around(e_vec,2) # round floats to 2 decimal places

array([[ 0.33,  0.71,  0.  ],
       [ 0.77, -0.71, -0.71],
       [ 0.55,  0.  ,  0.71]])

Accessing a single eigenvector

In [8]:
e_vec[:,0]

array([0.32929278, 0.76834982, 0.5488213 ])

## Web Pages example
A network showing the links between 5 web pages.  
This network is shown as a graph in lecture **M3 Matrices**.     
Entry `(i,j)` indicates an edge from `j` to `i`. 

In [9]:
wp = np.array([[0, 0, 0, 0, 1],
               [1, 0, 0, 0, 0],
               [0, 1, 0, 0, 1],
               [0, 0, 1, 0, 0],
               [1, 1, 1, 1, 0]]
             )

The eigenvector corresponding to the largest eigenvalue indicates the probabilities of a random walk ending up on one of the pages.  
The final page (E) is the most central.

In [10]:
e_val, e_vec = np.linalg.eig(wp)
np.around(e_vec[:,0].real,2)

array([0.38, 0.2 , 0.49, 0.26, 0.71])

In [11]:
np.around(e_vec.real,2)

array([[ 0.38, -0.07, -0.07, -0.32, -0.32],
       [ 0.2 , -0.51, -0.51,  0.32,  0.32],
       [ 0.49,  0.02,  0.02, -0.58, -0.58],
       [ 0.26,  0.61,  0.61,  0.48,  0.48],
       [ 0.71,  0.23,  0.23,  0.24,  0.24]])

## Symmetric Matrices
`wp_sim` is the un_directed version of `wp`.  
The edges don't have a direction so `wp_sim(i,j) = wp_sim(j,i)`.

In [12]:
wp_sim = np.array([[0, 1, 0, 0, 1],
                   [1, 0, 1, 0, 1],
                   [0, 1, 0, 1, 1],
                   [0, 0, 1, 0, 1],
                   [1, 1, 1, 1, 0]]
             )

In [13]:
e_val, e_vec = np.linalg.eig(wp_sim)

In [14]:
e_val

array([ 2.93543233,  0.61803399, -0.46259842, -1.61803399, -1.47283391])

In [15]:
np.around(e_vec,2)

array([[-0.35, -0.6 , -0.44, -0.37, -0.43],
       [-0.47, -0.37,  0.51,  0.6 , -0.14],
       [-0.47,  0.37,  0.51, -0.6 , -0.14],
       [-0.35,  0.6 , -0.44,  0.37, -0.43],
       [-0.56,  0.  , -0.31,  0.  ,  0.77]])

Because this matrix is symmetric, the eigenvectors are basis vectors.  
That is, they are orthogonal to each other, their dot products are 0.

In [16]:
e_vec[:,0].dot(e_vec[:,1])

-2.4547353189242355e-16

In [17]:
np.around(e_vec.dot(e_vec.T),2)

array([[ 1., -0., -0.,  0.,  0.],
       [-0.,  1., -0., -0., -0.],
       [-0., -0.,  1.,  0.,  0.],
       [ 0., -0.,  0.,  1., -0.],
       [ 0., -0.,  0., -0.,  1.]])

This is not true for matrices that are not symmetric, e.g. `wp`.

In [18]:
e_val, e_vec = np.linalg.eig(wp)
e_vec[:,0].dot(e_vec[:,1])

(0.20227590856795358+0.046948429280693364j)

## Rectangular Term Document Matrices
Term x document matrix (rows are terms).  
12 terms  
9 documents
From this we can produce two symmetric square matrices by multiplying by the transpose.  
  - a 12x12 term co-occurrence matrix
  - a 9x9 document similarity matrix (shared terms)

In [19]:
X = np.array([[1,0,0,1,0,0,0,0,0],
              [1,0,1,0,0,0,0,0,0],
              [1,1,0,0,0,0,0,0,0],
              [0,1,1,0,1,0,0,0,0],
              [0,1,1,2,0,0,0,0,0],
              [0,1,0,0,1,0,0,0,0],
              [0,1,0,0,1,0,0,0,0],
              [0,0,1,1,0,0,0,0,0],
              [0,1,0,0,0,0,0,0,1],
              [0,0,0,0,0,1,1,1,0],
              [0,0,0,0,0,0,1,1,1],
              [0,0,0,0,0,0,0,1,1]]
            )
X.shape

(12, 9)

In [20]:
X.T

array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0],
       [0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0],
       [1, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1]])

In [21]:
XT = X.T

Term co-occurrence matrix   
Entries in this matrix are dot products of rows in `X`

In [22]:
XdotXT = X.dot(XT)
XdotXT.shape

(12, 12)

In [23]:
XdotXT

array([[2, 1, 1, 0, 2, 0, 0, 1, 0, 0, 0, 0],
       [1, 2, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0],
       [1, 1, 2, 1, 1, 1, 1, 0, 1, 0, 0, 0],
       [0, 1, 1, 3, 2, 2, 2, 1, 1, 0, 0, 0],
       [2, 1, 1, 2, 6, 1, 1, 3, 1, 0, 0, 0],
       [0, 0, 1, 2, 1, 2, 2, 0, 1, 0, 0, 0],
       [0, 0, 1, 2, 1, 2, 2, 0, 1, 0, 0, 0],
       [1, 1, 0, 1, 3, 0, 0, 2, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 0, 2, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 2],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2]])

Document similarity matrix - based on shared terms. 

In [24]:
XTdotX = XT.dot(X)
XTdotX.shape

(9, 9)

In [25]:
XTdotX

array([[3, 1, 1, 1, 0, 0, 0, 0, 0],
       [1, 6, 2, 2, 3, 0, 0, 0, 1],
       [1, 2, 4, 3, 1, 0, 0, 0, 0],
       [1, 2, 3, 6, 0, 0, 0, 0, 0],
       [0, 3, 1, 0, 3, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 1, 2, 2, 1],
       [0, 0, 0, 0, 0, 1, 2, 3, 2],
       [0, 1, 0, 0, 0, 0, 1, 2, 3]])