# Linear Algebra and Representation in Numpy and Pandas

This notebook looks at linear algebra and how structures are represented in both nupy and pandas, including conversion between the different data structures.

References:
https://minireference.com/static/tutorials/linear_algebra_in_4_pages.pdf

In [32]:
import pandas as pd
import numpy as np
from numpy import array

# Basic Concepts
A vector is a 1 dimensional array.

A matrix is a 2 dimensional array.

A tensor is a generalization of vectors and matrices and is easily understood as a multidimensional array. A vector is a one-dimensional or first order tensor and a matrix is a two-dimensional or second order tensor.

Vectors and matrices, tensors can be represented in Python using the N-dimensional array (ndarray).

In [22]:
# create array
np_array1d = np.array([5,10,15,20,25,30])
print(np_array1d)
print("Dimensions:", np_array1d.ndim)
print("Shape:", np_array1d.shape)
print(type(np_array1d))
print()

# create matrix
matrix = np.array([[5,6],[10,11]])
print(matrix)
print("Dimensions:", matrix.ndim)
print("Shape:", matrix.shape)
print(type(matrix))
print()

# create tensor
from numpy import array
tensor = array([
  [[1,2,3],    [4,5,6],    [7,8,9]],
  [[11,12,13], [14,15,16], [17,18,19]],
  [[21,22,23], [24,25,26], [27,28,29]],
  ])
print(T)
print("Dimensions:", T.ndim)
print("Shape:", T.shape)

[ 5 10 15 20 25 30]
Dimensions: 1
Shape: (6,)
<class 'numpy.ndarray'>

[[ 5  6]
 [10 11]]
Dimensions: 2
Shape: (2, 2)
<class 'numpy.ndarray'>

[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[11 12 13]
  [14 15 16]
  [17 18 19]]

 [[21 22 23]
  [24 25 26]
  [27 28 29]]]
Dimensions: 3
Shape: (3, 3, 3)


# Basic Numpy Setup
Some of the different ways of creating numpy arrays of different dimensions.

Here we use array(), but can also use:
* empty() - Creates array of specificed shape and randomly initialised.
* zeros() - Creates array of specificed shape and initialised to 0.
* ones() - Creates array of specificed shape and initialised to 1.

In [8]:
print("1d numpy array\n")
np_array1d = np.array([5,10,15,20,25,30])
print(np_array1d)
print("Dimensions:", np_array1d.ndim)
print("Shape:", np_array1d.shape)
print(type(np_array1d))
print()

print("2d numpy array with single elements in the first dimension\n")
np_array2d16 = np.array([[5,10,15,20,25,30]])
print(np_array2d16)
print("Dimensions:", np_array2d16.ndim)
print("Shape:", np_array2d16.shape)
print(type(np_array2d16))
print()

print("2d numpy array with single elements in the second dimension\n")
np_array2d61 = np.array([[5],[10],[15],[20],[25],[30]])
print(np_array2d61)
print("Dimensions:", np_array2d61.ndim)
print("Shape:", np_array2d61.shape)
print(type(np_array2d61))
print()

print("2d numpy array with different number of multiple elements in the second dimension\n")
np_array2different = np.array([[5,6],[10,11],[15,16],[20],[25],[30]])
print(np_array2different)
print("Dimensions:", np_array2different.ndim)
print("Shape:", np_array2different.shape)
print(type(np_array2different))
print()

print("2d numpy array with multiple elements in the second dimension\n")
np_array2d62 = np.array([[5,6],[10,11],[15,16],[20,21],[25,26],[30,31]])
print(np_array2d62)
print("Dimensions:", np_array2d62.ndim)
print("Shape:", np_array2d62.shape)
print(type(np_array2d62))
print()
   

1d numpy array

[ 5 10 15 20 25 30]
Dimensions: 1
Shape: (6,)
<class 'numpy.ndarray'>

2d numpy array with single elements in the first dimension

[[ 5 10 15 20 25 30]]
Dimensions: 2
Shape: (1, 6)
<class 'numpy.ndarray'>

2d numpy array with single elements in the second dimension

[[ 5]
 [10]
 [15]
 [20]
 [25]
 [30]]
Dimensions: 2
Shape: (6, 1)
<class 'numpy.ndarray'>

2d numpy array with different number of multiple elements in the second dimension

[list([5, 6]) list([10, 11]) list([15, 16]) list([20]) list([25])
 list([30])]
Dimensions: 1
Shape: (6,)
<class 'numpy.ndarray'>

2d numpy array with multiple elements in the second dimension

[[ 5  6]
 [10 11]
 [15 16]
 [20 21]
 [25 26]
 [30 31]]
Dimensions: 2
Shape: (6, 2)
<class 'numpy.ndarray'>



# Basic Pandas Setup

## Data Frames
Some of the different ways of manually creating pandas DataFrames

In [9]:
# from dictionary
test_df = pd.DataFrame({'Column1' : 1,
                        'Column2' : [5,10,15,20,25,30]})
print(test_df.head())
print()

#same as above but using a list of lists rather than dictionary.
test_df = pd.DataFrame([[1,5], [1,10], [1,15], [1,20], [1,25], [1,30]])
print(test_df.head())
print()

# from 1d npArray
test_df = pd.DataFrame({'Column1' : 1,
                        'Column2' : np_array1d})
print(test_df.head())
print()

# from 2d numpy array with single element in the first dimension
test_df = pd.DataFrame(np_array2d16)
print(test_df.head())
print()

# from 2d numpy array with single elements in the second dimension
test_df = pd.DataFrame(np_array2d61)
print(test_df.head())
print()

# from 2d numpy array with different number of multiple elements in the second dimension
test_df = pd.DataFrame(np_array2different)
print(test_df.head())
print()

# from 2d npArray - 2 dimensions
test_df = pd.DataFrame(np_array2d62)
print(test_df.head())
print()

   Column1  Column2
0        1        5
1        1       10
2        1       15
3        1       20
4        1       25

   0   1
0  1   5
1  1  10
2  1  15
3  1  20
4  1  25

   Column1  Column2
0        1        5
1        1       10
2        1       15
3        1       20
4        1       25

   0   1   2   3   4   5
0  5  10  15  20  25  30

    0
0   5
1  10
2  15
3  20
4  25

          0
0    [5, 6]
1  [10, 11]
2  [15, 16]
3      [20]
4      [25]

    0   1
0   5   6
1  10  11
2  15  16
3  20  21
4  25  26



## Series
Some of the different ways of creating a pandas Series
* When creating from an nparray, the source must only have 1 dimension otherwise this will give an error.

In [10]:
test_series = pd.Series([5,10,15,20,25,30])
print(test_series.head())
print()

test_series = pd.Series(np_array1d)
print(test_series.head())
print(test_series.shape)

0     5
1    10
2    15
3    20
4    25
dtype: int64

0     5
1    10
2    15
3    20
4    25
dtype: int32
(6,)


# Conversion Pandas -> Numpy

DataFrame column selection - select by list gives dataframe, select by single value gives series

In [11]:
print("Select single column by single value (df[])")
newdf = test_df[1]
print(newdf)
print(type(newdf))

print()
print("Select single (or multiple) columns by list (df[[]])")
newdf = test_df[[1]]
print(newdf)
print(type(newdf))


Select single column by single value (df[])
0     6
1    11
2    16
3    21
4    26
5    31
Name: 1, dtype: int32
<class 'pandas.core.series.Series'>

Select single (or multiple) columns by list (df[[]])
    1
0   6
1  11
2  16
3  21
4  26
5  31
<class 'pandas.core.frame.DataFrame'>


Dataframe to numpy conversions

In [12]:
print("Input Dataframe (2 columns):")
print(test_df.head(10))
print("DataFrame.values:")
print(type(test_df.values), test_df.values.shape)
print(test_df.values)
print()

print("Input Dataframe (1 column):")
print(test_df[[1]].head(10))
print("DataFrame.values:")
print(type(test_df[[1]].values), test_df[[1]].values.shape)
print(test_df[[1]].values)
print()

Input Dataframe (2 columns):
    0   1
0   5   6
1  10  11
2  15  16
3  20  21
4  25  26
5  30  31
DataFrame.values:
<class 'numpy.ndarray'> (6, 2)
[[ 5  6]
 [10 11]
 [15 16]
 [20 21]
 [25 26]
 [30 31]]

Input Dataframe (1 column):
    1
0   6
1  11
2  16
3  21
4  26
5  31
DataFrame.values:
<class 'numpy.ndarray'> (6, 1)
[[ 6]
 [11]
 [16]
 [21]
 [26]
 [31]]



Series to numpy conversions. Note this is not the same as a conversion from a dataframe with 1 column

In [13]:
print("Input Series:")
print(test_series.head(10))
print("Series.values:")
print(type(test_series.values), test_series.values.shape)
print(test_series.values)
print()

Input Series:
0     5
1    10
2    15
3    20
4    25
5    30
dtype: int32
Series.values:
<class 'numpy.ndarray'> (6,)
[ 5 10 15 20 25 30]



# Operations
Note that certain operations require arrays / matrices to be of particular dimensions. Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size.

Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating "broadcasting" the smaller array along the last mismatched dimension so that they have compatible shapes.

Addition - addition is element wise

In [19]:
print("Adding series.values (1d np array) and 1d numpy array:")
print(test_series.values)
print(np_array1d)
print(test_series.values + np_array1d)

print("\nAdding series.values (1d np array) with 2d (1,x) array:")
print(test_series.values)
print(np_array2d16)
print(test_series.values + np_array2d16)

print("\nAdding series.values (1d np array) with 2d (x,1) array:")
print(test_series.values + np_array2d61)

print("\nAdding two 2d (x,1) arrays):")
print(np_array2d61 + np_array2d61)

print("\nAdding two 2d (1,x) arrays):")
print(np_array2d16 + np_array2d16)

print("\nAdding single column DataFrame and 2d (x,1) arrays:")
result = test_df[[1]] + np_array2d61
print(result)
#print("Dimensions:", np_array2d16.ndim)
#print("Shape:", np_array2d16.shape)
print(type(result))

print("\nAdding single column DataFrame and 1d np arrays:")
print("test_df[[1]] + np_array1d - this doesn't work and gives a 'ValueError: Unable to coerce to Series, length must be 1: given 6' error. Use series.values instead")

print("\nAdding single column DataFrame and 2d (1,x) arrays:")
print("test_df[[1]] + np_array2d16 - this doesn't work and gives a 'ValueError: Unable to coerce to DataFrame, shape must be (6, 1): given (1, 6)' error. Use series.values instead")

print("\nAdding series and 1d np arrays:")
print(test_series + np_array1d)

print("\nAdding series and np arrays:")
print("test_series + np_array2d16 - this doesn't work and gives a 'Exception: Data must be 1-dimensional' error. Use series.values instead")


Adding series.values (1d np array) and 1d numpy array:
[ 5 10 15 20 25 30]
[ 5 10 15 20 25 30]
[10 20 30 40 50 60]

Adding series.values (1d np array) with 2d (1,x) array:
[ 5 10 15 20 25 30]
[[ 5 10 15 20 25 30]]
[[10 20 30 40 50 60]]

Adding series.values (1d np array) with 2d (x,1) array:
[[10 15 20 25 30 35]
 [15 20 25 30 35 40]
 [20 25 30 35 40 45]
 [25 30 35 40 45 50]
 [30 35 40 45 50 55]
 [35 40 45 50 55 60]]

Adding two 2d (x,1) arrays):
[[10]
 [20]
 [30]
 [40]
 [50]
 [60]]

Adding two 2d (1,x) arrays):
[[10 20 30 40 50 60]]

Adding single column DataFrame and 2d (x,1) arrays:
    1
0  11
1  21
2  31
3  41
4  51
5  61
<class 'pandas.core.frame.DataFrame'>

Adding single column DataFrame and 1d np arrays:
test_df[[1]] + np_array1d - this doesn't work and gives a 'ValueError: Unable to coerce to Series, length must be 1: given 6' error. Use series.values instead

Adding single column DataFrame and 2d (1,x) arrays:
test_df[[1]] + np_array2d16 - this doesn't work and gives a 'Value

Subtraction - subtraction is element wise

In [23]:
print("Subtracting series.values (1d np array) and 1d numpy array:")
print(test_series.values)
print(np_array1d)
print(test_series.values - np_array1d)

print("\nSubtracting series.values (1d np array) with 2d (1,x) array:")
print(test_series.values)
print(np_array2d16)
print(test_series.values - np_array2d16)

print("\nSubtracting series.values (1d np array) with 2d (x,1) array:")
print(test_series.values - np_array2d61)

print("\nSubtracting two 2d (x,1) arrays):")
print(np_array2d61 - np_array2d61)

print("\nSubtracting two 2d (1,x) arrays):")
print(np_array2d16 - np_array2d16)

print("\nSubtracting single column DataFrame and 2d (x,1) arrays:")
result = test_df[[1]] - np_array2d61
print(result)
#print("Dimensions:", np_array2d16.ndim)
#print("Shape:", np_array2d16.shape)
print(type(result))

print("\nSubtracting single column DataFrame and 1d np arrays:")
print("test_df[[1]] - np_array1d - this doesn't work and gives a 'ValueError: Unable to coerce to Series, length must be 1: given 6' error. Use series.values instead")

print("\nSubtracting single column DataFrame and 2d (1,x) arrays:")
print("test_df[[1]] - np_array2d16 - this doesn't work and gives a 'ValueError: Unable to coerce to DataFrame, shape must be (6, 1): given (1, 6)' error. Use series.values instead")

print("\nSubtracting series and 1d np arrays:")
print(test_series - np_array1d)

print("\nSubtracting series and np arrays:")
print("test_series - np_array2d16 - this doesn't work and gives a 'Exception: Data must be 1-dimensional' error. Use series.values instead")


Subtracting series.values (1d np array) and 1d numpy array:
[ 5 10 15 20 25 30]
[ 5 10 15 20 25 30]
[0 0 0 0 0 0]

Subtracting series.values (1d np array) with 2d (1,x) array:
[ 5 10 15 20 25 30]
[[ 5 10 15 20 25 30]]
[[0 0 0 0 0 0]]

Subtracting series.values (1d np array) with 2d (x,1) array:
[[  0   5  10  15  20  25]
 [ -5   0   5  10  15  20]
 [-10  -5   0   5  10  15]
 [-15 -10  -5   0   5  10]
 [-20 -15 -10  -5   0   5]
 [-25 -20 -15 -10  -5   0]]

Subtracting two 2d (x,1) arrays):
[[0]
 [0]
 [0]
 [0]
 [0]
 [0]]

Subtracting two 2d (1,x) arrays):
[[0 0 0 0 0 0]]

Subtracting single column DataFrame and 2d (x,1) arrays:
   1
0  1
1  1
2  1
3  1
4  1
5  1
<class 'pandas.core.frame.DataFrame'>

Subtracting single column DataFrame and 1d np arrays:
test_df[[1]] - np_array1d - this doesn't work and gives a 'ValueError: Unable to coerce to Series, length must be 1: given 6' error. Use series.values instead

Subtracting single column DataFrame and 2d (1,x) arrays:
test_df[[1]] - np_arr

* Multiplication - Performed element wise i.e.  a * b = (a1 * b1, a2 * b2, a3 * b3)
 * Multiplying by a scalar multiplies each element by that scalar
* Division - Performed element wise i.e. a / b = (a1 / b1, a2 / b2, a3 / b3)
 * Dividing by a scalar divides each element by that scalar
* Dot Product - Gives a scalar result i.e. a . b = (a1 * b1 + a2 * b2 + a3 * b3)

Division is element wise either by a scalar or a matrix of the same size.

Multiplication can be element wise or by a scaler. 

Often we want the dot product which takes 2 matrices A & B to give a result matrix C. A must have the same number of columns as B has rows. If A is of shape m x n and B is of shape n x k then C will have shape m x k.

In [1]:
# matrix dot product
from numpy import array
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
B = array([[1, 2], [3, 4]])
print(B)
C = A.dot(B)
print(C)

[[1 2]
 [3 4]
 [5 6]]
[[1 2]
 [3 4]]
[[ 7 10]
 [15 22]
 [23 34]]


### Transpose
A transposed matrix is a new matrix with the number of columns and rows flipped.

In [16]:
# transpose matrix
from numpy import array
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
C = A.T
print(C)

[[1 2]
 [3 4]
 [5 6]]
[[1 3 5]
 [2 4 6]]


In [None]:
Inversion, Trace, Determinant, Rank, 

### Vector Norm / Magnitude
The length of a vector from it's origin (e.g. 0,0,0). The norm is always a positive value (or zero).
* L1 norm - sum of the absolute vector values - same as manhatten distance from the origin
* L2 norm - square root of the sum of the squared vector values - same as euclidian distance from the origin
* Max norm - maxmum of the individual vector values

L1 Norm

In [40]:
a = array([1, 2, 3])
print(a)
l1 = np.linalg.norm(a, 1)
print(l1)

[1 2 3]
6.0


L2 Norm

In [32]:
a = array([1, 2, 3])
print(a)
l1 = np.linalg.norm(a)
print(l1)

[1 2 3]
3.7416573867739413


Max Norm

In [39]:
a = array([1, 2, 3])
print(a)
l1 = np.linalg.norm(a, np.inf)
print(l1)

[1 2 3]
3.0


## Numpy Array Manipulation
np.reshape - gives a new shape to an array without changing the data<br/>
np.ravel - returns a flattened 1d array (only copies data if necessary and usually returns a view)<br/>
np.flatten - returns a copy of the array collapsed into one dimension
np.vstack - vertically stack 2 arrays
np.hstack - horizontally stack 2 arrays

np.zeros((3,3)).ravel()<br/>
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

The importance of ravel over flatten is ravel only copies data if necessary and usually returns a view, while flatten will always return a copy of the data.

To use reshape to flatten the array:

tt = t.reshape(-1)

vstack

In [15]:
a1 = np.array([1,2,3])
print(a1)
a2 = np.array([4,5,6])
print(a2)
a3 = np.vstack((a1, a2))
print(a3)
print(a3.shape)

[1 2 3]
[4 5 6]
[[1 2 3]
 [4 5 6]]
(2, 3)


hstack

In [16]:
a1 = np.array([1,2,3])
print(a1)
a2 = np.array([4,5,6])
print(a2)
a3 = np.hstack((a1, a2))
print(a3)
print(a3.shape)

[1 2 3]
[4 5 6]
[1 2 3 4 5 6]
(6,)


### Reshaping
1D -> 2D e.g. scikit-learn requires that 1D array of output variables be shaped as a 2D array with one column and outcomes for each column.


In [17]:
print("1d numpy array:")
print(np_array1d)
print(np_array1d.shape)

# reshape
print("reshaped to 2D:")
data = np_array1d.reshape((np_array1d.shape[0], 1))
print(data)
print(data.shape)

1d numpy array:
[ 5 10 15 20 25 30]
(6,)
reshaped to 2D:
[[ 5]
 [10]
 [15]
 [20]
 [25]
 [30]]
(6, 1)


2D -> 3D e.g. scikit-learn requires that 1D array of output variables be shaped as a 2D array with one column and outcomes for each column.


In [18]:
print("2d numpy array:")
print(np_array2d62)
print(np_array2d62.shape)

# reshape
print("reshaped to 2D:")
data = np_array2d62.reshape((np_array2d62.shape[0], np_array2d62.shape[1], 1))
print(data)
print(data.shape)

2d numpy array:
[[ 5  6]
 [10 11]
 [15 16]
 [20 21]
 [25 26]
 [30 31]]
(6, 2)
reshaped to 2D:
[[[ 5]
  [ 6]]

 [[10]
  [11]]

 [[15]
  [16]]

 [[20]
  [21]]

 [[25]
  [26]]

 [[30]
  [31]]]
(6, 2, 1)


## Matrix Types

Square Matrix (nxn)

Symmetric Matrix - square matrix where the top-right triangle is the same as the bottom-left triangle. A symmetric matrix is always square and equal to its own transpose.

In [15]:
# triangular matrices
from numpy import array
from numpy import tril
from numpy import triu
M = array([[1, 2, 3, 4], [2, 1, 3, 2], [3, 3, 1, 3], [4, 2, 3, 1]])
print(M)
transposed = M.T
print(transposed)

[[1 2 3 4]
 [2 1 3 2]
 [3 3 1 3]
 [4 2 3 1]]
[[1 2 3 4]
 [2 1 3 2]
 [3 3 1 3]
 [4 2 3 1]]


### Triangular Matrix
A triangular matrix is a type of matrix that has all values in the upper-right or lower-left of the matrix with the remaining elements filled with zero values.

A triangular matrix with values only above the main diagonal is called an upper triangular matrix. Whereas, a triangular matrix with values only below the main diagonal is called a lower triangular matrix.



In [7]:
# triangular matrices
from numpy import array
from numpy import tril
from numpy import triu
M = array([[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]])
print(M)
lower = tril(M)
print(lower)
upper = triu(M)
print(upper)

[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]
[[1 0 0 0]
 [1 2 0 0]
 [1 2 3 0]
 [1 2 3 4]]
[[1 2 3 4]
 [0 2 3 4]
 [0 0 3 4]
 [0 0 0 4]]


### Diagonal Matrix
A diagonal matrix is one where values outside of the main diagonal have a zero value, where the main diagonal is taken from the top left of the matrix to the bottom right.

A diagonal matrix is often denoted with the variable D and may be represented as a full matrix or as a vector of values on the main diagonal.

In [9]:
# diagonal matrix
from numpy import array
from numpy import diag
M = array([[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]])
print(M)
# extract diagonal vector
d = diag(M)
print(d)
# create diagonal matrix from vector
D = diag(d)
print(D)

[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]
[1 2 3 4]
[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]


### Identity Matrix
An identity matrix is a square matrix that does not change a vector when multiplied.

The values of an identity matrix are known. All of the scalar values along the main diagonal (top-left to bottom-right) have the value one, while all other values are zero.

An identity matrix is often represented using the notation “I” or with the dimensionality “In”, where n is a subscript that indicates the dimensionality of the square identity matrix.

In [10]:
# identity matrix
from numpy import identity
I = identity(3)
print(I)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Eigendecomposition, Eigenvalues, and Eigenvectors
Matrix decompositions are a useful tool for reducing a square matrix to their constituent parts in order to simplify a range of more complex operations.

Perhaps the most used type of matrix decomposition is the eigendecomposition that decomposes a matrix into eigenvectors and eigenvalues. This decomposition also plays a role in methods used in machine learning, such as in the the Principal Component Analysis method or PCA.

Eigenvectors are unit vectors, which means that their length or magnitude is equal to 1.0. They are often referred as right vectors, which simply means a column vector (as opposed to a row vector or a left vector). A right-vector is a vector as we understand them.

Eigenvalues are coefficients applied to eigenvectors that give the vectors their length or magnitude. For example, a negative eigenvalue may reverse the direction of the eigenvector as part of scaling it.

A matrix that has only positive eigenvalues is referred to as a positive definite matrix, whereas if the eigenvalues are all negative, it is referred to as a negative definite matrix.

In [25]:
# eigendecomposition
from numpy import array
from numpy.linalg import eig
# define matrix
A = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(A)
# calculate eigendecomposition
eigenvalues, eigenvectors = eig(A)
print(eigenvalues)
print(eigenvectors)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[ 1.61168440e+01 -1.11684397e+00 -9.75918483e-16]
[[-0.23197069 -0.78583024  0.40824829]
 [-0.52532209 -0.08675134 -0.81649658]
 [-0.8186735   0.61232756  0.40824829]]


In [27]:
# confirm first eigenvector - if we multiplies the original matrix with the first eigenvector it should be equal to 
# the first eigenvector multiplied by the first eigenvalue.
B = A.dot(vectors[:, 0])
print(B)
C = vectors[:, 0] * values[0]
print(C)

[ -3.73863537  -8.46653421 -13.19443305]
[ -3.73863537  -8.46653421 -13.19443305]


We can reverse the process and reconstruct the original matrix given only the eigenvectors and eigenvalues.

First, the list of eigenvectors must be converted into a matrix, where each vector becomes a row. The eigenvalues need to be arranged into a diagonal matrix. The NumPy diag() function can be used for this.

Next, we need to calculate the inverse of the eigenvector matrix, which we can achieve with the inv() NumPy function. Finally, these elements need to be multiplied together with the dot() function.

In [31]:
# create inverse of eigenvectors matrix
R = np.linalg.inv(eigenvectors)
# create diagonal matrix from eigenvalues
L = diag(eigenvalues)
# reconstruct the original matrix
B = eigenvectors.dot(L).dot(R)
print(B)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


## Sparse Matrix
A sparse matrix is a matrix that is comprised of mostly zero values. Sparse matrices can cause problems with regards to the space needed to store them and the time needed for processing when many of the operations are against zero values.

Sparse arrays can be represented by other structures that ignore the zero values and only hold the non-zero values. Sparse arrays can be represented as a dictionary of keys (map row, column to a value), a tuple list (tuple containing row, column, value), Compressed Sparse Row (CSR) using 3 1d arrays for the non-zero values, the row and column indexes, Compressed Sparse Column (CSC) same as CSR, but swapping the column and row indices. CSR is often used as it supports efficient access and matrix multiplication.

SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. These sparse matrices are compatible with many scipy and numpy functions.

In [6]:
# dense to sparse
from numpy import array, count_nonzero
from scipy.sparse import csr_matrix

# create dense matrix
A = array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
print(A)

# calculate sparsity
sparsity = 1.0 - count_nonzero(A) / A.size
print("Sparsity:", sparsity)

# convert to sparse matrix (CSR method)
S = csr_matrix(A)
print(S)

# reconstruct dense matrix
B = S.todense()
print(B)

[[1 0 0 1 0 0]
 [0 0 2 0 0 1]
 [0 0 0 2 0 0]]
Sparsity: 0.7222222222222222
  (0, 0)	1
  (0, 3)	1
  (1, 2)	2
  (1, 5)	1
  (2, 3)	2
[[1 0 0 1 0 0]
 [0 0 2 0 0 1]
 [0 0 0 2 0 0]]
