# 7. Vectors and Vector Arithmetic

We can represent a vector in Python as a NumPy array. 

7.4 Vector Arithmetic
7.4.1 Vector Addition

In [1]:
from numpy import array
a = array([1, 2, 3])
b = array([6, 7, 8])
# print(a+b) #Addition
# print(b-a) #Subtraction
# print(a*b) #Multiplication
# print(b/a) # Division
print(a.dot(b)) #Calculate the sum of the multiplied elements of two vectors of the same length 
                #to give scalar, callded dot Product.
print(b*3) #Vector-Scalar Multiplication

44
[18 21 24]


# 8. Vector Norms

Calculating the size or length of a vector is often required either directly or as part of a broader vector or vector-matrix operation. The length of the vector is referred to as the vector norm or the vector's magnitude.

8.3 Vector L^1 Norm
The length of a vector can be calculated using the L^1 norm, where the 1 is a superscript of the L. The L^1 norm is calculated as the sum of the absolute vector values, where the absolute value of a scalar uses the notation |a|

8.4 Vector L^2 Norm
The length of a vector can be calculated using the L^2 norm, where the 2 is a superscript of the L. The L^2 norm calculates the distance of the vector coordinate from the origin of the vector space. As such, it is also known as the Euclidean norm as it is calculated as the Euclidean distance from the origin. 

L^1 and L^2 norm is often used when fitting machine learning algorithms as a regularization method, e.g. a method to keep the coefficients of the model small and, in turn, the model less complex.

8.5 Vector Max Norm

The length of a vector can be calculated using the maximum norm, also called max norm. Max norm of a vector is referred to as L^inf where inf is a superscript.The max norm of a vector can be calculated in NumPy using the norm() function with the order parameter set to infinity. Max norm is also used as a regularization in machine learning, such as on neural network weights, called max norm regularization.


In [6]:
from numpy import array
from numpy.linalg import norm # L^1 and L^2 norm of a vector can be calculated in NumPy using the norm() function 
from math import inf # import infinity
a = array([1, 2, 3]) 
print(norm(a,1)) # L^1 norm, calculate the sum of the absolute vector values
print(norm(a)) #L^2 norm,  calculates the distance of the vector coordinate from the origin of
               # the vector space.
print(norm(a, inf)) #max (L^inf) norm

6.0
3.7416573867739413
3.0


# 9. Matrices and Matrix Arithmetic

A matrix is a two-dimensional array of scalars with one or more columns and one or more rows and entries are referred to by their two-dimensional subscript of row (i) and column (j) For example, we can dene a 3-row, 2-column matrix as
            A = ((a1.1; a1.2); (a2.1; a2.2); (a3.1; a3.2)) 
            
9.3 Defining a Matrix
9.4 Matrix Arithmetic
9.4.1 Matrix Addition
Matrix Subtraction
Matrix Multiplication 
Matrix Division
Matrix-Matrix Multiplication

In [8]:
from numpy import array
A = array([[1, 2, 3], [4, 5, 6]])  #Defining a Matrix
#print(A)

B = array([[2, 3, 7], [4, 8, 9]])
C = array([[2, 3, 7], [4, 8, 9], [3, 5, 8]])
D = array([3,9,7])
#print(A+B) # Matrix Addition
#print(B-A) #Matrix Subtraction
#print(B*A) #Matrix Multiplication 
#print(B/A) #Matrix Division
print(A.dot(D)) #Matrix-Matrix Multiplication
#print(C)
#print(D)
print(C.dot(D)) #Matrix-Vector Multiplication

[42 99]
[ 82 147 110]


# 10 Types of Matrices

10.2 Square Matrix -  
A square matrix is a matrix where the number of rows (n) is equivalent to the number of columns (m). e.g. n = m 

10.3 Symmetric Matrix
A symmetric matrix is a type of square matrix where the top-right triangle is the same as the bottom-left triangle.

            1 2 3 
     m =    2 1 2 
            3 2 1 
            
10.4 Triangular Matrix
A triangular matrix is a type of square matrix that has all values in the upper-right or lower-left of the matrix with the remaining elements filled with zero values. 

                   1 2 3           1 0 0
              m =  0 2 3           1 2 0
                   0 0 3           1 2 3 

10.5 Diagonal Matrix
A diagonal matrix is one where values outside of the main diagonal have a zero value, where the main diagonal is taken from the top left of the matrix to the bottom right.

         1 0 0
    D  = 0 2 0
         0 0 3
         
10.6 Identity Matrix
An identity matrix is a square matrix that does not change a vector when multiplied. All of the scalar values along the main diagonal (top-left to bottom-right) have the value one, while all other values are zero.

                1 0 0
            I = 0 1 0
                0 0 1

10.7 Orthogonal Matrix
Two vectors are orthogonal when their dot product equals zero. The length of each vector is 1 then the vectors are called orthonormal because they are both orthogonal and normalized

'v  w = 0'  OR 'v  w^T= 0' 

This is intuitive when we consider that one line is orthogonal with another if it is perpendicular to it. An orthogonal matrix is a type of square matrix whose columns and rows are orthonormal unit vectors, e.g. perpendicular and have a length or magnitude of 1.

An Orthogonal matrix is often denoted as uppercase Q and defined formally as follows:

Q  Q^T = Q^T  Q = I 

In [19]:
from numpy import array
from numpy import tril
from numpy import triu
from numpy import diag
from numpy import identity
from numpy.linalg import inv

M = array([[2, 3, 7], [4, 8, 9], [3, 5, 8]])
#print(tril(M)) #lower triangular matrix
#print(triu(M)) #upper triangular matrix
#print(diag(M)) #extract diagonal vector
#print(diag(diag(M))) #create diagonal matrix from vector
#print(identity(3)) #Identity Matrix

Q = array([[1, 0], [0, -1]]) #orthogonal matrix, Two vectors are orthogonal when their dot 
                             #product equals zero
V = inv(Q) # inverse matrix of Q
print(Q*V)
# print(Q)
# print(V)
# print(Q.T) # transpose of matrix T
# print(Q.dot(Q.T)) #identity matrix, calculated from the dot product of the orthogonal 
                  #matrix with its transpose.


[[ 1.  0.]
 [-0.  1.]]


# 11. Matrix Operations

Matrix operations are used in the description of many machine learning algorithms. Some operations can be used directly to solve key equations, whereas others provide useful shorthand or foundation in the description and the use of more complex matrix operations. 

11.2 Transpose

A defined matrix can be transposed, which creates a new matrix with the number of columns
and rows flipped. This is denoted by the superscript T next to the matrix A^T.

         C = A^T            1 2
                        A = 3 4             A^T = 1 3 5
                            5 6                   2 4 6
                            

11.3 Inverse

Matrix inversion is a process that finds another matrix that when multiplied with the matrix,
results in an identity matrix. Given a matrix A, find matrix B, such that AB= I^n  or BA = I^n.

        AB= BA = I^n

The operation of inverting a matrix is indicated by a -1 superscript next to the matrix; for example, A^-1. The result of the operation is referred to as the inverse of the original matrix; for example, B is the inverse of A. B = A^-1

11.4 Trace

A trace of a square matrix is the sum of the values on the main diagonal of the matrix (top-left to bottom-right). The operation of calculating a trace on a square matrix is described using the notation tr(A) where A is the square matrix on which the operation is being performed.

11.5 Determinant

The determinant describes the relative geometry of the vectors that make up the rows of the matrix. More specifically, the determinant of a matrix A tells you the volume of a box with sides given by rows of A. It is denoted by the det(A) notation. 

11.6 Rank

The rank of a matrix is the estimate of the number of linearly independent rows or columns in a matrix. The rank of a matrix M is often denoted as the function rank(). NumPy provides the matrix rank() function for calculating the rank of an array. It uses Singular-ValueDecomposition or the SVD method to estimate the rank.

In [26]:
from numpy import array
from numpy import trace
from numpy.linalg import det
from numpy.linalg import matrix_rank

A = array([
[1, 2],
[3, 4],
[5, 6]])
#print(A.T) #calculate transpose of A

A = array([
[1, 2],
[3, 4],])
#print(inv(A)) #invert matrix of A
# print(A.dot(inv(A))) #multiply A and invert of A

A = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(trace(A)) #calculate trace of A, sum of the values on the main diagonal of the matrix
print(det(A)) #calculate determinant of A, the volume of a box with sides given by rows of A
print(matrix_rank(A)) #the estimate of the number of linearly independent rows or columns in 
                      # a matrix. 


15
-9.51619735392994e-16
2


 # 12. Sparse Matrices

Matrices that contain mostly zero values are called sparse, distinct from matrices where most of the values are non-zero, called dense. 

12.2 Sparse Matrix

A sparse matrix is a matrix that is comprised of mostly zero values. The sparsity of a matrix can be quantified with a score, which is the number of zero values the matrix divided by the total number of elements in the matrix.

                    sparsity = (count of non-zero elements / total elements)
                    
                    A = 1 0 0 1 0 0
                        0 0 2 0 0 1
                        0 0 0 2 0 0

The example has 13 zero values of the 18 elements in the matrix, giving this matrix a sparsity score of 0.722 or about 72%.

12.3 Problems with Sparsity

    12.3.1 Space Complexity
    
  Very large matrices require a lot of memory, and some very large matrices that we wish to work with are sparse. sparse matrix contained more zero values than data values. problem with representing these sparse matrices is that memory is required and must be allocated for each 32-bit or even 64-bit zero value in the matrix. This is clearly a waste of memory resources as those zero values do not contain any information.

    12.3.2 Time Complexity
   
  Assuming a very large sparse matrix can be fit into memory, we will want to perform operations on this matrix. If the matrix contains mostly zero-values, i.e. no data, then performing operations across this matrix may take a long time where the bulk of the computation performed will involve adding or multiplying zero values together. This is a problem of increased time complexity of matrix operations that increases with the size of the matrix. 
  

12.4 Sparse Matrices in Machine Learning

    12.4.1 Data
    
   Sparse matrices come up in some specific types of data, most notably observations that record the occurrence or count of an activity.
   e.g. Whether or not a user has watched a movie in a movie catalog 
        or Count of the number of listens of a song in a song catalog.
    
    12.4.2 Data Preparation
   
  parse matrices come up in encoding schemes used in the preparation of data. e.g.
  1) One hot encoding, used to represent categorical data as sparse binary vectors.
  2) Count encoding, used to represent the frequency of words in a vocabulary for a document
  3) TF-IDF encoding, used to represent normalized word frequency scores in a vocabulary.
  
    12.4.3 Areas of Study
  
  areas of study within machine learning must develop specialized methods to address sparsity directly as the input data is almost always sparse. e.g.
  1) Natural language processing for working with documents of text
  2) Recommender systems for working with product usage within a catalog.
  3) Computer vision when working with images that contain lots of black pixels.
  
12.5 Working with Sparse Matrices

The solution to representing and working with sparse matrices is to use an alternate data structure to represent the sparse data. The zero values can be ignored and only the data or non-zero values in the sparse matrix need to be stored or acted upon. There are multiple data structures that can be used to efficiently construct a sparse matrix. e.g.
   
   1) Dictionary of Keys - A dictionary is used where a row and column index is mapped to a value
   2) List of Lists - Each row of the matrix is stored as a list, with each sublist containing the column index and the value.
   3) Coordinate List - A list of tuples is stored with each tuple containing the row index, column index, and the value.
   
There are also data structures that are more suitable for performing efficient operations.

   1) Compressed Sparse Row - The sparse matrix is represented using three one-dimensional arrays for the non-zero values, the extents of the rows, and the column indexes. It lso called CSR for short, is often used to represent sparse matrices in machine learning given the efficient access and matrix multiplication that it supports.

   2) Compressed Sparse Column - The same as the Compressed Sparse Row method except the column indices are compressed and read first before the row indices.

12.6 Sparse Matrices in Python

SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. Many linear algebra NumPy and SciPy functions that operate on NumPy arrays can transparently operate on SciPy sparse arrays. machine learning libraries that use NumPy data structures can also operate transparently on SciPy sparse arrays, such as scikit-learn for general machine learning and Keras for deep learning.

A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function. In the example below, we define a 36 sparse matrix as a dense array (e.g. an ndarray), convert it to a CSR sparse representation, and then convert it back to a dense array by calling the todense() function.

NumPy does not provide a function to calculate the sparsity of a matrix. Nevertheless, we can calculate it easily by first finding the density of the matrix and subtracting it from one. The number of non-zero elements in a NumPy array can be given by the count nonzero() function and the total number of elements in the array can be given by the size property of the array. Array sparsity can therefore be calculated as

                sparsity = 1.0 - count_nonzero(A) / A.size

In [41]:
from numpy import array 
from numpy import count_nonzero
from scipy.sparse import csr_matrix


A = array([
[1, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 1],
[0, 0, 0, 2, 0, 0]])

S = csr_matrix(A) #convert to sparse matrix (CSR method)
#print(S)
B = S.todense() #reconstruct dense matrix
#print(B) 
print(count_nonzero(A)) # Count Non zeros in Matrix A 
print(A.size) # Total element in Matrix A
sparsity = 1.0 - count_nonzero(A) / A.size #calculate sparsity of Matrix A
print(sparsity)

5
18
0.7222222222222222


# 13. Tensors and Tensor Arithmetic

13.2 What are Tensors

A tensor is a generalization of vectors and matrices and is easily understood as a multidimensional array. i.e. an array of numbers arranged on a regular grid with a variable number of axes is known as a tensor.A vector is a one-dimensional or first order tensor and a matrix is a two-dimensional or second order tensor. 
Tensor notation is much like matrix notation with a capital letter representing a tensor and lowercase letters with subscript integers representing scalar values within the tensor. For example, below defines a 3 * 3 * 3 three-dimensional tensor T with dimensions index as t(i;j;k).

    t(1;1;1) t(1;2;1) t(1;3;1)       t(1;1;2) t(1;2;2) t(1;3;3)      t(1;1;3) t(1;2;3) t(1;3;3)
T = t(2;1;1) t(2;2;1) t(2;3;1)   ,   t(2;1;2) t(2;2;2) t(2;3;3)   ,  t(2;1;3) t(2;2;3) t(2;3;3)
    t(3;1;1) t(3;2;1) t(3;3;1)       t(3;1;2) t(3;2;2) t(3;3;3)      t(3;1;3) t(3;2;3) t(3;3;3)
                

13.3 Tensors in Python

Tensors can be represented in Python using the N-dimensional array(ndarray). A tensor can be defined in-line to the constructor of array() as a list of lists. The example below defines a 3 * 3 * 3 tensor as a NumPy ndarray. Here, we first define rows, then a list of rows stacked as columns, then a list of columns stacked as levels in a cube

13.4 Tensor Arithmetic

As with matrices, we can perform element-wise arithmetic between tensors

    13.4.1 Tensor Addition
    
  The element-wise addition of two tensors with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise addition of the scalars in the parent tensors.
  
  A = a(1;1;1) a(1;2;1) a(1;3;1)    a(1;1;2) a(1;2;2) a(1;3;2) 
      a(2;1;1) a(2;2;1) a(2;3;1) ,  a(2;1;2) a(2;2;2) a(2;3;2)
         
  B = b(1;1;1) b(1;2;1) b(1;3;1)    b(1;1;2) b(1;2;2) b(1;3;2) 
      b(2;1;1) b(2;2;1) b(2;3;1) ,  b(2;1;2) b(2;2;2) b(2;3;2)he element-wise subtraction of one tensor from another tensor with the same dimensions
results in a new tensor with the same dimensions where each scalar value is the element-wise
subtraction of the scalars in the parent tensors.

                        C = A+B  
                        
  C = a(1;1;1)+b(1;1;1) a(1;2;1)+b(1;2;1) a(1;3;1)+b(1;3;1)    a(1;1;2)+b(1;1;2) a(1;2;2)+b(1;2;2) a(1;3;2)+b(1;3;2) 
      a(2;1;1)+b(2;1;1) a(2;2;1)+b(2;2;1) a(2;3;1)+b(2;3;1) ,  a(2;1;2)+b(2;1;2) a(2;2;2)+b(2;2;2) a(2;3;2)+b(2;3;2) 
      
    13.4.2 Tensor Subtraction
   The element-wise subtraction of one tensor from another tensor with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise subtraction of the scalars in the parent tensors.
   
                       C = A-B        
  
  C = a(1;1;1)-b(1;1;1) a(1;2;1)-b(1;2;1) a(1;3;1)-b(1;3;1)    a(1;1;2)-b(1;1;2) a(1;2;2)-b(1;2;2) a(1;3;2)-b(1;3;2) 
      a(2;1;1)-b(2;1;1) a(2;2;1)-b(2;2;1) a(2;3;1)-b(2;3;1) ,  a(2;1;2)-b(2;1;2) a(2;2;2)-b(2;2;2) a(2;3;2)-b(2;3;2)
      
      
    13.4.3 Tensor Hadamard Product
  The element-wise multiplication of one tensor with another tensor with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise multiplication of the scalars in the parent tensors. As with matrices, the operation is referred to as the Hadamard Product to dierentiate it from tensor multiplication.   
  
                      C = A*B          
                      
  C = a(1;1;1)*b(1;1;1) a(1;2;1)*b(1;2;1) a(1;3;1)*b(1;3;1)    a(1;1;2)*b(1;1;2) a(1;2;2)*b(1;2;2) a(1;3;2)*b(1;3;2) 
      a(2;1;1)*b(2;1;1) a(2;2;1)*b(2;2;1) a(2;3;1)*b(2;3;1) ,  a(2;1;2)*b(2;1;2) a(2;2;2)*b(2;2;2) a(2;3;2)*b(2;3;2)

    13.4.4 Tensor Division
  The element-wise division of one tensor with another tensor with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise division of the scalars in the parent tensors.  
  
                      C = A/B
                      
  C  = a(1;1;1)/b(1;1;1) a(1;2;1)/b(1;2;1) a(1;3;1)/b(1;3;1)    a(1;1;2)/b(1;1;2) a(1;2;2)/b(1;2;2) a(1;3;2)/b(1;3;2) 
       a(2;1;1)/b(2;1;1) a(2;2;1)/b(2;2;1) a(2;3;1)/b(2;3;1) ,  a(2;1;2)/b(2;1;2) a(2;2;2)/b(2;2;2) a(2;3;2)/b(2;3;2)

    13.5 Tensor Product
  The tensor product operator is often denoted as a circle with a small x in the middle. tensor A with q dimensions and tensor B with r dimensions, the product of these tensors will be a new tensor with the order of q + r or, said another way, q + r dimensions. The tensor product can be implemented in NumPy using the tensordot() function. The function takes as arguments the two tensors to be multiplied and the axis on which to sum the products over, called the sum reduction. To calculate the tensor product, also called the tensor dot product in NumPy, the axis must be set to 0. 
  
A = a(1;1) a(1;2)     
B = b(1;1) b(1;2)    
C = A x B = a(1;1) x b(1;1) b(1;2)   a(1;2) x b(1;1) b(1;2)
                     b(2;1) b(2;2)            b(2;1) b(2;2)            
            a(2;1) x b(1;1) b(1;2)	 a(2;2) x b(1;1) b(1;2)        
                     b(2;1) b(2;2)            b(2;1) b(2;2)
                     
                                                                      
C = a(1;1) x b(1;1)   a(1;1) x b(1;2)    a(1;2) x b(1;1)  a(1;2) x b(1;2)
    a(1;1) x b(2;1)   a(1;1) x b(2;2)    a(1;2) x b(2;1)  a(1;2) x b(2;2)
    a(2;1) x b(1;1)   a(2;1) x b(1;2)	 a(2;2) x b(1;1)  a(2;2) x b(1;2)
    a(2;1) x b(2;1)   a(2;1) x b(2;2)	 a(2;2) x b(2;1)  a(2;2) x b(2;2)                                                                                      
                        
                       
  

In [44]:
from numpy import array
from numpy import tensordot

A =  array([
[[1,2,3], [4,5,6], [7,8,9]],
[[11,12,13], [14,15,16], [17,18,19]],
[[21,22,23], [24,25,26], [27,28,29]]])

B =  array([
[[1,2,3], [4,5,6], [7,8,9]],
[[11,12,13], [14,15,16], [17,18,19]],
[[21,22,23], [24,25,26], [27,28,29]]])

#print(A.shape) # prints the shape of the tensor
#print(A) #he tensor is printed as a series of matrices, one for each layer i.e. axis 0 specifies the level (like height), axis 1 specifies the column, and axis 2 specifies the row.

#print(A+B) # Add two tensors
#print(A-B) # subtract two tensors
#print(A*B) # multiply two tensors ##tensor Hadamard product
#print(A/B) # divide two tensors

A =  array([[1,2], [3,4]])
B =  array([[5,6], [7,8]])            
print(tensordot(A, B, axes = 0)) #axis must be set to 0

[[[[ 5  6]
   [ 7  8]]

  [[10 12]
   [14 16]]]


 [[[15 18]
   [21 24]]

  [[20 24]
   [28 32]]]]
