# Chapter 5: Matrices

In [1]:
# standard python libraries to import
import numpy as np
import matplotlib.pyplot as plt

## 5.1 Interpretations and uses of matrices

- This book focuses on matrices that you can "lay down on a table", though they are not necessarily 2D matrices, as each row/column of a matrix is a dimension.
- Matrices that don't lay flat on a table are *tensors* (e.g. tensor calculus used in GR)
- Tensors will not be discussed in this book.

Matrices are ubiquitous and have innumerable uses.  A few examples:
- representing a linear transformation or mapping
- storing partial derivatives of a multivariable system
- representing a system of equations
- storing data (e.g. features x observations)
- representing regressors for statistical modeling
- storing geometric transformations for computer graphics
- storing kernels used in filtering or convolution
- representing finance information from various sectors of an economy or business
- housing parameters for a model that predicts changes in the spread of an infectious disease

## 5.2 Matrix terminology and notation

- mostly review...skipping the basics
- A "block matrix" is a matrix that comprises smaller matrices (note: this book doesn't cover block matrices)
- Matrix notation is typically M rows by N columns (mnemonic "**M**(R). **N**i(C)e guy")

## 5.3 Matrix dimensionalities

- Matrices are flexible with their notation, and flexibility can lead to confusion
- In practice, the dimensionality of a given matrix is made on a case-by-case basis, and is either explicitly stated, or is inferred from context.

## 5.4 The transpose operation

- transposing a matrix is the same as transposing a vector: swap rows and columns
  - $B_{i,j} = A_{j,i}$
- transposing a matrix twice gives you the original matrix
  - $A^{TT} = A$
- example of a transposed matrix:
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix}^T = 
\begin{bmatrix}
1 & 4 \\
2 & 5 \\
3 & 6
\end{bmatrix}

*note that transposing is like drawing a diagonal line and flipping the matrix*

*first always stays first (i.e. 1st row -> 1st column, or 1st column -> 1st row)*

#### Code

In [2]:
# Transposing in code is the same as transposing vectors
A = np.random.randn(2,5)
print(A)

[[ 0.46518617  0.82340415 -0.12050009  2.04364095  0.75041181]
 [ 1.88589917 -0.70063764 -0.61802163  0.33387694 -0.65562649]]


In [3]:
# option 1) using numpy's .T method
At1 = A.T
print(At1)

[[ 0.46518617  1.88589917]
 [ 0.82340415 -0.70063764]
 [-0.12050009 -0.61802163]
 [ 2.04364095  0.33387694]
 [ 0.75041181 -0.65562649]]


In [4]:
# option 2) using numpy's .transpose method
At2 = np.transpose(A)
print(At2)

[[ 0.46518617  1.88589917]
 [ 0.82340415 -0.70063764]
 [-0.12050009 -0.61802163]
 [ 2.04364095  0.33387694]
 [ 0.75041181 -0.65562649]]


## 5.5 Matrix zoology

- "fat" / wide matrices have more columns than rows
- "skinny" / tall matrices have more rows than columns

### Symmetric matrices

- a matrix is symmetric if it's mirrored across the diagonal
- A matrix is symmetric if it equals its transpose
- only square matrices can be symmetric (obviously)
- $A = A^T$
- $a_{i,j} = a_{j,i}$

\begin{bmatrix}
1 & 4 & \pi \\
4 & 7 & 2 \\
\pi & 2 & 0
\end{bmatrix}

- Symmetric matrices have many special properties and will be used throughout this book
- There are 2 ways to convert non-symmetric matrices into symetric matrices. Both will be covered in future chapters.

### Skew-symmetric

- skew symmetric matrices are where the lower triangle is the sign flipped version of the upper triangle
- The diagonal on a skew symmetric matrix must be 0 (positive and negative combined)
- $A = -A^T$
- $a_{i,j} = -a_{j,i}$


\begin{bmatrix}
0 & -4 & 8 \\
4 & 0 & -5 \\
-8 & 5 & 0
\end{bmatrix}

*note: I believe Hermitian matrices from physics are a type of skew-symmetric matrices in the Complex space*

### Identity

- The identity matrix $I$ is equal to the number 1 (anything multiplied by it = itself)
- 1's in diagonal, 0's everywhere else
- always a square matrix

\begin{bmatrix}
1 & 0 & ... & 0 \\
0 & 1 & ... & 0 \\
... & ... & ... & ... \\
0 & 0 & ... & 1
\end{bmatrix}

### Zeros

- obviously just a matrix of 0's
- can be either square or rectangular
- also called the *additive identity matrix* (since adding zero to anything gives you the original)

#### Code

In [5]:
I = np.eye(4)
print(I)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [6]:
O = np.ones(4)
print(O)

[1. 1. 1. 1.]


In [7]:
Z = np.zeros((4,4))
print(Z)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


### $A^TA$

- $A^TA$ is sometimes written as `AtA`
- $A^TA$ is one of the most important forms in all of applied linear algebra (covered in more detail next chapter)
- $AA^T$ is identical to $A^TA$, but $A^TA$ notation is used far more often

Important properties of $A^TA$:
- always a square matrix, even if A is rectangular
- Symmetric, even if A isn't
- it is full-rank if A is full column-rank
- it is invertible if A is full column-rank
- it has the same row space as A
- it has orthogonal eigenvectors
- it is positive (semi)definite.
- it has non-negative, real-valued eigenvalues.
- it is called a "covariance matrix" if A is a data matrix.
- if often looks pretty

### Diagonal

- only has values in the diagonal direction, everything else is zeroes
- diagonal can also be rectangular
- $I$ is one example of a diagonal, as is the following

\begin{bmatrix}
1 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 3
\end{bmatrix}, 
\begin{bmatrix}
7 & 0 & 0 & 0 \\
0 & \pi & 0 & 0
\end{bmatrix}

- If all the diagonal values are the same, then it can be written as a constant * Identity matrix

$\begin{bmatrix}
7 & 0 \\
0 & 7
\end{bmatrix} = 
7\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}
= 7I_2$

- Diagonal matrices are useful because they simplify operations like matrix multiplication and exponentiation.
- transforming a matrix into a diagonal is called *diagonalization* (covered later)
- $D$ is often used to indicate diagonal matrix, but note that there are many diagonal matrices not labeled $D$

#### Hollow Matrix

- The opposite of a diagonal matrix is a hollow matrix (0's on the diagonal)
- skew symmetric matrices are hollow

#### Code

In [8]:
D = np.diag([1,2,3,5])  # diagonal matrix
print(D)

[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 5]]


In [9]:
R = np.random.randn(3,4)
print(R)

[[-1.4192849   0.80970953  1.01493698 -0.52223646]
 [-0.10148739  0.58264719 -2.10766802 -0.94577382]
 [-0.96487829 -0.39610669 -1.00388355  0.6201468 ]]


In [10]:
d = np.diag(R)  # diagonal elements
print(d)

[-1.4192849   0.58264719 -1.00388355]


### Augmented

- Augmented matrix is the result of concatenating two or more matrices column-wise
  - i.e. tacking the columns from one matrix on the end of another
- You can only augment 2 matrices if they have the same number of rows (# of columns doesn't matter)

#### Code

In [11]:
# Augmenting matrix A and B => AB
A = np.random.randn(3,4)
print(A)

[[ 2.52433883 -1.33498509 -0.60150062  0.55633469]
 [-0.17116002  0.19594515  0.74102735  0.71382889]
 [ 0.29101293  2.26408239 -0.34446524 -1.50921809]]


In [12]:
B = np.random.randn(3, 2)
print(B)

[[-2.37702634 -0.52866817]
 [ 0.15127199  0.38832373]
 [ 0.05900842  0.22395287]]


In [13]:
AB = np.concatenate((A,B), axis=1)
print(AB)

[[ 2.52433883 -1.33498509 -0.60150062  0.55633469 -2.37702634 -0.52866817]
 [-0.17116002  0.19594515  0.74102735  0.71382889  0.15127199  0.38832373]
 [ 0.29101293  2.26408239 -0.34446524 -1.50921809  0.05900842  0.22395287]]


### Triangular

- only 1/2 the matrix has non-zero numbers (including diagonal)
- two kinds: upper triangle, lower triangle
- can be rectangular
- sometimes, the zeros are left off / ignored for visual clarity

#### Code

In [14]:
# Triangle matrices (lower and upper)
A = np.random.randn(4,5)
print(A)

[[-0.70827514 -0.4758618  -0.28407267  0.39652402  0.13435338]
 [-0.13737639  1.35782143  0.39476863  0.40569077 -0.14598929]
 [-0.66888504  0.75073146 -0.74378422  0.69637979  1.12940427]
 [-0.64464889  0.27205418 -0.18347664 -0.84907938  1.82777886]]


In [15]:
L = np.tril(A)  # tril(ower) extracts the lower triangle
print(L)

[[-0.70827514  0.          0.          0.          0.        ]
 [-0.13737639  1.35782143  0.          0.          0.        ]
 [-0.66888504  0.75073146 -0.74378422  0.          0.        ]
 [-0.64464889  0.27205418 -0.18347664 -0.84907938  0.        ]]


In [16]:
U = np.triu(A)  # triu(pper) extracts the upper triangle
print(U)

[[-0.70827514 -0.4758618  -0.28407267  0.39652402  0.13435338]
 [ 0.          1.35782143  0.39476863  0.40569077 -0.14598929]
 [ 0.          0.         -0.74378422  0.69637979  1.12940427]
 [ 0.          0.          0.         -0.84907938  1.82777886]]


### Dense and sparse

- a matrix in which most or all of the elements are non-zero is called a **dense** matrix (sometimes also called a full matrix)
- dense/full is usually used in comparison to another matrix
- a **sparse** matrix contains mostly zeros and a relatively small number of non-zero elements
- sparse elements are very computationally efficient which is why many modern algorithms utilize them

### Orthogonal

- orthogonal matrices satisfy the following:
  - all columns are pairwise orthogonal.  This means that the dot product between any 2 columns is exactly 0.
  - Each column *i* has unit magnitude: $\|Q_i\|=1$.
    - *(remember that the magnitude of a vector/column of a matrix is the dot product of the vector with itself)*

- $Q$ is often used to indicate orthogonal matrix (but not always)
- $Q^TQ = I$

### Toeplitz

- Toeplitz and Hankel matrices are closely related.
- In a Toeplitz matrix, all diagonals contain the same element

A Toeplitz matrix created from a vector:

$$
\begin{bmatrix}
a & b & c & d
\end{bmatrix} =>
\begin{bmatrix}
a & b & c & d \\
d & a & b & c \\
c & d & a & b \\
b & c & d & a
\end{bmatrix}
$$

### Hankel

- the opposite of a Toeplitz matrix (anti-diagonal)

A Hankel matrix created from a vector:

$$
\begin{bmatrix}
a & b & c & d
\end{bmatrix} =>
\begin{bmatrix}
a & b & c & d \\
b & c & d & 0 \\
c & d & 0 & 0 \\
d & 0 & 0 & 0
\end{bmatrix}
$$
or with wrap-around:
$$
\begin{bmatrix}
a & b & c & d \\
b & c & d & a \\
c & d & a & b \\
d & a & b & c
\end{bmatrix}
$$

#### Code

In [17]:
# Toeplitz matrix
from scipy.linalg import hankel, toeplitz
t = [1,2,3,4]
T = toeplitz(t)
print(T)

[[1 2 3 4]
 [2 1 2 3]
 [3 2 1 2]
 [4 3 2 1]]


In [18]:
# Hankel matrix (slighly less intuitive to create than Toeplitz)
H = hankel(t,r=[2,3,4,1])
print(H)

[[1 2 3 4]
 [2 3 4 3]
 [3 4 3 4]
 [4 3 4 1]]


## 5.6 Matrix addition and subtraction

- matrix addition and subtraction are simply item-wise addition / subtraction
- must have same number of elements (columns & rows)
- addition/subtraction is commutative (may seem trivial but it's important to remember because matrix multiplication is NOT commutative)

In [19]:
# Matrix addition
A = np.array([[0,5],[-4,6],[-3,0]])
B = np.array([[0,1],[1,1],[1,0]])
print(A + B)

[[ 0  6]
 [-3  7]
 [-2  0]]


## 5.7 Scalar-matrix multiplication

- matrix multiplication is involved enough to have its own chapter (later), but scalar multiplication is simple
- For scalar multiplication, just multiply the scalar value by each value of the matrix

## 5.8 "Shifting" a matrix

- "Shifting" a matrix means to add to the matrix a multiple of the identity matrix
- shifting is applied only to square matrices.
- The new, shifted matrix is often denoted as $\~A$

In [20]:
# Example of shifting a matrix
A = np.array([[1,3,0],[1,3,0],[2,2,7]])
I = np.eye(3)
shift = 0.1 * I
print (A + shift)

[[1.1 3.  0. ]
 [1.  3.1 0. ]
 [2.  2.  7.1]]


3 properties of matrix shifting:
1. only diagonal elements are affected
2. it can make identical rows distince (e.g. row 1 and 2 in above example)
3. when the $\lambda$ (scalar) is closer to zero, then $\~A$ is very similar to $A$.  (*I imagine this can be useful if you want to differentiate rows, but keep the matrix relatively the same.*)

- Shifting a matrix has important applications in statistics, machine learning, deep learning, etc
  - e.g. "regularization"

## 5.9 Diagonal and trace

### Diagonal

- the diagonal elements of a matrix can be extracted and converted into a vector
  - this is useful in statistics: the diagonal elements of a covariance matrix contain the variance of each variable
- matrix can be either square or rectangular

### Trace

- Trace produces a single number from a square matrix
- **IMPORTANT:** trace is only defined for square matrices
- it is indicated as $tr(A)$ and is defined as the sum of all diagonal elements of a matrix
- the trace operation has 2 applications in machine learning:
  - to compute the Frobenius norm of a matrix (measure of the magnitude of a matrix)
  - to measure the "distance" between 2 matrices

#### Code

In [21]:
# Trace
A = np.random.randn(4,4)
tr = np.trace(A)
print(tr)

-1.3827373924581665


## 5.10 - 5.11 Exercises

## 5.12 Coding Challenges

1. Goal: Create a matrix that contains the dot products between all pairs of columns in the two other matrices.

- create two 4x2 matrices
- write a double for loop to compute the dot products between each column of both matrices
- the *i,j* element of the resulting matrix will be the dot product between column *i* of the first matrix and column *j* of the second matrix

In [22]:
A = np.random.randn(4, 2)
print(A)

[[-0.38131789  1.30087495]
 [-0.51721821 -1.23224382]
 [ 0.10024948  1.27267139]
 [-1.45061494  0.07074066]]


In [23]:
B = np.random.randn(4, 2)
print(B)

[[-0.12829338 -0.5594911 ]
 [ 1.1899798   0.60583369]
 [ 0.02650357  1.40881382]
 [ 1.32094341  1.24843604]]


In [24]:
# test - extract a column from B
print( B[:,0] )

[-0.12829338  1.1899798   0.02650357  1.32094341]


In [25]:
dot_prods = []  # initialize empty array to hold 4 dot products

for i in range(0,2):  # from 0 up to but not including 2
  for j in range(0,2):
    col_A = A[:, i]
    col_B = B[:, j]
    dot = np.dot(col_A, col_B)
    dot_prods.append(dot)

print(dot_prods)

[-2.480081935393587, -1.769771369806117, -1.5060641615133419, 0.4069094496616006]


2. Create a symmetric matrix by starting with a dense random numbers matrix and applying 3 matrix operations: convert to triangular, transpose, matrix addition.

In [26]:
A = np.random.randn(6, 6)   # create random number matrix
print(A)

[[-0.58360455  1.01974647  0.54125838  1.27438416 -0.60891399  1.83064104]
 [ 0.80417938 -0.32595505 -0.21122167 -0.57867439 -0.49743304  0.55881994]
 [-0.64780552 -1.98591846  0.9928516   0.48203502 -0.53457592 -0.55968225]
 [-0.10365128  1.0849641  -0.12319094  0.13725647  0.99102136 -0.00434617]
 [ 0.06209404 -1.59797683  1.14366104 -0.06044753  0.08217303 -0.94737571]
 [ 0.79227567  0.16936967 -1.8861463   0.20573651  0.75883644  0.67477437]]


In [27]:
# 2a) convert to triangular
tri_A = np.tril(A)
print(tri_A)

[[-0.58360455  0.          0.          0.          0.          0.        ]
 [ 0.80417938 -0.32595505  0.          0.          0.          0.        ]
 [-0.64780552 -1.98591846  0.9928516   0.          0.          0.        ]
 [-0.10365128  1.0849641  -0.12319094  0.13725647  0.          0.        ]
 [ 0.06209404 -1.59797683  1.14366104 -0.06044753  0.08217303  0.        ]
 [ 0.79227567  0.16936967 -1.8861463   0.20573651  0.75883644  0.67477437]]


In [28]:
# 2b) transpose
A_t = tri_A.T
print(A_t)

[[-0.58360455  0.80417938 -0.64780552 -0.10365128  0.06209404  0.79227567]
 [ 0.         -0.32595505 -1.98591846  1.0849641  -1.59797683  0.16936967]
 [ 0.          0.          0.9928516  -0.12319094  1.14366104 -1.8861463 ]
 [ 0.          0.          0.          0.13725647 -0.06044753  0.20573651]
 [ 0.          0.          0.          0.          0.08217303  0.75883644]
 [ 0.          0.          0.          0.          0.          0.67477437]]


In [29]:
# 2c) combine with addition
A_symm = tri_A + A_t
print(A_symm)

[[-1.1672091   0.80417938 -0.64780552 -0.10365128  0.06209404  0.79227567]
 [ 0.80417938 -0.65191011 -1.98591846  1.0849641  -1.59797683  0.16936967]
 [-0.64780552 -1.98591846  1.98570319 -0.12319094  1.14366104 -1.8861463 ]
 [-0.10365128  1.0849641  -0.12319094  0.27451294 -0.06044753  0.20573651]
 [ 0.06209404 -1.59797683  1.14366104 -0.06044753  0.16434605  0.75883644]
 [ 0.79227567  0.16936967 -1.8861463   0.20573651  0.75883644  1.34954873]]


3. Create a diagonal matrix of size 4x8 without using the diag() function. The diagonal elements should be 1,2,3,4. How much of your code would you need to change to create an 8x4 diagonal matrix?

In [30]:
A = np.array([1,2,3,4])
I = np.eye(4)
D = A * I
print(D)

[[1. 0. 0. 0.]
 [0. 2. 0. 0.]
 [0. 0. 3. 0.]
 [0. 0. 0. 4.]]


In [31]:
# one way to change to 8x4 diagonal matrix is by augmenting a 4x4 zeroes matrix using np.concatenate
Z = np.zeros((4,4))
A2 = np.concatenate((D, Z), axis=0)
print(A2)

[[1. 0. 0. 0.]
 [0. 2. 0. 0.]
 [0. 0. 3. 0.]
 [0. 0. 0. 4.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
