# Chapter 7: Rank

- content: pp. 185 - 206  
- exercises: pp. 207 - 210

In [1]:
# standard python library imports
import numpy as np
from matplotlib import pyplot as plt

## 7.1 Six things about matrix rank

The rank of a matrix is a single number associated with that matrix, and is relevant for nearly all applications of linear algebra.  Before learning how to compute the rank, here we'll focus on the concept and interpretatinos of the rank.

1. The rank of the matrix is indicated by the letter *r* or by *rank*(A), and is a non-negative integer.  A matrix cannot have a rank of -2 or 4.7.  A rank of 0 is possible, but most matrices have a rank > 0.  In fact, only the zeros matrix can have a rank = 0.

2. The max possible rank of an $M x N$ matrix is the smaller of $M$ or $N$

3. Rank is a property of the entire matrix; it doesn't make sense to talk about the rank of the columns of the matrix, or the rank of the rows of the matrix.

4. The following shows terminology for full-rank matrices, depending on their sizses:
- $M$ x $M$ matrix: rank($A$) = $M$ --> "Full rank"
- $M>N$ matrix: rank($A$) = $N$ --> "Full column rank"
- $M<N$ matrix: rank($A$) = $M$ --> "Full row rank"
- if a matrix rank is less than the smaller of M or N, then it can be called any of: "reduced rank", "rank-deficient", "degenerate", "low-rank", or "singular"

5. The rank indicates the number of dimensions of *information* contained in the matrix. This is not the same as the total number of columns or rows in the matrix.

6. There are several definitions of rank that you will learn throughout this book, and several algorithms for computing the rank.  However, the key definition to keep in mind is that the **rank of a matrix is the largest number of columns that can form a linearly independent set.**  This is exactly the same as: **the largest number of rows that can form a linearly independent set.**

### Reflection - Why all the full about rank?

There are some operations in linear algebra that are valid *only* for full-rank matrices (the matrix inverse being the most important). Other operations are valid on reduced-rank matrices (e.g. eigen-decomposition) but having full rank endows some additional properties.  Furthermore, many computer algorithms returm more reliable results when using full-rank compared to reduced rank matrices.  Indeed, one of the main goals of *regularization* in statistics and machine-learning is to increase numerical stability by ensuring that data matrices are full rank.

So yeah, matrix rank is a big deal.

## 7.2 Interpretations of matrix rank

### Algebraic interpretation

If you think of a matrix as comprising a set of vectors, then the rank of the matrix corresponds to the largest number of vectors that can form a linearly independent set.  (Remember that a set of vectors is linearly independent if no vector can be expressed as a linear combination of the other vectors.)

i.e. ignore any vectors of a matrix that are simply scaled versions of another vector in the matrix, or scaled combination of 2 or more vectors in the matrix.

### Geometric interpretation

- Rank is the dimensionality of the sub-space spanned by the columns (or the rows) of the matrix.  This is not necessarily that same as the ambient dimensionality of the space containing the matrix.
- For an example, we can think of the matrix  $v = [4 \ 0 \ 1]^T$  as a vector that lives in $\mathbb{R}^3$, but it's subspace only spans 1D.
  - This is because we can also reinterpret the 3D vector as 3 points living in $\mathbb{R}^1$ (3 points on a line), and you can scale each of the 3 points to equal another.
- **In fact, all isolated vectors have a rank of 1 (except the zeros vector with rank 0).**
- Geometrically, the rank of a matrix is the dimensionality of the subspaces spanned by either the columns or the rows.
- Regardless of the perspective you take (a column focused or row focused perspective), the dimensionality of the subspace spanned by those vectors--and thus, the rank of the matrix--is the same.

## 7.3 Computing matrix rank

Computing the matrix rank of medium/large matrices is not trivial.  In fact, beyond small matrices, computers cannot actually *compute* the rank of a matrix; they can only *estimate* the rank to a reasonable degree of certainty.  That said, computing the rank of small matrices is not too difficult.

Below are 3 methods to compute the rank of a matrix.  The first can be implemented with current knowledge, the other 2 will be explained later in the book.

1. Count the largetst number of columns (or rows) that can form a linearly independent set.  This involves a bit of trial-and-error and a bit of educated guessing.  You can follow the same tips for determining linear independence from Ch 4 (page 97).

2. Count the number of pivots in the echelon or row-reduced echelon form of the matrix (Ch 10).

3. Count the number of nonzero singular values from a singular value decomposition of the matrix (Ch 16).

In [2]:
# Compute rank using Python
A = np.random.randn(3,6)
r = np.linalg.matrix_rank(A)
print(r)

3


## 7.4 Rank and scalar multiplication

- Scalar multiplication has no effect on the rank of a matrix, with one exception when the scalar is 0.
- Note: This is clear because rank excludes scaled versions of vectors, so scaling all vectors equally has no effect.

In [3]:
# Show that scaling has no effect on rank
scalar = np.random.randn()
M = np.random.randn(3, 5)
rank = np.linalg.matrix_rank(M)
rank_scaled = np.linalg.matrix_rank(scalar * M)
print(rank, rank_scaled)

3 3


## 7.5 Rank of added matrices

- knowing the ranks of $A$ and $B$ doesn't automatically mean you'll know the rank of $A + B$
- But it *does* provide an upper bound on the rank of $A + B$

### Rule for max rank of added matrices:

$$rank(A + B) \leq rank(A) + rank(B)$$
(plus additional constraint The max possible rank of an $M x N$ matrix is the smaller of $M$ or $N$)

e.g.
- A = 5x7 matrix
- B = 5x7 matrix
- if $rank(A)$ = 2, and $rank(B)$ = 5, then the formula says that $rank(A + B) \leq$ 7.
- but then you need to include the additional constraint (see multiple constraints below) where the max rank is smaller of M or N
- Thus, the max rank of A + B is 5.

### Subtraction
- the addition rules also apply to subtraction, since subtraction is just addition which is multiplied by a scalar of -1, and scalars do not affect rank.

### Multiple constraints

- there are multiple constraints on the rank of a matrix. (e.g. the largest possible rank is the smaller of $M$ or $N$)
- So to calculate max rank in addition, you need to include the rule/formula above, along with the constraint that rank is smaller of M or N.

## 7.6 Rank of multiplied matrices

- As with summed matrices, you cannot know the exact rank of a multiplied matrix product simply from the ranks of $A$ and $B$.
- But as with summed matrices, there is a rule for the upper bound

### Rule for max rank of multiplied matrices:

$$rank(AB) \leq min\{rank(A), rank(B)\}$$

i.e. The smaller of $rank(A)$, $rank(B)$ is the largest possible rank of the product matrix

e.g. $rank(A)$ = 2, $rank(B)$ = 5, then $rank(A + B) \leq$ 5.

#### How to understand this rule?
- You can think about it in terms of the column space of the matrix $C = AB$ (which is the subspace spanned by the columns of a matrix).
- Think of the $j^{th}$ column of $C$ as being the matrix-vector product of matrix $A$ and the $j^{th}$ column in $B$.
$$Ab_j = c_j$$
- This means that each column of $C$ is a linear combination fo columns of $A$ with scalar weights defined by the corresponding column in $B$.
- In other words, each column of $C$ is in the subspace spanned by the columns of $A$

### Reflection
- The rules in the previous 2 sections prepare for the next 2 sections, which have **major** implications for applied linear algebra, primarily statistics and machine-learning.
- The more comfortable you are with matrix rank, the more intuitive advanced linear concepts will be!

## 7.7 Rank of $A, A^T, A^TA, AA^T$

- The key take-home message from this section is that these 4 matrices: $A, A^T, A^TA, AA^T$ all have exactly the same rank.
- We already know that $A$ and $A^T$ have the same rank because rank can be determined by either column or row perspective, they are the same.  So transposing a matrix has no effect on rank.

- proving that $A^T$ and $AA^T$ have the same rank is not as easy and requires use of tools not yet leaned in this book.
  - The author provided some proofs on pages 197-199.  Bookmark and return to them later if needed to review.

## 7.8 Rank of random matrices

- A "random matrix" is a matrix that contains elements drawn at random from any type of distribution: normal/Gaussian, uniform, Poisson, etc.
- Random matrices have some interesting properties, and there are entire theories built around random matrices.
- The most interesting property of random matrices that is relevant for this book is that they are basically always full rank.
  - note: this is intuitive since true randomness should have no linear dependency except the extremely rare case of spurious/accidental ones.
- As long as the range of possible numbers to select from is large (e.g. any real between 0 and 1) then spurious/accidental linear dependency is extremely rare. But be aware that this no longer applies when the range of possible numbers is narrow (e.g. integers from 1 - 5).
- Thus, whenever you create random matrices using computers (using floating point numbers), you can safely assume that their rank is the maximum possible rank.
  - this is useful because you can create matrices with any pre-defined rank you want.

## 7.9 Boosting to full-rank by "shifting"

- Full-rank square matrices are absolutely fabulous to work with, but many matrices are rank-deficient.  So what are we to do?
- One solution is to transform a rank-deficient matrix into a full-rank matrix through "shifting" which we learned about in Ch 5.8.
  - reminder: "shifting" a matrix means to add a multiple of the identity matrix ($A + \lambda I = \~A$) which adds a tiny quantity to the diagonal elements without changing the off-diagonal elements.

- If we shift by a very small scalar number, then the matrix $\~A$ approaches the original $A$ matrix.
- If we shift by a very large scalar number, then the matrix $\~A$ approaches the Identity matrix $I$.

- In the context of statistics and machine learning, "shifting" is also called *regularization* or *matrix smoothing*.
- It is an important procedure for multivariate analyses such as principal components analysis and generalized eigendecomposition (which are the mathematical backbones of data compression and linear discriminant analyses).

## 7.10 Difficulties in computing rank

The author briefly gives a explanation of why it's difficult to compute the rank of large matrices:
- Computers suffer from floating point rounding errors that lead to uncertainties in distinguishing very small numbers from true 0.
- Numbers < $10^-15$ are considered to be 0 plus a small computer rounding error.
- Computer software (e.g. Python or MATLAB) will have some threshold for rounding numbers, and that threshold can affect the rank of a matrix.

In [4]:
## inspect the source code used to compute rank to check rounding threshold
??np.linalg.matrix_rank

[0;31mSignature:[0m [0mnp[0m[0;34m.[0m[0mlinalg[0m[0;34m.[0m[0mmatrix_rank[0m[0;34m([0m[0mM[0m[0;34m,[0m [0mtol[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mhermitian[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;34m@[0m[0marray_function_dispatch[0m[0;34m([0m[0m_matrix_rank_dispatcher[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;32mdef[0m [0mmatrix_rank[0m[0;34m([0m[0mM[0m[0;34m,[0m [0mtol[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mhermitian[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""[0m
[0;34m    Return matrix rank of array using SVD method[0m
[0;34m[0m
[0;34m    Rank of the array is the number of singular values of the array that are[0m
[0;34m    greater than `tol`.[0m
[0;34m[0m
[0;34m    .. versionchanged:: 1.14[0m
[0;34m       Can now operate on stacks of matrices[0m
[0;34m[0m
[0;34m    Parameters[0m
[0;34m    ----------[0m
[

### Geometry

Now let's think about the difficulty in computing rank geometrically.
- Imagine a 3 x 3 matrix that represents some data collected from a satellite.
  - The columns are in $\mathbb{R}^3$ and let's imagine we know for a fact the 3 vectors all lie on a 2D plane (based on satellite sensor design).
  - this means, we know for a fact that the rank is 2.
- The satellite data is imperfect and there is a tiny bit of noise.
  - this noise causes the vectors to point slightly off of the 2D plane
- Due to the imperfect data, a computer will calculate the rank of this matrix as 3, but it really should be rank 2 excluding the noise.
- So given this info, you might want your rank-estimating-algorithm to ignore some small amount of noise based on what you know about the real life data.

## 7.11 Rank and span

1. Put the vectors from set *S* into a matrix $S$.
2. Compute the rank of $S$. Call that rank $r1$.
3. Augment $S$ by $v$, thus creating $S_v = S \lfloor\rfloor v$.
4. Compute the rank of $S_v$. Call that rank $r2$.
5. If $r2 > r1$ then $v$ is **not** in the span of S.  
If $r2 = r1$ then $v$ **is** in the span of S.  
If $r2 <> r1$ then check your math or code for a mistake.

## 7.12 - 7.13 Exercises

do some for group discussion?

## 7.14 - 7.15 Code Challenges

1) The goal of this code challenge is to create random matrices with any arbitrary rank (though still limited by the constraints presented in this chapter). In particular, combine standard matrix multiplication (previous chapter) with the rule about rank and matrix multiplication (Equation 7.8) to create reduced-rank matrices comprising random numbers (hint: think about the "inner dimensions" of matrix multiplication).

In [5]:
# Goal rank = 2
A = np.random.randn(13,4)
B = np.random.randn(4,2)
C = A@B
rank = np.linalg.matrix_rank(C)
print(rank)

2


2) The goal of this code challenge is to explore the tolerance level of your computer for computing the rank of matrices with tiny values.  Start by creating the 5x5 zeros matrix and confirm that its rank is 0.  Then add a 5x5 random numbers matrix scaled by machine-epsilon, which is the computer's estimate of its numerical precision due to round-off errors.  Now the rank of that summed matrix will be 5. Finally, keep scaling down the machine-epsilon until the rank of the summed matrix is 0. You can also compute the Frobenius norm to get a sense of the magnitude of the values in the matrix.

In [6]:
Z = np.zeros((5,5))
print(Z)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [7]:
print(np.linalg.matrix_rank(Z))

0


In [8]:
A = np.random.randn(5,5)
m_epsilon = np.finfo(float).eps
print(m_epsilon)

2.220446049250313e-16


In [9]:
A_scaled = A * m_epsilon
C = Z + A_scaled
print(np.linalg.matrix_rank(C))

5


In [10]:
frob = np.trace(C.T@C)
print(frob)

7.994687149941428e-31


In [11]:
for i in range(1, 100):
  C = C * m_epsilon
  frob = np.trace(C.T@C)
  print("Frobenius norm after " + str(i) + " loops: " + str(frob))
  rank = np.linalg.matrix_rank(C)
  if (rank == 0):
    print("Reached rank 0 after multiplying by machine epsilon " + str(i) + " times.")
    break

Frobenius norm after 1 loops: 3.941685088788491e-62
Frobenius norm after 2 loops: 1.9434007920236585e-93
Frobenius norm after 3 loops: 9.58170567501884e-125
Frobenius norm after 4 loops: 4.724145632722918e-156
Frobenius norm after 5 loops: 2.3291836251410566e-187
Frobenius norm after 6 loops: 1.1483761893467074e-218
Frobenius norm after 7 loops: 5.661931751639373e-250
Frobenius norm after 8 loops: 2.79154787931114e-281
Frobenius norm after 9 loops: 1.376339366904e-312
Frobenius norm after 10 loops: 0.0
Frobenius norm after 11 loops: 0.0
Frobenius norm after 12 loops: 0.0
Frobenius norm after 13 loops: 0.0
Frobenius norm after 14 loops: 0.0
Frobenius norm after 15 loops: 0.0
Frobenius norm after 16 loops: 0.0
Frobenius norm after 17 loops: 0.0
Frobenius norm after 18 loops: 0.0
Frobenius norm after 19 loops: 0.0
Frobenius norm after 20 loops: 0.0
Reached rank 0 after multiplying by machine epsilon 20 times.
