<a href="https://colab.research.google.com/github/deltorobarba/machinelearning/blob/master/matrix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Matrix Factorization**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# **LU Decomposition**

## **Characteristics of an LU Decomposition**

The LU decomposition is often used to simplify the **solving of systems of linear equations**, such as **finding the coefficients in a linear regression**, as well as in **calculating the determinant and inverse** of a matrix.

* Lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. 

* The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. 

* Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix.

The **LU decomposition is for square matrices** and decomposes a matrix into L and U components. Let A be a square matrix. An LU factorization refers to the factorization of A, with proper row and/or column orderings or permutations, into two factors – a **lower triangular matrix L** and an **upper triangular matrix U**:

> A = L U

* The LU decomposition is found using an <u>iterative numerical process</u> and **can fail for those matrices that cannot be decomposed or decomposed easily**.

* In the lower triangular matrix all elements above the diagonal are zero, in the upper triangular matrix, all the elements below the diagonal are zero. For example, for a 3 × 3 matrix A, its LU decomposition looks like this:

> $\left[\begin{array}{lll}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{array}\right]=\left[\begin{array}{ccc}
l_{11} & 0 & 0 \\
l_{21} & l_{22} & 0 \\
l_{31} & l_{32} & l_{33}
\end{array}\right]\left[\begin{array}{ccc}
u_{11} & u_{12} & u_{13} \\
0 & u_{22} & u_{23} \\
0 & 0 & u_{33}
\end{array}\right]$

**Underdeterminism & Unit Triangular Matrix**

* Sometimes **equations is [underdetermined](https://en.m.wikipedia.org/wiki/Underdetermined_system)**. In this case any two non-zero elements of L and U matrices are parameters of the solution and can be set arbitrarily to any non-zero value. 

* Therefore, to find the unique LU decomposition, it is **necessary to put some restriction on L and U matrices**. For example, we can conveniently require the lower triangular matrix L to be a **unit triangular matrix** (i.e. set all the entries of its main diagonal to ones). 

**Square matrices**

* Any square matrix A admits an LUP factorization. If A is [invertible](https://en.m.wikipedia.org/wiki/Invertible_matrix), then it admits an LU (or LDU) factorization if and only if all its leading principal [minors](https://en.m.wikipedia.org/wiki/Minor_(linear_algebra)) are nonzero. 

* If A is a singular matrix of rank k, then it admits an LU factorization if the first k leading principal minors are nonzero, although the converse is not true.

* If a square, invertible matrix has an LDU (factorization with all diagonal entries of L and U equal to 1), then the factorization is unique. In that case, the LU factorization is also unique if we require that the diagonal of L (or U) consists of ones.

**Symmetric positive definite matrices**

* If A is a symmetric (or [Hermitian](https://en.m.wikipedia.org/wiki/Hermitian_matrix), if A is complex) [positive definite](https://en.m.wikipedia.org/wiki/Definite_symmetric_matrix) matrix, we can arrange matters so that U is the [conjugate transpose](https://en.m.wikipedia.org/wiki/Conjugate_transpose) of L. That is, we can write A as

> A = LL*

* This decomposition is called the **Cholesky decomposition**. The Cholesky decomposition always exists and is unique — provided the matrix is positive definite. 

* Furthermore, computing the Cholesky decomposition is more efficient and [numerically more stable](https://en.m.wikipedia.org/wiki/Numerical_stability) than computing some other LU decompositions.

**General matrices**

* For a (not necessarily invertible) matrix over any field, the exact necessary and sufficient conditions under which it has an LU factorization are known. 

* The conditions are expressed in terms of the ranks of certain submatrices. The Gaussian elimination algorithm for obtaining LU decomposition has also been extended to this most general case.

## **Variations of LU Decomposition**

**LU factorization with partial pivoting (LUP Decomposition)**

A variation of this decomposition that is numerically more stable to solve in practice is called the LUP decomposition, or the **LU decomposition with partial pivoting**.

> A = P L U

The rows of the parent matrix are re-ordered to simplify the decomposition process and the **additional P matrix specifies a way to permute the result or return the result to the original order**.

It turns out that a proper permutation in rows (or columns) is sufficient for LU factorization. LU factorization with partial pivoting (LUP) refers often to LU factorization with row permutations only:

> PA = LU

where L and U are again lower and upper triangular matrices, and P is a [permutation matrix](https://en.m.wikipedia.org/wiki/Permutation_matrix)*, which, when left-multiplied to A, reorders the rows of A. It turns out that all square matrices can be factorized in this form, and the factorization is numerically stable in practice. This makes LUP decomposition a useful technique in practice.

*A permutation matrix is a square binary matrix that has exactly one entry of 1 in each row and each column and 0s elsewhere.*

**LU factorization with full pivoting**

An LU factorization with full pivoting involves both row and column permutations:

> PAQ = LU

where L, U and P are defined as before, and Q is a permutation matrix that reorders the columns of A.

**LDU Decomposition**

An LDU decomposition is a decomposition of the form

> A = LDU

where D is a diagonal matrix, and L and U are unitriangular matrices, meaning that all the entries on the diagonals of L and U are one.

Below we required that A be a square matrix, but these decompositions can all be generalized to rectangular matrices as well. In that case, **L and D are square matrices** both of which have the same number of rows as A, and U has exactly the same dimensions as A. Upper triangular should be interpreted as having only zero entries below the main diagonal, which starts at the upper left corner.

![LDU decomposition of a Walsh matrix](https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/LDU_decomposition_of_Walsh_16.svg/640px-LDU_decomposition_of_Walsh_16.svg.png)

*LDU decomposition of a [Walsh matrix](https://en.m.wikipedia.org/wiki/Walsh_matrix)*

## **Example**

**Define Matrix**

In [None]:
# LU decomposition
from numpy import array

# define a square matrix
A = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(A)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


**Decompose**

In [None]:
# LU decomposition
from scipy.linalg import lu
P, L, U = lu(A)

In [None]:
print(P)

[[0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]]


In [None]:
print(L)

[[1.         0.         0.        ]
 [0.14285714 1.         0.        ]
 [0.57142857 0.5        1.        ]]


In [None]:
print(U)

[[7.         8.         9.        ]
 [0.         0.85714286 1.71428571]
 [0.         0.         0.        ]]


**Reconstruct**

In [None]:
B = P.dot(L).dot(U)
print(B)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


In [None]:
# Check for differences between both matrices
X = B - A
print(X)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


# **QR Decomposition**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

* The QR decomposition is for m x n matrices (not limited to square matrices) and decomposes a matrix into Q and R components.

> A = Q R

* Where A is the matrix that we wish to decompose, Q a matrix with the size m x m, and R is an upper triangle matrix with the size m x n.

* **Q is an orthogonal (Q<sup>T</sup> Q = I) or unitary matrix** (Q ∗ Q = I) and **R is an upper triangular matrix**. The QR decomposition is a special case of the [Iwasawa decomposition](https://en.m.wikipedia.org/wiki/Iwasawa_decomposition).

* The QR decomposition is found using an iterative numerical method that can fail for those matrices that cannot be decomposed, or decomposed easily.

* Like the LU decomposition, the QR decomposition is often used to solve systems of linear equations, although is not limited to square matrices.

* By default, the qr function returns the Q and R matrices with smaller or ‘reduced’ dimensions that is more economical. We can change this to return the expected sizes of m x m for Q and m x n for R by specifying the mode argument as ‘complete’, although this is not required for most applications.

Such a decomposition always exists and can be calculated using various algorithms. The best known are

* [Householder transformations](https://de.m.wikipedia.org/wiki/Householdertransformation)
* [Givens rotations](https://de.m.wikipedia.org/wiki/Givens-Rotation)
* [Gram-Schmidtsch's orthogonalization method](https://de.m.wikipedia.org/wiki/Gram-Schmidtsches_Orthogonalisierungsverfahren)

**Define Matrix**

In [None]:
from numpy import array

# define a 3x2 matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)

[[1 2]
 [3 4]
 [5 6]]


**Decompose**

In [None]:
# QR decomposition
from numpy.linalg import qr
Q, R = qr(A, 'complete')

In [None]:
print(Q)

[[-0.16903085  0.89708523  0.40824829]
 [-0.50709255  0.27602622 -0.81649658]
 [-0.84515425 -0.34503278  0.40824829]]


In [None]:
print(R)

[[-5.91607978 -7.43735744]
 [ 0.          0.82807867]
 [ 0.          0.        ]]


**Reconstruct**

In [None]:
B = Q.dot(R)
print(B)

[[1. 2.]
 [3. 4.]
 [5. 6.]]


# **Cholesky Decomposition**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

The Cholesky decomposition is for square symmetric matrices where all eigenvalues are greater than zero, so-called [positive definite matrices](https://en.wikipedia.org/wiki/Definite_symmetric_matrix). For our interests in machine learning, we will focus on the Cholesky decomposition for real-valued matrices and ignore the cases when working with complex numbers.

> A = LL^T

Where A is the matrix being decomposed, L is the lower triangular matrix and L^T is the transpose of L.

The decompose can also be written as the product of the upper triangular matrix, for example:

> A = U^T . U

* Where U is the upper triangular matrix.

* The Cholesky decomposition is used for solving linear least squares for linear regression, as well as simulation and optimization methods.

* When decomposing symmetric matrices, the Cholesky decomposition is nearly twice as efficient as the LU decomposition and should be preferred in these cases.

* While symmetric, positive definite matrices are rather special, they occur quite frequently in some applications, so their special factorization, called Cholesky decomposition, is good to know about. When you can use it, Cholesky decomposition is about a factor of two faster than alternative methods for solving linear equations.

**Define Matrix**

In [None]:
from numpy import array

# define a 3x3 matrix
A = array([[2, 1, 1], [1, 2, 1], [1, 1, 2]])
print(A)

[[2 1 1]
 [1 2 1]
 [1 1 2]]


**Decompose**

In [None]:
# Cholesky decomposition
from numpy.linalg import cholesky
L = cholesky(A)
print(L)

[[1.41421356 0.         0.        ]
 [0.70710678 1.22474487 0.        ]
 [0.70710678 0.40824829 1.15470054]]


**Reconstruct**

In [None]:
B = L.dot(L.T)
print(B)

[[2. 1. 1.]
 [1. 2. 1.]
 [1. 1. 2.]]


# **Singular-Value Decomposition (SVD)**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

* Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent elements.

* Perhaps the most known and widely used matrix decomposition method is the Singular-Value Decomposition, or SVD. All matrices have an SVD, which makes it more stable than other methods, such as the eigendecomposition. As such, it is often used in a wide array of applications including compressing, denoising, and data reduction.

* The Singular-Value Decomposition, or SVD for short, is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simpler.

* For the case of simplicity we will focus on the SVD for real-valued matrices and ignore the case for complex numbers.

> A = U . Sigma . V^T

* Where A is the real m x n matrix that we wish to decompose, U is an m x m matrix, Sigma (often represented by the uppercase Greek letter Sigma) is an m x n diagonal matrix, and V^T is the  transpose of an n x n matrix where T is a superscript.

* The diagonal values in the Sigma matrix are known as the singular values of the original matrix A. The columns of the U matrix are called the left-singular vectors of A, and the columns of V are called the right-singular vectors of A.

* The SVD is calculated via iterative numerical methods. We will not go into the details of these methods. Every rectangular matrix has a singular value decomposition, although the resulting matrices may contain complex numbers and the limitations of floating point arithmetic may cause some matrices to fail to decompose neatly.

* The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. The SVD allows us to discover some of the same kind of information as the eigendecomposition. However, the SVD is more generally applicable.

* The SVD is used widely both in the calculation of other matrix operations, such as matrix inverse, but also as a data reduction method in machine learning. SVD can also be used in least squares linear regression, image compression, and denoising data.

* The SVD can be calculated by calling the svd() function. The function takes a matrix and returns the U, Sigma and V^T elements. The Sigma diagonal matrix is returned as a vector of singular values. The V matrix is returned in a transposed form, e.g. V.T.

**Define a Matrix**

In [None]:
from numpy import array
A = array([[1, 2], [3, 4], [5, 6]])
print(A)

[[1 2]
 [3 4]
 [5 6]]


**Decompose**

In [None]:
# Calculate Singular-Value Decomposition
from scipy.linalg import svd
U, s, VT = svd(A)

In [None]:
print(U)

[[-0.2298477   0.88346102  0.40824829]
 [-0.52474482  0.24078249 -0.81649658]
 [-0.81964194 -0.40189603  0.40824829]]


In [None]:
print(s)

[9.52551809 0.51430058]


In [None]:
print(VT)

[[-0.61962948 -0.78489445]
 [-0.78489445  0.61962948]]


**Reconstruct**

* The original matrix can be reconstructed from the U, Sigma, and V^T elements.
* The U, s, and V elements returned from the svd() cannot be multiplied directly.
* The s vector must be converted into a diagonal matrix using the diag() function. By default, this function will create a square matrix that is n x n, relative to our original matrix. This causes a problem as the size of the matrices do not fit the rules of matrix multiplication, where the number of columns in a matrix must match the number of rows in the subsequent matrix.
* After creating the square Sigma diagonal matrix, the sizes of the matrices are relative to the original m x n matrix that we are decomposing, as follows:

> U (m x m) . Sigma (n x n) . V^T (n x n)

* Where, in fact, we require:

> U (m x m) . Sigma (m x n) . V^T (n x n)

* We can achieve this by creating a new Sigma matrix of all zero values that is m x n (e.g. more rows) and populate the first n x n part of the matrix with the square diagonal matrix calculated via diag().

In [None]:
from numpy import diag
from numpy import dot
from numpy import zeros

# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))

# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[1], :A.shape[1]] = diag(s)

# reconstruct matrix
B = U.dot(Sigma.dot(VT))
print(B)

[[1. 2.]
 [3. 4.]
 [5. 6.]]


The above complication with the Sigma diagonal only exists with the case where m and n are not equal. The diagonal matrix can be used directly when reconstructing a square matrix, as follows.

In [None]:
# Reconstruct SVD
from numpy import array
from numpy import diag
from numpy import dot
from scipy.linalg import svd

# define a matrix
A = array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(A)
# Singular-value decomposition
U, s, VT = svd(A)
# create n x n Sigma matrix
Sigma = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(VT))
print(B)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


**Appendix: SVD for Pseudoinverse**

* The pseudoinverse is the generalization of the matrix inverse for square matrices to rectangular matrices where the number of rows and columns are not equal.

* It is also called the the Moore-Penrose Inverse after two independent discoverers of the method or the Generalized Inverse.

* Matrix inversion is not defined for matrices that are not square. When A has more columns than rows, then solving a linear equation using the pseudoinverse provides one of the many possible solutions.

* The pseudoinverse is denoted as A^+, where A is the matrix that is being inverted and + is a superscript. The pseudoinverse is calculated using the singular value decomposition of A:

> A^+ = VD^+U^T

Where A^+ is the pseudoinverse, D^+ is the pseudoinverse of the diagonal matrix Sigma and U^T is the transpose of U.

We can get U and V from the SVD operation.

> A = U . Sigma . V^T

The D^+ can be calculated by creating a diagonal matrix from Sigma, calculating the reciprocal of each non-zero element in Sigma, and taking the transpose if the original matrix was rectangular.



In [None]:
#          s11,   0,   0
# Sigma = (  0, s22,   0)
#            0,   0, s33

In [None]:
#        1/s11,     0,     0
# D^+ = (    0, 1/s22,     0)
#            0,     0, 1/s33

The pseudoinverse provides one way of solving the linear regression equation, specifically when there are more rows than there are columns, which is often the case. 

NumPy provides the function pinv() for calculating the pseudoinverse of a rectangular matrix. The example below defines a 4×2 matrix and calculates the pseudoinverse.

In [None]:
# Pseudoinverse
from numpy import array
from numpy.linalg import pinv

# define matrix
A = array([
	[0.1, 0.2],
	[0.3, 0.4],
	[0.5, 0.6],
	[0.7, 0.8]])
print(A)

[[0.1 0.2]
 [0.3 0.4]
 [0.5 0.6]
 [0.7 0.8]]


In [None]:
# calculate pseudoinverse
B = pinv(A)
print(B)

[[-1.00000000e+01 -5.00000000e+00  1.42385628e-14  5.00000000e+00]
 [ 8.50000000e+00  4.50000000e+00  5.00000000e-01 -3.50000000e+00]]


# **Rank Factorization**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

* given an m × n matrix A of rank r, a rank decomposition or rank factorization of A is a factorization of A of the form A = C F, where C is an m × r matrix and F is an r × n matrix.

* Every finite-dimensional matrix has a rank decomposition

* One can also construct a full rank factorization of A by using its singular value decomposition

# **Eigendecomposition**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

* Eigendecomposition or sometimes spectral decomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way.

# **Non-negative matrix factorization**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

* Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation is a **group of algorithms** in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. 

* This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. 

* Since the problem is not exactly solvable in general, it is commonly approximated numerically.

![NNMF](https://upload.wikimedia.org/wikipedia/commons/f/f9/NMF.png)

*Illustration of approximate non-negative matrix factorization: the matrix V is represented by the two smaller matrices W and H, which, when multiplied, approximately **reconstruct V**.*