## Singular-Value Decomposition 

All matrices has an SVD, which makes it more stable than other methods, such as the eigendecomposition. As such, it is often used in a wide array of applications including compressing, denoising, and data reduction.

The Singular-Value Decomposition, or SVD for short, is a matrix decomposition method for reducing a matrix to its constituent parts in order to make certain subsequent matrix calculations simplier

In [1]:
# Singular-value decomposition
from numpy import array
from scipy.linalg import svd

A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])

# factorize
U, s, V = svd(A)
print(U)
print(s)
print(V)

[[-0.2298477   0.88346102  0.40824829]
 [-0.52474482  0.24078249 -0.81649658]
 [-0.81964194 -0.40189603  0.40824829]]
[9.52551809 0.51430058]
[[-0.61962948 -0.78489445]
 [-0.78489445  0.61962948]]


In [5]:
# Reconstruct rectanglar matrix from svd
from numpy import array
from numpy import diag
from numpy import zeros
from scipy.linalg import svd

A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])

# factorize
U, s, V = svd(A)
# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[1], :A.shape[1]] = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(V))
print(B) # [[1. 2.]
         #  [3. 4.]
         #  [5. 6.]]

[[1. 2.]
 [3. 4.]
 [5. 6.]]


In [6]:
# Reconstruct square matrix from svd
from numpy import array
from numpy import diag
from scipy.linalg import svd

A = array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# factorize
U, s, V = svd(A)
# create n x n Sigma matrix
Sigma = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(V))
print(B) # [[1. 2. 3.]
         #  [4. 5. 6.]
         #  [7. 8. 9.]]

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


## Pseudoinverse
The pseudoinverse is the generalization of the matrix inverse for square matrices to rectangular matrices where the number of rows and columns are not equal. It is also called the Moore-Penrose Inverse after two independent discoverers of the method or the Generalized Inverse.

In [7]:
# Pseudoinverse
from numpy import array
from numpy.linalg import pinv

A = array([
    [0.1, 0.2],
    [0.3, 0.4],
    [0.5, 0.6],
    [0.7, 0.8]    
]) # 4 x 2 matrix

# calculate pse
B = pinv(A)
print(B) # 2 x 4 matrix 

[[-1.00000000e+01 -5.00000000e+00  1.28785871e-14  5.00000000e+00]
 [ 8.50000000e+00  4.50000000e+00  5.00000000e-01 -3.50000000e+00]]


In [8]:
# Pseudoinverse via svd
from numpy import array
from numpy import zeros
from numpy import diag
from numpy.linalg import svd

A = array([
    [0.1, 0.2],
    [0.3, 0.4],
    [0.5, 0.6],
    [0.7, 0.8]    
]) # 4 x 2 matrix

# factorize
U, s, V = svd(A)
# reciprocals of s
d = 1.0 / s
# create m x n D matrix
D = zeros(A.shape)
# populate D with n x n diagonal matrix
D[:A.shape[1], :A.shape[1]] = diag(d)
# calcualte pseudoinverse
B = V.T.dot(D.T).dot(U.T)
print(B)

[[-1.00000000e+01 -5.00000000e+00  1.28508315e-14  5.00000000e+00]
 [ 8.50000000e+00  4.50000000e+00  5.00000000e-01 -3.50000000e+00]]


## Dimensionality Reduction
A popular application of SVD is for dimensionality reduction. Data with a large number of features, such as more features (columns) than observations (rows) may be reduced to a smaller subset of features that are most relevant to the prediction problem.

In [10]:
# data reduction with svd
from numpy import array
from numpy import diag
from numpy import zeros
from scipy.linalg import svd

A = array([
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
])

# factorize
U, s, V = svd(A)
# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))

# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[0], :A.shape[0]] = diag(s)
# select
n_elements = 2
Sigma = Sigma[:, :n_elements]
V = V[:n_elements, :]
# reconstruct
B = U.dot(Sigma.dot(V))
print(B) # A
# transform
T = U.dot(Sigma)
print(T)
T = A.dot(V.T)
print(T)

[[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
 [11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]
 [21. 22. 23. 24. 25. 26. 27. 28. 29. 30.]]
[[-18.52157747   6.47697214]
 [-49.81310011   1.91182038]
 [-81.10462276  -2.65333138]]
[[-18.52157747   6.47697214]
 [-49.81310011   1.91182038]
 [-81.10462276  -2.65333138]]


In [13]:
# svd data reduction with scikit-learn
from numpy import array
from sklearn.decomposition import TruncatedSVD

A = array([
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
])

# create transform
svd = TruncatedSVD(n_components=2)
# fit transform
svd.fit(A)
# apply transform
result = svd.transform(A)
print(result)

[[18.52157747  6.47697214]
 [49.81310011  1.91182038]
 [81.10462276 -2.65333138]]
