## Matrix Factorization
## SVD

In [1]:
import numpy as np

# Original matrix
A = np.array([[1, 0, 0, 0, 2],
              [0, 0, 3, 0, 0],
              [4, 0, 0, 5, 0],
              [0, 6, 0, 0, 0]])

# Perform SVD
U, S, VT = np.linalg.svd(A, full_matrices=False)

# Reconstruct the matrix
S = np.diag(S)
A_approx = np.dot(U, np.dot(S, VT))

print("Original Matrix:\n", A)
print("Reconstructed Matrix:\n", A_approx)


Original Matrix:
 [[1 0 0 0 2]
 [0 0 3 0 0]
 [4 0 0 5 0]
 [0 6 0 0 0]]
Reconstructed Matrix:
 [[ 1.00000000e+00  0.00000000e+00  0.00000000e+00  4.84887635e-16
   2.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00  3.00000000e+00  0.00000000e+00
   0.00000000e+00]
 [ 4.00000000e+00  0.00000000e+00  0.00000000e+00  5.00000000e+00
  -6.05606821e-16]
 [-1.38683742e-15  6.00000000e+00  0.00000000e+00 -8.34665832e-16
   6.93418708e-16]]


In [4]:
np.round(A_approx,3)

array([[ 1.,  0.,  0.,  0.,  2.],
       [ 0.,  0.,  3.,  0.,  0.],
       [ 4.,  0.,  0.,  5., -0.],
       [-0.,  6.,  0., -0.,  0.]])

In [5]:
U, S , VT

(array([[ 0.10911677,  0.        ,  0.        , -0.99402894],
        [ 0.        ,  0.        ,  1.        ,  0.        ],
        [ 0.99402894,  0.        ,  0.        ,  0.10911677],
        [ 0.        ,  1.        ,  0.        ,  0.        ]]),
 array([[6.43732001, 0.        , 0.        , 0.        ],
        [0.        , 6.        , 0.        , 0.        ],
        [0.        , 0.        , 3.        , 0.        ],
        [0.        , 0.        , 0.        , 2.13562897]]),
 array([[ 6.34616971e-01,  0.00000000e+00,  0.00000000e+00,
          7.72082898e-01,  3.39013042e-02],
        [-2.31139569e-16,  1.00000000e+00,  0.00000000e+00,
         -1.39110972e-16,  1.15569785e-16],
        [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00,
          0.00000000e+00,  0.00000000e+00],
        [-2.61076179e-01,  0.00000000e+00,  0.00000000e+00,
          2.55467531e-01, -9.30900408e-01]]))

In [6]:
U.shape , S.shape , VT.shape

((4, 4), (4, 4), (4, 5))

Imagine a user-movie rating matrix where rows represent users and columns represent movies. The values in the matrix are ratings given by users to movies, with many entries missing.

Matrix Factorization:

ùëÖ‚âàùëÉùëÑùëá
R‚âàPQ T
 

Where:

ùëÖ R is the user-movie rating matrix.
ùëÉ P is the user-feature matrix (latent factors for users).
ùëÑ Q is the movie-feature matrix (latent factors for movies).

In [7]:
import numpy as np
from sklearn.decomposition import NMF

# User-movie rating matrix (0 indicates missing rating)
R = np.array([[5, 3, 0, 1],
              [4, 0, 0, 1],
              [1, 1, 0, 5],
              [1, 0, 0, 4],
              [0, 1, 5, 4]])

model = NMF(n_components=2, init='random', random_state=0)
P = model.fit_transform(R)
Q = model.components_

# Predicted ratings
R_pred = np.dot(P, Q)

print("Original Ratings:\n", R)
print("Predicted Ratings:\n", np.round(R_pred, 2))


Original Ratings:
 [[5 3 0 1]
 [4 0 0 1]
 [1 1 0 5]
 [1 0 0 4]
 [0 1 5 4]]
Predicted Ratings:
 [[5.26 1.99 0.   1.46]
 [3.5  1.33 0.   0.97]
 [1.31 0.94 1.95 3.95]
 [0.98 0.72 1.53 3.08]
 [0.   0.65 2.84 5.22]]


2. Data Compression
Example: Reducing the dimensionality of data while preserving important information.

Matrix Factorization:

A‚âàUŒ£V T
 

Where:

A is the original data matrix.
U, Œ£, and VT are the decomposed matrices from Singular Value Decomposition (SVD).

In [8]:
import numpy as np

# Original matrix
A = np.random.random((100, 100))

# Perform SVD
U, S, VT = np.linalg.svd(A, full_matrices=False)

# Retain only the top k singular values
k = 10
U_k = U[:, :k]
S_k = np.diag(S[:k])
VT_k = VT[:k, :]

# Reconstructed matrix
A_approx = np.dot(U_k, np.dot(S_k, VT_k))

print("Original Shape:", A.shape)
print("Compressed Shape:", U_k.shape, S_k.shape, VT_k.shape)


Original Shape: (100, 100)
Compressed Shape: (100, 10) (10, 10) (10, 100)


3. Image Processing
Example: Compressing an image using Singular Value Decomposition (SVD).

Matrix Factorization:

A‚âàUŒ£V T
 

Where:

A is the grayscale image matrix.
U, Œ£, and V T  are the decomposed matrices from SVD.

In [10]:
# import numpy as np
# import cv2
# import matplotlib.pyplot as plt

# # Load image and convert to grayscale
# image = cv2.imread('example.jpg', cv2.IMREAD_GRAYSCALE)

# # Perform SVD
# U, S, VT = np.linalg.svd(image, full_matrices=False)

# # Retain only the top k singular values
# k = 50
# U_k = U[:, :k]
# S_k = np.diag(S[:k])
# VT_k = VT[:k, :]

# # Reconstructed image
# image_approx = np.dot(U_k, np.dot(S_k, VT_k))

# # Display original and compressed image
# plt.figure(figsize=(10, 5))
# plt.subplot(1, 2, 1)
# plt.title('Original Image')
# plt.imshow(image, cmap='gray')

# plt.subplot(1, 2, 2)
# plt.title('Compressed Image')
# plt.imshow(image_approx, cmap='gray')
# plt.show()


4. Topic Modeling
Example: Extracting topics from a collection of documents.

Matrix Factorization:

X‚âàWH

Where:

X is the term-document matrix.
ùëä is the term-topic matrix.
ùêª  is the topic-document matrix.

In [11]:
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

# Sample documents
documents = [
    "I love reading about science and technology.",
    "Mathematics is a fundamental part of science.",
    "The future of technology is bright.",
    "I enjoy reading books on mathematics and science.",
    "Science and technology are interconnected."
]

# TF-IDF Vectorization
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)

# Perform NMF
nmf = NMF(n_components=2, random_state=0)
W = nmf.fit_transform(X)
H = nmf.components_

# Display topics
terms = vectorizer.get_feature_names_out()
for i, topic in enumerate(H):
    print(f"Topic {i}:")
    print(" ".join([terms[i] for i in topic.argsort()[:-6:-1]]))


Topic 0:
mathematics science fundamental enjoy books
Topic 1:
technology interconnected science future bright
