# Practical No 4 : Implement the SVD algorithm and  analyze it.

### Singular Value Decomposition (SVD)

 SVD is a matrix factorization technique that decomposes a matrix into three matrices: U, Σ, and V^T. This decomposition is useful for various applications, including dimensionality reduction and feature extraction.

### Approach:

**Decomposition:** Decompose the matrix A into three matrices: A = UΣV^T, where U and V are orthogonal matrices, and Σ is a diagonal matrix containing singular values.

**Dimensionality Reduction:** Reduce the dimensionality of the matrix by selecting the top k singular values and corresponding columns of U and V.

**Feature Extraction:** Extract latent features from the reduced matrix.

By reducing the dimensionality of the data, SVD can improve the performance of various machine learning algorithms, including recommender systems.




In [None]:
def transpose(matrix):
    return [[matrix[j][i] for j in range(len(matrix))] for i in range(len(matrix[0]))]

In [None]:
def multiplyMatrices(A, B):
    result = [[sum(A[i][k] * B[k][j] for k in range(len(B))) for j in range(len(B[0]))] for i in range(len(A))]
    return result

In [None]:
def eigSymmetric(matrix):
    n = len(matrix)
    eigenvalues = [matrix[i][i] for i in range(n)]
    eigenvectors = [[1 if i == j else 0 for i in range(n)] for j in range(n)]
    return eigenvalues, eigenvectors

In [None]:
def svd(A):
    AT = transpose(A)
    # A^T.A and A.A^T
    ATA = multiplyMatrices(AT, A)
    AAT = multiplyMatrices(A, AT)

    eigenvalues_V, V = eigSymmetric(ATA)
    eigenvalues_U, U = eigSymmetric(AAT)

    Sigma = [[(eigenvalues_U[i] ** 0.5 if i == j else 0) for j in range(len(U))] for i in range(len(V))]

    return U, Sigma, V

In [None]:
A = [[1, 1], [7, 7]]

In [None]:
U, Sigma, V = svd(A)

print("Left Singular Vectors (U):", U)
print("Singular Values (Sigma):", Sigma)
print("Right Singular Vectors (V):", V)

Left Singular Vectors (U): [[1, 0], [0, 1]]
Singular Values (Sigma): [[1.4142135623730951, 0], [0, 9.899494936611665]]
Right Singular Vectors (V): [[1, 0], [0, 1]]


In [None]:
ans = np.dot(U, Sigma)
ans = np.dot(ans, V)
print("SVD of A: ", ans)

SVD of A:  [[1.41421356 0.        ]
 [0.         9.89949494]]


In [4]:
import numpy as np
import pandas as pd

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
movies = pd.read_csv('/content/drive/MyDrive/Recommendation System/movies.csv')
ratings = pd.read_csv('/content/drive/MyDrive/Recommendation System/ratings.csv')

In [6]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [7]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [8]:
n_users = ratings.userId.unique().shape[0]
n_movies = ratings.movieId.unique().shape[0]
print('Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_movies))

Number of users = 610 | Number of movies = 9724


In [9]:
raw_ratings_pivot = ratings.pivot(index = 'userId', columns ='movieId', values = 'rating')

In [14]:
ratings_pivot = raw_ratings_pivot.copy().fillna(0)
ratings_pivot.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,4.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
user_ratings_mean = np.mean(ratings_pivot.values, axis=1)
user_ratings_demeaned = ratings_pivot.values - user_ratings_mean.reshape(-1, 1)
user_ratings_demeaned

array([[ 3.89582476, -0.10417524,  3.89582476, ..., -0.10417524,
        -0.10417524, -0.10417524],
       [-0.01177499, -0.01177499, -0.01177499, ..., -0.01177499,
        -0.01177499, -0.01177499],
       [-0.00976964, -0.00976964, -0.00976964, ..., -0.00976964,
        -0.00976964, -0.00976964],
       ...,
       [ 2.23215755,  1.73215755,  1.73215755, ..., -0.26784245,
        -0.26784245, -0.26784245],
       [ 2.98755656, -0.01244344, -0.01244344, ..., -0.01244344,
        -0.01244344, -0.01244344],
       [ 4.50611888, -0.49388112, -0.49388112, ..., -0.49388112,
        -0.49388112, -0.49388112]])

In [15]:
from scipy.sparse.linalg import svds

In [16]:
U, Sigma, VT = svds(user_ratings_demeaned, k=10)

In [17]:
sigma = np.diag(Sigma)

In [20]:
all_user_predicted_ratings = np.dot(np.dot(U, sigma), VT) + user_ratings_mean.reshape(-1, 1)
all_user_predicted_ratings

array([[ 2.83580926,  0.92840164,  0.96771846, ..., -0.01899251,
        -0.01899251, -0.02889362],
       [ 0.19067575, -0.02629404, -0.02603574, ...,  0.00970949,
         0.00970949,  0.0171218 ],
       [ 0.03335256,  0.00868281,  0.01879297, ...,  0.00855682,
         0.00855682,  0.00662988],
       ...,
       [ 2.83634752,  1.93957034,  1.83140652, ..., -0.05033501,
        -0.05033501, -0.0042988 ],
       [ 0.83865362,  0.64825045,  0.31045898, ...,  0.00741363,
         0.00741363,  0.00822376],
       [ 0.82137275,  2.77356798, -0.29776494, ...,  0.08697474,
         0.08697474,  0.11876479]])

In [19]:
preds = pd.DataFrame(all_user_predicted_ratings, columns = ratings_pivot.columns)
preds.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
0,2.835809,0.928402,0.967718,-0.024039,0.221835,1.724209,0.12674,-0.013477,0.154454,2.017108,...,-0.018993,-0.017314,-0.020671,-0.020671,-0.018993,-0.020671,-0.018993,-0.018993,-0.018993,-0.028894
1,0.190676,-0.026294,-0.026036,0.005521,0.028316,0.088451,-0.061647,0.008842,-0.006453,-0.070653,...,0.009709,0.008904,0.010515,0.010515,0.009709,0.010515,0.009709,0.009709,0.009709,0.017122
2,0.033353,0.008683,0.018793,0.003493,-0.014331,0.075373,-0.015139,0.0041,0.015454,0.065764,...,0.008557,0.008553,0.008561,0.008561,0.008557,0.008561,0.008557,0.008557,0.008557,0.00663
3,1.558919,0.275447,0.271616,0.043859,0.183769,0.273353,0.346929,-0.054245,-0.036465,0.068682,...,-0.017409,-0.01727,-0.017547,-0.017547,-0.017409,-0.017547,-0.017409,-0.017409,-0.017409,-0.026551
4,1.272888,0.991241,0.42005,0.122955,0.535151,0.75333,0.634397,0.11759,0.110667,1.151538,...,-0.003708,-0.003825,-0.003592,-0.003592,-0.003708,-0.003592,-0.003708,-0.003708,-0.003708,-0.004525


# Analysis

SVD provides a robust approach to dimensionality reduction, leading to significant computational efficiency in recommender systems. However, it can be sensitive to noise and missing data, which may impact the accuracy of predictions.

#Conclusion

Singular Value Decomposition (SVD) offers a powerful technique for dimensionality reduction in recommender systems. By decomposing the user-item rating matrix into three matrices, SVD captures the underlying latent factors that influence user preferences and item characteristics. This allows for efficient representation of the data while preserving key information for accurate prediction of user ratings and personalized recommendations.