# Movie Recommendation Using Singular Value Decopotision

# Purpose:

1. On basis of available  __Movie and Rattings data__ recommendation to user Using Unsupervise Machine Learning.

2. Use the __Singular Value Decomposition__ Unsupervise Machine Learning for Recommendation of Movies to User.

# Steps:

Step 1:-Importing the dataset

Step2:-Summarize Dataset

Step3:-Creating Rating Matrix 

Step4:-Training the algorithm

Step5:-Computing SVD(Singular Value Decomposition

Step6:Calculate cosine similarity, sort by most similar and return the top N

Step7:Select k principal components to represent the movies, a movie_id to find recommendations and print the top_n results

# Import Libraries

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings 
warnings.filterwarnings('ignore')

# Step1

# Import DataSet

## Importing and parsing the dataset as rating and movie details

In [11]:
Rating_data=pd.io.parsers.read_csv("E:\\Data Science\\ratings.dat",names=['user_id','movie_id','rating','time'],engine='python',delimiter='::')

# Step2

# Summarize the Dataset

In [13]:
Rating_data.head()

Unnamed: 0,user_id,movie_id,rating,time
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [29]:
Rating_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000209 entries, 0 to 1000208
Data columns (total 4 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   user_id   1000209 non-null  int64
 1   movie_id  1000209 non-null  int64
 2   rating    1000209 non-null  int64
 3   time      1000209 non-null  int64
dtypes: int64(4)
memory usage: 30.5 MB


In [30]:
Rating_data.describe()

Unnamed: 0,user_id,movie_id,rating,time
count,1000209.0,1000209.0,1000209.0,1000209.0
mean,3024.512,1865.54,3.581564,972243700.0
std,1728.413,1096.041,1.117102,12152560.0
min,1.0,1.0,1.0,956703900.0
25%,1506.0,1030.0,3.0,965302600.0
50%,3070.0,1835.0,4.0,973018000.0
75%,4476.0,2770.0,4.0,975220900.0
max,6040.0,3952.0,5.0,1046455000.0


In [15]:
Movie_data=pd.io.parsers.read_csv(r"E:\Data Science\movies.dat",names=['movie_id','title','genre'],engine='python',delimiter='::')

In [16]:
Movie_data.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [27]:
Movie_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3883 entries, 0 to 3882
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  3883 non-null   int64 
 1   title     3883 non-null   object
 2   genre     3883 non-null   object
dtypes: int64(1), object(2)
memory usage: 91.1+ KB


In [28]:
Movie_data.describe()

Unnamed: 0,movie_id
count,3883.0
mean,1986.049446
std,1146.778349
min,1.0
25%,982.5
50%,2010.0
75%,2980.5
max,3952.0


# Step3


# Creating Rating Matrix of shape mxu

In [25]:
rating_matrix=np.ndarray(shape=(np.max(Rating_data.movie_id.values),np.max(Rating_data.user_id.values)),dtype=np.uint8)

In [32]:
rating_matrix[Rating_data.movie_id.values-1,Rating_data.user_id.values-1]=Rating_data.rating.values

In [33]:
print(rating_matrix)

[[5 0 0 ... 0 0 3]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


# Step4

# Normalization

In [43]:
NormalizeMatrix=(rating_matrix-np.asarray([(np.mean(rating_matrix,1))]).T).T/(np.sqrt(rating_matrix.shape[0] - 1))
print(NormalizeMatrix)

[[ 0.05685934 -0.00591061 -0.00379817 ... -0.00052152 -0.0004109
  -0.00386402]
 [-0.02268632 -0.00591061 -0.00379817 ... -0.00052152 -0.0004109
  -0.00386402]
 [-0.02268632 -0.00591061 -0.00379817 ... -0.00052152 -0.0004109
  -0.00386402]
 ...
 [-0.02268632 -0.00591061 -0.00379817 ... -0.00052152 -0.0004109
  -0.00386402]
 [-0.02268632 -0.00591061 -0.00379817 ... -0.00052152 -0.0004109
  -0.00386402]
 [ 0.02504108 -0.00591061 -0.00379817 ... -0.00052152 -0.0004109
  -0.00386402]]


# Step5

# Computing SVD

In [42]:
U,S,V=np.linalg.svd(NormalizeMatrix)

# Step6

# Calculate cosine similarity, sort by most similar and return the top N

In [47]:
def similar(rating_data,movie_id,top_n):
    index=movie_id-1 # Movie id Starts from 1
    movie_row=raring_data[index,:]
    magnitude=np.sqrt(np.einsum('ij,ij ->i',rating_data,rating_data)) #Einstein summation | traditional matrix multiplication and is equivalent to np.matmul
    similarity=np.dot(movie_row,rating_data.T)/(magnitude[index]*magnitude)
    sort_indexes=np.argsort(-similarity) #perform an indirect sort along the given axis(Last axis)
    return sort_indexes[:top_n]


# Step7

# Select k principal components to represent the movies, a movie_id to find recommendations and print the top_n results

In [51]:
k = 50
movie_id = 6
top_n = 5

sliced = V.T[:, :k] # representative data
indexes = similar(sliced, movie_id, top_n)

print('Recommendations for Movie {0}: \n'.format(
Movie_data[Movie_data.movie_id == movie_id].title.values[0]))
for id in indexes + 1:
    print(Movie_data[Movie_data.movie_id == id].title.values[0])

Recommendations for Movie Heat (1995): 

Heat (1995)
Ronin (1998)
True Romance (1993)
Out of Sight (1998)
Cop Land (1997)
