<a href="https://colab.research.google.com/github/Rashijain07/rashi_jain/blob/main/MovieRecommendationSystem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Importing the basic libraries

In [1]:
import numpy as np
import pandas as pd

Importing & Parsing the dataset as ratings and movies details

In [7]:
ratingData= pd.io.parsers.read_csv("ratings.dat",
                                   names=["user_id", "movie_id", "rating", "time"],
                                   engine="python", delimiter="::")

movieData = pd.read_csv("movies.dat",
                        names=["movie_id", "title", "genre"],
                        engine="python",
                        delimiter="::",
                        encoding="latin-1")
print(ratingData)

         user_id  movie_id  rating       time
0              1      1193       5  978300760
1              1       661       3  978302109
2              1       914       3  978301968
3              1      3408       4  978300275
4              1      2355       5  978824291
...          ...       ...     ...        ...
1000204     6040      1091       1  956716541
1000205     6040      1094       5  956704887
1000206     6040       562       5  956704746
1000207     6040      1096       4  956715648
1000208     6040      1097       4  956715569

[1000209 rows x 4 columns]


Create the ratings matrix of shape(mxu)

In [9]:
ratingMatrix= np.ndarray(
    shape=(np.max(ratingData.movie_id.values), np.max(ratingData.user_id.values)),
    dtype=np.uint8)
ratingMatrix[ratingData.movie_id.values-1, ratingData.user_id.values-1]= ratingData.rating.values
print(ratingMatrix)

[[  5 112  22 ... 122   0   3]
 [176 144 185 ... 122   0   0]
 [ 48 142 186 ... 122   0   0]
 ...
 [236  12   0 ...   0   0   0]
 [239  12   0 ...   0   0   0]
 [240  12   0 ...   0   0   0]]


Subtract Mean off- Normalisation

In [17]:
import numpy as np

# Assuming ratingMatrix is already defined
mean_ratings = np.mean(ratingMatrix, axis=1)
normalizedMatrix = ratingMatrix - mean_ratings[:, np.newaxis]
print(normalizedMatrix)


[[-64.16655629  42.83344371 -47.16655629 ...  52.83344371 -69.16655629
  -66.16655629]
 [ 80.55877483  48.55877483  89.55877483 ...  26.55877483 -95.44122517
  -95.44122517]
 [-50.08195364  43.91804636  87.91804636 ...  23.91804636 -98.08195364
  -98.08195364]
 ...
 [205.12549669 -18.87450331 -30.87450331 ... -30.87450331 -30.87450331
  -30.87450331]
 [208.01423841 -18.98576159 -30.98576159 ... -30.98576159 -30.98576159
  -30.98576159]
 [210.38261589 -17.61738411 -29.61738411 ... -29.61738411 -29.61738411
  -29.61738411]]


Computing SVD

In [18]:
A= normalizedMatrix.T / np.sqrt(ratingMatrix.shape[0]-1)
U, S, V = np.linalg.svd(A)

Calculate cosine similarity, sort by most similar and return the top N

In [20]:
def similar(ratingData, movie_id, top_n):
  index=movie_id-1 #movie id starts from 1
  movie_row=ratingData[index, :]
  magnitude=np.sqrt(np.einsum("ij, ij -> i", ratingData, ratingData)) #einstien summation
  similarity=np.dot(movie_row, ratingData.T)
  sort_indexes=np.argsort(-similarity)
  return sort_indexes[:top_n]

Select k principal components to represent the movies, a movie_id to find recommendations and print the top_n results

In [21]:
k=50
movie_id=12
top_n=5

sliced= V.T[:,:k] #representative data
indexes= similar(sliced, movie_id, top_n)

print("Recommendations for Movie {0}: \n".format(
    movieData[movieData.movie_id==movie_id].title.values[0]))
for id in indexes + 1:
  print(movieData[movieData.movie_id==id].title.values[0])

Recommendations for Movie Dracula: Dead and Loving It (1995): 

Gumby: The Movie (1995)
Dracula: Dead and Loving It (1995)
Sliver (1993)
Chasers (1994)
Bushwhacked (1995)
