<a href="https://colab.research.google.com/github/MusabUmama/Movie_Recommendation_system/blob/main/Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collaborative Movie Filtering

**The Dataset**

There are 4 datasets including movies, ratings, tags and links.


* Movies Dataset: This dataset contains information about movies, including movie
IDs, titles, and genres.

* Ratings Dataset: This dataset contains user ratings for movies, including user IDs, movie IDs, ratings, and timestamps.

* Tags Dataset: This dataset contains user-generated tags for movies, including user IDs, movie IDs, tags, and timestamps.

* Links Dataset: This dataset contains links between movie IDs in the dataset and external databases (IMDb and TMDB).

# **User-based Movie Filtering**

* The system makes recommendations based on similar users



In [119]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [120]:
import pandas as pd

In [121]:
# Importing the datasets
movies_data = pd.read_csv("/content/movies.csv")
ratings_data = pd.read_csv("/content/ratings.csv")
tags_data = pd.read_csv("/content/tags.csv")

In [122]:
movies_data.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [123]:
ratings_data.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [124]:
# Deleting the null rows
movies_data.dropna(inplace=True)
ratings_data.dropna(inplace=True)
tags_data.dropna(inplace=True)

# Deleting duplicate rows
movies_data.drop_duplicates(subset='movieId', keep='first')

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [125]:
movies_data.dtypes

movieId     int64
title      object
genres     object
dtype: object

In [126]:
ratings_data.dtypes

userId         int64
movieId        int64
rating       float64
timestamp      int64
dtype: object

In [127]:
tags_data.dtypes

userId        int64
movieId       int64
tag          object
timestamp     int64
dtype: object

In [128]:
# Merging the Ratings and Tags datasets based on 'movieId' and 'userId'
user_interactions_data = pd.merge(ratings_data, tags_data, on=['userId', 'movieId'], how='outer')

# Merging the user_interactions dataframe with the Movies dataset based on 'movieId' to create merged data frame
merged_data = pd.merge(user_interactions_data, movies_data, on='movieId', how='left')

In [129]:
merged_data.head()

Unnamed: 0,userId,movieId,rating,timestamp_x,tag,timestamp_y,title,genres
0,1,1,4.0,964982703.0,,,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247.0,,,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224.0,,,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815.0,,,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931.0,,,"Usual Suspects, The (1995)",Crime|Mystery|Thriller


The merged data frame has been created using relevant features from all the datasets.


In [130]:
from sklearn.metrics.pairwise import cosine_similarity

In [131]:
# Creating a user-item interaction matrix
interaction_matrix = pd.pivot_table(merged_data, values='rating', index='userId', columns='movieId', fill_value=0)

In [132]:
# Calculating the user similarity scores using cosine similarity
user_similarity_scores = cosine_similarity(interaction_matrix)

In [133]:
# Creating a data frame to store user similarity scores
user_similarity_data = pd.DataFrame(user_similarity_scores, index=interaction_matrix.index, columns=interaction_matrix.index)

# Replacing the diagonal values with zeros (self-similarity scores)
user_similarity_data.values[[range(user_similarity_data.shape[0])]*2] = 0

In [134]:
interaction_matrix.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,4.0,0.0,0.0,4.0,0.0,0,0.0,0.0,...,0.0,0,0,0,0.0,0,0.0,0.0,0.0,0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,...,0.0,0,0,0,0.0,0,0.0,0.0,0.0,0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,...,0.0,0,0,0,0.0,0,0.0,0.0,0.0,0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,...,0.0,0,0,0,0.0,0,0.0,0.0,0.0,0
5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,...,0.0,0,0,0,0.0,0,0.0,0.0,0.0,0


In [135]:
user_similarity_data.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [136]:
import numpy as np

In [137]:
# Choosing a target user
target_user_id = 1

In [138]:
# Finding the most similar users to the target user in descending order
similar_users = user_similarity_data[target_user_id].sort_values(ascending=False)

In [139]:
similar_users

userId
1      0.0
410    0.0
403    0.0
404    0.0
405    0.0
      ... 
205    0.0
206    0.0
207    0.0
208    0.0
610    0.0
Name: 1, Length: 610, dtype: float64

In [140]:
# Geting the movies that the user has not rated
rated_movies = interaction_matrix.loc[target_user_id]
unrated_movies = rated_movies[rated_movies == 0].index

In [141]:
# Calculating the weighted sum of ratings by similar users for unrated movies
unrated_movie_scores = interaction_matrix.loc[similar_users.index, unrated_movies].T.dot(similar_users)

In [142]:
# Sorting the recommended movies by their scores in descending order
recommended_movies_id = unrated_movie_scores.sort_values(ascending=False)

In [143]:
recommended_movies_id

movieId
2         0.0
55156     0.0
55190     0.0
55205     0.0
55207     0.0
         ... 
4624      0.0
4625      0.0
4626      0.0
4628      0.0
193609    0.0
Length: 9492, dtype: float64

In [144]:
recommended_movie_titles = merged_data[merged_data['movieId'].isin(recommended_movies_id.index)]['title']

In [145]:
recommended_movie_titles

232                        Shawshank Redemption, The (1994)
234                                Good Will Hunting (1997)
236                                Kill Bill: Vol. 1 (2003)
237                                       Collateral (2004)
238       Talladega Nights: The Ballad of Ricky Bobby (2...
                                ...                        
102879                  City of God (Cidade de Deus) (2002)
102880                                     Daredevil (2003)
102881                                     Daredevil (2003)
102882    Mary Shelley's Frankenstein (Frankenstein) (1994)
102883                               Shame (Skammen) (1968)
Name: title, Length: 86132, dtype: object

Some movies are recommended multiple times in the list due to multiple similar users rating those movies highly.

In [146]:
# List to store unique movies
unique_movies = []

In [147]:
# Adding the movies to unique list
for movie_id in recommended_movie_titles:
    if movie_id not in unique_movies:
        unique_movies.append(movie_id)

In [155]:
# Top 10 movies recommended
print("Top 10 recommendations:\n")
for movie in range(10):
  print(unique_movies[movie])

Top 10 recommendations:

Shawshank Redemption, The (1994)
Good Will Hunting (1997)
Kill Bill: Vol. 1 (2003)
Collateral (2004)
Talladega Nights: The Ballad of Ricky Bobby (2006)
Departed, The (2006)
Dark Knight, The (2008)
Step Brothers (2008)
Inglourious Basterds (2009)
Zombieland (2009)
