## Movies_Recommendation_System
   - By Debjeet Das
 
## To Checkout all the Interesting ML Projects click on the below link :
   - https://github.com/Debjeet-Das/Machine_Learning_Projects

In [12]:
# Loading the necessary libraries to be used in the following steps
# on anaconda prompt write --> conda install -c conda-forge scikit-surprise
from surprise import KNNWithMeans,Dataset,accuracy,Reader
from surprise.model_selection import train_test_split

In [14]:
# Uploading the ratings dataset.
import pandas as pd 
ratings = pd.read_csv("ratings_assignment.csv")

# The range of ratings in this dataset is between 0 - 5
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


## Recommendation System Predicting the User Rating :
  - Here we are building a model for movie ratings recommendation system.
  - We are using the cosine distance as a parameter to see the similarities between two points.
  - So,least cosine distance will say that the two points are closely related i.e they are similar.
  - The range of ratings given to the movies in this dataset is between 0.5 - 5
  - We are splitting the dataset into trainset and testset.
  - The output of the split will give us two sets with datatypes as trainset & testset respectively.
  - While Training the model it will consider the cosine distances on the basis of which it will select k neighbors.
  - The it will compute a mean of those selected k neighbors and will predict a rating which is equal to the mean of the k neighbours.
  
## Predicting the Movie Rating for a UserID:
  - Taking the userID and MovieID as user inputs from the keyboard.
  - Showing the predicted rating that he/she would give if he/she would have watched the movie.
  - This prediction of ratings is done based on the previous ratings he/she must have given to the movie of same genres.

In [15]:
# predicting the user rating for a movie


# Setting the range scale from 0.5 - 5 for this dataset
reader = Reader(rating_scale=(0.5,5))

#  creating a dataframe which is to be given to the train_test_split for spliting the data
data= Dataset.load_from_df(ratings[["userId","movieId", "rating"]],reader)

# spliting the data into trainset and testset
[trainset, testset] = train_test_split(data,test_size=0.3,shuffle=True)


# here we are using the KNNWithMeans algorithm to train our recommender system which will use cosine distance for similarity
recom=KNNWithMeans(k=9,sim_options={"name":"cosine","user_based":True})
recom.fit(trainset)
test_pred = recom.test(testset)
rmse = accuracy.rmse(test_pred)

# Taking the UserID as a user input from the keyboard and displaying the predicted rating of the user
user_id = int(input("Enter UserID "))
movie_id = int(input("Enter MovieID "))
r=recom.predict(user_id,movie_id)
print("Predicted Movie Rating for the UserID {} is %.2f " .format(user_id) % r.est)

Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 0.9247
Enter UserID 5
Enter MovieID 6
Predicted Movie Rating for the UserID 5 is 3.67 


##### Loading Movies dataset:


In [16]:
import numpy as np
movies = pd.read_csv("movies_assignment.csv")
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [17]:
# Merging ratings & movies Datframes
movie_rating = pd.merge(ratings,movies,left_on="movieId",right_on="movieId")

In [18]:
movie_rating

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
5,18,1,3.5,1455209816,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
6,19,1,4.0,965705637,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
7,21,1,3.5,1407618878,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
8,27,1,3.0,962685262,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
9,31,1,5.0,850466616,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


In [6]:
# Dropping the unwanted columns from the merged Dataframe
movie_rating = movie_rating.drop(["timestamp","genres"],axis= 1)

In [7]:
# merged Dataframe with movie titles and movieID
movie_rating

Unnamed: 0,userId,movieId,rating,title
0,1,1,4.0,Toy Story (1995)
1,5,1,4.0,Toy Story (1995)
2,7,1,4.5,Toy Story (1995)
3,15,1,2.5,Toy Story (1995)
4,17,1,4.5,Toy Story (1995)
5,18,1,3.5,Toy Story (1995)
6,19,1,4.0,Toy Story (1995)
7,21,1,3.5,Toy Story (1995)
8,27,1,3.0,Toy Story (1995)
9,31,1,5.0,Toy Story (1995)


In [8]:
# Finding the top 10 movies on the basis of sum of ratings 
movie_rating.pivot_table(index="movieId",values="rating",aggfunc=sum).sort_values(by="rating",ascending=False).head(10).reset_index()

Unnamed: 0,movieId,rating
0,318,1404.0
1,356,1370.0
2,296,1288.5
3,2571,1165.5
4,593,1161.0
5,260,1062.0
6,110,955.5
7,2959,931.5
8,527,929.5
9,480,892.5


# Main Recommendation system :
   - Here we are first Taking the UserID as and input from the keyboard.
   - Then we are first segregating the unique movieID's from the dataframe and storing it in unique_iid.
   - After this we will check with movies that has been already watched by the User having the input UserID and storing it in u_iid.
   - Now we will create series which will contain only the movies which the user have not watched and store it in not_watched_iid.
   - We are creating a testset which we will use for the prediction of ratings for all the unwatched movies of the User.
   - Then we will create a Dataframe with movieID and Predicted Ratings as the columns stored in it named rating DF.
   - Now we will convert the movieId as index of the dataframe which will make it easier to get the top10 movies on the basis of ratings.
   - Sorting the values by ratings and collecting the top 10 movies for the same user.
   - Displaying the titles of the movies from the top10 movies on the basis of predicted ratings of the same UserID .
   - Also recommending the movies which he or she has not watched yet.

In [9]:
# Recommending top ten movies on the basis of the predicted ratings by the user

UserID = int(input("Enter the userID : "))

# unique MovieID throughout the dataset
unique_iid = movie_rating["movieId"].unique()

# movies which the user has watched
u_iid = movie_rating[movie_rating["userId"]==UserID]["movieId"]

# movies which the user has not watched
not_watched_iid = np.setdiff1d(unique_iid,u_iid)
not_watched_iid

# We are creating a testset which we will use for the prediction of ratings for all the unwatched movies
testset = [[UserID,i,5] for i in not_watched_iid]
predicted_movies = recom.test(testset)
predicted_movies


# Creating a dataframe with MOvieID and Predicted ratings as columns in it.
mov_id=[]
rate = []

for i in predicted_movies:
    mov_id.append(i.iid)
    rate.append(i.est)

ratingDF = pd.DataFrame({"movieID":mov_id,"ratings":rate})
ratingDF = ratingDF.set_index("movieID")

# sorting the values by ratings and collecting the top 10 movies
recom_movies=ratingDF.sort_values(by="ratings",ascending = False).head(10)
top_10_movie = pd.Series(recom_movies.index) 


# Displaying the titles of the movies from the top10 movies on the basis of predicted ratings
# Also recommending the movies whuch he or she has not watched yet
name = movies[movies["movieId"].isin(top_10_movie)]["title"]
print("Movies that are recommendent are : ",sep="\n")
print(name)

Enter the userID : 56
Movies that are recommendent are : 
2740             Man with the Golden Arm, The (1955)
2947                         Two Family House (2000)
3908                              The Big Bus (1976)
4121       Victory (a.k.a. Escape to Victory) (1981)
6559                                 Ten, The (2007)
7551    Scooby-Doo! Curse of the Lake Monster (2010)
8079                It's Such a Beautiful Day (2012)
8933        Scooby-Doo! and the Samurai Sword (2009)
8936                Scooby-Doo Goes Hollywood (1979)
8941        Larry David: Curb Your Enthusiasm (1999)
Name: title, dtype: object


# Merged DF:
  - Collecting the topten movies and creating a dataframe.
  - Creating another dataframe with average rating  per movie by the users.
  - Merging the top 10 movies with the avg_movie_rating dataframe.
  - Uploading the two datasets with different movie links,tags,genres etc. present in it.
  - Then merging all the Dataframes to get the final dataframe with MovieID, Title, Ratings, Tags, imdbID, Genres in it.

In [10]:
# Collecting the topten movies and creating a dataframe
top_ten_mov = pd.DataFrame(name).reset_index().drop("index",axis=1)

# Creating a dataframe with average rating  per movie 
avg_movie_rating = movie_rating.pivot_table(index=["title","movieId"],values="rating").reset_index()

# merging the top 10 movies with the avg_movie_rating dataframe
top_avg_merged = pd.merge(top_ten_mov,avg_movie_rating,how ="left",on = "title")

# Uploading the two datasets with different movie links and tags present in it
links = pd.read_csv("links_assignment.csv")
tags = pd.read_csv("tags_assignment.csv")

# Then merging the all the Dataframes to get the final dataframe with MovieID, Title, Ratings, Tags, imdbID, Genres in it.
top_avg_tag = pd.merge(top_avg_merged,tags,how ="left",on = "movieId")
top_avg_tag =top_avg_tag.drop(["userId","timestamp"],axis=1)

top_avg_tag_imdb = pd.merge(top_avg_tag,links,on="movieId",how = "left")
top_avg_tag_imdb =top_avg_tag_imdb.drop("tmdbId",axis=1)

top_avg_tag_imdb_genre = pd.merge(top_avg_tag_imdb,movies,how="left",on=["movieId","title"])
top_avg_tag_imdb_genre

Unnamed: 0,title,movieId,rating,tag,imdbId,genres
0,"Man with the Golden Arm, The (1955)",3678,5.0,,48347,Drama
1,Two Family House (2000),3951,5.0,In Netflix queue,202641,Drama
2,The Big Bus (1976),5490,5.0,,74205,Action|Comedy
3,Victory (a.k.a. Escape to Victory) (1981),5915,4.75,,83284,Action|Drama|War
4,"Ten, The (2007)",55020,4.5,,811106,Comedy
5,Scooby-Doo! Curse of the Lake Monster (2010),85295,5.0,,1618435,Adventure|Children|Comedy|Mystery
6,It's Such a Beautiful Day (2012),99764,4.25,moving,2396224,Animation|Comedy|Drama|Fantasy|Sci-Fi
7,It's Such a Beautiful Day (2012),99764,4.25,philosopical,2396224,Animation|Comedy|Drama|Fantasy|Sci-Fi
8,It's Such a Beautiful Day (2012),99764,4.25,surreal,2396224,Animation|Comedy|Drama|Fantasy|Sci-Fi
9,It's Such a Beautiful Day (2012),99764,4.25,weird,2396224,Animation|Comedy|Drama|Fantasy|Sci-Fi


- This is the final dataframe for the userID which was a user input from keyboard.

# Project Overview :

## Recommendation System Predicting the User Rating :
  - Here we are building a model for movie ratings recommendation system.
  - We are using the cosine distance as a parameter to see the similarities between two points.
  - So,least cosine distance will say that the two points are closely related i.e they are similar.
  - The range of ratings given to the movies in this dataset is between 0.5 - 5
  - We are splitting the dataset into trainset and testset.
  - The output of the split will give us two sets with datatypes as trainset & testset respectively.
  - While Training the model it will consider the cosine distances on the basis of which it will select k neighbors.
  - The it will compute a mean of those selected k neighbors and will predict a rating which is equal to the mean of the k neighbours.
  
## Predicting the Movie Rating for a UserID :
  - Taking the userID and MovieID as user inputs from the keyboard.
  - Showing the predicted rating that he/she would give if he/she would have watched the movie.
  - This prediction of ratings is done based on the previous ratings he/she must have given to the movie of same genres.
  
## Main Recommendation system :
   - Here we are first Taking the UserID as and input from the keyboard.
   - Then we are first segregating the unique movieID's from the dataframe and storing it in unique_iid.
   - After this we will check with movies that has been already watched by the User having the input UserID and storing it in u_iid.
   - Now we will create series which will contain only the movies which the user have not watched and store it in not_watched_iid.
   - We are creating a testset which we will use for the prediction of ratings for all the unwatched movies of the User.
   - Then we will create a Dataframe with movieID and Predicted Ratings as the columns stored in it named rating DF.
   - Now we will convert the movieId as index of the dataframe which will make it easier to get the top10 movies on the basis of ratings.
   - Sorting the values by ratings and collecting the top 10 movies for the same user.
   - Displaying the titles of the movies from the top10 movies on the basis of predicted ratings of the same UserID .
   - Also recommending the movies which he or she has not watched yet.
   
## Merged DataFrame :
  - Collecting the topten movies and creating a dataframe.
  - Creating another dataframe with average rating  per movie by the users.
  - Merging the top 10 movies with the avg_movie_rating dataframe.
  - Uploading the two datasets with different movie links,tags,genres etc. present in it.
  - Then merging all the Dataframes to get the final dataframe with MovieID, Title, Ratings, Tags, imdbID, Genres in it.