# Method used - cosine similarity(KNN) and Collaborative filtering

Why cosine Similarity?
* cosine similarity has the ability to handle variable length data
* cosine similarity considers the frequently occurring words in the text document and will help in yielding higher similarity scores for the text data

Why Collaborative filtering?
* Collaborative filtering models can recommend an item to user A based on the interests of a similar user B. Furthermore, the embeddings can be learned automatically

##Overview -
* Input the title
* Find the movieId corresponding to the title(best match of the title)
* Find similar movies based on similar-users' rating and how much it stands out against the average user
* Display the recommendation

In [49]:
import string
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

movies = pd.read_csv("movies.csv")
ratings = pd.read_csv("ratings.csv")
vectorizer = TfidfVectorizer(ngram_range=(1,2))
tfidf = vectorizer.fit_transform(movies["title"])

def search(title):

    query_vec = vectorizer.transform([title])
    similarity = cosine_similarity(query_vec, tfidf).flatten()
    indices = np.argpartition(similarity, -5)[-5:]
    results = movies.iloc[indices].iloc[::-1]
    return results

#Taking input from the user
title_input = input("Enter title: ")
title_input = string.capwords(title_input)
searchresult = search(title_input)

search_titlelist = list(searchresult['title']) #list of titles obtained on searching
movie_titlelist = list(movies['title']) #list of all movie titles

#finding the movieId of the movie input
movieId_ofinput = movies['movieId'][movie_titlelist.index(search_titlelist[0])]

def find_similar_movies(movie_id):

    #To find the users who have given similar ratings as us for the given movieId
    similar_users = ratings[(ratings["movieId"] == movie_id) & (ratings["rating"] > 4)]["userId"].unique()
    similar_user_recs = ratings[(ratings["userId"].isin(similar_users)) & (ratings["rating"] > 4)]["movieId"]
    similar_user_recs = similar_user_recs.value_counts() / len(similar_users)
    similar_user_recs = similar_user_recs[similar_user_recs > .10]

    #To find the rating given by all users for movieId
    all_users = ratings[(ratings["movieId"].isin(similar_user_recs.index)) & (ratings["rating"] > 4)]
    all_user_recs = all_users["movieId"].value_counts() / len(all_users["userId"].unique())

    #Obtain the ratio of recommendation score given by similar user and all users
    #Higher the ratio, more suitable it is to be recommended
    rec_percentages = pd.concat([similar_user_recs, all_user_recs], axis=1)
    rec_percentages.columns = ["similar", "all"]
    rec_percentages["score"] = rec_percentages["similar"] / rec_percentages["all"]
    rec_percentages = rec_percentages.sort_values("score", ascending=False)
    return rec_percentages.head(10).merge(movies, left_index=True, right_on="movieId")[["title", "genres", "score"]]

#Invoking the function with the movieId of the movie name entered by the user
recommendation = find_similar_movies(movieId_ofinput)

if recommendation.size > 0:
  print("\nYOU MAY ALSO LIKE:")
  display(recommendation)
else:
  print("Sorry! Couldn't find a suitable movie")

Enter title: iron man

YOU MAY ALSO LIKE:


Unnamed: 0,title,genres,score
6743,Iron Man (2008),Action|Adventure|Sci-Fi,18.322581
7324,Iron Man 2 (2010),Action|Adventure|Sci-Fi|Thriller|IMAX,18.322581
8301,"Day of the Doctor, The (2013)",Adventure|Drama|Sci-Fi,14.658065
7620,X-Men: First Class (2011),Action|Adventure|Sci-Fi|Thriller|War,14.658065
8151,Iron Man 3 (2013),Action|Sci-Fi|Thriller|IMAX,13.087558
8425,X-Men: Days of Future Past (2014),Action|Adventure|Sci-Fi,12.825806
8699,Untitled Spider-Man Reboot (2017),Action|Adventure|Fantasy,12.215054
8695,Guardians of the Galaxy 2 (2017),Action|Adventure|Sci-Fi,11.451613
6746,Taken (2008),Action|Crime|Drama|Thriller,11.275434
8053,"Hobbit: An Unexpected Journey, The (2012)",Adventure|Fantasy|IMAX,10.688172
