<a href="https://colab.research.google.com/github/aakanshadalmia/Movie-Recommendation-System/blob/main/Movie_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Movie Search Engine and Recommendation System**

In [10]:
import pandas as pd
import re

In [11]:
#Importing movies data

movies = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/movies_data/movies.csv")


In [12]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy



## Cleaning Movie Titles

**Aim**: Removing any special characters from movie names

In [13]:
def clean_title(title):
    return re.sub("[^a-zA-Z0-9 ]","",title)

movies['cleaned_title'] = movies['title'].apply(clean_title)

## Computing the TF-IDF matrix
**Aim:** Build the matrix using upto two word combinations

In [14]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(ngram_range = (1,2))

tfidf = vectorizer.fit_transform(movies['cleaned_title'])

## **Building the search engine**

**Aim:** Given a movie title as input, return the five most similar movie titles from our database

In [15]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def search(title):   
    title = clean_title(title)
    query_vectorizer = vectorizer.transform([title])

    #Computing cosine similarity between given title and the above calculated tf-idf matrix
    similarity = cosine_similarity(query_vectorizer,tfidf).flatten()

    indices = np.argpartition(similarity,-5)[-5:]
    results = movies.iloc[indices][::-1]
    return results

## **Building an Interactive Search Box**

**Aim:** As soon as the user types in the movie name, return the list of five most similar movie names from the database

In [18]:
import ipywidgets as widgets
from IPython.display import display

movie_input = widgets.Text(value="Toy Story", description='Enter Movie:', disabled = False)

movies_list = widgets.Output()

def on_type(data):
    with movies_list:
        movies_list.clear_output()
        input_title = data.new
        if len(input_title) >= 5:
            display(search(input_title))

movie_input.observe(on_type, names = 'value') 

display(movie_input, movies_list)            

Text(value='Toy Story', description='Enter Movie:')

Output()

In [20]:
# Readings in ratings data

ratings = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/movies_data/ratings.csv")

In [21]:
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,296,5.0,1147880044
1,1,306,3.5,1147868817
2,1,307,5.0,1147868828
3,1,665,5.0,1147878820
4,1,899,3.5,1147868510
...,...,...,...,...
25000090,162541,50872,4.5,1240953372
25000091,162541,55768,2.5,1240951998
25000092,162541,56176,2.0,1240950697
25000093,162541,58559,4.0,1240953434


## Find movie recommendations from similar users

**Aim:** Given a movie, find other movies liked by those people who also liked the movie given as input


In [22]:
#Find similar people who liked a particular movie

movie_id = 1
similar_people = ratings[(ratings["movieId"] == movie_id) & (ratings["rating"] > 4.0)]['userId'].unique()
similar_people

array([    36,     75,     86, ..., 162527, 162530, 162533])

In [23]:
#Find movie recommendations from those similar users

similar_people_movie_recs = ratings[(ratings["userId"].isin(similar_people)) & (ratings['rating'] > 4.0)]
similar_people_movie_recs

Unnamed: 0,userId,movieId,rating,timestamp
5101,36,1,5.0,857131378
5105,36,34,5.0,834413787
5111,36,110,5.0,834412999
5114,36,150,5.0,839928587
5127,36,260,5.0,857131062
...,...,...,...,...
24998854,162533,60069,4.5,1280919889
24998861,162533,67997,4.5,1280920712
24998876,162533,78499,4.5,1281405901
24998884,162533,81591,4.5,1297289876


In [24]:
#Filtering movies that were liked by greater than 10% of similar people

similar_movie_recs = similar_people_movie_recs['movieId']
similar_movie_recs = similar_movie_recs.value_counts() / len(similar_people)
similar_movie_recs = similar_movie_recs[similar_movie_recs > 0.1]
similar_movie_recs

1        1.000000
318      0.445607
260      0.403770
356      0.370215
296      0.367295
           ...   
953      0.103053
551      0.101195
1222     0.100876
745      0.100345
48780    0.100186
Name: movieId, Length: 113, dtype: float64

In [25]:
#Finding % of total people who liked the above recommended movies

all_people = ratings[(ratings['movieId'].isin(similar_movie_recs.index)) & (ratings['rating'] > 4.0)]
all_people_recs = all_people['movieId'].value_counts() / len(all_people['userId'].unique())
all_people_recs

318      0.342220
296      0.284674
2571     0.244033
356      0.235266
593      0.225909
           ...   
551      0.040918
50872    0.039111
745      0.037031
78499    0.035131
2355     0.025091
Name: movieId, Length: 113, dtype: float64

In [34]:
#Creating a recommendation score

rec_percentage = pd.concat([similar_movie_recs,all_people_recs],axis = 1)
rec_percentage.columns = ["Similar","All"]

rec_percentage["Similarity Score"] = rec_percentage["Similar"] / rec_percentage["All"]

In [36]:
#Sorting the recommendations by recommendation score in descending order

rec_percentage = rec_percentage.sort_values("Similarity Score",ascending = False)

rec_percentage

Unnamed: 0,Similar,All,Similarity Score
1,1.000000,0.124728,8.017414
3114,0.280648,0.053706,5.225654
2355,0.110539,0.025091,4.405452
78499,0.152960,0.035131,4.354038
4886,0.235147,0.070811,3.320783
...,...,...,...
2858,0.216724,0.167634,1.292845
296,0.367295,0.284674,1.290232
79132,0.166817,0.131384,1.269693
4973,0.142501,0.112405,1.267747


In [37]:
#Making a dataset with above information and movie titles to form the final recommendations

final_recommendations = rec_percentage.merge(movies, left_index = True, right_on = 'movieId')
final_recommendations

Unnamed: 0,Similar,All,Similarity Score,movieId,title,genres,cleaned_title
0,1.000000,0.124728,8.017414,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,Toy Story 1995
3021,0.280648,0.053706,5.225654,3114,Toy Story 2 (1999),Adventure|Animation|Children|Comedy|Fantasy,Toy Story 2 1999
2264,0.110539,0.025091,4.405452,2355,"Bug's Life, A (1998)",Adventure|Animation|Children|Comedy,Bugs Life A 1998
14813,0.152960,0.035131,4.354038,78499,Toy Story 3 (2010),Adventure|Animation|Children|Comedy|Fantasy|IMAX,Toy Story 3 2010
4780,0.235147,0.070811,3.320783,4886,"Monsters, Inc. (2001)",Adventure|Animation|Children|Comedy|Fantasy,Monsters Inc 2001
...,...,...,...,...,...,...,...
2766,0.216724,0.167634,1.292845,2858,American Beauty (1999),Drama|Romance,American Beauty 1999
292,0.367295,0.284674,1.290232,296,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller,Pulp Fiction 1994
14937,0.166817,0.131384,1.269693,79132,Inception (2010),Action|Crime|Drama|Mystery|Sci-Fi|Thriller|IMAX,Inception 2010
4867,0.142501,0.112405,1.267747,4973,"Amelie (Fabuleux destin d'Amélie Poulain, Le) ...",Comedy|Romance,Amelie Fabuleux destin dAmlie Poulain Le 2001


# **The Final Recommendation Function**
**Aim:** Build a function which takes as input the movie id and returns the movie recommendations based on collaborative filtering

In [43]:
def collaborative_recs(movie_id):
    
    similar_people = ratings[(ratings["movieId"] == movie_id) & (ratings["rating"] > 4.0)]['userId'].unique()

    similar_people_movie_recs = ratings[(ratings["userId"].isin(similar_people)) & (ratings['rating'] > 4.0)]

    similar_movie_recs = similar_people_movie_recs['movieId']
    similar_movie_recs = similar_movie_recs.value_counts() / len(similar_people)
    similar_movie_recs = similar_movie_recs[similar_movie_recs > 0.1]

    all_people = ratings[(ratings['movieId'].isin(similar_movie_recs.index)) & (ratings['rating'] > 4.0)]
    all_people_recs = all_people['movieId'].value_counts() / len(all_people['userId'].unique())

    rec_percentage = pd.concat([similar_movie_recs,all_people_recs],axis = 1)
    rec_percentage.columns = ["Similar","All"]
    rec_percentage["Similarity Score"] = rec_percentage["Similar"] / rec_percentage["All"]
    rec_percentage = rec_percentage.sort_values("Similarity Score",ascending = False)

    final_recommendations = rec_percentage.merge(movies, left_index = True, right_on = 'movieId')
    final_recommendations = final_recommendations[['Similarity Score','title','genres']][:10:]
    
    return final_recommendations.reset_index()

# **Interactive widget for Movie Recommendations**

In [45]:
movie_input = widgets.Text(value="Toy Story", description='Enter Movie:', disabled = False)

recommendations_list = widgets.Output()

def on_type(data):
    with recommendations_list:
        recommendations_list.clear_output()
        input_title = data.new
        if len(input_title) >= 5:
            movie_id = search(input_title).iloc[0]['movieId']
            display(collaborative_recs(movie_id))

movie_input.observe(on_type, names = 'value') 

display(movie_input, recommendations_list)            

Text(value='Toy Story', description='Enter Movie:')

Output()