# <p style="background-color:#8B0000; font-family:newtimeroman;color:#FFF9ED; font-size:150%; text-align:center; border-radius: 15px 50px;"> Item-Based | Content-Based | Hybrid Recommendation Systems</p>

<div style="border-radius:10px; border:#DEB887 solid; padding: 15px; background-color: #FFFAF0; font-size:100%; text-align:left">
    
### Objective
The primary objective of this notebook is to harness the potential of recommendation systems in the context of a movie dataset. We aim to provide insights into how these systems operate, their strengths, and how combining them through a hybrid approach can enhance the accuracy and personalization of movie recommendations.

### Dataset
For this analysis, we leverage "The Movies Dataset," a rich dataset containing information about movies, including titles, overviews, user ratings, and more. By combining content-related features and user interactions, we aim to build effective recommendation systems.

### Methodology
* Item-Based Recommendation: We kick off our exploration with an item-based approach, where we analyze user-item interactions, uncover similarities between movies based on user ratings, and generate recommendations.*

* Content-Based Recommendation: Moving forward, we delve into content-based recommendation, focusing on the textual features of movies, such as overviews. Using techniques like TF-IDF, we aim to recommend movies with similar content.

* Hybrid Recommendation System: The highlight of this analysis is the hybrid recommendation system. By blending the strengths of both item-based and content-based approaches, we strive to provide a holistic and refined recommendation system that caters to a user's preferences more accurately.



# <p style="background-color:#228B22; font-family:newtimeroman;color:#FFF9ED; font-size:100%; text-align:center; border-radius: 15px 50px;"> ⇣ Reading and Cleaning Data ⇣</p>

In [1]:
import pandas as pd

In [2]:
movies = pd.read_csv("/kaggle/input/the-movies-dataset/movies_metadata.csv",
                    usecols=["id","overview","title","vote_average","vote_count","release_date"])

In [3]:
movies.head()

Unnamed: 0,id,overview,release_date,title,vote_average,vote_count
0,862,"Led by Woody, Andy's toys live happily in his ...",1995-10-30,Toy Story,7.7,5415.0
1,8844,When siblings Judy and Peter discover an encha...,1995-12-15,Jumanji,6.9,2413.0
2,15602,A family wedding reignites the ancient feud be...,1995-12-22,Grumpier Old Men,6.5,92.0
3,31357,"Cheated on, mistreated and stepped on, the wom...",1995-12-22,Waiting to Exhale,6.1,34.0
4,11862,Just when George Banks has recovered from his ...,1995-02-10,Father of the Bride Part II,5.7,173.0


In [4]:
movies.isnull().sum()

id                0
overview        954
release_date     87
title             6
vote_average      6
vote_count        6
dtype: int64

In [5]:
movies = movies.dropna()

In [6]:
movies.isnull().sum()

id              0
overview        0
release_date    0
title           0
vote_average    0
vote_count      0
dtype: int64

In [7]:
movies.dtypes

id               object
overview         object
release_date     object
title            object
vote_average    float64
vote_count      float64
dtype: object

In [8]:
movies.duplicated().sum()

28

In [9]:
movies = movies.drop_duplicates()

In [10]:
movies = movies.reset_index(drop=True)

In [11]:
movies.shape

(44407, 6)

In [12]:
ratings = pd.read_csv("/kaggle/input/the-movies-dataset/ratings_small.csv")

In [13]:
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
...,...,...,...,...
99999,671,6268,2.5,1065579370
100000,671,6269,4.0,1065149201
100001,671,6365,4.0,1070940363
100002,671,6385,2.5,1070979663


In [14]:
ratings["date"] = pd.to_datetime(ratings["timestamp"],unit="s")

In [15]:
ratings

Unnamed: 0,userId,movieId,rating,timestamp,date
0,1,31,2.5,1260759144,2009-12-14 02:52:24
1,1,1029,3.0,1260759179,2009-12-14 02:52:59
2,1,1061,3.0,1260759182,2009-12-14 02:53:02
3,1,1129,2.0,1260759185,2009-12-14 02:53:05
4,1,1172,4.0,1260759205,2009-12-14 02:53:25
...,...,...,...,...,...
99999,671,6268,2.5,1065579370,2003-10-08 02:16:10
100000,671,6269,4.0,1065149201,2003-10-03 02:46:41
100001,671,6365,4.0,1070940363,2003-12-09 03:26:03
100002,671,6385,2.5,1070979663,2003-12-09 14:21:03


In [16]:
ratings = ratings.drop("timestamp",axis=1)

In [17]:
ratings

Unnamed: 0,userId,movieId,rating,date
0,1,31,2.5,2009-12-14 02:52:24
1,1,1029,3.0,2009-12-14 02:52:59
2,1,1061,3.0,2009-12-14 02:53:02
3,1,1129,2.0,2009-12-14 02:53:05
4,1,1172,4.0,2009-12-14 02:53:25
...,...,...,...,...
99999,671,6268,2.5,2003-10-08 02:16:10
100000,671,6269,4.0,2003-10-03 02:46:41
100001,671,6365,4.0,2003-12-09 03:26:03
100002,671,6385,2.5,2003-12-09 14:21:03


In [18]:
ratings.isnull().sum()

userId     0
movieId    0
rating     0
date       0
dtype: int64

In [19]:
ratings.duplicated().sum()

0

In [20]:
movies["id"].nunique()

44405

In [21]:
movies = movies.rename(columns={"id":"movieId"})

In [22]:
movies

Unnamed: 0,movieId,overview,release_date,title,vote_average,vote_count
0,862,"Led by Woody, Andy's toys live happily in his ...",1995-10-30,Toy Story,7.7,5415.0
1,8844,When siblings Judy and Peter discover an encha...,1995-12-15,Jumanji,6.9,2413.0
2,15602,A family wedding reignites the ancient feud be...,1995-12-22,Grumpier Old Men,6.5,92.0
3,31357,"Cheated on, mistreated and stepped on, the wom...",1995-12-22,Waiting to Exhale,6.1,34.0
4,11862,Just when George Banks has recovered from his ...,1995-02-10,Father of the Bride Part II,5.7,173.0
...,...,...,...,...,...,...
44402,30840,"Yet another version of the classic epic, with ...",1991-05-13,Robin Hood,5.7,26.0
44403,111109,An artist struggles to finish his work while a...,2011-11-17,Century of Birthing,9.0,3.0
44404,67758,"When one of her hits goes wrong, a professiona...",2003-08-01,Betrayal,3.8,6.0
44405,227506,"In a small town live two brothers, one a minis...",1917-10-21,Satan Triumphant,0.0,0.0


In [23]:
movies.dtypes

movieId          object
overview         object
release_date     object
title            object
vote_average    float64
vote_count      float64
dtype: object

In [24]:
ratings.dtypes

userId              int64
movieId             int64
rating            float64
date       datetime64[ns]
dtype: object

In [25]:
movies["movieId"] = movies["movieId"].astype("int64")

# <p style="background-color:#228B22; font-family:newtimeroman;color:#FFF9ED; font-size:100%; text-align:center; border-radius: 15px 50px;"> ⇣ Item-Based ⇣</p>

In [26]:
df = pd.merge(movies, ratings, on="movieId", how="inner")

In [27]:
df

Unnamed: 0,movieId,overview,release_date,title,vote_average,vote_count,userId,rating,date
0,949,"Obsessive master thief, Neil McCauley leads a ...",1995-12-15,Heat,7.7,1886.0,23,3.5,2006-05-27 09:11:32
1,949,"Obsessive master thief, Neil McCauley leads a ...",1995-12-15,Heat,7.7,1886.0,102,4.0,2000-04-24 17:55:42
2,949,"Obsessive master thief, Neil McCauley leads a ...",1995-12-15,Heat,7.7,1886.0,232,2.0,2000-04-07 07:31:37
3,949,"Obsessive master thief, Neil McCauley leads a ...",1995-12-15,Heat,7.7,1886.0,242,5.0,2000-04-25 18:53:45
4,949,"Obsessive master thief, Neil McCauley leads a ...",1995-12-15,Heat,7.7,1886.0,263,3.0,2005-06-04 00:56:15
...,...,...,...,...,...,...,...,...,...
44818,64197,Plucked from an orphanage as a literal love sl...,2007-06-25,Travelling with Pets,6.0,5.0,73,4.0,2015-09-06 04:24:51
44819,64197,Plucked from an orphanage as a literal love sl...,2007-06-25,Travelling with Pets,6.0,5.0,544,5.0,2015-07-01 22:30:19
44820,64197,Plucked from an orphanage as a literal love sl...,2007-06-25,Travelling with Pets,6.0,5.0,648,3.5,2009-05-10 10:37:14
44821,98604,"Masha Krapivina - is yet beautiful, and not th...",2012-02-14,Cinderella,4.6,6.0,352,4.0,2015-01-06 05:26:26


In [28]:
df["movieId"].nunique()

2808

In [29]:
user_title_df = df.groupby(["userId","movieId"])["rating"].mean().unstack().notnull()

In [30]:
user_title_df

movieId,2,3,5,6,11,12,13,14,15,16,...,132961,133365,134158,134569,134881,140174,142507,148652,158238,160718
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
5,False,True,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
667,False,False,False,True,True,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
668,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
669,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
670,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [31]:
picked = 1949

In [32]:
filtered = user_title_df[picked]

In [33]:
user_title_df_wo = user_title_df.drop(picked,axis=1)

In [34]:
movies_similarity = user_title_df_wo.corrwith(filtered)

In [35]:
movies_similarity.sort_values(ascending=False).head(20)

movieId
4912     0.561427
1948     0.540254
1950     0.529912
2099     0.492436
5747     0.471598
1937     0.469822
26242    0.469822
898      0.455261
2132     0.450053
7916     0.444532
8460     0.444532
963      0.444532
39231    0.444532
4925     0.444532
4836     0.444532
4274     0.444532
7096     0.444532
964      0.444532
3405     0.439783
650      0.439783
dtype: float64

In [36]:
movies_similarity = movies_similarity.sort_values(ascending=False).reset_index()
movies_similarity.columns = ["movieId","movies_similarity"]

In [37]:
movies_similarity

Unnamed: 0,movieId,movies_similarity
0,4912,0.561427
1,1948,0.540254
2,1950,0.529912
3,2099,0.492436
4,5747,0.471598
...,...,...
2802,671,-0.027973
2803,1249,-0.030554
2804,198,-0.032185
2805,70,-0.034141


# <p style="background-color:#228B22; font-family:newtimeroman;color:#FFF9ED; font-size:100%; text-align:center; border-radius: 15px 50px;"> ⇣ Content-Based ⇣</p>

In [38]:
movies

Unnamed: 0,movieId,overview,release_date,title,vote_average,vote_count
0,862,"Led by Woody, Andy's toys live happily in his ...",1995-10-30,Toy Story,7.7,5415.0
1,8844,When siblings Judy and Peter discover an encha...,1995-12-15,Jumanji,6.9,2413.0
2,15602,A family wedding reignites the ancient feud be...,1995-12-22,Grumpier Old Men,6.5,92.0
3,31357,"Cheated on, mistreated and stepped on, the wom...",1995-12-22,Waiting to Exhale,6.1,34.0
4,11862,Just when George Banks has recovered from his ...,1995-02-10,Father of the Bride Part II,5.7,173.0
...,...,...,...,...,...,...
44402,30840,"Yet another version of the classic epic, with ...",1991-05-13,Robin Hood,5.7,26.0
44403,111109,An artist struggles to finish his work while a...,2011-11-17,Century of Birthing,9.0,3.0
44404,67758,"When one of her hits goes wrong, a professiona...",2003-08-01,Betrayal,3.8,6.0
44405,227506,"In a small town live two brothers, one a minis...",1917-10-21,Satan Triumphant,0.0,0.0


In [39]:
movies["overview"].loc[1]

"When siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures."

In [40]:
movies["overview"] = movies["overview"].str.replace(r"[^\w\s]"," ",regex=True).str.replace(r"[\d]"," ",regex=True)

In [41]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [42]:
tfidf = TfidfVectorizer(stop_words="english", min_df = 5)
tfidf_matrix = tfidf.fit_transform(movies["overview"])

In [43]:
from sklearn.metrics.pairwise import cosine_similarity

In [44]:
similarity = cosine_similarity(tfidf_matrix,tfidf_matrix)

In [45]:
index = movies[movies["movieId"] == 1949].index[0]

In [46]:
movies

Unnamed: 0,movieId,overview,release_date,title,vote_average,vote_count
0,862,Led by Woody Andy s toys live happily in his ...,1995-10-30,Toy Story,7.7,5415.0
1,8844,When siblings Judy and Peter discover an encha...,1995-12-15,Jumanji,6.9,2413.0
2,15602,A family wedding reignites the ancient feud be...,1995-12-22,Grumpier Old Men,6.5,92.0
3,31357,Cheated on mistreated and stepped on the wom...,1995-12-22,Waiting to Exhale,6.1,34.0
4,11862,Just when George Banks has recovered from his ...,1995-02-10,Father of the Bride Part II,5.7,173.0
...,...,...,...,...,...,...
44402,30840,Yet another version of the classic epic with ...,1991-05-13,Robin Hood,5.7,26.0
44403,111109,An artist struggles to finish his work while a...,2011-11-17,Century of Birthing,9.0,3.0
44404,67758,When one of her hits goes wrong a professiona...,2003-08-01,Betrayal,3.8,6.0
44405,227506,In a small town live two brothers one a minis...,1917-10-21,Satan Triumphant,0.0,0.0


In [47]:
pd.DataFrame(similarity[index], columns=["similarity"])

Unnamed: 0,similarity
0,0.000000
1,0.000000
2,0.000000
3,0.000000
4,0.000000
...,...
44402,0.021734
44403,0.000000
44404,0.000000
44405,0.007536


# <p style="background-color:#228B22; font-family:newtimeroman;color:#FFF9ED; font-size:100%; text-align:center; border-radius: 15px 50px;"> ⇣ Hybrid ⇣</p>

In [48]:
from sklearn.preprocessing import MinMaxScaler

def hybrid(movieid, how_many=5):
    # item-based
    filtered = user_title_df[picked]
    user_title_df_wo = user_title_df.drop(picked,axis=1)
    movies_similarity = user_title_df_wo.corrwith(filtered).reset_index()
    movies_similarity.columns = ["movieId","movies_similarity"]
    movies_similarity["movies_similarity"] = MinMaxScaler().fit_transform(movies_similarity[["movies_similarity"]])
    
    #content-based
    index = movies[movies["movieId"] == movieid].index[0]
    content_similarity = pd.DataFrame(dict(zip(similarity[index],movies["movieId"])).items(), columns=["cosine_similarity","movieId"])
    content_similarity["cosine_similarity"] = MinMaxScaler().fit_transform(content_similarity[["cosine_similarity"]])
    
    #merge
    merged = pd.merge(movies_similarity,content_similarity,on = "movieId", how="inner")
    merged["hybrid"] = merged["movies_similarity"] * merged["cosine_similarity"]
    
    #hybrid
    merged = merged.sort_values(by="hybrid", ascending=False)
    
    last_df = pd.merge(merged,movies,on="movieId", how="inner")[["title","hybrid","movieId","release_date"]].head(how_many)
    return last_df

In [49]:
hybrid(1949, how_many=5)

Unnamed: 0,title,hybrid,movieId,release_date
0,Seven Blood-Stained Orchids,0.089864,3699,1972-06-30
1,The Pledge,0.070376,5955,2001-01-09
2,Antibodies,0.069062,8332,2005-04-24
3,Mercy,0.059701,5247,2000-02-11
4,Nightwatch,0.059638,6498,1994-02-25
