<a href="https://www.kaggle.com/code/mostafahabibi1994/movie-recommender-system-content-based?scriptVersionId=155141018" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import numpy as np
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


In [2]:
data = pd.read_csv('/kaggle/input/imdb-dataset-of-top-1000-movies-and-tv-shows/imdb_top_1000.csv')
df = pd.DataFrame(data)
df.head(3)

Unnamed: 0,Poster_Link,Series_Title,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
0,https://m.media-amazon.com/images/M/MV5BMDFkYT...,The Shawshank Redemption,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469
1,https://m.media-amazon.com/images/M/MV5BM2MyNj...,The Godfather,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411
2,https://m.media-amazon.com/images/M/MV5BMTMxNT...,The Dark Knight,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Poster_Link    1000 non-null   object 
 1   Series_Title   1000 non-null   object 
 2   Released_Year  1000 non-null   object 
 3   Certificate    899 non-null    object 
 4   Runtime        1000 non-null   object 
 5   Genre          1000 non-null   object 
 6   IMDB_Rating    1000 non-null   float64
 7   Overview       1000 non-null   object 
 8   Meta_score     843 non-null    float64
 9   Director       1000 non-null   object 
 10  Star1          1000 non-null   object 
 11  Star2          1000 non-null   object 
 12  Star3          1000 non-null   object 
 13  Star4          1000 non-null   object 
 14  No_of_Votes    1000 non-null   int64  
 15  Gross          831 non-null    object 
dtypes: float64(2), int64(1), object(13)
memory usage: 125.1+ KB


In [4]:
df.describe()

Unnamed: 0,IMDB_Rating,Meta_score,No_of_Votes
count,1000.0,843.0,1000.0
mean,7.9493,77.97153,273692.9
std,0.275491,12.376099,327372.7
min,7.6,28.0,25088.0
25%,7.7,70.0,55526.25
50%,7.9,79.0,138548.5
75%,8.1,87.0,374161.2
max,9.3,100.0,2343110.0


In [5]:
df.isna().sum()

Poster_Link        0
Series_Title       0
Released_Year      0
Certificate      101
Runtime            0
Genre              0
IMDB_Rating        0
Overview           0
Meta_score       157
Director           0
Star1              0
Star2              0
Star3              0
Star4              0
No_of_Votes        0
Gross            169
dtype: int64

In [6]:
df =df.drop(columns = ['Certificate','Gross','Poster_Link','Meta_score'])
df.head(3)

Unnamed: 0,Series_Title,Released_Year,Runtime,Genre,IMDB_Rating,Overview,Director,Star1,Star2,Star3,Star4,No_of_Votes
0,The Shawshank Redemption,1994,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110
1,The Godfather,1972,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367
2,The Dark Knight,2008,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232


# Simple Recommender System

**This recommender System is simply based on weighted average of ratings**

In [7]:
min_vote = df['No_of_Votes'].quantile(.9)
min_vote

699297.7

In [8]:
mean_vote = df['IMDB_Rating'].mean()
mean_vote

7.949299999999999

In [9]:
max_vote_movies = df.loc[df['No_of_Votes'] >= min_vote]
max_vote_movies.shape

(100, 12)

In [10]:
def simple_recommender(x , mean_ = mean_vote , min_ = min_vote):
    rate = x['IMDB_Rating']
    voters = x['No_of_Votes']
    return ( ( voters / (voters + min_) * rate ) + ( min_ / (voters + min_) * mean_ ) )

In [11]:
max_vote_movies = max_vote_movies.assign(weighted_avg=max_vote_movies.apply(simple_recommender, axis=1))
max_vote_movies = max_vote_movies.sort_values('weighted_avg' ,ascending = False )
max_vote_movies = max_vote_movies.round({'weighted_avg': 2})
max_vote_movies[['Series_Title','Released_Year','IMDB_Rating','weighted_avg']].head(10)

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,weighted_avg
0,The Shawshank Redemption,1994,9.3,8.99
1,The Godfather,1972,9.2,8.82
2,The Dark Knight,2008,9.0,8.76
6,Pulp Fiction,1994,8.9,8.64
5,The Lord of the Rings: The Return of the King,2003,8.9,8.62
3,The Godfather: Part II,1974,9.0,8.6
8,Inception,2010,8.8,8.58
9,Fight Club,1999,8.8,8.57
11,Forrest Gump,1994,8.8,8.56
7,Schindler's List,1993,8.9,8.55


# Content Based Recommender System

**in this type of RS we use features like overview, director , actors , etc to recommend a movie to a user**

> firstly we use the overview or plot summary of the movie to recommend other similar movies

In [12]:
df['Overview'].head(5)

0    Two imprisoned men bond over a number of years...
1    An organized crime dynasty's aging patriarch t...
2    When the menace known as the Joker wreaks havo...
3    The early life and career of Vito Corleone in ...
4    A jury holdout attempts to prevent a miscarria...
Name: Overview, dtype: object

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity,linear_kernel

In [14]:
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['Overview'])
tfidf_matrix.shape

(1000, 5426)

In [15]:
sim_vec = linear_kernel(tfidf_matrix , tfidf_matrix)

now we have the similarity vector we can start to define a system that recommends movies besed on movie overview

In [16]:
indexes = pd.Series(df.index , index = df['Series_Title'])
indexes

Series_Title
The Shawshank Redemption      0
The Godfather                 1
The Dark Knight               2
The Godfather: Part II        3
12 Angry Men                  4
                           ... 
Breakfast at Tiffany's      995
Giant                       996
From Here to Eternity       997
Lifeboat                    998
The 39 Steps                999
Length: 1000, dtype: int64

In [17]:
def recommend_movies(x , sim_vec):
    id_ = indexes[x]
    sim_movies = list(enumerate(sim_vec[id_]))
    sim_movies = sorted(sim_movies , key = lambda x : x[1] , reverse=True)
    sim_movies = sim_movies[1:11]
    R_movies = []
    for i in sim_movies:
        R_movies.append(i[0])
    return df[['Series_Title','Released_Year','IMDB_Rating','Genre','Director','Star1','Star2','Star3','Star4']].iloc[R_movies]

In [18]:
recommend_movies('12 Angry Men' , sim_vec)

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,Genre,Director,Star1,Star2,Star3,Star4
791,Road to Perdition,2002,7.7,"Crime, Drama, Thriller",Sam Mendes,Tom Hanks,Tyler Hoechlin,Rob Maxey,Liam Aiken
95,Amélie,2001,8.3,"Comedy, Romance",Jean-Pierre Jeunet,Audrey Tautou,Mathieu Kassovitz,Rufus,Lorella Cravotta
69,Memento,2000,8.4,"Mystery, Thriller",Christopher Nolan,Guy Pearce,Carrie-Anne Moss,Joe Pantoliano,Mark Boone Junior
40,American History X,1998,8.5,Drama,Tony Kaye,Edward Norton,Edward Furlong,Beverly D'Angelo,Jennifer Lien
495,The Man from Earth,2007,7.9,"Drama, Fantasy, Mystery",Richard Schenkman,David Lee Smith,Tony Todd,John Billingsley,Ellen Crawford
162,L.A. Confidential,1997,8.2,"Crime, Drama, Mystery",Curtis Hanson,Kevin Spacey,Russell Crowe,Guy Pearce,Kim Basinger
270,Kaze no tani no Naushika,1984,8.1,"Animation, Adventure, Fantasy",Hayao Miyazaki,Sumi Shimamoto,Mahito Tsujimura,Hisako Kyôda,Gorô Naya
479,X-Men: Days of Future Past,2014,7.9,"Action, Adventure, Sci-Fi",Bryan Singer,Patrick Stewart,Ian McKellen,Hugh Jackman,James McAvoy
675,Back to the Future Part II,1989,7.8,"Adventure, Comedy, Sci-Fi",Robert Zemeckis,Michael J. Fox,Christopher Lloyd,Lea Thompson,Thomas F. Wilson
539,Le charme discret de la bourgeoisie,1972,7.9,Comedy,Luis Buñuel,Fernando Rey,Delphine Seyrig,Paul Frankeur,Bulle Ogier


In [19]:
recommend_movies("Breakfast at Tiffany's" , sim_vec)

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,Genre,Director,Star1,Star2,Star3,Star4
916,The Visitor,2007,7.6,Drama,Tom McCarthy,Richard Jenkins,Haaz Sleiman,Danai Gurira,Hiam Abbass
868,Rebel Without a Cause,1955,7.7,Drama,Nicholas Ray,James Dean,Natalie Wood,Sal Mineo,Jim Backus
870,Sabrina,1954,7.7,"Comedy, Drama, Romance",Billy Wilder,Humphrey Bogart,Audrey Hepburn,William Holden,Walter Hampden
972,Delicatessen,1991,7.6,"Comedy, Crime",Marc Caro,Jean-Pierre Jeunet,Marie-Laure Dougnac,Dominique Pinon,Pascal Benezech
963,Die Hard: With a Vengeance,1995,7.6,"Action, Adventure, Thriller",John McTiernan,Bruce Willis,Jeremy Irons,Samuel L. Jackson,Graham Greene
425,Rosemary's Baby,1968,8.0,"Drama, Horror",Roman Polanski,Mia Farrow,John Cassavetes,Ruth Gordon,Sidney Blackmer
213,Inside Out,2015,8.1,"Animation, Adventure, Comedy",Pete Docter,Ronnie Del Carmen,Amy Poehler,Bill Hader,Lewis Black
885,Victoria,2015,7.6,"Crime, Drama, Romance",Sebastian Schipper,Laia Costa,Frederick Lau,Franz Rogowski,Burak Yigit
485,Un prophète,2009,7.9,"Crime, Drama",Jacques Audiard,Tahar Rahim,Niels Arestrup,Adel Bencherif,Reda Kateb
699,Midnight Cowboy,1969,7.8,Drama,John Schlesinger,Dustin Hoffman,Jon Voight,Sylvia Miles,John McGiver


**The Recommender System is working good but let's make another system that instead of movies overview, uses genres , cast and crew**

In [20]:
df.head(3)

Unnamed: 0,Series_Title,Released_Year,Runtime,Genre,IMDB_Rating,Overview,Director,Star1,Star2,Star3,Star4,No_of_Votes
0,The Shawshank Redemption,1994,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110
1,The Godfather,1972,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367
2,The Dark Knight,2008,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232


In [21]:
df['meta_soup'] = df['Genre'] +' '+ df['Director'] +' ' + df['Star1'] +' '+df['Star2'] +' '+ df['Star3'] +' ' + df['Star4']
df['meta_soup'].head(3)

0    Drama Frank Darabont Tim Robbins Morgan Freema...
1    Crime, Drama Francis Ford Coppola Marlon Brand...
2    Action, Crime, Drama Christopher Nolan Christi...
Name: meta_soup, dtype: object

In [22]:
count_vec  = CountVectorizer(stop_words='english')
count_matrix = count_vec.fit_transform(df['meta_soup'])
count_matrix.shape

(1000, 4318)

In [23]:
sim_vec2 = cosine_similarity(count_matrix , count_matrix)

In [24]:
indexes = pd.Series(df.index , index = df['Series_Title'])

In [25]:
recommend_movies('12 Angry Men' , sim_vec2)

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,Genre,Director,Star1,Star2,Star3,Star4
416,Dog Day Afternoon,1975,8.0,"Biography, Crime, Drama",Sidney Lumet,Al Pacino,John Cazale,Penelope Allen,Sully Boyar
849,Serpico,1973,7.7,"Biography, Crime, Drama",Sidney Lumet,Al Pacino,John Randolph,Jack Kehoe,Biff McGuire
457,The Grapes of Wrath,1940,8.0,"Drama, History",John Ford,Henry Fonda,Jane Darwell,John Carradine,Charley Grapewin
305,On the Waterfront,1954,8.1,"Crime, Drama, Thriller",Elia Kazan,Marlon Brando,Karl Malden,Lee J. Cobb,Rod Steiger
546,In the Heat of the Night,1967,7.9,"Crime, Drama, Mystery",Norman Jewison,Sidney Poitier,Rod Steiger,Warren Oates,Lee Grant
850,Enter the Dragon,1973,7.7,"Action, Crime, Drama",Robert Clouse,Bruce Lee,John Saxon,Jim Kelly,Ahna Capri
981,On Golden Pond,1981,7.6,Drama,Mark Rydell,Katharine Hepburn,Henry Fonda,Jane Fonda,Doug McKeon
295,The Man Who Shot Liberty Valance,1962,8.1,"Drama, Western",John Ford,James Stewart,John Wayne,Vera Miles,Lee Marvin
674,Dip huet seung hung,1989,7.8,"Action, Crime, Drama",John Woo,Yun-Fat Chow,Danny Lee,Sally Yeh,Kong Chu
978,"Planes, Trains & Automobiles",1987,7.6,"Comedy, Drama",John Hughes,Steve Martin,John Candy,Laila Robins,Michael McKean


In [26]:
recommend_movies("Breakfast at Tiffany's" , sim_vec2)

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,Genre,Director,Star1,Star2,Star3,Star4
870,Sabrina,1954,7.7,"Comedy, Drama, Romance",Billy Wilder,Humphrey Bogart,Audrey Hepburn,William Holden,Walter Hampden
446,Roman Holiday,1953,8.0,"Comedy, Romance",William Wyler,Gregory Peck,Audrey Hepburn,Eddie Albert,Hartley Power
562,The Philadelphia Story,1940,7.9,"Comedy, Romance",George Cukor,Cary Grant,Katharine Hepburn,James Stewart,Ruth Hussey
547,Charade,1963,7.9,"Comedy, Mystery, Romance",Stanley Donen,Cary Grant,Audrey Hepburn,Walter Matthau,James Coburn
760,Flipped,2010,7.7,"Comedy, Drama, Romance",Rob Reiner,Madeline Carroll,Callan McAuliffe,Rebecca De Mornay,Anthony Edwards
703,My Fair Lady,1964,7.8,"Drama, Family, Musical",George Cukor,Audrey Hepburn,Rex Harrison,Stanley Holloway,Wilfrid Hyde-White
284,Paper Moon,1973,8.1,"Comedy, Crime, Drama",Peter Bogdanovich,Ryan O'Neal,Tatum O'Neal,Madeline Kahn,John Hillerman
319,Sunrise: A Song of Two Humans,1927,8.1,"Drama, Romance",F.W. Murnau,George O'Brien,Janet Gaynor,Margaret Livingston,Bodil Rosing
95,Amélie,2001,8.3,"Comedy, Romance",Jean-Pierre Jeunet,Audrey Tautou,Mathieu Kassovitz,Rufus,Lorella Cravotta
471,Captain Fantastic,2016,7.9,"Comedy, Drama",Matt Ross,Viggo Mortensen,George MacKay,Samantha Isler,Annalise Basso


In [27]:
recommend_movies("Inception" , sim_vec2)

Unnamed: 0,Series_Title,Released_Year,IMDB_Rating,Genre,Director,Star1,Star2,Star3,Star4
155,Batman Begins,2005,8.2,"Action, Adventure",Christopher Nolan,Christian Bale,Michael Caine,Ken Watanabe,Liam Neeson
21,Interstellar,2014,8.6,"Adventure, Drama, Sci-Fi",Christopher Nolan,Matthew McConaughey,Anne Hathaway,Jessica Chastain,Mackenzie Foy
63,The Dark Knight Rises,2012,8.4,"Action, Adventure",Christopher Nolan,Christian Bale,Tom Hardy,Anne Hathaway,Gary Oldman
343,The Revenant,2015,8.0,"Action, Adventure, Drama",Alejandro G. Iñárritu,Leonardo DiCaprio,Tom Hardy,Will Poulter,Domhnall Gleeson
477,Star Wars: Episode VII - The Force Awakens,2015,7.9,"Action, Adventure, Sci-Fi",J.J. Abrams,Daisy Ridley,John Boyega,Oscar Isaac,Domhnall Gleeson
482,Edge of Tomorrow,2014,7.9,"Action, Adventure, Sci-Fi",Doug Liman,Tom Cruise,Emily Blunt,Bill Paxton,Brendan Gleeson
493,Star Trek,2009,7.9,"Action, Adventure, Sci-Fi",J.J. Abrams,Chris Pine,Zachary Quinto,Simon Pegg,Leonard Nimoy
496,Letters from Iwo Jima,2006,7.9,"Action, Adventure, Drama",Clint Eastwood,Ken Watanabe,Kazunari Ninomiya,Tsuyoshi Ihara,Ryô Kase
746,Star Trek Into Darkness,2013,7.7,"Action, Adventure, Sci-Fi",J.J. Abrams,Chris Pine,Zachary Quinto,Zoe Saldana,Benedict Cumberbatch
36,The Prestige,2006,8.5,"Drama, Mystery, Sci-Fi",Christopher Nolan,Christian Bale,Hugh Jackman,Scarlett Johansson,Michael Caine


**Now the second part of the RS is Done**