# Recommender System (Part 1)

## Recommendation Based on Popularity

In this section, I will create a recommendation engine based only on the popularity of films. The popularity metric I will be using is IMDB's weighted rating formula.


Weighted Rating (WR) = (v/v+m)*R + (m/v+m)*C

where, v = Number of votes for the movie
        m = minimum votes requred to be listed in the chart
        R = Average rating of the movie
        C = Mean votes.
        
        
This is a system that is not personalized based on a user or an item. This is neither content based nor collaborative based. It is just a simple system that uses popularity metric to suggest films.

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from ast import literal_eval

In [13]:
data = pd.read_csv("movies_metadata.csv")
data.shape
data.head(20)

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0
5,False,,60000000,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",,949,tt0113277,en,Heat,"Obsessive master thief, Neil McCauley leads a ...",...,1995-12-15,187436818.0,170.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,A Los Angeles Crime Saga,Heat,False,7.7,1886.0
6,False,,58000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",,11860,tt0114319,en,Sabrina,An ugly duckling having undergone a remarkable...,...,1995-12-15,0.0,127.0,"[{'iso_639_1': 'fr', 'name': 'Français'}, {'is...",Released,You are cordially invited to the most surprisi...,Sabrina,False,6.2,141.0
7,False,,0,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",,45325,tt0112302,en,Tom and Huck,"A mischievous young boy, Tom Sawyer, witnesses...",...,1995-12-22,0.0,97.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,The Original Bad Boys.,Tom and Huck,False,5.4,45.0
8,False,,35000000,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",,9091,tt0114576,en,Sudden Death,International action superstar Jean Claude Van...,...,1995-12-22,64350171.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Terror goes into overtime.,Sudden Death,False,5.5,174.0
9,False,"{'id': 645, 'name': 'James Bond Collection', '...",58000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",http://www.mgm.com/view/movie/757/Goldeneye/,710,tt0113189,en,GoldenEye,James Bond must unmask the mysterious head of ...,...,1995-11-16,352194034.0,130.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,No limits. No fears. No substitutes.,GoldenEye,False,6.6,1194.0


In [14]:
data['genres'] = data['genres'].fillna('[]').apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

In [15]:
#Vote counts
vote_counts = data[data['vote_count'].notnull()]['vote_count'].astype('int')

# Lets choose m to be 95 %ile
m = vote_counts.quantile(0.95)

#Vote average
vote_average = data[data['vote_average'].notnull()]['vote_average'].astype('int')

# Mean vote average
C = vote_average.mean()


In [16]:
data['year'] = data['release_date'].apply(lambda x: str(x).split('-')[0])
data['year']

0        1995
1        1995
2        1995
3        1995
4        1995
         ... 
45461     nan
45462    2011
45463    2003
45464    1917
45465    2017
Name: year, Length: 45466, dtype: object

We keep only those movies where vote counts are not null, vote counts are greater than 95%ile and vote averages are not null. We create a new data frame called "qualified"


In [17]:
qualified = data[(data['vote_count'] >=m) & (data['vote_count'].notnull()) & (data['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity', 'genres']]

qualified['vote_count'] = qualified['vote_count'].astype('int')

qualified['vote_average'] = qualified['vote_average'].astype('int')


In [18]:
#Applying formula
def weighted_rating(x):
    v = x['vote_count']
    R = x['vote_average']
    return (v/(v+m) * R) + (m/(m+v) * C)

In [19]:
qualified['wr'] = qualified.apply(weighted_rating, axis=1)

In [20]:
qualified = qualified.sort_values('wr', ascending = False).head(250)
qualified.head(10)

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,wr
15480,Inception,2010,14075,8,29.1081,"[Action, Thriller, Science Fiction, Mystery, A...",7.917588
12481,The Dark Knight,2008,12269,8,123.167,"[Drama, Action, Crime, Thriller]",7.905871
22879,Interstellar,2014,11187,8,32.2135,"[Adventure, Drama, Science Fiction]",7.897107
2843,Fight Club,1999,9678,8,63.8696,[Drama],7.881753
4863,The Lord of the Rings: The Fellowship of the Ring,2001,8892,8,32.0707,"[Adventure, Fantasy, Action]",7.871787
292,Pulp Fiction,1994,8670,8,140.95,"[Thriller, Crime]",7.86866
314,The Shawshank Redemption,1994,8358,8,51.6454,"[Drama, Crime]",7.864
7000,The Lord of the Rings: The Return of the King,2003,8226,8,29.3244,"[Adventure, Fantasy, Action]",7.861927
351,Forrest Gump,1994,8147,8,48.3072,"[Comedy, Drama, Romance]",7.860656
5814,The Lord of the Rings: The Two Towers,2002,7641,8,29.4235,"[Adventure, Fantasy, Action]",7.851924


## Popular Movies Based on a Genre



In [21]:
s = data.apply(lambda x: pd.Series(x['genres']),axis=1).stack().reset_index(level=1, drop=True)
s.head(10)


  """Entry point for launching an IPython kernel.


0    Animation
0       Comedy
0       Family
1    Adventure
1      Fantasy
1       Family
2      Romance
2       Comedy
3       Comedy
3        Drama
dtype: object

In [22]:
s.name = 'genre'
gen_md = data.drop('genres',axis = 1).join(s)

In [23]:
def build_chart(genre, percentile = 0.85):
    df = gen_md[gen_md['genre'] == genre]
    vote_avg = df[df['vote_average'].notnull()]['vote_average'].astype('int')
    C = vote_avg.mean()
    
    vote_ct = df[df['vote_count'].notnull()]['vote_count'].astype('int')
    m = vote_ct.quantile(percentile)
    
    qualified = df[(df['vote_count'] >= m) & (df['vote_average'].notnull()) & (df['vote_count'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity']]
    
    qualified['wr'] = qualified.apply(lambda x: (x['vote_count']/(x['vote_count']+m) * x['vote_average']) + (m/(m+x['vote_count']) * C), axis=1)
    qualified = qualified.sort_values('wr', ascending=False).head(250)
    
    return qualified

In [24]:
rom = build_chart('Romance')
rom.head(10)

Unnamed: 0,title,year,vote_count,vote_average,popularity,wr
10309,Dilwale Dulhania Le Jayenge,1995,661.0,9.1,34.457,8.653196
40251,Your Name.,2016,1030.0,8.5,34.461252,8.248941
351,Forrest Gump,1994,8147.0,8.2,48.3072,8.16915
1132,Cinema Paradiso,1988,834.0,8.2,14.177,7.925222
40882,La La Land,2016,4745.0,7.9,19.681686,7.853086
22168,Her,2013,4215.0,7.9,13.8295,7.847311
7208,Eternal Sunshine of the Spotless Mind,2004,3758.0,7.9,12.9063,7.841055
876,Vertigo,1958,1162.0,8.0,18.2082,7.811667
15530,Mr. Nobody,2009,1616.0,7.9,11.8171,7.767085
38798,Captain Fantastic,2016,1569.0,7.9,16.519276,7.763322


In [25]:
act = build_chart("Action")
act.head(10)

Unnamed: 0,title,year,vote_count,vote_average,popularity,wr
12481,The Dark Knight,2008,12269.0,8.3,123.167,8.243159
1154,The Empire Strikes Back,1980,5998.0,8.2,19.471,8.089546
15480,Inception,2010,14075.0,8.1,29.1081,8.053512
7000,The Lord of the Rings: The Return of the King,2003,8226.0,8.1,29.3244,8.021345
256,Star Wars,1977,6778.0,8.1,42.1497,8.005086
4863,The Lord of the Rings: The Fellowship of the Ring,2001,8892.0,8.0,32.0707,7.929579
5814,The Lord of the Rings: The Two Towers,2002,7641.0,8.0,29.4235,7.918382
23753,Guardians of the Galaxy,2014,10014.0,7.9,53.2916,7.839511
2458,The Matrix,1999,9079.0,7.9,33.3663,7.833434
13605,Inglourious Basterds,2009,6598.0,7.9,16.8956,7.809236


In [128]:
fan = build_chart('Fantasy')
fan.head(10)

Unnamed: 0,title,year,vote_count,vote_average,popularity,wr
5481,Spirited Away,2001,3968.0,8.3,41.0489,8.034694
7000,The Lord of the Rings: The Return of the King,2003,8226.0,8.1,29.3244,7.974898
3030,The Green Mile,1999,4166.0,8.2,19.9668,7.954879
4863,The Lord of the Rings: The Fellowship of the Ring,2001,8892.0,8.0,32.0707,7.888126
5814,The Lord of the Rings: The Two Towers,2002,7641.0,8.0,29.4235,7.870711
17437,Harry Potter and the Deathly Hallows: Part 2,2011,6141.0,7.9,24.9907,7.747091
9698,Howl's Moving Castle,2004,2049.0,8.2,16.136,7.742589
2884,Princess Mononoke,1997,2041.0,8.2,17.1667,7.741087
7725,Harry Potter and the Prisoner of Azkaban,2004,6037.0,7.7,28.4603,7.556913
10839,V for Vendetta,2006,4562.0,7.7,20.2144,7.514339


## Most Popular higher rated movies


In [65]:
worstpop = data.loc[data['popularity'].notnull()][['title', 'year', 'vote_count', 'vote_average', 'popularity']]
worstpop['popularity']


0         21.9469
1         17.0155
2         11.7129
3         3.85949
4         8.38752
           ...   
45461    0.072051
45462    0.178241
45463    0.903007
45464    0.003503
45465    0.163015
Name: popularity, Length: 45461, dtype: object

In [92]:
def WOrstPop(vote_avg=5):
    df = worstpop[worstpop['vote_average']> vote_avg]
    df['popularity'] = df['popularity'].astype('str')
    df = df.sort_values('popularity',ascending=False)
    return df



In [97]:
Worst_popular = WOrstPop(5).head(250)
Worst_popular.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,title,year,vote_count,vote_average,popularity
34313,Chameli Ki Shaadi,1986,0.0,0.0,9e-06
38576,Hyper Sapien: People from Another Star,1986,0.0,0.0,9e-06
12488,Black Dawn,2005,21.0,3.6,9.989719
14043,The Haunting of Molly Hartley,2008,48.0,4.3,9.982123
497,The Next Karate Kid,1994,202.0,4.8,9.9697
10464,The Fog,2005,191.0,3.8,9.929809
24591,Children of the Corn,2009,45.0,4.3,9.895965
12286,Pledge This!,2006,34.0,2.7,9.867616
558,Chasers,1994,22.0,4.7,9.85327
11445,Eragon,2006,990.0,4.9,9.851133
