## IMDB movie recommendation

IMDB uses this famous formula:
<br>
weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
<br>
 where:
 <br>
  R = average for the movie (mean) = (Rating)
  <br>
  v = number of votes for the movie = (votes)
  <br>
  m = minimum votes required to be listed in the Top 250 (currently 1250)
  <br>
  C = the mean vote across the whole report (currently 6.8)

This formula is exceedingly useful, but I have beef with the "m"
variable, because it's arbitrary. As far as I can tell, the other
three variables should be enough to calculate what score a movie would
have if it had a quadrillion votes.
So why is this  "m" nonsense thrown in, and is there any formula that avoids it?
<br>
Source : google

In [69]:
import numpy as np
import pandas as pd

In [70]:
movie_data= pd.read_csv(r'C:\Users\Bharath\movies_metadata.csv', encoding='utf-8',low_memory=False)
movie_data.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


In [71]:
print('Columns Names ',movie_data.columns)

Columns Names  Index(['adult', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id',
       'imdb_id', 'original_language', 'original_title', 'overview',
       'popularity', 'poster_path', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'video',
       'vote_average', 'vote_count'],
      dtype='object')


In [72]:
# considering few columns 
movie_data = movie_data[['title','vote_average','vote_count']]
movie_data.head(10)

Unnamed: 0,title,vote_average,vote_count
0,Toy Story,7.7,5415.0
1,Jumanji,6.9,2413.0
2,Grumpier Old Men,6.5,92.0
3,Waiting to Exhale,6.1,34.0
4,Father of the Bride Part II,5.7,173.0
5,Heat,7.7,1886.0
6,Sabrina,6.2,141.0
7,Tom and Huck,5.4,45.0
8,Sudden Death,5.5,174.0
9,GoldenEye,6.6,1194.0


In [73]:
# C value will be obtained as
C = movie_data['vote_average'].mean()
C

5.618207215133889

In [74]:
# M value will be obtained as
m = movie_data['vote_count'].quantile(0.80)
m

50.0

In [75]:
movie_data = movie_data[movie_data['vote_count'] >= m]
movie_data.head()

Unnamed: 0,title,vote_average,vote_count
0,Toy Story,7.7,5415.0
1,Jumanji,6.9,2413.0
2,Grumpier Old Men,6.5,92.0
4,Father of the Bride Part II,5.7,173.0
5,Heat,7.7,1886.0


In [76]:
def weighted_rating(x):
    v = x['vote_count']
    R = x['vote_average']
    imdb_cal = (v/(v+m) * R) + (m/(m+v) * C)
    return imdb_cal

In [77]:
# Define a new feature 'score' and calculate its value with `weighted_rating()`
movie_data['score'] = movie_data.apply(weighted_rating, axis=1)
movie_data.head(10)

Unnamed: 0,title,vote_average,vote_count,score
0,Toy Story,7.7,5415.0,7.680953
1,Jumanji,6.9,2413.0,6.873979
2,Grumpier Old Men,6.5,92.0,6.18951
4,Father of the Bride Part II,5.7,173.0,5.681661
5,Heat,7.7,1886.0,7.646235
6,Sabrina,6.2,141.0,6.047698
8,Sudden Death,5.5,174.0,5.526386
9,GoldenEye,6.6,1194.0,6.560539
10,The American President,6.5,199.0,6.322933
11,Dracula: Dead and Loving It,5.7,210.0,5.684271


In [81]:
# movies names sorted based on the scores
movie_data = movie_data.sort_values('score', ascending=False)

# top 10 movies
movie_data[['title', 'score']].head(10)

Unnamed: 0,title,score
10309,Dilwale Dulhania Le Jayenge,8.855148
314,The Shawshank Redemption,8.482863
834,The Godfather,8.476278
40251,Your Name.,8.366584
12481,The Dark Knight,8.289115
2843,Fight Club,8.286216
292,Pulp Fiction,8.284623
522,Schindler's List,8.270109
23673,Whiplash,8.269704
5481,Spirited Away,8.266628
