**Load Libraries**

In [1]:
import numpy as np
import pandas as pd

**Dataset Loading**

Credits Data File

In [5]:
df1 = pd.read_csv('tmdb_5000_credits.csv')
#df1.head()
#print("Shape of Credits dataset")
#df1.shape
df1.info()

Shape of Credits dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   movie_id  4803 non-null   int64 
 1   title     4803 non-null   object
 2   cast      4803 non-null   object
 3   crew      4803 non-null   object
dtypes: int64(1), object(3)
memory usage: 150.2+ KB


Movies Data File

In [6]:
df2 = pd.read_csv('tmdb_5000_movies.csv')
#df2.head()
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

**Merging Data Files**

In [10]:
df1.columns= ['id', 'title', 'cast', 'crew']
df2 = df2.merge(df1, on = 'id')
#df2.head()
df2.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title_x', 'vote_average',
       'vote_count', 'title_y', 'cast_x', 'crew_x', 'title_x', 'cast_y',
       'crew_y', 'title_y', 'cast_x', 'crew_x', 'title', 'cast_y', 'crew_y'],
      dtype='object')

**Processing**

Weighted Rating (WR) = (v/ v + m .R) + (v/ v + m .C)

where,

**v** is the number of votes for the movie,
**m** is the minimum votes required to be listed in the chart;
**R** is the average rating of the movie; and
**C** is the mean vote across the whole report

In [16]:
C = df2['vote_average'].mean()
C

6.092171559442011

**Minimum votes to be listed**

In [12]:
m = df2['vote_count']. quantile(0.9)
m

1838.4000000000015

**Getting list of movies to be listed**

In [14]:
lists_movies = df2.copy().loc[df2['vote_count'] >= m]
lists_movies.shape

(481, 32)

There are 481 movies which qualify to be in this list. Let us calculate our metric for each qualified movies

**Defining a function **

In [17]:
def weighted_rating (x, m=m, C=C):
  v = x['vote_count']
  R = x['vote_average']
  # Calcultation based on formula and (m = 1838, c= 6.09)
  return (v/(v+m) * R) + (m/(m+v) * C)

In [18]:
# Define a new feature 'score' and calculate its value with weighted rating
lists_movies['score'] = lists_movies.apply(weighted_rating, axis= 1)

In [21]:
#lists_movies.head(3)
lists_movies.shape

(481, 33)

**Sorting the Movies**

In [23]:
# sort movies based on score calculated 
lists_movies = lists_movies.sort_values('score', ascending=False)

# Print top 10 movies
lists_movies[['title_x', 'vote_count', 'vote_average', 'score']].head(10)


Unnamed: 0,title_x,title_x.1,vote_count,vote_average,score
1881,The Shawshank Redemption,The Shawshank Redemption,8205,8.5,8.059258
662,Fight Club,Fight Club,9413,8.3,7.939256
65,The Dark Knight,The Dark Knight,12002,8.2,7.92002
3232,Pulp Fiction,Pulp Fiction,8428,8.3,7.904645
96,Inception,Inception,13752,8.1,7.863239
3337,The Godfather,The Godfather,5893,8.4,7.851236
95,Interstellar,Interstellar,10867,8.1,7.809479
809,Forrest Gump,Forrest Gump,7927,8.2,7.803188
329,The Lord of the Rings: The Return of the King,The Lord of the Rings: The Return of the King,8064,8.1,7.727243
1990,The Empire Strikes Back,The Empire Strikes Back,5879,8.2,7.697884
