# Functions for director rating and actor rating

This file has the functions for finding the average rating for a director or actor of their movies before a certain year

## Unraveling the data

In [3]:
import pickle

#Edit this file path if needed but I think it should work 
file_path = '../processed_data/combined_dataset.pkl'

with open(file_path, 'rb') as file:
    data = pickle.load(file)

print(data)

                 final_title  final_budget  final_worldwide_boxoffice  \
0      !Women Art Revolution           NaN                        NaN   
1        #1 Cheerleader Camp           NaN                        NaN   
2               #chicagoGirl           NaN                        NaN   
3                    #Horror     1500000.0                        0.0   
4             #Pellichoopulu      200000.0                  5500000.0   
...                      ...           ...                        ...   
49861                １リットルの涙           NaN                        NaN   
49862      １３号待避線より　その護送車を狙え           NaN                        NaN   
49863   ２０世紀少年< 第1章> 終わりの始まり    20000000.0                 31244858.0   
49864                ３－４Ｘ１０月           NaN                        NaN   
49865     ＳＭガールズ セイバーマリオネットＲ           NaN                        NaN   

       final_domestic_boxoffice  \
0                           NaN   
1                           NaN   
2                 

## Director Rating Function

First we create a function that takes in a director name and a year, and then averages the rating of all the movies before that year:

In [6]:
def director_rating_before_year(Name, Year):
    directorname = str(Name)
    vallist = data[(data['director'] == directorname) & (data['final_year'] < (Year))]
    rating = vallist['final_rating'].mean()
    return rating

An example of using this function:

In [7]:
director_rating_before_year('Christopher Nolan', 2010)

np.float64(7.650000000000001)

## Actor Rating Function

For this first we add a 'star2' column that turns each entry of the star column into a list of strings so it can be worked with

In [9]:
data['star2'] = (data['star'].str.replace("\n","")).str.split(", ")

#Here is an example of what it looks like
data['star2'].iloc[9]

['Warren Beatty', 'Goldie Hawn', 'Gert Fröbe', 'Robert Webber']

Now we define the function:

In [10]:
def actor_rating_before_year(Name, Year):
    actorname = str(Name)
    data['star2'] = data['star2'].apply(lambda x: x if isinstance(x, list) else [])
    mask = data['star2'].apply(lambda names: actorname in names) & (data['final_year'] < Year)
    vallist = data[mask]
    rating = vallist['final_rating'].mean()
    return rating

And here are some examples

In [12]:
print(actor_rating_before_year('Tom Cruise', 2022))
print(actor_rating_before_year('Ben Stiller', 2022))

6.46969696969697
5.896551724137932


In case it helps, here is a list of all the actors, listed alphabetically:

In [30]:
ActorsTotalList = (list(set([actor for sublist in data['star2'] if isinstance(sublist, list) for actor in sublist])))
ActorsTotalList = sorted(ActorsTotalList)
print(ActorsTotalList[:10])

["'Ganja' Karuppu", "'Hurricane Ryu' Hariken", "'Little Billy' Rhodes", "'Spring' Mark Adley", "'University' Jeevan", "'Weird Al' Yankovic", '50 Cent', 'A Martinez', 'A. Michael Baldwin', 'A. Scott']


For example of one way to use this, if we4 wanted to list the first 20 actors and their actor rating this is what we would do:

In [17]:
def actorratinglistbeforeyear(Alist, Year):
    return [[actor, actor_rating_before_year(actor, Year)] for actor in Alist]

actorratinglistbeforeyear(ActorsTotalList[:20],2010)

[["'Ganja' Karuppu", np.float64(6.8)],
 ["'Hurricane Ryu' Hariken", np.float64(5.4)],
 ["'Little Billy' Rhodes", np.float64(1.0)],
 ["'Spring' Mark Adley", np.float64(6.1)],
 ["'University' Jeevan", np.float64(7.1)],
 ["'Weird Al' Yankovic", np.float64(6.7)],
 ['50 Cent', np.float64(5.975)],
 ['A Martinez', np.float64(0.0)],
 ['A. Michael Baldwin', np.float64(5.666666666666667)],
 ['A. Scott', np.float64(4.5)],
 ['A.C. Peterson', np.float64(5.5)],
 ['A.J. Buckley', np.float64(5.466666666666668)],
 ['A.J. Clarke', np.float64(6.5)],
 ['A.J. Cook', np.float64(5.3)],
 ['A.J. Langer', np.float64(5.366666666666667)],
 ['A.J. van der Merwe', nan],
 ['A.W. Baskcomb', np.float64(6.0)],
 ['AJ Bowen', nan],
 ['AJ Michalka', nan],
 ['Aachi Manorama', np.float64(5.6)]]

# Adjusting for inflation

In [18]:
 #function to adjust for inflation, input the number and year and it will say the value adjusted for inflation to 2025
def adjust_inflation(num, year):
    diff = 2025 - year
    final_num = num * (1.0325)**diff
    return final_num

In [None]:
 #heres an example
print(adjust_inflation(8000, 2024))

8260.0


In [36]:
#example applying it to the column box office
data['ww_boxoffice_adjusted'] = adjust_inflation(data['final_worldwide_boxoffice'], data['final_year'])
print(data[[ 'final_year','final_worldwide_boxoffice', 'ww_boxoffice_adjusted','final_title']][:30])

    final_year  final_worldwide_boxoffice  ww_boxoffice_adjusted  \
0       2010.0                        NaN                    NaN   
1       2010.0                        NaN                    NaN   
2       2013.0                        NaN                    NaN   
3       2015.0                        0.0           0.000000e+00   
4       2016.0                  5500000.0           7.334546e+06   
5       2016.0                        NaN                    NaN   
6       2014.0                        NaN                    NaN   
7       2008.0                        NaN                    NaN   
8       2008.0                        NaN                    NaN   
9       1971.0                        NaN                    NaN   
10      2013.0                        NaN                    NaN   
11      1917.0                        NaN                    NaN   
12      2014.0                  1625847.0           2.311375e+06   
13      2005.0                        NaN       

7334545.928564244