# 4-Oscar Prediction with AutoML
After out dataframe has been assemlbed (see scraping and table_assembling) notebooks we have the data we need to make predictions on the Best Picture winner. [AutoML](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) represents a quick, but powerful route though the Machine Learning process. H2O's AutoML runs many models through the dataset and using cross-validation, picks the best one. For my purposes I use it to confirm/compare to the Preferential Balloting Random Forest model I created.
If you are gunning to win your office's Oscar pool, scroll down to see the results.

In [1]:
import pandas as pd
import numpy as np
import h2o

In [3]:
from h2o.estimators import H2OXGBoostEstimator
h2o.__version__

'3.40.0.4'

# Machine Learning - Using h2o Auto ML

In [4]:
full_table = pd.read_csv('./data/processed_results/osc_df')

In [5]:
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321. connected.


0,1
H2O_cluster_uptime:,1 day 2 hours 24 mins
H2O_cluster_timezone:,Europe/Berlin
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.40.0.4
H2O_cluster_version_age:,23 days
H2O_cluster_name:,H2O_from_python_Aleksandra_Czaplak_8vsa57
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,1.542 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


First Year of Existance. This data will be used below
- golden_globes 1943
- pga 1989
- bafta 1960
- dga 1948
- sag 1995
- cannes 1970

In [6]:
# I pick a min_year where the awards shows will be relevant
min_year = 1995

# H2O's Auto ML

In [7]:
# Auto ML uses Cross Validation, so we do not specifiy a validation set
train = full_table.loc[((full_table['year'] < 2022) & (full_table['year'] >= min_year))]

print('training set contains:', train.shape[0], 'movies')

training set contains: 186 movies


In [8]:
train = train.drop('Unnamed: 0', axis=1)
train.columns

Index(['year', 'film', 'wiki', 'winner', 'nominations', 'Oscar_win',
       'nom_gg_drama', 'winner_gg_drama', 'nom_gg_comedy', 'winner_gg_comedy',
       'nom_pga', 'winner_pga', 'nom_bafta', 'winner_bafta', 'nom_dga',
       'winner_dga', 'nom_sag', 'winner_sag', 'nom_cannes', 'winner_cannes'],
      dtype='object')

In [9]:
print(type(train))

<class 'pandas.core.frame.DataFrame'>


In [10]:
from h2o.automl import H2OAutoML, get_leaderboard

# Import a sample binary outcome train/test set into H2O
train1 = h2o.H2OFrame(train)

# Identify predictors and response
predictors = ['year','nom_gg_drama', 'winner_gg_drama', 'nom_gg_comedy', 'winner_gg_comedy',
       'nom_pga', 'winner_pga', 'nom_bafta', 'winner_bafta', 'nom_dga', 'winner_dga',
        'nom_sag', 'winner_sag', 'nom_cannes', 'winner_cannes','nominations']

x = predictors
y = 'Oscar_win'

# For binary classification, response should be a factor
train1[y] = train1[y].asfactor()

# Run AutoML for 100 base models (limited to 1 hour max runtime by default)
aml = H2OAutoML(max_models=100, seed=1
                , keep_cross_validation_predictions= True
               , exclude_algos = ['StackedEnsemble'],
               balance_classes=True,
               sort_metric='AUCPR')

aml.train(x=x, y=y, training_frame=train1)

# AutoML Leaderboard
lb = aml.leaderboard

# Optionally edd extra model information to the leaderboard
lb = get_leaderboard(aml, extra_columns='ALL')

# Print all rows (instead of default 10 rows)
lb.head(rows=lb.nrows)

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
AutoML progress: |█
17:24:51.318: AutoML: XGBoost is not available; skipping it.
17:24:51.352: _train param, Dropping bad and constant columns: [winner_cannes, nom_cannes]
17:24:52.242: _train param, Dropping bad and constant columns: [winner_cannes, nom_cannes]
17:24:52.242: _min_rows param, The dataset size is too small to split for min_rows=100.0: must have at least 200.0 (weighted) rows, but have only 186.0.
17:24:52.244: _train param, Dropping bad and constant columns: [winner_cannes, nom_cannes]
17:24:53.213: _train param, Dropping bad and constant columns: [winner_cannes, nom_cannes]
17:24:53.892: _train param, Dropping bad and constant columns: [winner_cannes, nom_cannes]
17:24:54.340: _train param, Dropping bad and constant columns: [winner_cannes, nom_cannes]
17:24:54.872: _train param, Dropping bad and constant columns: [winner_cannes, nom_cannes]
17:24:56.542: _train param, Droppi

In [9]:
top_model = aml.leader
top_model

Unnamed: 0,layer,units,type,dropout,l1,l2,mean_rate,rate_rms,momentum,mean_weight,weight_rms,mean_bias,bias_rms
,1,14,Input,5.0,,,,,,,,,
,2,50,RectifierDropout,40.0,0.0,0.0,0.0052819,0.0270572,0.0,0.0706238,0.3019202,-0.2531923,0.3544667
,3,2,Softmax,,0.0,0.0,0.0019572,0.0036744,0.0,-0.0821257,0.9244528,-0.011181,0.3788637

Unnamed: 0,0,1,Error,Rate
0,156.0,3.0,0.0189,(3.0/159.0)
1,10.0,17.0,0.3704,(10.0/27.0)
Total,166.0,20.0,0.0699,(13.0/186.0)

metric,threshold,value,idx
max f1,0.3524093,0.7234043,18.0
max f2,0.050816,0.7668712,46.0
max f0point5,0.5289677,0.8241758,14.0
max accuracy,0.5289677,0.9301075,14.0
max precision,1.0,1.0,0.0
max recall,0.0231833,1.0,65.0
max specificity,1.0,1.0,0.0
max absolute_mcc,0.3524093,0.6945175,18.0
max min_per_class_accuracy,0.0863697,0.8518519,40.0
max mean_per_class_accuracy,0.050816,0.8686233,46.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0107527,1.0,6.8888889,6.8888889,1.0,1.0,1.0,1.0,0.0740741,0.0740741,588.8888889,588.8888889,0.0740741
2,0.0215054,0.9999959,6.8888889,6.8888889,1.0,0.9999999,1.0,0.9999999,0.0740741,0.1481481,588.8888889,588.8888889,0.1481481
3,0.0322581,0.9998203,6.8888889,6.8888889,1.0,0.9999137,1.0,0.9999712,0.0740741,0.2222222,588.8888889,588.8888889,0.2222222
4,0.0430108,0.9945451,6.8888889,6.8888889,1.0,0.9988702,1.0,0.999696,0.0740741,0.2962963,588.8888889,588.8888889,0.2962963
5,0.0537634,0.9124077,6.8888889,6.8888889,1.0,0.9858973,1.0,0.9969362,0.0740741,0.3703704,588.8888889,588.8888889,0.3703704
6,0.1021505,0.3525419,4.5925926,5.8011696,0.6666667,0.5578856,0.8421053,0.7889649,0.2222222,0.5925926,359.2592593,480.1169591,0.5737247
7,0.1505376,0.2653283,1.5308642,4.4285714,0.2222222,0.3352631,0.6428571,0.6431321,0.0740741,0.6666667,53.0864198,342.8571429,0.6037736
8,0.2043011,0.1207151,2.7555556,3.9883041,0.4,0.1676764,0.5789474,0.5180122,0.1481481,0.8148148,175.5555556,298.8304094,0.7141859
9,0.3010753,0.0483993,1.1481481,3.0753968,0.1666667,0.0666777,0.4464286,0.3729404,0.1111111,0.9259259,14.8148148,207.5396825,0.7309574
10,0.4086022,0.0353265,0.3444444,2.3567251,0.05,0.0404583,0.3421053,0.2854451,0.037037,0.962963,-65.5555556,135.6725146,0.6484976

Unnamed: 0,0,1,Error,Rate
0,144.0,15.0,0.0943,(15.0/159.0)
1,10.0,17.0,0.3704,(10.0/27.0)
Total,154.0,32.0,0.1344,(25.0/186.0)

metric,threshold,value,idx
max f1,0.1897039,0.5762712,31.0
max f2,0.1897039,0.6071429,31.0
max f0point5,0.9415002,0.6349206,8.0
max accuracy,0.9415002,0.8924731,8.0
max precision,0.9999998,1.0,0.0
max recall,0.0032229,1.0,142.0
max specificity,0.9999998,1.0,0.0
max absolute_mcc,0.1897039,0.4996136,31.0
max min_per_class_accuracy,0.0606926,0.7037037,53.0
max mean_per_class_accuracy,0.1897039,0.767645,31.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0107527,0.9999811,6.8888889,6.8888889,1.0,0.9999998,1.0,0.9999998,0.0740741,0.0740741,588.8888889,588.8888889,0.0740741
2,0.0215054,0.9993527,6.8888889,6.8888889,1.0,0.9999777,1.0,0.9999887,0.0740741,0.1481481,588.8888889,588.8888889,0.1481481
3,0.0322581,0.985408,3.4444444,5.7407407,0.5,0.9985298,0.8333333,0.9995024,0.037037,0.1851852,244.4444444,474.0740741,0.1788959
4,0.0430108,0.949711,6.8888889,6.0277778,1.0,0.9651555,0.875,0.9909157,0.0740741,0.2592593,588.8888889,502.7777778,0.25297
5,0.0537634,0.9192653,3.4444444,5.5111111,0.5,0.9331563,0.8,0.9793638,0.037037,0.2962963,244.4444444,451.1111111,0.2837177
6,0.1021505,0.6285782,0.0,2.9005848,0.0,0.7645958,0.4210526,0.8776316,0.0,0.2962963,-100.0,190.0584795,0.2271139
7,0.1505376,0.2736724,4.5925926,3.4444444,0.6666667,0.432021,0.5,0.7343996,0.2222222,0.5185185,359.2592593,244.4444444,0.4304682
8,0.2043011,0.1509563,2.0666667,3.0818713,0.3,0.1847457,0.4473684,0.5897539,0.1111111,0.6296296,106.6666667,208.1871345,0.4975542
9,0.3010753,0.0580568,0.7654321,2.3373016,0.1111111,0.0975445,0.3392857,0.4315437,0.0740741,0.7037037,-23.4567901,133.7301587,0.4709993
10,0.4032258,0.0342205,0.0,1.7451852,0.0,0.0433447,0.2533333,0.3332,0.0,0.7037037,-100.0,74.5185185,0.3515024

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.7790896,0.1870809,0.8684211,0.5405405,0.972973,0.6216216,0.8918919
auc,0.6687955,0.1405028,0.8666667,0.6214285,0.6714286,0.7058824,0.4785714
err,0.2209104,0.1870809,0.131579,0.4594594,0.027027,0.3783784,0.1081081
err_count,8.2,6.9065185,5.0,17.0,1.0,14.0,4.0
f0point5,0.4603295,0.3526573,0.8510638,0.1282051,0.8333333,0.2112676,0.2777778
f1,0.4710682,0.282976,0.8648649,0.1904762,0.6666667,0.3,0.3333333
f2,0.5477909,0.1996527,0.8791209,0.3703704,0.5555556,0.5172414,0.4166667
lift_top_group,4.1222224,8.08924,2.1111112,0.0,18.5,0.0,0.0
logloss,0.4337158,0.2180643,0.7732567,0.5186943,0.222864,0.312507,0.3412567
max_per_class_error,0.4094958,0.1496116,0.15,0.4857143,0.5,0.4117647,0.5

Unnamed: 0,timestamp,duration,training_speed,epochs,iterations,samples,training_rmse,training_logloss,training_r2,training_auc,training_pr_auc,training_lift,training_classification_error
,2023-05-18 15:35:20,0.000 sec,,0.0,0,0.0,,,,,,,
,2023-05-18 15:35:20,14 min 57.881 sec,77500 obs/sec,10.0,1,1860.0,0.3281038,0.4167286,0.132464,0.808642,0.4972985,6.8888889,0.1129032
,2023-05-18 15:35:25,15 min 2.890 sec,87865 obs/sec,2370.0,237,440820.0,0.2488482,0.2048569,0.5009615,0.9367575,0.7875639,6.8888889,0.0698925
,2023-05-18 15:35:30,15 min 7.891 sec,98158 obs/sec,5280.0,528,982080.0,0.2480757,0.2025354,0.504055,0.9439786,0.7986576,6.8888889,0.0752688
,2023-05-18 15:35:33,15 min 10.849 sec,101425 obs/sec,7060.0,706,1313160.0,0.2461841,0.1987398,0.5115894,0.9430468,0.8008137,6.8888889,0.0698925

variable,relative_importance,scaled_importance,percentage
winner_sag,1.0,1.0,0.1255337
winner_gg_drama,0.8694185,0.8694185,0.1091413
nominations,0.8031203,0.8031203,0.1008186
winner_dga,0.6849936,0.6849936,0.0859898
winner_pga,0.6611996,0.6611996,0.0830028
nom_gg_drama,0.6321517,0.6321517,0.0793563
nom_bafta,0.4765725,0.4765725,0.0598259
winner_bafta,0.4700401,0.4700401,0.0590059
nom_dga,0.42396,0.42396,0.0532213
year,0.4210126,0.4210126,0.0528513


In [16]:
h2o.save_model(top_model, './basic_model/model.zip')

'C:\\Users\\Aleksandra Czaplak\\Desktop\\oscars_ml\\oscar_predictions\\basic_model\\model.zip\\DeepLearning_grid_1_AutoML_1_20230518_152000_model_13'

In [17]:
model = h2o.load_model('./basic_model/model.zip/DeepLearning_grid_1_AutoML_1_20230518_152000_model_13')

In [18]:
model

Unnamed: 0,layer,units,type,dropout,l1,l2,mean_rate,rate_rms,momentum,mean_weight,weight_rms,mean_bias,bias_rms
,1,14,Input,5.0,,,,,,,,,
,2,50,RectifierDropout,40.0,0.0,0.0,0.0052819,0.0270572,0.0,0.0706238,0.3019202,-0.2531923,0.3544667
,3,2,Softmax,,0.0,0.0,0.0019572,0.0036744,0.0,-0.0821257,0.9244528,-0.011181,0.3788637

Unnamed: 0,0,1,Error,Rate
0,156.0,3.0,0.0189,(3.0/159.0)
1,10.0,17.0,0.3704,(10.0/27.0)
Total,166.0,20.0,0.0699,(13.0/186.0)

metric,threshold,value,idx
max f1,0.3524093,0.7234043,18.0
max f2,0.050816,0.7668712,46.0
max f0point5,0.5289677,0.8241758,14.0
max accuracy,0.5289677,0.9301075,14.0
max precision,1.0,1.0,0.0
max recall,0.0231833,1.0,65.0
max specificity,1.0,1.0,0.0
max absolute_mcc,0.3524093,0.6945175,18.0
max min_per_class_accuracy,0.0863697,0.8518519,40.0
max mean_per_class_accuracy,0.050816,0.8686233,46.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0107527,1.0,6.8888889,6.8888889,1.0,1.0,1.0,1.0,0.0740741,0.0740741,588.8888889,588.8888889,0.0740741
2,0.0215054,0.9999959,6.8888889,6.8888889,1.0,0.9999999,1.0,0.9999999,0.0740741,0.1481481,588.8888889,588.8888889,0.1481481
3,0.0322581,0.9998203,6.8888889,6.8888889,1.0,0.9999137,1.0,0.9999712,0.0740741,0.2222222,588.8888889,588.8888889,0.2222222
4,0.0430108,0.9945451,6.8888889,6.8888889,1.0,0.9988702,1.0,0.999696,0.0740741,0.2962963,588.8888889,588.8888889,0.2962963
5,0.0537634,0.9124077,6.8888889,6.8888889,1.0,0.9858973,1.0,0.9969362,0.0740741,0.3703704,588.8888889,588.8888889,0.3703704
6,0.1021505,0.3525419,4.5925926,5.8011696,0.6666667,0.5578856,0.8421053,0.7889649,0.2222222,0.5925926,359.2592593,480.1169591,0.5737247
7,0.1505376,0.2653283,1.5308642,4.4285714,0.2222222,0.3352631,0.6428571,0.6431321,0.0740741,0.6666667,53.0864198,342.8571429,0.6037736
8,0.2043011,0.1207151,2.7555556,3.9883041,0.4,0.1676764,0.5789474,0.5180122,0.1481481,0.8148148,175.5555556,298.8304094,0.7141859
9,0.3010753,0.0483993,1.1481481,3.0753968,0.1666667,0.0666777,0.4464286,0.3729404,0.1111111,0.9259259,14.8148148,207.5396825,0.7309574
10,0.4086022,0.0353265,0.3444444,2.3567251,0.05,0.0404583,0.3421053,0.2854451,0.037037,0.962963,-65.5555556,135.6725146,0.6484976

Unnamed: 0,0,1,Error,Rate
0,144.0,15.0,0.0943,(15.0/159.0)
1,10.0,17.0,0.3704,(10.0/27.0)
Total,154.0,32.0,0.1344,(25.0/186.0)

metric,threshold,value,idx
max f1,0.1897039,0.5762712,31.0
max f2,0.1897039,0.6071429,31.0
max f0point5,0.9415002,0.6349206,8.0
max accuracy,0.9415002,0.8924731,8.0
max precision,0.9999998,1.0,0.0
max recall,0.0032229,1.0,142.0
max specificity,0.9999998,1.0,0.0
max absolute_mcc,0.1897039,0.4996136,31.0
max min_per_class_accuracy,0.0606926,0.7037037,53.0
max mean_per_class_accuracy,0.1897039,0.767645,31.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0107527,0.9999811,6.8888889,6.8888889,1.0,0.9999998,1.0,0.9999998,0.0740741,0.0740741,588.8888889,588.8888889,0.0740741
2,0.0215054,0.9993527,6.8888889,6.8888889,1.0,0.9999777,1.0,0.9999887,0.0740741,0.1481481,588.8888889,588.8888889,0.1481481
3,0.0322581,0.985408,3.4444444,5.7407407,0.5,0.9985298,0.8333333,0.9995024,0.037037,0.1851852,244.4444444,474.0740741,0.1788959
4,0.0430108,0.949711,6.8888889,6.0277778,1.0,0.9651555,0.875,0.9909157,0.0740741,0.2592593,588.8888889,502.7777778,0.25297
5,0.0537634,0.9192653,3.4444444,5.5111111,0.5,0.9331563,0.8,0.9793638,0.037037,0.2962963,244.4444444,451.1111111,0.2837177
6,0.1021505,0.6285782,0.0,2.9005848,0.0,0.7645958,0.4210526,0.8776316,0.0,0.2962963,-100.0,190.0584795,0.2271139
7,0.1505376,0.2736724,4.5925926,3.4444444,0.6666667,0.432021,0.5,0.7343996,0.2222222,0.5185185,359.2592593,244.4444444,0.4304682
8,0.2043011,0.1509563,2.0666667,3.0818713,0.3,0.1847457,0.4473684,0.5897539,0.1111111,0.6296296,106.6666667,208.1871345,0.4975542
9,0.3010753,0.0580568,0.7654321,2.3373016,0.1111111,0.0975445,0.3392857,0.4315437,0.0740741,0.7037037,-23.4567901,133.7301587,0.4709993
10,0.4032258,0.0342205,0.0,1.7451852,0.0,0.0433447,0.2533333,0.3332,0.0,0.7037037,-100.0,74.5185185,0.3515024

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.7790896,0.1870809,0.8684211,0.5405405,0.972973,0.6216216,0.8918919
auc,0.6687955,0.1405028,0.8666667,0.6214285,0.6714286,0.7058824,0.4785714
err,0.2209104,0.1870809,0.131579,0.4594594,0.027027,0.3783784,0.1081081
err_count,8.2,6.9065185,5.0,17.0,1.0,14.0,4.0
f0point5,0.4603295,0.3526573,0.8510638,0.1282051,0.8333333,0.2112676,0.2777778
f1,0.4710682,0.282976,0.8648649,0.1904762,0.6666667,0.3,0.3333333
f2,0.5477909,0.1996527,0.8791209,0.3703704,0.5555556,0.5172414,0.4166667
lift_top_group,4.1222224,8.08924,2.1111112,0.0,18.5,0.0,0.0
logloss,0.4337158,0.2180643,0.7732567,0.5186943,0.222864,0.312507,0.3412567
max_per_class_error,0.4094958,0.1496116,0.15,0.4857143,0.5,0.4117647,0.5

Unnamed: 0,timestamp,duration,training_speed,epochs,iterations,samples,training_rmse,training_logloss,training_r2,training_auc,training_pr_auc,training_lift,training_classification_error
,2023-05-18 15:35:20,0.000 sec,,0.0,0,0.0,,,,,,,
,2023-05-18 15:35:20,14 min 57.881 sec,77500 obs/sec,10.0,1,1860.0,0.3281038,0.4167286,0.132464,0.808642,0.4972985,6.8888889,0.1129032
,2023-05-18 15:35:25,15 min 2.890 sec,87865 obs/sec,2370.0,237,440820.0,0.2488482,0.2048569,0.5009615,0.9367575,0.7875639,6.8888889,0.0698925
,2023-05-18 15:35:30,15 min 7.891 sec,98158 obs/sec,5280.0,528,982080.0,0.2480757,0.2025354,0.504055,0.9439786,0.7986576,6.8888889,0.0752688
,2023-05-18 15:35:33,15 min 10.849 sec,101425 obs/sec,7060.0,706,1313160.0,0.2461841,0.1987398,0.5115894,0.9430468,0.8008137,6.8888889,0.0698925

variable,relative_importance,scaled_importance,percentage
winner_sag,1.0,1.0,0.1255337
winner_gg_drama,0.8694185,0.8694185,0.1091413
nominations,0.8031203,0.8031203,0.1008186
winner_dga,0.6849936,0.6849936,0.0859898
winner_pga,0.6611996,0.6611996,0.0830028
nom_gg_drama,0.6321517,0.6321517,0.0793563
nom_bafta,0.4765725,0.4765725,0.0598259
winner_bafta,0.4700401,0.4700401,0.0590059
nom_dga,0.42396,0.42396,0.0532213
year,0.4210126,0.4210126,0.0528513


In [14]:
lb.as_data_frame().to_csv('./basic_model/leaderboard.csv')

## Predict the winner

In [9]:
# Predict on 2019's films
test = full_table.loc[(full_table['year'] == 2019)]

# Import a binary outcome train/test set into H2O
test = h2o.H2OFrame(test)

# For binary classification, response should be a factor
test[y] = test[y].asfactor()

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [10]:
preds = top_model.predict(test)

preds

xgboost prediction progress: |████████████████████████████████████████████| 100%


predict,p0,p1
0,0.872341,0.127659
0,0.871212,0.128788
0,0.871916,0.128084
0,0.871639,0.128361
0,0.872341,0.127659
0,0.872341,0.127659
1,0.508026,0.491974
0,0.871212,0.128788
0,0.805078,0.194922




In [11]:
test['pred'] = preds['predict']
test['probA'] = preds['p1']
test_pd = test.as_data_frame(use_pandas=True)

In [12]:
final_rankings = test_pd[['film','probA']].sort_values('probA', ascending = False)
final_rankings['%_confidence'] = final_rankings['probA']/final_rankings['probA'].sum() * 100
final_rankings

Unnamed: 0,film,probA,%_confidence
6,1917 (2019 film),0.491974,31.061029
8,Parasite (2019 film),0.194922,12.306475
1,The Irishman,0.128788,8.131125
7,Once Upon a Time in Hollywood,0.128788,8.131125
3,Joker (2019 film),0.128361,8.104137
2,Jojo Rabbit,0.128084,8.086673
0,Ford v Ferrari,0.127659,8.059812
4,Little Women (2019 film),0.127659,8.059812
5,Marriage Story,0.127659,8.059812


# And the Oscar goes to...

In [13]:
bp_winner = np.array(final_rankings.reset_index())[0][1].split('(')[0].strip()
print(f'And the Oscar goes to...\n🎉🏆{bp_winner}🏆🎉')

And the Oscar goes to...
🎉🏆1917🏆🎉
