<h1 style="text-align:center">Creating a Hall of Fame Predictor</h1>

In the last notebook, we noticed that goalies who made the hall of fame had better W, GP, and career save %.  In this notebook, we will create a model to predict whether certain goalies who are not eligible for the HOF yet will make the hall of fame.  We will use a KNN classifier model for this task.

In [1]:
import pandas as pd

GoaliesData = pd.read_csv('GoalieBios1950-2020-Filtered-For-Analysis.csv', index_col = 0)
GoaliesData.head()

Unnamed: 0,Player,Country,Draft Position,1st Season,HOF,GP,W,Career Save %,GAA
1,Jeff Zatkoff,USA,74.0,20132014,N,48,18,0.91225,2.4925
5,Wendell Young,CAN,73.0,19851986,N,187,59,0.8768,3.861
9,Ken Wregget,CAN,45.0,19831984,N,575,225,0.885647,3.666471
10,Chris Worthy,GBR,,19681969,N,26,5,0.861333,4.706667
11,Gump Worsley,CAN,,19521953,Y,860,333,0.912842,2.812857


In [36]:
from sklearn.neighbors import KNeighborsClassifier

features = ['GP', 'W', 'Career Save %']

X = GoaliesData[features]
y = pd.get_dummies(GoaliesData.HOF)

KNN_model = KNeighborsClassifier(n_neighbors = 10)
KNN_model.fit(X,y)

KNeighborsClassifier(n_neighbors=10)

We have now created a KNN classifier model to predict whether a goalie will be in the hall of fame or not.  Let's test it out!

In [37]:
def is_he_HOF(name, model):
    prediction = model.predict(GoaliesData.loc[GoaliesData.Player == name][features])[0][1]
    if prediction == 1:
        print("He will be in the Hall of Fame!")
    else:
        print("Nope")

In [38]:
#Roberto Luongo first; likely to be a hall of famer

is_he_HOF('Roberto Luongo', KNN_model)

He will be in the Hall of Fame!


In [39]:
#Now let's try Mike Smith

is_he_HOF('Mike Smith', KNN_model)

Nope


In [40]:
#How about Marc Andre Fleury

is_he_HOF('Marc-Andre Fleury', KNN_model)

He will be in the Hall of Fame!


In [41]:
is_he_HOF('Ryan Miller', KNN_model)

Nope


In [42]:
is_he_HOF('Carey Price', KNN_model)

Nope


In [43]:
is_he_HOF('Henrik Lundqvist', KNN_model)

He will be in the Hall of Fame!


I created a few different predictor models for hall of fame prediction using decision trees and KNN.  This particular KNN classifier seemed to work the best compared to what one would think.  However, after reanalyzing the data, there are a few considerations for future models.  
<ol>
    <li>GP and W are correlated with HOF, but a lot of the goalies in the hall of fame have less games played, depending on the length of their careers, so win % may be a better metric to use in a future model</li>
    <li>It would be useful to find the data and create a column for # of stanley cups won to increase accuracy of the model </li>
    <li>In future models, it would be a good idea to scale up the Save % for goalies who played during the 1980's</li>
</ol>

In [45]:
#let's save the model so we can use it in the future

import joblib
joblib.dump(KNN_model, 'KNN_model.pkl')

['KNN_model.pkl']