## ML Model Evaluation
Evaluate and compare different classication models to find the best performing algorithm to predict the target feature of the data set.

### 1. Import the framework and generate the models for comparison

For the general steps, the methods of the ML_Framework.py will be used, but for sophisticated predictions, some other libraries are necessary as well. The most crucial point is the data splitting for training and testing, since the percentage of drafted players is relatively low (around 1%). 

So to prepare the model for more accurate performance, it is necessary to somehow create similarly distributed folds. For this purpose, I will use Stratified K-Fold Cross Validation. This method splits the data into a training and a test set and repeats the splitting for a specified number of times (parameter: number of splits). Each split is followed by fitting the model to the training data, and performing predictions on the test set. Finally, the evaluation of the prediction takes place to analyze the performance of each and every iteration. 

In [1]:
import ML_Framework
from statistics import mean, stdev
from sklearn import preprocessing
from sklearn.model_selection import StratifiedKFold
import numpy as np

### 2. Define different classification models for comparison

The ML_Framework.py file contains a Model class. When the Model class is called, a Model instance is going to be created. 

In [2]:
random_model = ML_Framework.Model() # random model
logreg_model = ML_Framework.Model(user_defined_model='lr',logreg_params=['l2','lbfgs']) # logistic regression model with 'l2' penalty and 'lbfgs' solver settings
dtree_model = ML_Framework.Model(user_defined_model='dt', dectree_params=['best',10,'gini']) # decision tree model with 'best' splitter, maximum depth = 10 and 'gini' criterion settings
randfor_model = ML_Framework.Model(user_defined_model='rf',randfor_params=[20,5,'gini']) # random forest model with number of estimator trees = 20, maximum depth = 5 and 'gini' criterion settings

### 3. Transform the input data for all models using the framework's transform() method
The transform method applies all data cleansing, transforming and handling of missing values steps from the EDA.ipynb file's 2nd and 3rd sections.

In [3]:
random_model.transform()
logreg_model.transform()
dtree_model.transform()
randfor_model.transform()

### 4. Model evaluation using Stratified K-Fold Cross Validation
After cleaning the data and separating the predictors from the target feature, the next steps would be the model fitting, prediction and evaluation in the very same order as they were listed. However, for more accurate results, these steps will be applied repeatedly, many times in a loop using the stratified k-fold cross validation. The main point of stratified K-fold is that this validator preserves the percentage of samples for each class, which gives a more realistic training and test split for the model.

Each of the following subsections focuses on one of the four different models created in the previous step. At the end of each subsection the output of the certain cell will be the evaluation of the actual model performance.

#### 4.1 Random model evaluation

Apply different number of splits, then fit, predict and evaluate the model's performance on those splits.

In [4]:
# for the random model it is necessary to ignore cases where the model was unable to predict the target feature
import warnings
warnings.filterwarnings('ignore')

for n in [10,50,100]:
    skf = StratifiedKFold(n_splits=n, shuffle=True, random_state=1)
    prec_skf = []

    for train_index, test_index in skf.split(random_model.X_scaled, random_model.y):
        # split X and y
        random_model.X_train, random_model.X_test = random_model.X_scaled[train_index], random_model.X_scaled[test_index]
        random_model.y_train, random_model.y_test = random_model.y[train_index], random_model.y[test_index]
        
        # evaluate the model
        cm, cr = random_model.evaluate(random_model.random_model())

        prec_skf.append(cr.iloc[1,1]) # precision for drafted_flag = 1 predictions

    # print out evaluation results
    print('\nNumber of splits: {}'.format(n))
#    print('List of possible accuracy:', accuracy_skf)
    print('Maximum Precision:',
        max(prec_skf)*100, '%')
    print('Minimum Precision:',
        min(prec_skf)*100, '%')
    print('Overall Precision:',
        mean(prec_skf)*100, '%')
    print('Standard Deviation:', stdev(prec_skf))


Number of splits: 10
Maximum Precision: 3.3333333333333335 %
Minimum Precision: 0.0 %
Overall Precision: 0.8306010928961749 %
Standard Deviation: 0.011763944218474078

Number of splits: 50
Maximum Precision: 8.333333333333332 %
Minimum Precision: 0.0 %
Overall Precision: 0.6666666666666666 %
Standard Deviation: 0.022837292968155808

Number of splits: 100
Maximum Precision: 16.666666666666664 %
Minimum Precision: 0.0 %
Overall Precision: 0.5 %
Standard Deviation: 0.028574434666294213


It is visible, that the random model has a really bad performance predicting the players who are going to be drafted. However, the precision of the prediction of drafted players gets better when the number of splits is higher, but after 100 splits, the maximum precision score of predicting drafted players is still around 33%, which is a pretty low value.

#### 4.2 Logistic Regression model evaluation

In [5]:
for n in [10,50,100]:
    skf = StratifiedKFold(n_splits=n, shuffle=True, random_state=1)
    prec_skf = []

    for train_index, test_index in skf.split(logreg_model.X_scaled, logreg_model.y):
        # split X and y
        logreg_model.X_train, logreg_model.X_test = logreg_model.X_scaled[train_index], logreg_model.X_scaled[test_index]
        logreg_model.y_train, logreg_model.y_test = logreg_model.y[train_index], logreg_model.y[test_index]
        
        # fit the model
        logreg_model.fit()

        # predict target feature values and evaluate the model
        cm, cr = logreg_model.evaluate(logreg_model.predict())

        prec_skf.append(cr.iloc[1,1]) # precision for drafted_flag = 1 predictions

    # print out evaluation results
    print('\nNumber of splits: {}'.format(n))
#    print('List of possible precision:', prec_skf)
    print('Maximum Precision:',
        max(prec_skf)*100, '%')
    print('Minimum Precision:',
        min(prec_skf)*100, '%')
    print('Overall Precision:',
        mean(prec_skf)*100, '%')
    print('Standard Deviation:', stdev(prec_skf))


Number of splits: 10
Maximum Precision: 40.0 %
Minimum Precision: 19.672131147540984 %
Overall Precision: 28.42896174863388 %
Standard Deviation: 0.06767814040748778

Number of splits: 50
Maximum Precision: 58.333333333333336 %
Minimum Precision: 8.333333333333332 %
Overall Precision: 28.423076923076923 %
Standard Deviation: 0.13799980427797415

Number of splits: 100
Maximum Precision: 83.33333333333334 %
Minimum Precision: 0.0 %
Overall Precision: 28.547619047619044 %
Standard Deviation: 0.18739985071305545


The Logistic Regression model's performance reaches 83% after 100 splits, which means it is able to predict about 4 out of 5 drafted players correctly. The LogReg beats the random algorithm, and the it provides a good predictive model to find those players who are going to be drafted.

However, the overall precision is under 30% with almost 20% standard deviation (in this context it means the standard deviation of the percentage), which means the high performance is not a usual phenomenom. Unfortunately, it is not a stable model for this prediction.

#### 4.3 Decision Tree model evaluation

In [6]:
for n in [10,50,100]:
    skf = StratifiedKFold(n_splits=n, shuffle=True, random_state=1)
    prec_skf = []

    for train_index, test_index in skf.split(dtree_model.X_scaled, dtree_model.y):
        # split X and y
        dtree_model.X_train, dtree_model.X_test = dtree_model.X_scaled[train_index], dtree_model.X_scaled[test_index]
        dtree_model.y_train, dtree_model.y_test = dtree_model.y[train_index], dtree_model.y[test_index]
        
        # fit the model
        dtree_model.fit()

        # predict target feature values and evaluate the model
        cm, cr = dtree_model.evaluate(dtree_model.predict())

        prec_skf.append(cr.iloc[1,1]) # precision for drafted_flag = 1 predictions

    # print out evaluation results
    print('\nNumber of splits: {}'.format(n))
#    print('List of possible precision:', prec_skf)
    print('Maximum Precision:',
        max(prec_skf)*100, '%')
    print('Minimum Precision:',
        min(prec_skf)*100, '%')
    print('Overall Precision:',
        mean(prec_skf)*100, '%')
    print('Standard Deviation:', stdev(prec_skf))


Number of splits: 10
Maximum Precision: 43.333333333333336 %
Minimum Precision: 24.59016393442623 %
Overall Precision: 33.404371584699454 %
Standard Deviation: 0.05018755404797663

Number of splits: 50
Maximum Precision: 58.333333333333336 %
Minimum Precision: 0.0 %
Overall Precision: 35.205128205128204 %
Standard Deviation: 0.14770491689880583

Number of splits: 100
Maximum Precision: 83.33333333333334 %
Minimum Precision: 0.0 %
Overall Precision: 33.88095238095238 %
Standard Deviation: 0.20117428802302334


The decision tree algorithm performs slightly better than the previous logistic regression, but it faces the same problem: low overall precision value and high standard deviation, which means this model works very efficiently in case of a specific data split, but it does not perform well in general, just like the logistic regression model.

#### 4.4 Random Forest model evaluation

In [7]:
for n in [10,50,100]:
    skf = StratifiedKFold(n_splits=n, shuffle=True, random_state=1)
    prec_skf = []

    for train_index, test_index in skf.split(randfor_model.X_scaled, randfor_model.y):
        # split X and y
        randfor_model.X_train, randfor_model.X_test = randfor_model.X_scaled[train_index], randfor_model.X_scaled[test_index]
        randfor_model.y_train, randfor_model.y_test = randfor_model.y[train_index], randfor_model.y[test_index]
        
        # fit the model
        randfor_model.fit()

        # predict target feature values and evaluate the model
        cm, cr = randfor_model.evaluate(randfor_model.predict())

        prec_skf.append(cr.iloc[1,1]) # precision for drafted_flag = 1 predictions

    # print out evaluation results
    print('\nNumber of splits: {}'.format(n))
#    print('List of possible precision:', prec_skf)
    print('Maximum Precision:',
        max(prec_skf)*100, '%')
    print('Minimum Precision:',
        min(prec_skf)*100, '%')
    print('Overall Precision:',
        mean(prec_skf)*100, '%')
    print('Standard Deviation:', stdev(prec_skf))


Number of splits: 10
Maximum Precision: 20.0 %
Minimum Precision: 9.836065573770492 %
Overall Precision: 13.633879781420767 %
Standard Deviation: 0.039472233356886086

Number of splits: 50
Maximum Precision: 41.66666666666667 %
Minimum Precision: 0.0 %
Overall Precision: 13.448717948717947 %
Standard Deviation: 0.0961973562816552

Number of splits: 100
Maximum Precision: 50.0 %
Minimum Precision: 0.0 %
Overall Precision: 14.309523809523808 %
Standard Deviation: 0.14215375174880387


The random forest model performed worse than all previously built models (except the random model), with the maximum precision value of 50% and only 14% overall precision for the drafted players. It also had a quite large standard deviation of 14%.

### 5. Conclusion

The random model had a really low precision score, so it was an easy task for the models to beat its score. Finally, all constructed machine learning algorithms were able to surpass the random model's performance. 
However, amongst the defined and built models, the decision tree classifier had the highest maximum precision value, but the logistic regression reached almost the same value for the maximum precision.

From another point of view, neither model was able to produce a good overall precision score, which means that during the iteration of the cross validation, there were a few good splits (or just the one with the maximum precision), which can be used for training an accurate model.

### 6. Further steps

The machine learning task for this project was to find those college basketball players who are most likely going to be selected on the NBA draft. 
The goal was to make a good predictive model, which can provide high precision results. This was a binary task, since the target feature has only two outcomes: a player can be selected, or not selected on the draft. 
However, based on the input data, there are many other features that could be predicted, such as which NBA team will select a certain player on the draft, or in which round will that player be selected (first or second)?

On the other hand, the algorithms introduced above have hyperparameters. Those hyperparameters were not tuned during the model building steps, which means there might be better performing versions and settings of these models. 
Also, it is important to mention that the cross validation process had its own limitations, such as computer memory. Therefore, a more powerful machine might be able to run the model on 1000 different data splits which might lead to better precision scores.