# Libraries

In [1]:
import pickle
from sklearn.metrics import log_loss
from __future__ import print_function

# Get Data & Models For Testing

In [2]:
path = "/Users/davidziganto/Repositories/Synthetic_Dataset_Generation/pickle_files/py35/"

with open(path + "X_test_py35.pkl", 'rb') as picklefile: 
    X_test = pickle.load(picklefile)
    
with open(path + "y_test_py35.pkl", 'rb') as picklefile: 
    y_test = pickle.load(picklefile)

with open(path + "knn_needs_improvement_py35.pkl", 'rb') as picklefile: 
    knn_needs_improvement = pickle.load(picklefile)
    
with open(path + "rf_satisfactory_py35.pkl", 'rb') as picklefile: 
    rf_satisfactory = pickle.load(picklefile)

with open(path + "gbc_proficient_py35.pkl", 'rb') as picklefile: 
    gbc_proficient = pickle.load(picklefile)

# Calculate Log Loss

In [3]:
log_loss(y_test, knn_needs_improvement.predict_proba(X_test))

0.73921970647492385

In [4]:
log_loss(y_test, rf_satisfactory.predict_proba(X_test))

0.59110226449157899

In [5]:
log_loss(y_test, gbc_proficient.predict_proba(X_test))

0.58273276192776557

# Create Pickle_Dict

In [6]:
pickle_dict = {'knn':knn_needs_improvement, 'rf_':rf_satisfactory, 'gbc':gbc_proficient}

# Auto_Score()

In [7]:
def auto_score(pickle_dict):
    '''
    Input:
        pickle_dict: dictionary where key is username | ID and value is model
    Output:
        username : [log loss value, classification] 
    '''
    for k, v in pickle_dict.items():
        score = log_loss(y_test, v.predict_proba(X_test))
        if score < 0.59:
            pickle_dict[k] = [score, "Proficient"]
        elif score <= 0.62:
            pickle_dict[k] = [score, "Satisfactory"]
        else:
            pickle_dict[k] = [score, "Needs Improvement"]
            
    return pickle_dict

In [8]:
output = auto_score(pickle_dict)
output

{'gbc': [0.58273276192776557, 'Proficient'],
 'knn': [0.73921970647492385, 'Needs Improvement'],
 'rf_': [0.59110226449157899, 'Satisfactory']}

In [9]:
for k,v in output.items():
    print(k, output[k][0], output[k][1])

gbc 0.582732761928 Proficient
rf_ 0.591102264492 Satisfactory
knn 0.739219706475 Needs Improvement


# Rationale For Scoring

Default settings for Decision Trees, KNN, Logistic Regression, Multinomial Naive Bayes, and Random Forest yield log loss values on the test set of:

|Algo|Log Loss|Type|
|---|---|---|
|DT|13.228|Std|
|GBC|0.582|Adv|
|KNN|2.321|Std|
|LR|0.589|Std|
|NB|0.922|Std|
|RF|0.622|Std|


**Std=standard algorithm**; 
**Adv=advanced algorithm**

Tuned versions yield the following log loss values:

|Algo|Log Loss|Type|
|---|---|---|
|DT|0.976|Std|
|GBC|0.583|Adv|
|KNN|0.739|Std|
|LR|0.590|Std|
|NB|0.922|Std|
|RF|0.591|Std|

The goal here is to determine each invidual's skill level in achieving performant modeling results. As such, it made sense to set the threshold for *satisfactory* at a level below all but the lowest of log loss values yielded by default model settings. The threshold is set at 0.62.

Furthermore, in an attempt to separate the high-achieving students, a category called *proficient* is included. In order to achieve this status, a student must use modeling techniques either not covered or covered only in very little detail to achieve the required log loss value. Therefore, the threshold (0.590) was set just below the log loss value of a tuned random forest (0.591). For instance, a tuned gradient boosted classifier can achieve a log loss of 0.583.

**NOTES**
1. You may notice that running a stock logistic regression model will place a student in the *satisfactory* bucket. This is not ideal. The original goal was to set the satisfactory threshold just below the lowest log loss value achieved by default models. The dataset will require tweaking to get enough variance for this to make sense, so this is something that should be addressed in the next iteration.   
2. There will be some variablity with inherently non-deterministic algorithms like random forest. The techniques will be the same but the results may vary slightly due to how the model was seeded.  
3.  This is a first attempt. Score thresholds can be adjusted as we collect data in our pilot program. In other words, this is a WIP.