# Useful Notebook: Report Testing Data Prediction Probabilities and Illustrate Accessing Evaluation Metrics
**This notebook will (1) show users how to access all model evaluation metrics from internal pickle files, and (2) generate model (class 1) prediction probabilities for instances of the respective testing dataset.**

*This notebook is designed to run after having run STREAMLINE (at least phases 1-6) and will use the files from a specific STREAMLINE experiment folder, as well as save new output files to that same folder.*

***
## Notebook Details
STREAMLINE outputs pickled objects with (1) all the metric results, (2) elements needed to build the ROC and PRC plots, as well as (3) the prediction probabilities on the testing data across all datasets, algorithm models, and CV dataset partitions. 

This notebook illustrates how the user can access the pickled metric information saved as a list object. 

It includes (1) grabbing and calculating all average metric scores over the CV partitions, (2) grabbing the elements needed to build the average ROC plot, (3) grabbing the elementes needed to build the average PRC plot, (4) grabbing and reporting average model feature importance scores, and (5) grabbing and reporting the model testing prediction probabilities for each instance of the dataset. 

When run, this last item will generate a new folder (`prediction_probas`) in the pipeline's output experiment folder in the `model_evaluation` folder for each dataset. Here the class 1 prediction probabilities are reported as a `.csv` file for each algorithm and CV partition pair. In these files is the instance's true outcome value, the unique instance ID, and the predicted probability of the instance being class 1 (i.e. which typically encodes cases or the less frequent class). 
 

***
## Notebook Run Parameters
* This notbook has been set up to run 'as-is' on the experiment folder generated when running the demo of STREAMLINE in any mode (if no run parameters were changed). 
* If you have run STREAMLINE on different target data or saved the experiment to some other folder outside of STREAMLINE, you need to edit `experiment_path` below to point to the respective experiment folder.

In [1]:
experiment_path = "../DemoOutput/demo_experiment" # path the target experiment folder 
target_data_list = None # None if user wants to generate output for all analyzed target datasets, otherwise provide a (str) list of target dataset names to run
algorithms = [] # use empty list if user wishes re-evaluate all modeling algorithms that were run in pipeline, otherwise specify a (str) list of algorithm identifiers.

***
## Housekeeping
### Import Packages

In [2]:
import os
import pandas as pd
import pickle
import numpy as np
from statistics import mean
from scipy import interp,stats
import warnings
warnings.filterwarnings('ignore')

# Jupyter Notebook Hack: This code ensures that the results of multiple commands within a given cell are all displayed, rather than just the last. 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### Automatically Detect Dataset Names

In [3]:
# Get dataset paths for all completed dataset analyses in experiment folder
datasets = os.listdir(experiment_path)

# Name of experiment folder
experiment_name = experiment_path.split('/')[-1] 

datasets = os.listdir(experiment_path)
remove_list = ['.DS_Store', 'metadata.pickle', 'metadata.csv', 'algInfo.pickle',
                'DatasetComparisons', 'jobs', 'jobsCompleted', 'logs',
                'KeyFileCopy', 'dask_logs',
                experiment_name + '_ML_Pipeline_Report.pdf']
for text in remove_list:
    if text in datasets:
        datasets.remove(text)

datasets = sorted(datasets) # ensures consistent ordering of datasets
print("Analyzed Datasets: " + str(datasets))

Analyzed Datasets: ['hcc_data', 'hcc_data_custom']


### Load Other Necessary Parameters

In [4]:
# Unpickle metadata from previous phase
file = open(experiment_path + '/' + "metadata.pickle", 'rb')
metadata = pickle.load(file)
file.close()
# Load variables specified earlier in the pipeline from metadata
class_label = metadata['Class Label']
instance_label = metadata['Instance Label']
cv_partitions = int(metadata['CV Partitions'])

# Unpickle algorithm information from previous phase
file = open(experiment_path + '/' + "algInfo.pickle", 'rb')
algInfo = pickle.load(file)
file.close()
algorithms = []
abbrev = {}
for key in algInfo:
    if algInfo[key][0]: # If that algorithm was used
        algorithms.append(key)
        abbrev[key] = (algInfo[key][1])

print("Algorithms Ran: " + str(algorithms))

Algorithms Ran: ['Decision Tree', 'Logistic Regression', 'Naive Bayes']


***
## From Pickle: Extract Metric List and Cacluate CV Averages

In [5]:
def print_results(algorithm, full_path):
    # Define evaluation stats variable lists
        s_bac = [] # balanced accuracies
        s_ac = [] # standard accuracies
        s_f1 = [] # F1 scores
        s_re = [] # recall values
        s_sp = [] # specificities
        s_pr = [] # precision values
        s_tp = [] # true positives
        s_tn = [] # true negatives
        s_fp = [] # false positives
        s_fn = [] # false negatives
        s_npv = [] # negative predictive values
        s_lrp = [] # likelihood ratio positive values
        s_lrm = [] # likelihood ratio negative values
        
        aucs = [] #areas under ROC curve
        praucs = [] #area under PRC curve
        aveprecs = [] #average precisions for PRC
        
        for cv_count in range(0, cv_partitions): #loop through cv's
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            metric_list = results[0] #First item in pickled list is the metric list (set of standard classification metrics)
            roc_auc = results[3] #Fourth item is the ROC AUC
            prec_rec_auc = results[6] #Seventh item is the PRC AUC
            ave_prec = results[7] #Eighth item is the average precision of PRC
            
            #Separate metrics from metricList
            s_bac.append(metric_list[0])
            s_ac.append(metric_list[1])
            s_f1.append(metric_list[2])
            s_re.append(metric_list[3])
            s_sp.append(metric_list[4])
            s_pr.append(metric_list[5])
            s_tp.append(metric_list[6])
            s_tn.append(metric_list[7])
            s_fp.append(metric_list[8])
            s_fn.append(metric_list[9])
            s_npv.append(metric_list[10])
            s_lrp.append(metric_list[11])
            s_lrm.append(metric_list[12])
            
            aucs.append(roc_auc)
            praucs.append(prec_rec_auc)
            aveprecs.append(ave_prec)
            
        results = {'Balanced Accuracy': mean(s_bac), 'Accuracy': mean(s_ac), 
                   'F1_Score': mean(s_f1), 'Sensitivity (Recall)': mean(s_re), 
                   'Specificity': mean(s_sp),'Precision (PPV)': mean(s_pr), 
                   'TP': mean(s_tp), 'TN': mean(s_tn), 'FP': mean(s_fp), 
                   'FN': mean(s_fn), 'NPV': mean(s_npv), 'LR+': mean(s_lrp), 
                   'LR-': mean(s_lrm), 'ROC_AUC': mean(aucs),'PRC_AUC': mean(praucs), 
                   'PRC_APS': mean(aveprecs)}
        print(results)

In [6]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)

for each in datasets: 
    print("---------------------------------------")
    print("Dataset: "+str(each))
    print("---------------------------------------")
    full_path = experiment_path + '/' + each
    for algorithm in algorithms: #loop through algorithms
        print("Algorithm: "+str(algorithm))
        print_results(algorithm, full_path)

---------------------------------------
Dataset: hcc_data
---------------------------------------
Algorithm: Decision Tree
{'Balanced Accuracy': 0.6582633053221288, 'Accuracy': 0.6787878787878788, 'F1_Score': 0.572663139329806, 'Sensitivity (Recall)': 0.5714285714285714, 'Specificity': 0.7450980392156863, 'Precision (PPV)': 0.6203463203463203, 'TP': 12, 'TN': 25, 'FP': 8, 'FN': 9, 'NPV': 0.7433481152993348, 'LR+': 2.9634920634920627, 'LR-': 0.561941251596424, 'ROC_AUC': 0.7163865546218487, 'PRC_AUC': 0.6373904038003011, 'PRC_APS': 0.5794160399202416}
Algorithm: Logistic Regression
{'Balanced Accuracy': 0.6942110177404295, 'Accuracy': 0.696969696969697, 'F1_Score': 0.6297674418604651, 'Sensitivity (Recall)': 0.6825396825396826, 'Specificity': 0.7058823529411765, 'Precision (PPV)': 0.5893416927899686, 'TP': 14, 'TN': 24, 'FP': 10, 'FN': 6, 'NPV': 0.7871017871017871, 'LR+': 2.3566137566137564, 'LR-': 0.4458041958041958, 'ROC_AUC': 0.780578898225957, 'PRC_AUC': 0.6878193986710414, 'PRC_APS

## From Pickle: Extract list of true and false positive rates for constructing ROC

In [7]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)

for each in datasets: 
    print("---------------------------------------")
    print("Dataset: "+str(each))
    print("---------------------------------------")
    full_path = experiment_path+ '/' + each
    for algorithm in algorithms: #loop through algorithms
        print("Algorithm: "+str(algorithm))
        # Define evaluation stats variable lists
        tprs = [] # true postitive rates
        mean_fpr = np.linspace(0, 1, 100) # used to plot all CVs in single ROC plot
        
        for cv_count in range(0, cv_partitions): #loop through cv's
            # Load pickled metric file for given algorithm and cv =
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            fpr = results[1]
            tpr = results[2]

            tprs.append(interp(mean_fpr, fpr, tpr))
            tprs[-1][0] = 0.0

        results = {'tprs': np.mean(tprs, axis=0)}
        
        print(results)
        #print('fprs: '+str(mean_fpr))

---------------------------------------
Dataset: hcc_data
---------------------------------------
Algorithm: Decision Tree
{'tprs': array([0.        , 0.04633638, 0.09267276, 0.13900914, 0.18534552,
       0.2316819 , 0.28763829, 0.29854097, 0.30944364, 0.32063492,
       0.33262787, 0.34462081, 0.36238576, 0.39073272, 0.41907969,
       0.44045214, 0.45299022, 0.4655283 , 0.47806638, 0.49060446,
       0.50314254, 0.51736412, 0.53262787, 0.54789161, 0.56430976,
       0.58120891, 0.59810806, 0.61500722, 0.63190637, 0.64880552,
       0.6572299 , 0.66452431, 0.67181872, 0.67911313, 0.68640754,
       0.69370195, 0.70099636, 0.70829077, 0.71558518, 0.72287959,
       0.730174  , 0.73746841, 0.74476282, 0.75205723, 0.75888133,
       0.76472205, 0.77056277, 0.77640349, 0.78224421, 0.78808493,
       0.79345839, 0.7983646 , 0.8032708 , 0.80817701, 0.81308321,
       0.81798942, 0.82289562, 0.82780183, 0.83270803, 0.83761424,
       0.84252044, 0.84742665, 0.85233285, 0.85723906, 0.8621452

## From Pickle: Extract list of precision and recall values for constructing PRC

In [8]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)

for each in datasets: 
    print("---------------------------------------")
    print("Dataset: "+str(each))
    print("---------------------------------------")
    full_path = experiment_path + '/' + each
    for algorithm in algorithms: #loop through algorithms
        print("Algorithm: "+str(algorithm))
        # Define evaluation stats variable lists
        precs = [] # true postitive rates
        mean_recall = np.linspace(0, 1, 100) # used to plot all CVs in single PRC plot
        
        for cv_count in range(0, cv_partitions): #loop through cv's
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            prec = results[4]
            recall = results[5]

            precs.append(interp(mean_recall, recall, prec))

        results = {'precs': np.mean(precs, axis=0)}

        print(results)
        #print('recall: '+str(mean_recall))

---------------------------------------
Dataset: hcc_data
---------------------------------------
Algorithm: Decision Tree
{'precs': array([1.        , 0.98133432, 0.96266864, 0.94400297, 0.92533729,
       0.90667161, 0.88800593, 0.86934026, 0.85067458, 0.8320089 ,
       0.81334322, 0.79467754, 0.77601187, 0.75734619, 0.73868051,
       0.73041293, 0.72387837, 0.71734381, 0.71080924, 0.70559743,
       0.70832213, 0.71104683, 0.71377152, 0.71649622, 0.71659728,
       0.71320015, 0.70980302, 0.70640589, 0.70300877, 0.69961164,
       0.69621451, 0.69281738, 0.68942025, 0.68602313, 0.68561039,
       0.68519765, 0.68478491, 0.68437217, 0.68395944, 0.6835467 ,
       0.68313396, 0.68272122, 0.68230848, 0.68082057, 0.67852627,
       0.67623197, 0.67393767, 0.67164337, 0.66934907, 0.66705477,
       0.66476047, 0.66246617, 0.62265019, 0.62095199, 0.61925378,
       0.61755557, 0.61585737, 0.61353798, 0.61039034, 0.6072427 ,
       0.60409506, 0.60094743, 0.59527453, 0.58859154, 0.581908

## From Pickle: Extract Average Model Feature Importance Estimates (Over CVs)

In [9]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)
print("Dataset: " + str(datasets))

for each in datasets: 
    print("---------------------------------------")
    print("Dataset: "+str(each))
    print("---------------------------------------")
    full_path = experiment_path + '/' + each
    original_headers = pd.read_csv(full_path + "/exploratory/ProcessedFeatureNames.csv", sep=',').columns.values.tolist() # Get Original Headers
    for algorithm in algorithms: #loop through algorithms
        print("Algorithm: "+str(algorithm))
        # Define evaluation stats variable lists
        FI_ave = [0] * len(original_headers)  # used to save average FI scores over all cvs. (all original features in dataset prior to feature selection included)
        
        for cv_count in range(0, cv_partitions): # loop through cv's
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            fi = results[8]
            
            # Format feature importance scores as list (takes into account that all features are not in each CV partition)
            tempList = []
            j = 0
            headers = pd.read_csv(full_path + '/CVDatasets/' + each + '_CV_' + str(cv_count) + '_Test.csv').columns.values.tolist()
            if instance_label != None: 
                headers.remove(instance_label)
            headers.remove(class_label)
            for feature in original_headers:
                if feature in headers:  # Check if current feature from original dataset was in the partition
                    # Deal with features not being in original order (find index of current feature list.index()
                    f_index = headers.index(feature)
                    FI_ave[j] += fi[f_index]
                j += 1
            
        #Turn FI sums into averages
        for i in range(0, len(FI_ave)):
            FI_ave[i] = FI_ave[i] / float(cv_partitions)

        fi_dict = {}
        for key in original_headers:
            for value in FI_ave:
                fi_dict[key] = value
                FI_ave.remove(value)
                break  
                
        print(fi_dict)

Dataset: ['hcc_data', 'hcc_data_custom']
---------------------------------------
Dataset: hcc_data
---------------------------------------
Algorithm: Decision Tree
{'Gender': 0.0, 'Symptoms': 0.040989729225023336, 'Alcohol': 0.0, 'Hepatitis B Surface Antigen': 0.0, 'Hepatitis B e Antigen': 0.0, 'Hepatitis B Core Antibody': 0.0, 'Hepatitis C Virus Antibody': 0.0, 'Cirrhosis': 0.0, 'Endemic Countries': 0.0, 'Smoking': 0.014519140989729215, 'Diabetes': 0.0, 'Obesity': 0.0, 'Hemochromatosis': 0.0, 'Arterial Hypertension': 0.0, 'Chronic Renal Insufficiency': 0.0, 'Human Immunodeficiency Virus': 0.0, 'Nonalcoholic Steatohepatitis': 0.0, 'Esophageal Varices': 0.0, 'Splenomegaly': 0.0, 'Portal Hypertension': 0.0, 'Portal Vein Thrombosis': 0.0, 'Liver Metastasis': 0.0, 'Radiological Hallmark': 0.0, 'Age at diagnosis': 0.0, 'Grams of Alcohol per day': 0.0, 'Packs of cigarets per year': 0.043569094304388434, 'Performance Status*': 0.0, 'Encephalopathy degree*': 0.0, 'Ascites degree*': 0.0, 'Inter

## Extract and Output Testing Data Prediction Probabilities 

In [10]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)

for each in datasets: 
    print("---------------------------------------")
    print("Dataset: "+str(each))
    print("---------------------------------------")

    full_path = experiment_path + '/' + each
    
    # Make folder in experiment folder/datafolder to store all prediction probabilities per algorithm/CV combination
    if not os.path.exists(full_path + '/model_evaluation/prediction_probas'):
        os.mkdir(full_path + '/model_evaluation/prediction_probas')
        
    original_headers = pd.read_csv(full_path + "/exploratory/OriginalFeatureNames.csv", sep=',').columns.values.tolist() #Get Original Headers
    for algorithm in algorithms: #loop through algorithms
        print("Algorithm: "+str(algorithm))

        for cv_count in range(0,cv_partitions): #loop through cv's
            print("CV: "+str(cv_count))
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Load associated testing dataset
            test_data = pd.read_csv(full_path + '/CVDatasets/'+each+'_CV_' + str(cv_count) + '_Test.csv')
            probas_summary = test_data[[class_label,instance_label]]

            #Separate pickled results
            probas_ = results[9]
            print(probas_[:,1])
            probas_summary['1_prob'] = probas_[:,1]
            file_name = full_path + '/model_evaluation/prediction_probas/' + algorithm + '_CV_'+str(cv_count) + '_class1_probas.csv'
            probas_summary.to_csv(file_name, index=False)

---------------------------------------
Dataset: hcc_data
---------------------------------------
Algorithm: Decision Tree
CV: 0
[0.17647059 0.         0.         0.17647059 0.74074074 0.66666667
 0.52       0.8        0.17647059 0.52       0.74074074 0.52
 0.         0.         0.52       0.74074074 0.         0.17647059
 0.52       0.74074074 0.17647059 0.8        0.74074074 0.74074074
 0.         0.         0.         0.74074074 0.74074074 0.
 0.         0.52       0.         0.52       0.74074074 0.
 0.         0.66666667 0.8        0.8        0.8        0.74074074
 0.66666667 0.52       0.17647059 0.         0.8        0.17647059
 0.52       0.66666667 0.52       0.66666667 0.74074074 0.52
 0.        ]
CV: 1
[0.14285714 0.         0.         0.14285714 0.72727273 0.42857143
 1.         0.14285714 0.42857143 0.         0.         0.14285714
 0.42857143 1.         0.14285714 0.42857143 0.42857143 0.42857143
 0.57142857 0.14285714 0.42857143 0.42857143 1.         0.14285714
 0.571428

Algorithm: Logistic Regression
CV: 0
[0.49937902 0.49989999 0.49968432 0.50016151 0.50029795 0.50049514
 0.49996443 0.50069195 0.50012185 0.50039792 0.50022023 0.49977679
 0.49949293 0.49950127 0.50033799 0.49984405 0.49954262 0.49962425
 0.49986198 0.49978206 0.50037223 0.4996494  0.50004679 0.50064132
 0.49927522 0.50011289 0.49949689 0.49988253 0.49972181 0.49946543
 0.49950785 0.50093662 0.49939134 0.5000207  0.50000344 0.49911686
 0.49944021 0.49984783 0.50057227 0.5000306  0.50095936 0.50058817
 0.50003119 0.50029857 0.49998768 0.49955158 0.49985027 0.49977487
 0.49995249 0.4999719  0.50017748 0.50015648 0.50024103 0.50009393
 0.49986369]
CV: 1
[0.50745122 0.56005795 0.46815172 0.56137637 0.47656185 0.44202094
 0.52030529 0.50860368 0.4301014  0.5844921  0.37770534 0.62265053
 0.4368264  0.71550092 0.4751473  0.50836567 0.52641062 0.50150488
 0.49904259 0.92519694 0.51981221 0.58312777 0.48446665 0.57632342
 0.53661365 0.47938174 0.45880153 0.56983433 0.51990444 0.53553996
 0.646