# Accessing Pickled Metrics And Prediction Probabilities - Testing Data- (for STREAMLINE)
The pipeline outputs pickled objects with all the metric results, elements needed to build the ROC and PRC plots, as well as the prediction probabilities on the testing data across all datasets, algorithm models, and CV dataset partitions. 

This notebook illustrates how the user can access the pickled metric information saved as a list object. 

It includes (1) grabbing and calculating all average metric scores over the CV partitions, (2) grabbing the elements needed to build the average ROC plot, (3) grabbing the elementes needed to build the average PRC plot, (4) grabbing and reporting average model feature importance scores, and (5) grabbing and reporting the model testing prediction probabilities for each instance of the dataset. 

When run, this last item will generate a new folder in the pipeline's output experiment folder in the 'model_evaluation' folder for each dataset. Here the (case/i.e. code 1) prediction probabilities are reported as a .csv file for each algorithm and CV partition pair.  In these files is the instance's true outcome value, the unique instance ID, and the predicted probability of the instance being case/code 1. 
 

## Import Packages

In [2]:
import os
import pandas as pd
import pickle
import numpy as np
from statistics import mean
from scipy import interp,stats
import warnings
warnings.filterwarnings('ignore')

# Jupyter Notebook Hack: This code ensures that the results of multiple commands within a given cell are all displayed, rather than just the last. 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Set Run Parameters

In [3]:
experiment_path = "../demo/demo"
target_data_list = None # None if user wants to generate visualizations for all analyzed datasets otherwise list of datasets to run
algorithms = [] # use empty list if user wishes re-evaluate all modeling algorithms that were run in pipeline.

## Automatically Detect Dataset Names

In [4]:
# Get dataset paths for all completed dataset analyses in experiment folder
datasets = os.listdir(experiment_path)

# Name of experiment folder
experiment_name = experiment_path.split('/')[-1] 

datasets = os.listdir(experiment_path)
remove_list = ['.DS_Store', 'metadata.pickle', 'metadata.csv', 'algInfo.pickle',
                'DatasetComparisons', 'jobs', 'jobsCompleted', 'logs',
                'KeyFileCopy', 'dask_logs',
                experiment_name + '_ML_Pipeline_Report.pdf']
for text in remove_list:
    if text in datasets:
        datasets.remove(text)

datasets = sorted(datasets) # ensures consistent ordering of datasets
print("Analyzed Datasets: " + str(datasets))

Analyzed Datasets: ['hcc-data_example', 'hcc-data_example_custom']


## Load Other Necessary Parameters

In [5]:
# Unpickle metadata from previous phase
file = open(experiment_path + '/' + "metadata.pickle", 'rb')
metadata = pickle.load(file)
file.close()
# Load variables specified earlier in the pipeline from metadata
outcome_label = metadata['Class Label']
instance_label = metadata['Instance Label']
cv_partitions = int(metadata['CV Partitions'])

# Unpickle algorithm information from previous phase
file = open(experiment_path + '/' + "algInfo.pickle", 'rb')
algInfo = pickle.load(file)
file.close()
algorithms = []
abbrev = {}
for key in algInfo:
    if algInfo[key][0]: # If that algorithm was used
        algorithms.append(key)
        abbrev[key] = (algInfo[key][1])

print("Algorithms Ran: " + str(algorithms))

Algorithms Ran: ['Decision Tree', 'Logistic Regression', 'Naive Bayes']


## Extract Metric List and Cacluate CV Averages

In [6]:
def print_results(algorithm, full_path):
    # Define evaluation stats variable lists
        s_bac = [] # balanced accuracies
        s_ac = [] # standard accuracies
        s_f1 = [] # F1 scores
        s_re = [] # recall values
        s_sp = [] # specificities
        s_pr = [] # precision values
        s_tp = [] # true positives
        s_tn = [] # true negatives
        s_fp = [] # false positives
        s_fn = [] # false negatives
        s_npv = [] # negative predictive values
        s_lrp = [] # likelihood ratio positive values
        s_lrm = [] # likelihood ratio negative values
        
        aucs = [] #areas under ROC curve
        praucs = [] #area under PRC curve
        aveprecs = [] #average precisions for PRC
        
        for cv_count in range(0, cv_partitions): #loop through cv's
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            metric_list = results[0] #First item in pickled list is the metric list (set of standard classification metrics)
            roc_auc = results[3] #Fourth item is the ROC AUC
            prec_rec_auc = results[6] #Seventh item is the PRC AUC
            ave_prec = results[7] #Eighth item is the average precision of PRC
            
            #Separate metrics from metricList
            s_bac.append(metric_list[0])
            s_ac.append(metric_list[1])
            s_f1.append(metric_list[2])
            s_re.append(metric_list[3])
            s_sp.append(metric_list[4])
            s_pr.append(metric_list[5])
            s_tp.append(metric_list[6])
            s_tn.append(metric_list[7])
            s_fp.append(metric_list[8])
            s_fn.append(metric_list[9])
            s_npv.append(metric_list[10])
            s_lrp.append(metric_list[11])
            s_lrm.append(metric_list[12])
            
            aucs.append(roc_auc)
            praucs.append(prec_rec_auc)
            aveprecs.append(ave_prec)
            
        results = {'Balanced Accuracy': mean(s_bac), 'Accuracy': mean(s_ac), 
                   'F1_Score': mean(s_f1), 'Sensitivity (Recall)': mean(s_re), 
                   'Specificity': mean(s_sp),'Precision (PPV)': mean(s_pr), 
                   'TP': mean(s_tp), 'TN': mean(s_tn), 'FP': mean(s_fp), 
                   'FN': mean(s_fn), 'NPV': mean(s_npv), 'LR+': mean(s_lrp), 
                   'LR-': mean(s_lrm), 'ROC_AUC': mean(aucs),'PRC_AUC': mean(praucs), 
                   'PRC_APS': mean(aveprecs)}
        print(results)

## Extract list of increasing false positive rates and true positive rates for constructing ROC

In [7]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)
print("Vizualized Datasets: " + str(datasets))
for each in datasets: 
    print("---------------------------------------")
    print(each)
    print("---------------------------------------")
    full_path = experiment_path + '/' + each
    for algorithm in algorithms: #loop through algorithms
        print(algorithm)
        print_results(algorithm, full_path)

Vizualized Datasets: ['hcc-data_example', 'hcc-data_example_custom']
---------------------------------------
hcc-data_example
---------------------------------------
Decision Tree
{'Balanced Accuracy': 0.6214935064935065, 'Accuracy': 0.618014705882353, 'F1_Score': 0.5451887440045334, 'Sensitivity (Recall)': 0.6357142857142857, 'Specificity': 0.6072727272727273, 'Precision (PPV)': 0.5115692640692641, 'TP': 4, 'TN': 6, 'FP': 4, 'FN': 2, 'NPV': 0.7425152625152625, 'LR+': 1.8635487528344672, 'LR-': 0.6204034391534392, 'ROC_AUC': 0.648452380952381, 'PRC_AUC': 0.6141227277624337, 'PRC_APS': 0.512492967272379}
Logistic Regression
{'Balanced Accuracy': 0.6753896103896104, 'Accuracy': 0.6838235294117647, 'F1_Score': 0.6071950271950272, 'Sensitivity (Recall)': 0.6571428571428571, 'Specificity': 0.6936363636363636, 'Precision (PPV)': 0.5839357864357864, 'TP': 4, 'TN': 7, 'FP': 3, 'FN': 2, 'NPV': 0.7821356421356421, 'LR+': 2.5734126984126986, 'LR-': 0.4938586545729403, 'ROC_AUC': 0.768463203463203

In [8]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)
print("Vizualized Datasets: " + str(datasets))

for each in datasets: 
    print("---------------------------------------")
    print(each)
    print("---------------------------------------")
    full_path = experiment_path+ '/' + each
    for algorithm in algorithms: #loop through algorithms
        print(algorithm)
        # Define evaluation stats variable lists
        tprs = [] # true postitive rates
        mean_fpr = np.linspace(0, 1, 100) # used to plot all CVs in single ROC plot
        
        for cv_count in range(0, cv_partitions): #loop through cv's
            # Load pickled metric file for given algorithm and cv =
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            fpr = results[1]
            tpr = results[2]

            tprs.append(interp(mean_fpr, fpr, tpr))
            tprs[-1][0] = 0.0

        results = {'tprs': np.mean(tprs, axis=0)}
        
        print(results)
        #print('fprs: '+str(mean_fpr))

Vizualized Datasets: ['hcc-data_example', 'hcc-data_example_custom']
---------------------------------------
hcc-data_example
---------------------------------------
Decision Tree
{'tprs': array([0.        , 0.03661215, 0.05655764, 0.07650313, 0.09644861,
       0.1163941 , 0.13633959, 0.15628507, 0.17623056, 0.19617605,
       0.23223505, 0.24664903, 0.26106301, 0.27547699, 0.28989097,
       0.30430495, 0.31871894, 0.33313292, 0.3475469 , 0.36196088,
       0.37643498, 0.39114959, 0.4058642 , 0.4205788 , 0.43529341,
       0.45000802, 0.46472262, 0.4961039 , 0.50896665, 0.52182941,
       0.53385041, 0.54390733, 0.55396425, 0.56402116, 0.57407808,
       0.584135  , 0.59419192, 0.60424884, 0.61430576, 0.62436267,
       0.66685506, 0.67466731, 0.68247956, 0.69029181, 0.69810406,
       0.72258297, 0.72885201, 0.73512105, 0.74139009, 0.74765913,
       0.75236492, 0.75550746, 0.75864999, 0.76179253, 0.76493506,
       0.7680776 , 0.77122014, 0.77436267, 0.77750521, 0.78064775,
       

## Extract list of increasing precision and recall values for constructing PRC

In [9]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)
print("Vizualized Datasets: " + str(datasets))

for each in datasets: 
    print("---------------------------------------")
    print(each)
    print("---------------------------------------")
    full_path = experiment_path + '/' + each
    for algorithm in algorithms: #loop through algorithms
        print(algorithm)
        # Define evaluation stats variable lists
        precs = [] # true postitive rates
        mean_recall = np.linspace(0, 1, 100) # used to plot all CVs in single PRC plot
        
        for cv_count in range(0, cv_partitions): #loop through cv's
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            prec = results[4]
            recall = results[5]

            precs.append(interp(mean_recall, recall, prec))

        results = {'precs': np.mean(precs, axis=0)}

        print(results)
        #print('recall: '+str(mean_recall))

Vizualized Datasets: ['hcc-data_example', 'hcc-data_example_custom']
---------------------------------------
hcc-data_example
---------------------------------------
Decision Tree
{'precs': array([0.9       , 0.88996939, 0.87993878, 0.86990817, 0.85987756,
       0.84984695, 0.83981635, 0.82978574, 0.81975513, 0.80972452,
       0.79969391, 0.7896633 , 0.77963269, 0.76960208, 0.75957147,
       0.74954086, 0.73951025, 0.73270836, 0.72913518, 0.72556201,
       0.72198883, 0.71841565, 0.71484247, 0.71126929, 0.70769612,
       0.70412294, 0.70054976, 0.69697658, 0.69340341, 0.69100373,
       0.68907345, 0.68714317, 0.68521289, 0.66661595, 0.66367557,
       0.66073519, 0.65779481, 0.65485443, 0.65191405, 0.64897367,
       0.6460333 , 0.64309292, 0.64015254, 0.63721216, 0.63427178,
       0.6313314 , 0.62839102, 0.62545064, 0.62251026, 0.61956988,
       0.58317155, 0.5809343 , 0.57869706, 0.57645982, 0.57422257,
       0.57198533, 0.56974809, 0.56650074, 0.56190659, 0.55731245,
      

## Extract Average Model Feature Importance Estimates (Over CVs)

In [10]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)
print("Vizualized Datasets: " + str(datasets))

for each in datasets: 
    print("---------------------------------------")
    print(each)
    print("---------------------------------------")
    full_path = experiment_path + '/' + each
    original_headers = pd.read_csv(full_path + "/exploratory/OriginalFeatureNames.csv", sep=',').columns.values.tolist() # Get Original Headers
    if instance_label != None: 
        original_headers.remove(instance_label)
    original_headers.remove(outcome_label)
    for algorithm in algorithms: #loop through algorithms
        print(algorithm)
        # Define evaluation stats variable lists
        FI_ave = [0] * len(original_headers)  # used to save average FI scores over all cvs. (all original features in dataset prior to feature selection included)
        
        for cv_count in range(0, cv_partitions): # loop through cv's
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Separate pickled results
            fi = results[8]
            
            # Format feature importance scores as list (takes into account that all features are not in each CV partition)
            tempList = []
            j = 0
            headers = pd.read_csv(full_path + '/CVDatasets/' + each + '_CV_' + str(cv_count) + '_Test.csv').columns.values.tolist()
            if instance_label != None: 
                headers.remove(instance_label)
            headers.remove(outcome_label)
            for feature in original_headers:
                if feature in headers:  # Check if current feature from original dataset was in the partition
                    # Deal with features not being in original order (find index of current feature list.index()
                    f_index = headers.index(feature)
                    FI_ave[j] += fi[f_index]
                j += 1
            
        #Turn FI sums into averages
        for i in range(0, len(FI_ave)):
            FI_ave[i] = FI_ave[i] / float(cv_partitions)

        fi_dict = {}
        for key in original_headers:
            for value in FI_ave:
                fi_dict[key] = value
                FI_ave.remove(value)
                break  
                
        print(fi_dict)

Vizualized Datasets: ['hcc-data_example', 'hcc-data_example_custom']
---------------------------------------
hcc-data_example
---------------------------------------
Decision Tree
{'Gender': 0.0, 'Symptoms': 0.0, 'Alcohol': 0.0, 'Hepatitis B Surface Antigen': 0.0, 'Hepatitis B e Antigen': 0.0, 'Hepatitis B Core Antibody': 0.0, 'Hepatitis C Virus Antibody': 0.0, 'Cirrhosis': 0.0, 'Endemic Countries': 0.0, 'Smoking': 0.0, 'Diabetes': 0.0, 'Obesity': 0.0, 'Hemochromatosis': 0.0, 'Arterial Hypertension': 0.0, 'Chronic Renal Insufficiency': 0.0, 'Human Immunodeficiency Virus': 0.0, 'Nonalcoholic Steatohepatitis': 0.0, 'Esophageal Varices': 0.0, 'Splenomegaly': 0.0, 'Portal Hypertension': 0.0, 'Portal Vein Thrombosis': 0.0, 'Liver Metastasis': 0.010902936689549971, 'Radiological Hallmark': 0.0, 'Age at diagnosis': 0.0033535762483130984, 'Grams of Alcohol per day': 0.0, 'Packs of cigarets per year': 0.0, 'Performance Status*': 0.023189359267734563, 'Encephalopathy degree*': 0.0, 'Ascites degr

## Extract and Report Case (i.e. class 1) Prediction Probabilities For all instances in each Testing Dataset

In [11]:
if target_data_list: # User specified one analyzed dataset above (if more than one were analyzed)
    for each in datasets:
        if not each in target_data_list:
            datasets.remove(each)
print("Vizualized Datasets: " + str(datasets))

for each in datasets: 
    print("---------------------------------------")
    print(each)
    print("---------------------------------------")

    full_path = experiment_path + '/' + each
    
    # Make folder in experiment folder/datafolder to store all prediction probabilities per algorithm/CV combination
    if not os.path.exists(full_path + '/model_evaluation/prediction_probas'):
        os.mkdir(full_path + '/model_evaluation/prediction_probas')
        
    original_headers = pd.read_csv(full_path + "/exploratory/OriginalFeatureNames.csv", sep=',').columns.values.tolist() #Get Original Headers
    for algorithm in algorithms: #loop through algorithms
        print(algorithm)

        for cv_count in range(0,cv_partitions): #loop through cv's
            print(cv_count)
            #Load pickled metric file for given algorithm and cv
            result_file = full_path + '/model_evaluation/pickled_metrics/' + abbrev[algorithm] + "_CV_" + str(cv_count) + "_metrics.pickle"
            file = open(result_file, 'rb')
            results = pickle.load(file)
            file.close()
            
            #Load associated testing dataset
            test_data = pd.read_csv(full_path + '/CVDatasets/'+each+'_CV_' + str(cv_count) + '_Test.csv')
            probas_summary = test_data[[outcome_label,instance_label]]

            #Separate pickled results
            probas_ = results[9]
            print(probas_[:,1])
            probas_summary['1_prob'] = probas_[:,1]
            file_name = full_path + '/model_evaluation/prediction_probas/' + algorithm + '_CV_'+str(cv_count) + '_case_probas.csv'
            probas_summary.to_csv(file_name, index=False)

Vizualized Datasets: ['hcc-data_example', 'hcc-data_example_custom']
---------------------------------------
hcc-data_example
---------------------------------------
Decision Tree
0
[0.29222864 0.67017359 0.29222864 0.67017359 0.67017359 0.29222864
 0.67017359 0.29222864 0.29222864 0.67017359 0.29222864 0.67017359
 0.67017359 0.67017359 0.67017359 0.29222864 0.67017359]
1
[0.42051756 0.         0.64596273 0.         0.64596273 0.12674095
 0.         0.         0.         0.         0.69088937 0.56086287
 0.64596273 0.42051756 0.64596273 0.64596273 0.64596273]
2
[0.23930636 0.63179427 0.63179427 0.63179427 0.23930636 0.63179427
 0.23930636 0.63179427 0.63179427 0.63179427 0.23930636 0.63179427
 0.23930636 0.63179427 0.63179427 0.63179427 0.23930636]
3
[0.71964956 0.22822492 0.5816092  0.22822492 0.22822492 0.71964956
 0.5816092  0.5816092  0.5816092  0.5816092  0.5816092  0.71964956
 0.71964956 0.22822492 0.71964956 0.22822492 0.5816092 ]
4
[0.19277108 0.19277108 0.19277108 0.19277108 0