# Applicability of training ML model with TDA using Delaunay-Rips complex vs. using Rips vs. using Alpha
Author: Amish Mishra  
Date: November 1, 2022

## Notes
* We will use DR for "Delaunay-Rips"
* We will refer to the pipeline that uses DR, Rips, or Alpha for training/validating the corresponding ML model as the "DR method", "Rips method", or "Alpha method", respectively.
* Rename folders with 1, 2, 3,... ahead of them to show what order they are used in

## Import the necessary libraries

In [4]:
import pandas
from scipy.stats import median_test
from persistence_stats import generate_training_validation_pers_stats
from train_ml_classifiers import train_ml_classifiers
from validate_ml_classifiers import validate_ml_classifiers

## 1. Generate Persistence Statistics from Persistence Diagrams using DR, Rips, and Alpha

In [3]:
types = ['Training', 'Validation']
methods = ['rips', 'alpha', 'del_rips']
for t in types:
    for m in methods:
        generate_training_validation_pers_stats(type_of_data=t, method=m)

---- Using rips ----
CGMH_preprocessed_data/Training/1.csv
Loading CGMH_preprocessed_data/Training/1.csv


KeyboardInterrupt: 

## 2. Train ML models (SVM) based on Persistence Statistics

In [2]:
func_arr = ['rips', 'alpha', 'del_rips']
for func in func_arr:
    train_ml_classifiers(func)

Training classifer based on rips


## 3. Validate ML models

In [2]:
func_arr = ['rips', 'alpha', 'del_rips']
for func in func_arr:
    validate_ml_classifiers(func)

Validating rips svm...




KeyboardInterrupt: 

## 4. Generate performance metrics

### Calculate the median and IQR for each method's performance metrics table

In [5]:
func_arr = ['rips', 'alpha', 'del_rips']
all_perf_stats_by_func = {'rips':0, 'alpha':0, 'del_rips':0}

for func in func_arr:
    print(f'========== {func} performance ==========')
    perf_metrics = pandas.read_pickle(
        f'performance_metrics_tables/perf_metrics_{func}_svm_classifier.pkl')
    summary_metrics = pandas.DataFrame({'median':[], 'iqr':[]})
    # print(perf_metrics)
    summary_metrics['median'] = perf_metrics.median(axis=1)
    quantile_75 = perf_metrics.quantile(0.75, axis=1)
    quantile_25 = perf_metrics.quantile(0.25, axis=1)
    summary_metrics['iqr'] = quantile_75 - quantile_25
    relavant_summary_metrics = summary_metrics.iloc[4:] # The median and IQR of the confusion matrix elements are not relevant
    all_perf_stats_by_func[func] = perf_metrics.iloc[4:]
    print(relavant_summary_metrics)

         median       iqr
se     0.515625  0.130270
sp     0.844482  0.204973
acc    0.753103  0.149609
pr     0.327684  0.293320
f1     0.382445  0.161490
auc    0.694226  0.135264
aps    0.449661  0.204775
kappa  0.232994  0.215981
         median       iqr
se     0.451613  0.144379
sp     0.863142  0.206726
acc    0.778846  0.135322
pr     0.343558  0.282201
f1     0.353982  0.119013
auc    0.681994  0.111956
aps    0.343546  0.203456
kappa  0.256327  0.167096
         median       iqr
se     0.470588  0.118972
sp     0.829421  0.202676
acc    0.748619  0.145028
pr     0.287234  0.268333
f1     0.321127  0.119739
auc    0.663149  0.128349
aps    0.332181  0.190546
kappa  0.206814  0.180448


### Perform a row-by-row median test pairwise between DR method, Rips method, and Alpha method

In [6]:
p_val_df = pandas.DataFrame({'p-value for rips vs alpha':[], 'p-value for rips vs del-rips':[],'p-value for alpha vs del-rips':[]})
for idx, row in all_perf_stats_by_func['rips'].iterrows():
    rips_row = all_perf_stats_by_func['rips'].loc[[idx]].values[0]
    alpha_row = all_perf_stats_by_func['alpha'].loc[[idx]].values[0]
    del_rips_row = all_perf_stats_by_func['del_rips'].loc[[idx]].values[0]
    _, p_r_a, _, _ = median_test(rips_row, alpha_row)
    _, p_r_d, _, _ = median_test(rips_row, del_rips_row)
    _, p_a_d, _, _ = median_test(alpha_row, del_rips_row)
    p_val_df.loc[len(p_val_df.index)] = [p_r_a, p_r_d, p_a_d]
p_val_df.index = all_perf_stats_by_func['rips'].index

In [8]:
print(p_val_df[['p-value for rips vs del-rips', 'p-value for alpha vs del-rips']])

       p-value for rips vs del-rips  p-value for alpha vs del-rips
se                         0.586214                       0.586214
sp                         1.000000                       0.586214
acc                        0.586214                       0.586214
pr                         0.276303                       0.586214
f1                         0.276303                       1.000000
auc                        1.000000                       1.000000
aps                        0.276303                       1.000000
kappa                      0.586214                       0.276303
