### Experiments

This notebook demonstrates the experiments carried out for "Evaluating Explainability of Graph Neural Networks for Disease Subnetwork Detection".

The code here runs multiple iterations of the process: training a model -> running the explainer -> generating explanations -> evaluating explanability metrics.

Four explainability metrics are used: validity+, validity-, sparsity and fidelity. The average of each metric in each iteration is reported and saved to file at the end of this notebook.

### Setup

First, we set up the necessary imports and the variables that will be used later:

- Number of times to run the entire workflow
```
big_loop_iterations = 10 (recommended)
```
- Number of times to run the explainer on a trained model before evaluating it (the mean explanation is used for the evaluation)
```
explainer_runs = 10 (recommended)
```
- Array of thresholds used for transforming the soft node mask to a hard node mask. Validity+ and validity- are calculated at every threshold in the given array . 

Optionally, fidelity is calculated at every threshold on the given array (see use_softmask_fidelity).

The number here is the percentage of nodes that are taken as important. For example, at threshold 30, the top 30% of values are selected as important. 
```
threshold = [30, 50]
```
- The number of samples to be used when evaluating fidelity.
```
samples = [5, 10, 20]
```
- Whether or not to use a soft mask to evaluate fidelity. If not using a soft mask, hard masks at each of the thresholds are used to evaluate fidelity.
```
use_softmask_fidelity = False
```
- The filepath (any signifier you choose) used to name the directory and files where the results are saved.
```
filepath = "23_May_KIRC_data"
```

In [None]:
# See description above for what these variables represent
big_loop_iterations = 1
explainer_runs = 1
thresholds = [30, 50]
# thresholds = [90, 80, 70, 60, 50, 40, 30, 20, 10]
samples = 10
use_softmask_fidelity = False

# Used to label the resulting files - 
filepath = "6 Jun"

# Set up directory for result files
import os
dir = f'./results_{filepath}'
if not os.path.exists(dir):
    os.mkdir(dir)

To start, we choose a dataset to work with (either KIRC or synthetic data) and load this dataset.

In [None]:
from GNNSubNet import GNNSubNet as gnn
from GNNSubNet import explainability_evaluator as eval
import pandas as pd
import numpy as np

# # Kidney data set  ------------------------- #
# loc   = "GNNSubNet/datasets/kirc/"
# ppi   = f'{loc}/KIDNEY_RANDOM_PPI.txt'
# feats = [f'{loc}/KIDNEY_RANDOM_Methy_FEATURES.txt', f'{loc}/KIDNEY_RANDOM_mRNA_FEATURES.txt']
# targ  = f'{loc}/KIDNEY_RANDOM_TARGET.txt'

# # Synthetic data set  ------------------------- #
loc   = "GNNSubNet/datasets/synthetic/"
ppi   = f'{loc}/NETWORK_synthetic.txt'
feats = [f'{loc}/FEATURES_synthetic.txt']
targ  = f'{loc}/TARGET_synthetic.txt'

# Read in the synthetic data
g = gnn.GNNSubNet(loc, ppi, feats, targ, normalize=False)

# Get some general information about the data dimension
g.summary()

### Run

Run the workflow of train, explain and evaluate for the given number of big_loop_iterations.

NB: if running on KIRC data for 10 iterations with all metrics, this cell can take **5 to 7 hours** to fully run!

NB: this cell only saves the raw results for each test graph in each iteration. The cell below gathers the averages of all metrics into a single results table and saves the results table. **Ensure that both cells are run together to obtain the full results.**

Read the description further below to see the format in which these results are saved.

In [None]:
model_info = []
fidelity = []
validity_plus = []
validity_minus = []
validity_plus_matrix = []
validity_minus_matrix = []
sparsity = []

for i in range(big_loop_iterations):
    g = gnn.GNNSubNet(loc, ppi, feats, targ, normalize=False)
    g.train()

    # Check the performance of the classifier
    accuracy = g.accuracy

    # Run the explainer the desired number of times
    g.explain(explainer_runs)

    # Save node mask
    np.savetxt(f"results_{filepath}/{i}_node_mask.csv", g.node_mask, delimiter=",", fmt= "% s")

    # Initialise evaluator
    ev = eval.explainability_evaluator(g)

    if use_softmask_fidelity:
        # Fidelity with softmask
        f = ev.evaluate_RDT_fidelity(use_softmask=True, samples = samples)

        # Save raw results in case needed for further analysis
        filename = f"results_{filepath}/{i}_fidelities_{samples}_samples_softmask.csv"
        np.savetxt(filename, f, delimiter=',', fmt ='% s')

        # Save mean fidelity to list to create processed table
        fidelity.append( [i, accuracy, samples, 0, np.mean(f)])
    else:
        # Fidelity with hardmasks of varying thresholds
        for t in thresholds:
            f = ev.evaluate_RDT_fidelity(use_softmask=False, samples = samples, threshold=t)

            # Save raw results in case needed for further analysis
            filename = f"results_{filepath}/{i}_fidelities_{samples}_samples_top_{t}_hardmask.csv"
            np.savetxt(filename, f, delimiter=',', fmt ='% s')

            # Save mean fidelity to list to create processed table
            fidelity.append( [i, accuracy, samples, t, np.mean(f)])

    # Sparsity
    sparsities = ev.evaluate_sparsity()
    # Save raw results in case needed for further analysis
    filename = f"results_{filepath}/{i}_sparsities.csv"
    np.savetxt(filename, sparsities, delimiter=',', fmt ='% s')
    # Save mean sparsity to list to create processed table
    sparsity.append([i, accuracy, np.mean(sparsities)])

    # Validity with hardmasks of varying thresholds
    for t in thresholds:
        v_plus, v_minus, mat_plus, mat_minus = ev.evaluate_validity(threshold=t, confusion_matrix=True)
        validity_plus.append([i, accuracy, t, v_plus])
        validity_minus.append([i, accuracy, t, v_minus])
        validity_plus_matrix.append([i, accuracy, t, mat_plus[0,0], mat_plus[0,1], mat_plus[1,0], mat_plus[1,1]])
        validity_minus_matrix.append([i, accuracy, t, mat_minus[0,0], mat_minus[0,1], mat_minus[1,0], mat_minus[1,1]])


## Results
Finally, we can view the processed tables and save them to a written file.

Four CSV files are created, one for each metric. You can see the columns that each of them contains in the code snippet below.

Additionally, a single CSV file named "results_{filepath}.csv" contains the combined results of all four metrics.

All files are placed in a directory named "results_{filepath}".

**NB: See visualisation_and_analysis.ipynb for the final results and plots.**

In [None]:
import pandas as pd
fidelity_table = pd.DataFrame(fidelity, columns=['Iteration','Model accuracy','Samples','Threshold','Fidelity score'])
fidelity_table.to_csv(f"results_{filepath}/fidelity_table.csv", float_format="%.3f", index=False)

sparsity_table = pd.DataFrame(sparsity, columns=["Iteration", "Model accuracy", "Sparsity score"])
print(sparsity_table)
sparsity_table.to_csv(f"results_{filepath}/sparsity_table.csv", float_format="%.3f", index=False)

validity_table_plus = pd.DataFrame(validity_plus, columns=["Iteration", "Model accuracy", "Threshold", "Validity+ score"])
validity_table_plus.to_csv(f"results_{filepath}/validity_table_plus.csv", float_format="%.3f", index=False)

validity_table_minus = pd.DataFrame(validity_minus, columns=["Iteration", "Model accuracy", "Threshold", "Validity- score"])
validity_table_minus.to_csv(f"results_{filepath}/validity_table_minus.csv", float_format="%.3f", index=False)

validity_plus_matrix = pd.DataFrame(validity_plus_matrix, columns=["Iteration", "Model accuracy", "Threshold", "00", "01", "10", "11"])
validity_plus_matrix.to_csv(f"results_{filepath}/validity_matrix_plus.csv", index=False)

validity_minus_matrix = pd.DataFrame(validity_minus_matrix, columns=["Iteration", "Model accuracy", "Threshold", "00", "01", "10", "11"])
validity_minus_matrix.to_csv(f"results_{filepath}/validity_matrix_minus.csv", index=False)

# Construct a single table holding all the metrics
final_table = fidelity_table
final_table["Validity+ score"] = validity_table_plus["Validity+ score"]
final_table["Validity- score"] = validity_table_minus["Validity- score"]
final_table = pd.merge(final_table, sparsity_table, on=["Iteration", "Model accuracy"], how='outer')
final_table.to_csv(f"results_{filepath}/results_{filepath}.csv", index=False)

final_table

   Iteration  Model accuracy  Sparsity score
0          0           100.0        0.087605


Unnamed: 0,Iteration,Model accuracy,Samples,Threshold,Fidelity score,Validity+ score,Validity- score,Sparsity score
0,0,100.0,10,30,1.0,0.53,1.0,0.087605
1,0,100.0,10,50,1.0,0.55,1.0,0.087605
