#### **Computing the Average Weighted Path (AWP) metric.**

To objectively quantify the non-random localization specific subjects (such as participants with a diagnosis), we need to compute the average weighted path metric and compare it against a null distribution (created by computing X iterations with randomly selected subjects). Then, we compute the p-value associated with this specific condition by comparing the value to the null distribution. 

**This process is a *REALLY* long running process (the results in Gagnon et al. have been computed on high performance computer clusters). The CLI script are detailed below, but it shouldn't be run on a local computer as it can take up to 7 days using all available resources. If you want to try it, reduce the number of iterations to maximum a 100.**

The AWP will be computed using data from all studies and individually for those specific variables:

1. Anxiety Disorder (AD)
1. Attention Deficit-Hyperactivity Disorder (ADHD)
1. Conduct Disorder (CD)
1. Depressive Disorder (DD)
1. Obsessive-compulsive disorder (OCD)
1. Oppositional Defiant Disorder (ODD)
1. Psychopathology index (at least one of the above diagnosis)

In [1]:
# Imports
import os

from scipy.stats import false_discovery_control

from neurostatx.io.utils import load_df_in_any_format

In [2]:
# Setting up relevant paths.
repository_path = "/Users/anthonygagnon/code/Article-s-Code/" # CHANGE THIS
abcd_base_path = "/Volumes/T7/CCPM/ABCD/Release_5.1/abcd-data-release-5.1/" # CHANGE THIS
geste_base_dir = "/Volumes/T7/CCPM/GESTE/" # CHANGE THIS
banda_dir = '/Volumes/T7/CCPM/BANDA/BANDARelease1.1/' # CHANGE THIS
output_folder = "/Volumes/T7/CCPM/RESULTS_JUNE_24/" # CHANGE THIS
data_dir = f"{output_folder}/fuzzyclustering/"
output_dir = f"{output_folder}/awp/"

# Create output directory if it does not exist.
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

#### **Running the AWP computation using all studies**

The following cell contains the CLI script to run in order to compute the AWP for all variables using data from the three cohorts. **As mentioned above, to test it, reduce the iterations to at least 100 and select the number of cores to use (using the --processes flag).**

In [None]:
# Running AWP computations using all studies.

!AverageWeightedPath --in-graph "{data_dir}/GraphNetwork.gml" --iterations 5000 --weight membership \
    --method dijkstra --processes 40 --out-folder "{output_dir}/AWP_GLOBAL/" --label-name AD --label-name ADHD \
    --label-name CD --label-name DD --label-name OCD --label-name ODD --label-name PSYPATHO -v -s -f

#### **Running the AWP computation using the ABCD cohort**

The following cell contains the CLI script to run in order to compute the AWP for all variables using data from the ABCD cohort. **As mentioned above, to test it, reduce the iterations to at least 100 and select the number of cores to use (using the --processes flag).**

In [None]:
# Running AWP computations using the ABCD study.

!AverageWeightedPath --in-graph "{data_dir}/GraphNetwork.gml" --cohort 1 --iterations 5000 --weight membership \
    --method dijkstra --processes 40 --out-folder "{output_dir}/AWP_ABCD/" --label-name AD --label-name ADHD \
    --label-name CD --label-name DD --label-name OCD --label-name ODD --label-name PSYPATHO -v -s -f

#### **Running the AWP computation using the BANDA cohort**

The following cell contains the CLI script to run in order to compute the AWP for all variables using data from the BANDA cohort. **As mentioned above, to test it, reduce the iterations to at least 100 and select the number of cores to use (using the --processes flag).**

In [None]:
# Running AWP computations using the BANDA study.

!AverageWeightedPath --in-graph "{data_dir}/GraphNetwork.gml" --cohort 2 --iterations 5000 --weight membership \
    --method dijkstra --processes 40 --out-folder "{output_dir}/AWP_BANDA/" --label-name AD --label-name ADHD \
    --label-name DD --label-name OCD --label-name ODD --label-name PSYPATHO -v -s -f

#### **Running the AWP computation using the GESTE cohort**

The following cell contains the CLI script to run in order to compute the AWP for all variables using data from the GESTE cohort. **As mentioned above, to test it, reduce the iterations to at least 100 and select the number of cores to use (using the --processes flag).**

In [None]:
# Running AWP computations using the GESTE study.

!AverageWeightedPath --in-graph "{data_dir}/GraphNetwork.gml" --cohort 3 --iterations 5000 --weight membership \
    --method dijkstra --processes 40 --out-folder "{output_dir}/AWP_GESTE/" --label-name ADHD \
    --label-name PSYPATHO -v -s -f

#### **Load back the results to apply FDR correction.**

Since we are performing a lot of statistical test (one for each diagnosis), we will correct the resulting p-values using the Benjamini-Hochberg [1] method. 

[1] Benjamini, Yoav, and Yosef Hochberg. “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” Journal of the Royal statistical society: series B (Methodological) 57.1 (1995): 289-300.

In [17]:
# For the global AWP computations.
# Load back the computed AWP values. Since they are in different files, we need to load them separately, append the p-values to a list, perform FDR correction, and append them to the original data frame. This will be done using 2 for loops (not the most efficient way, but it works).
diagnosis = ['AD', 'ADHD', 'CD', 'DD', 'OCD', 'ODD', 'PSYPATHO']

pvals = []
for dx in diagnosis:
    results = load_df_in_any_format(f'{output_dir}/AWP_GLOBAL/statistics_{dx}.xlsx')
    pvals.append(results.iloc[1, 1])

# Perform FDR correction.
pval_corrected = false_discovery_control(pvals, method='bh')

# Append the corrected p-values to the original data frame.
for i, dx in enumerate(diagnosis):
    results = load_df_in_any_format(f'{output_dir}/AWP_GLOBAL/statistics_{dx}.xlsx')
    results.columns = ['Index', 'Statistics']
    results.set_index('Index', inplace=True)
    results.loc['FDR-corrected pval'] = pval_corrected[i]
    results.to_excel(f'{output_dir}/AWP_GLOBAL/statistics_{dx}.xlsx', index=True,
                     header=True)

In [None]:
# For the ABCD study.
# Load back the computed AWP values. Since they are in different files, we need to load them separately, append the p-values to a list, perform FDR correction, and append them to the original data frame. This will be done using 2 for loops (not the most efficient way, but it works).
diagnosis = ['AD', 'ADHD', 'CD', 'DD', 'OCD', 'ODD', 'PSYPATHO']

pvals = []
for dx in diagnosis:
    results = load_df_in_any_format(f'{output_dir}/AWP_ABCD/statistics_{dx}.xlsx')
    pvals.append(results.iloc[1, 1])

# Perform FDR correction.
pval_corrected = false_discovery_control(pvals, method='bh')

# Append the corrected p-values to the original data frame.
for i, dx in enumerate(diagnosis):
    results = load_df_in_any_format(f'{output_dir}/AWP_ABCD/statistics_{dx}.xlsx')
    results.columns = ['Index', 'Statistics']
    results.set_index('Index', inplace=True)
    results.loc['FDR-corrected pval'] = pval_corrected[i]
    results.to_excel(f'{output_dir}/AWP_ABCD/statistics_{dx}.xlsx', index=True,
                     header=True)

In [18]:
# For the BANDA study.
# Load back the computed AWP values. Since they are in different files, we need to load them separately, append the p-values to a list, perform FDR correction, and append them to the original data frame. This will be done using 2 for loops (not the most efficient way, but it works).
diagnosis = ['AD', 'ADHD', 'DD', 'OCD', 'ODD', 'PSYPATHO']

pvals = []
for dx in diagnosis:
    results = load_df_in_any_format(f'{output_dir}/AWP_BANDA/statistics_{dx}.xlsx')
    pvals.append(results.iloc[1, 1])

# Perform FDR correction.
pval_corrected = false_discovery_control(pvals, method='bh')

# Append the corrected p-values to the original data frame.
for i, dx in enumerate(diagnosis):
    results = load_df_in_any_format(f'{output_dir}/AWP_BANDA/statistics_{dx}.xlsx')
    results.columns = ['Index', 'Statistics']
    results.set_index('Index', inplace=True)
    results.loc['FDR-corrected pval'] = pval_corrected[i]
    results.to_excel(f'{output_dir}/AWP_BANDA/statistics_{dx}.xlsx', index=True,
                     header=True)

In [19]:
# For the GESTE study.
# Load back the computed AWP values. Since they are in different files, we need to load them separately, append the p-values to a list, perform FDR correction, and append them to the original data frame. This will be done using 2 for loops (not the most efficient way, but it works).
diagnosis = ['ADHD', 'PSYPATHO']

pvals = []
for dx in diagnosis:
    results = load_df_in_any_format(f'{output_dir}/AWP_GESTE/statistics_{dx}.xlsx')
    pvals.append(results.iloc[1, 1])

# Perform FDR correction.
pval_corrected = false_discovery_control(pvals, method='bh')

# Append the corrected p-values to the original data frame.
for i, dx in enumerate(diagnosis):
    results = load_df_in_any_format(f'{output_dir}/AWP_GESTE/statistics_{dx}.xlsx')
    results.columns = ['Index', 'Statistics']
    results.set_index('Index', inplace=True)
    results.loc['FDR-corrected pval'] = pval_corrected[i]
    results.to_excel(f'{output_dir}/AWP_GESTE/statistics_{dx}.xlsx', index=True,
                     header=True)