#### Regional Homogeneity (ReHo) 

Regional Homogeneity is a measure of local brain activity synchronization in resting-state fMRI. It evaluates how similar the time-series activity of a given voxel is compared to its neighboring voxels (Voxel or volumetric pixel is the smallest unit of a 3D brain scan in fMRI which represents a tiny cube of brain tissue that contains measurable brain activity at a specific location). The assumption behind ReHo is that functionally connected brain regions exhibit synchronized activity patterns, meaning that neighboring voxels in specific regions should show similar fluctuations in BOLD signals over time. 

ReHo is calculated by using Kendall's Coefficient of Concordance (KCC), a statistical method that evaluated how consistently a voxel's time-series aligns withh those of its neighboring voxels. Typically, ReHo is computed across a cluster of 27 neighboring voxels in a 3D space, capturing localized functional connectivity rather than long-range connections between distant brain regions. 
A high ReHo value indicated that a voxel's activity is highly synchronized wih its neighbors, suggesting strong local functional connectivity. Conversely, a low ReHo value means the voxel's activity differs significantly from its neighbors, indicating weak or disrupted connectivity in that area. 

In ADHD research, ReHo is used to explore abnormal local connectivity patterns in the brain. Studies have shown that individuals with ADHD often display altered ReHo values in key brain regions such as the prefrontal cortex, motor regions, and the default mode network (DMN). For instance, higher ReHo in certain prefrontal areas might reflect overactive local synchronization, possibly linked to attentional deficits, while lower ReHo in motor regions could be associated with poor regulation of movement and hyperactivity. These patterns provide insight into how local neural networks function differently in ADHD compared to neurotypical individuals.


In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import os
import math
import numpy as np

from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = 'all'

import warnings
warnings.filterwarnings("ignore")


In [9]:
import os
import numpy as np
import pandas as pd
from scipy.stats import kendalltau
from joblib import Parallel, delayed

# List of sites to process
sites = ["KKI", "NeuroIMAGE", "NYU", "OHSU", "Peking_1", "Peking_2", "Peking_3"]

# Base folder containing all sites
base_folder = "fMRI/ADHD200_CC200_TCs_filtfix/"

# Dictionary to store ReHo data for each site
site_reho_data = {}

# Function to Compute ReHo for One Subject (Parallelized)
def compute_reho_for_subject(fmri_data_array):
    num_regions = fmri_data_array.shape[1]
    reho_values = np.zeros(num_regions)
    
    # Compute Kendall's Tau for each CC200 region in parallel
    def region_reho(i):
        region_series = fmri_data_array[:, i]  # Extract time-series of region i
        neighbors = np.delete(fmri_data_array, i, axis=1)  # Exclude self-region
        concordance_values = [kendalltau(region_series, neighbors[:, j])[0] for j in range(neighbors.shape[1])]
        return np.nanmean(concordance_values)  # Mean Kendall’s Tau

    # Use parallel processing to compute ReHo for all regions
    reho_values = Parallel(n_jobs=-1)(delayed(region_reho)(i) for i in range(num_regions))

    return np.array(reho_values)

# Load and Merge Phenotypic Data for All Sites
phenotype_data_list = []
for site in sites:
    pheno_csv = f"{site}_phenotypic.csv"
    phenotype_path = os.path.join(base_folder, site, pheno_csv)

    if os.path.exists(phenotype_path):
        phenotype_data = pd.read_csv(phenotype_path)
        phenotype_data["ScanDir ID"] = phenotype_data["ScanDir ID"].astype(str).str.zfill(7)  # Standardizing ID format
        phenotype_data = phenotype_data[["ScanDir ID", "DX"]]  # Using only necessary columns
        phenotype_data_list.append(phenotype_data)

# Merge all phenotype datasets into one DataFrame
phenotype_data = pd.concat(phenotype_data_list, axis=0)

# Loop through each site
for site in sites:
    print(f"Processing site: {site}")

    # Define path to the site’s fMRI data
    site_folder = os.path.join(base_folder, site)

    # Dictionary to store ReHo values for subjects at this site
    reho_data_dict = {}

    # Loop through each subject folder in the site
    for subject_id in sorted(os.listdir(site_folder)):
        subject_path = os.path.join(site_folder, subject_id)

        # Ensure that subject_path is a directory (not a file)
        if os.path.isdir(subject_path):

            # Handle multiple rest sessions for NYU & OHSU
            if site == "NYU":
                rest_files = [f"sfnwmrda{subject_id}_session_1_rest_1_cc200_TCs.1D",
                              f"sfnwmrda{subject_id}_session_1_rest_2_cc200_TCs.1D"]
            elif site == "OHSU":
                rest_files = [f"sfnwmrda{subject_id}_session_1_rest_1_cc200_TCs.1D",
                              f"sfnwmrda{subject_id}_session_1_rest_2_cc200_TCs.1D",
                              f"sfnwmrda{subject_id}_session_1_rest_3_cc200_TCs.1D"]
            else:
                rest_files = [f"sfnwmrda{subject_id}_session_1_rest_1_cc200_TCs.1D"]

            # List to store concatenated time-series data
            merged_time_series = []

            # Loop through all rest session files
            for rest_file in rest_files:
                fmri_file_path = os.path.join(subject_path, rest_file)

                # If the file exists, process it
                if os.path.exists(fmri_file_path):
                    fmri_data = pd.read_csv(fmri_file_path, delim_whitespace=True, header=None, skiprows=1)
                    fmri_data = fmri_data.iloc[:, 2:]  # Remove metadata columns
                    fmri_data = fmri_data.astype(float)  # Convert to float type
                    merged_time_series.append(fmri_data.to_numpy())

            # If data exists, concatenate all rest session data along the time dimension
            if merged_time_series:
                fmri_data_array = np.concatenate(merged_time_series, axis=0)

                # Compute ReHo using optimized function
                reho_values = compute_reho_for_subject(fmri_data_array)

                # Store the computed ReHo values for the subject
                reho_data_dict[subject_id] = reho_values

    # Convert ReHo dictionary to DataFrame
    site_reho_df = pd.DataFrame.from_dict(reho_data_dict, orient='index', columns=[f"ReHo_{i+1}" for i in range(fmri_data_array.shape[1])])

    # Store the site’s ReHo DataFrame
    site_reho_data[site] = site_reho_df

# Merge all site ReHo datasets into a single dataset
merged_reho_df = pd.concat(site_reho_data.values(), axis=0)

# Merge with ADHD labels (DX)
merged_reho_df["ScanDir ID"] = merged_reho_df.index
final_reho_df = phenotype_data.merge(merged_reho_df, on="ScanDir ID", how="inner")




Processing site: KKI
Processing site: NeuroIMAGE
Processing site: NYU
Processing site: OHSU
Processing site: Peking_1
Processing site: Peking_2
Processing site: Peking_3


In [10]:
final_reho_df

Unnamed: 0,ScanDir ID,DX,ReHo_1,ReHo_2,ReHo_3,ReHo_4,ReHo_5,ReHo_6,ReHo_7,ReHo_8,...,ReHo_181,ReHo_182,ReHo_183,ReHo_184,ReHo_185,ReHo_186,ReHo_187,ReHo_188,ReHo_189,ReHo_190
0,1018959,0,-0.000693,-0.002784,0.005485,0.013734,0.009531,0.012442,0.015524,0.004018,...,-0.000228,-0.004366,0.007430,0.014627,0.001317,-0.006228,0.001716,-0.000210,-0.012670,-0.006496
1,1019436,3,0.001641,0.000157,-0.000600,0.008507,-0.005215,0.021348,0.037630,-0.002221,...,0.005336,0.026657,-0.010433,0.010363,0.016745,0.020867,-0.026912,-0.010929,0.002073,-0.014229
2,1043241,0,-0.025526,-0.007973,-0.003846,-0.008930,0.002580,0.012220,0.027966,0.006901,...,-0.004057,0.029532,0.017034,0.011103,-0.010745,-0.009088,0.008249,-0.008181,-0.001200,-0.007629
3,1266183,0,0.017338,-0.020910,0.026822,-0.028537,0.004145,0.018429,0.023084,0.018687,...,-0.009916,0.024002,0.001976,-0.016781,0.019819,0.008669,0.015358,-0.016745,-0.011874,0.012493
4,1535233,0,0.025408,-0.004181,0.019778,0.010639,0.042414,0.037103,0.037410,0.053010,...,-0.033017,-0.009099,-0.008167,-0.052603,0.034772,-0.018005,0.037081,0.002573,0.026957,0.030917
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
615,5669389,0,0.041583,0.031368,0.018500,0.003853,0.005902,0.035916,0.007781,0.037332,...,0.029162,-0.003820,0.023286,-0.027044,0.047881,-0.004888,0.033479,0.043651,0.021794,0.022059
616,6383713,1,-0.011001,0.023553,0.045278,0.023125,-0.010543,0.009812,-0.002366,0.047632,...,0.023663,-0.011299,0.034275,0.012848,0.053413,-0.014896,0.022261,0.042729,0.033602,0.029257
617,6477085,0,-0.090311,0.077690,0.036559,-0.020320,0.005736,-0.015036,-0.008798,0.056555,...,0.046824,0.049709,0.059777,0.007273,0.057467,0.073680,0.053626,0.080742,0.041598,0.056369
618,7994085,0,-0.009549,0.003129,0.000436,0.005846,0.003584,0.004464,0.017457,0.018368,...,0.002915,0.000692,-0.005344,-0.006689,0.002440,0.005350,0.000327,-0.005584,0.002570,-0.007553


In [7]:
from scipy.stats import kendalltau

features_only = FC_df.drop(columns = ["ScanDir ID", "DX"])



In [11]:
phenotype_data

Unnamed: 0,ScanDir ID,DX
0,1018959,0
1,1019436,3
2,1043241,0
3,1266183,0
4,1535233,0
...,...,...
37,5669389,0
38,6383713,1
39,6477085,0
40,7994085,0


In [7]:
missing_subjects = set(phenotype_data["ScanDir ID"]) - set(final_reho_df["ScanDir ID"])
print("Missing Subjects:", missing_subjects)


Missing Subjects: {'0010055', '0010098', '0010105', '0010016', '0010127', '0010027'}


In [15]:
final_reho_df.to_csv("All_ReHo.csv", index = True)