Imports

In [1]:
import warnings
warnings.filterwarnings('ignore')

# Nifti Import

**From Directory**
___

Here's a brief markup (in Markdown format) that explains the purpose and usage of the `segments_dict`:

---

## Neuroimaging File Extraction Dictionary

The `segments_dict` is a predefined dictionary structured to facilitate the extraction of specific types of neuroimaging files. Each key in the dictionary represents a distinct neuroimaging segment, and its associated value is another dictionary containing the following fields:

- **path**: This should be filled with the absolute path to the base directory containing the neuroimaging files for the corresponding segment. 
- **glob_name_pattern**: This is the string pattern that will be used to "glob" or search for the specific files within the provided path. It helps in identifying and extracting the desired files based on their naming conventions.

Here's a breakdown of the segments and their respective fields:

### 1. Cortical Thickness (Nifti)
- **path**: Absolute path to the base directory containing CT files.
- **glob_name_pattern**: File pattern to search for CT files.

---

**Instructions**: Please fill out the `path` and `glob_name_pattern` fields for each segment in the `segments_dict`. This will ensure that the extraction process can locate and identify the appropriate neuroimaging files for further analysis.
- < *_name_pattern > variables do not need a leading slash ("/"). This is already accounted for. 

---

In [1]:
base_directory = r'/Volumes/Expansion/datasets/adni/neuroimaging/all_patients_sbm_v2'
ct_glob_name_pattern = r'*/vol/*.nii.gz'


In [None]:
from calvin_utils.file_utils.import_matrices import import_matrices_from_folder #<----- CALVIN IMPORT

def import_ct_dataframes_from_folders(base_directory, ct_glob_name_pattern):
    """
    Imports dataframes from specified directories and glob name patterns.
    
    Parameters:
    - base_directory (str): The base directory where the data resides.
    - grey_matter_glob_name_pattern (str): Glob pattern for grey matter data.
    - white_matter_glob_name_pattern (str): Glob pattern for white matter data.
    - csf_glob_name_pattern (str): Glob pattern for cerebrospinal fluid data.
    
    Returns:
    - dict: A dictionary containing dataframes for grey matter, white matter, and cerebrospinal fluid.
    """
    

    segments_dict = {
        'ct': {'path': base_directory, 'glob_name_pattern': ct_glob_name_pattern},
    }

    dataframes_dict = {}

    for k, v in segments_dict.items():
        dataframes_dict[k] = import_matrices_from_folder(connectivity_path=v['path'], file_pattern=v['glob_name_pattern'])
        print(f'Imported data {k} data with {dataframes_dict[k].shape[0]} voxels and {dataframes_dict[k].shape[1]} patients')
        print(f'These are the filenames per subject {dataframes_dict[k].columns}')
        print('--------------------------------')

    return dataframes_dict


In [None]:
dataframes_dict = import_ct_dataframes_from_folders(base_directory, ct_glob_name_pattern)

**Extract Subject ID From File Names**
- Using the example filenames that have been printed above, please define a general string:
1) Preceding the subject ID. If nothing preceding subject identifier, enter "".
- Do NOT include mwp[1/2/3] in this. 
2) Proceeding the subject ID. If nothing proceeding subject identifier, enter "".

In [4]:
preceding_id = ''
proceeding_id = '_vol_MNI152'

In [5]:
import re

def remove_specific_mwp_integer_pattern(text):
    # Define the pattern to search for: 'mwp' followed by [1], [2], or [3]
    pattern = r'mwp[123]'
    # Replace the first occurrence of the pattern with an empty string
    return re.sub(pattern, '', text, count=1)


def extract_and_rename_subject_id(dataframe, split_command_dict):
    """
    Renames the columns of a dataframe based on specified split commands.

    Parameters:
    - dataframe (pd.DataFrame): The dataframe whose columns need to be renamed.
    - split_command_dict (dict): A dictionary where the key is the split string 
                                 and the value is the order to take after splitting 
                                 (0 for before the split, 1 for after the split, etc.).

    Returns:
    - pd.DataFrame: Dataframe with renamed columns.

    Example:
    >>> data = {'subject_001': [1, 2, 3], 'patient_002': [4, 5, 6], 'control_003': [7, 8, 9]}
    >>> df = pd.DataFrame(data)
    >>> split_commands = {'_': 1}
    >>> new_df = extract_and_rename_subject_id(df, split_commands)
    >>> print(new_df.columns)
    Index(['001', '002', '003'], dtype='object')
    """

    raw_names = dataframe.columns
    name_mapping = {}

    # For each column name in the dataframe
    for name in raw_names:
        new_name = name  # Default to the original name in case it doesn't match any split command

        # Check each split command to see if it applies to this column name
        for k, v in split_command_dict.items():
            if k in new_name:
                new_name = remove_specific_mwp_integer_pattern(new_name)
                if k !='':
                    new_name = new_name.split(k)[v]
        # Add the original and new name to the mapping
        name_mapping[name] = new_name

    # Rename columns in the dataframe based on the mapping
    return dataframe.rename(columns=name_mapping)

def rename_dataframe_subjects(dataframes_dict, preceding_id, proceeding_id):
    """
    Renames the subjects in the provided dataframes based on the split commands.

    Parameters:
    - dataframes_dict (dict): A dictionary containing dataframes with subjects to be renamed.
    - preceding_id (str): The delimiter for taking the part after the split.
    - proceeding_id (str): The delimiter for taking the part before the split.

    Returns:
    - dict: A dictionary containing dataframes with subjects renamed.
    """
    
    split_command_dict = {preceding_id: 1, proceeding_id: 0}
    
    for k, v in dataframes_dict.items():
        dataframes_dict[k] = extract_and_rename_subject_id(dataframe=dataframes_dict[k], split_command_dict=split_command_dict)
        print('Dataframe: ', k)
        display(dataframes_dict[k])
        print('------------- \n')

    return dataframes_dict


In [None]:
renamed_dfs = rename_dataframe_subjects(dataframes_dict, preceding_id, proceeding_id)

# Import Control Segments

In [7]:
base_directory_control = '/Volumes/Expansion/datasets/adni/neuroimaging/true_control/freesurfer_reconall_sbm'
control_cortical_thickness_glob_name_pattern = '*/vol/*_vol_MNI152_s6.nii.gz'

In [None]:
control_dataframes_dict = import_ct_dataframes_from_folders(base_directory_control,control_cortical_thickness_glob_name_pattern)

# Generate Z-Scored Atrophy Maps for Each Segment

In [9]:
import pandas as pd
from typing import Tuple
def threshold_probabilities(patient_df: pd.DataFrame, threshold: float) -> pd.DataFrame:
    patient_df = patient_df.where(patient_df > threshold, 0)
    return patient_df

def calculate_z_scores(control_df: pd.DataFrame, patient_df: pd.DataFrame, matter_type=None) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """
    Function to calculate voxel-wise mean, standard deviation for control group and z-scores for patient group.

    Args:
    control_df (pd.DataFrame): DataFrame where each column represents a control subject, 
                               and each row represents flattened image data for a voxel.
    patient_df (pd.DataFrame): DataFrame where each column represents a patient, 
                               and each row represents flattened image data for a voxel.

    Returns:
    patient_z_scores (pd.DataFrame): DataFrame of voxel-wise z-scores calculated for each patient using control mean and std.
    """

    # # Mask the dataframes to only consider tissues over acceptable probability thresholds
    # # Using p>0.2, as typical masking to MNI152 segments uses P > 0.2 for a given segment
    
    # # Now you can use the function to apply a threshold to patient_df and control_df
    threshold = 0.2
    patient_df = threshold_probabilities(patient_df, threshold)
    control_df = threshold_probabilities(control_df, threshold)

    # Calculate mean and standard deviation for each voxel in control group
    control_mean = control_df.mean(axis=1)
    control_std = control_df.std(axis=1)

    # Initialize DataFrame to store patient z-scores
    patient_z_scores = pd.DataFrame()

    # Calculate z-scores for each patient using control mean and std
    for patient in patient_df.columns:
        patient_z_scores[patient] = (patient_df[patient] - control_mean) / control_std
    return patient_z_scores

In [10]:
def process_atrophy_dataframes(dataframes_dict, control_dataframes_dict):
    """
    Processes the provided dataframes to calculate z-scores and determine significant atrophy.

    Parameters:
    - dataframes_dict (dict): Dictionary containing patient dataframes.
    - control_dataframes_dict (dict): Dictionary containing control dataframes.

    Returns:
    - tuple: A tuple containing two dictionaries - atrophy_dataframes_dict and significant_atrophy_dataframes_dict.
    """
    
    atrophy_dataframes_dict = {}
    significant_atrophy_dataframes_dict = {}

    for k in dataframes_dict.keys():
        atrophy_dataframes_dict[k] = calculate_z_scores(control_df=control_dataframes_dict[k], patient_df=dataframes_dict[k])
        if k == 'cerebrospinal_fluid':
            significant_atrophy_dataframes_dict[k] = atrophy_dataframes_dict[k].where(atrophy_dataframes_dict[k] > 2, 0)
        else:
            significant_atrophy_dataframes_dict[k] = atrophy_dataframes_dict[k].where(atrophy_dataframes_dict[k] < -2, 0)
        print('Dataframe: ', k)
        display(dataframes_dict[k])
        print('------------- \n')

    return atrophy_dataframes_dict, significant_atrophy_dataframes_dict


In [None]:
unthresholded_atrophy_dataframes_dict, significant_atrophy_dataframes_dict = process_atrophy_dataframes(dataframes_dict, control_dataframes_dict)

**Save the Atrophy Results**

Save Raw Z-Scores

In [12]:
import os
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti #<-----CAlVIN IMPORT
from tqdm import tqdm

def save_nifti_to_bids(dataframes_dict, bids_base_dir, analysis='tissue_segment_z_scores', ses=None, dry_run=True):
    """
    Saves NIFTI images to a BIDS directory structure.
    
    Parameters:
    - dataframes_dict (dict): Dictionary containing dataframes with NIFTI data.
    - bids_base_dir (str): The base directory where the BIDS structure starts.
    - ses (str, optional): Session identifier. If None, defaults to '01'.
    
    Note:
    This function assumes a predefined BIDS directory structure and saves the NIFTI 
    images accordingly. The function currently has the view_and_save_nifti call commented out 
    for safety. Uncomment this call if you wish to actually save the NIFTI images.
    
    Example:
    >>> dfs = { ... }  # some dictionary with dataframes
    >>> save_nifti_to_bids(dfs, '/path/to/base/dir')
    """
    
    for k in tqdm(dataframes_dict.keys()):
        for col in dataframes_dict[k].columns:
            
            # Define BIDS Directory Architecture
            sub_no = col
            if ses is None:
                ses_no = '01'
            else:
                ses_no = ses
            
            # Define and Initialize the Save Directory
            out_dir = os.path.join(bids_base_dir, f'sub-{sub_no}', f'ses-{ses_no}', analysis)
            os.makedirs(out_dir, exist_ok=True)
            
            # Save Image to BIDS Directory
            if dry_run:
                print(out_dir+f'/sub-{sub_no}_{k}')
            else:
                view_and_save_nifti(matrix=dataframes_dict[k][col],
                                    out_dir=out_dir,
                                    output_name=(f'sub-{sub_no}_{k}'))


# Save the Z-Scored Maps

Unthresholded Maps

In [20]:
base_directory = '/Volumes/Expansion/datasets/adni/neuroimaging/all_patients_sbm_v2'

In [None]:
save_nifti_to_bids(unthresholded_atrophy_dataframes_dict, bids_base_dir=base_directory, analysis='unthresholded_tissue_segment_z_scores', dry_run=False)

Thresholded Maps - The 'Real' Atrophy


In [None]:
save_nifti_to_bids(significant_atrophy_dataframes_dict, bids_base_dir=base_directory, analysis='thresholded_tissue_segment_z_scores', dry_run=False);

All Done. Enjoy your atrophy seeds.

--Calvin