Imports

In [1]:
import warnings
warnings.filterwarnings('ignore')

# Nifti Import

**From Directory**
___

Here's a brief markup (in Markdown format) that explains the purpose and usage of the `segments_dict`:

---

## Neuroimaging File Extraction Dictionary

The `segments_dict` is a predefined dictionary structured to facilitate the extraction of specific types of neuroimaging files. Each key in the dictionary represents a distinct neuroimaging segment, and its associated value is another dictionary containing the following fields:

- **path**: This should be filled with the absolute path to the base directory containing the neuroimaging files for the corresponding segment. 
- **glob_name_pattern**: This is the string pattern that will be used to "glob" or search for the specific files within the provided path. It helps in identifying and extracting the desired files based on their naming conventions.

Here's a breakdown of the segments and their respective fields:

### 1. Cerebrospinal Fluid (CSF)
- **path**: Absolute path to the base directory containing CSF files.
- **glob_name_pattern**: File pattern to search for CSF files.

### 2. Grey Matter
- **path**: Absolute path to the base directory containing grey matter files.
- **glob_name_pattern**: File pattern to search for grey matter files.

### 3. White Matter
- **path**: Absolute path to the base directory containing white matter files.
- **glob_name_pattern**: File pattern to search for white matter files.

---

**Instructions**: Please fill out the `path` and `glob_name_pattern` fields for each segment in the `segments_dict`. This will ensure that the extraction process can locate and identify the appropriate neuroimaging files for further analysis.
- < *_name_pattern > variables do not need a leading slash ("/"). This is already accounted for. 

---

# Import Segmented Patients for Atrophy Detection

In [2]:
base_directory = r'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/vbm_derivatives/mri'
grey_matter_glob_name_pattern = '*smwp1*resampled*'
white_matter_glob_name_pattern = '*smwp2*resampled*'
csf_glob_name_pattern = '*smwp3*resampled*'

In [3]:
from calvin_utils.file_utils.import_matrices import import_matrices_from_folder #<----- CALVIN IMPORT

def import_dataframes_from_folders(base_directory, grey_matter_glob_name_pattern, white_matter_glob_name_pattern, csf_glob_name_pattern):
    """
    Imports dataframes from specified directories and glob name patterns.
    
    Parameters:
    - base_directory (str): The base directory where the data resides.
    - grey_matter_glob_name_pattern (str): Glob pattern for grey matter data.
    - white_matter_glob_name_pattern (str): Glob pattern for white matter data.
    - csf_glob_name_pattern (str): Glob pattern for cerebrospinal fluid data.
    
    Returns:
    - dict: A dictionary containing dataframes for grey matter, white matter, and cerebrospinal fluid.
    """
    

    segments_dict = {
        'grey_matter': {'path': base_directory, 'glob_name_pattern': grey_matter_glob_name_pattern},
        'white_matter': {'path': base_directory, 'glob_name_pattern': white_matter_glob_name_pattern},
        'cerebrospinal_fluid': {'path': base_directory, 'glob_name_pattern': csf_glob_name_pattern}
    }

    dataframes_dict = {}

    for k, v in segments_dict.items():
        dataframes_dict[k] = import_matrices_from_folder(connectivity_path=v['path'], file_pattern=v['glob_name_pattern'])
        print(f'Imported data {k} data with {dataframes_dict[k].shape[0]} voxels and {dataframes_dict[k].shape[1]} patients')
        print(f'These are the filenames per subject {dataframes_dict[k].columns[-1]}')
        print('--------------------------------')

    return dataframes_dict


In [4]:
dataframes_dict = import_dataframes_from_folders(base_directory, grey_matter_glob_name_pattern, white_matter_glob_name_pattern, csf_glob_name_pattern)

I will search:  /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/vbm_derivatives/mri/*smwp1*resampled*
Imported data grey_matter data with 902629 voxels and 50 patients
These are the filenames per subject smwp10039_resampled.nii
--------------------------------
I will search:  /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/vbm_derivatives/mri/*smwp2*resampled*
Imported data white_matter data with 902629 voxels and 50 patients
These are the filenames per subject smwp20050_resampled.nii
--------------------------------
I will search:  /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/vbm_derivatives/mri/*smwp3*resampled*
Imported data cerebrospinal_fluid data with 902629 voxels and 50 patients
These are the filenames per subject smwp30023_resampled.nii
--------------------------------


**Extract Subject ID From File Names**
- Using the example filenames that have been printed above, please define a general string:
1) Preceding the subject ID. If nothing preceding subject identifier, enter "".
- Do NOT include mwp[1/2/3] in this. 
2) Proceeding the subject ID. If nothing proceeding subject identifier, enter "".

In [5]:
preceding_id = 's'
proceeding_id = '_re'

In [6]:
import re

def remove_specific_mwp_integer_pattern(text):
    # Define the pattern to search for: 'mwp' followed by [1], [2], or [3]
    pattern = r'mwp[123]'
    # Replace the first occurrence of the pattern with an empty string
    return re.sub(pattern, '', text, count=1)


def extract_and_rename_subject_id(dataframe, split_command_dict):
    """
    Renames the columns of a dataframe based on specified split commands.

    Parameters:
    - dataframe (pd.DataFrame): The dataframe whose columns need to be renamed.
    - split_command_dict (dict): A dictionary where the key is the split string 
                                 and the value is the order to take after splitting 
                                 (0 for before the split, 1 for after the split, etc.).

    Returns:
    - pd.DataFrame: Dataframe with renamed columns.

    Example:
    >>> data = {'subject_001': [1, 2, 3], 'patient_002': [4, 5, 6], 'control_003': [7, 8, 9]}
    >>> df = pd.DataFrame(data)
    >>> split_commands = {'_': 1}
    >>> new_df = extract_and_rename_subject_id(df, split_commands)
    >>> print(new_df.columns)
    Index(['001', '002', '003'], dtype='object')
    """

    raw_names = dataframe.columns
    name_mapping = {}

    # For each column name in the dataframe
    for name in raw_names:
        new_name = name  # Default to the original name in case it doesn't match any split command

        # Check each split command to see if it applies to this column name
        for k, v in split_command_dict.items():
            if k in new_name:
                new_name = remove_specific_mwp_integer_pattern(new_name)
                if k !='':
                    new_name = new_name.split(k)[v]
        # Add the original and new name to the mapping
        name_mapping[name] = new_name

    # Rename columns in the dataframe based on the mapping
    return dataframe.rename(columns=name_mapping)

def rename_dataframe_subjects(dataframes_dict, preceding_id, proceeding_id):
    """
    Renames the subjects in the provided dataframes based on the split commands.

    Parameters:
    - dataframes_dict (dict): A dictionary containing dataframes with subjects to be renamed.
    - preceding_id (str): The delimiter for taking the part after the split.
    - proceeding_id (str): The delimiter for taking the part before the split.

    Returns:
    - dict: A dictionary containing dataframes with subjects renamed.
    """
    
    split_command_dict = {preceding_id: 1, proceeding_id: 0}
    
    for k, v in dataframes_dict.items():
        dataframes_dict[k] = extract_and_rename_subject_id(dataframe=dataframes_dict[k], split_command_dict=split_command_dict)
        print('Dataframe: ', k)
        display(dataframes_dict[k])
        print('------------- \n')

    return dataframes_dict


In [7]:
dataframes_dict = rename_dataframe_subjects(dataframes_dict, preceding_id, proceeding_id)

Dataframe:  grey_matter


Unnamed: 0,0048,0038,0001,0016,0012,0005,0015,0002,0028,0006,...,0010,0007,0029,0003,0014,0049,0004,0013,0017,0039
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 

Dataframe:  white_matter


Unnamed: 0,0036,0021,0018,0025,0032,0046,0042,0045,0041,0022,...,0034,0023,0033,0024,0019,0020,0037,0043,0047,0050
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 

Dataframe:  cerebrospinal_fluid


Unnamed: 0,0045,0041,0022,0035,0031,0026,0008,0036,0021,0018,...,0043,0047,0050,0040,0044,0009,0027,0030,0034,0023
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 



# Import Control Segments

In [8]:
base_directory_control = '/Volumes/OneTouch/resources/adni/neuroimaging/true_control/anat/mri'
control_grey_matter_glob_name_pattern = '*smwp1*resampled*'
control_white_matter_glob_name_pattern = '*smwp2*resampled*'
control_csf_glob_name_pattern = '*smwp3*resampled*'

In [9]:
from calvin_utils.file_utils.import_matrices import import_matrices_from_folder
def import_control_dataframes(base_directory, control_grey_matter_glob_name_pattern, control_white_matter_glob_name_pattern, control_csf_glob_name_pattern):
    """
    Imports control dataframes from specified directories and glob name patterns.

    Parameters:
    - base_directory (str): The base directory where the data resides.
    - control_grey_matter_glob_name_pattern (str): Glob pattern for grey matter data.
    - control_white_matter_glob_name_pattern (str): Glob pattern for white matter data.
    - control_csf_glob_name_pattern (str): Glob pattern for cerebrospinal fluid data.

    Returns:
    - dict: A dictionary containing control dataframes for grey matter, white matter, and cerebrospinal fluid.
    """
    
    segments_dict = {
        'grey_matter': {'path': base_directory, 'glob_name_pattern': control_grey_matter_glob_name_pattern},
        'white_matter': {'path': base_directory, 'glob_name_pattern': control_white_matter_glob_name_pattern},
        'cerebrospinal_fluid': {'path': base_directory, 'glob_name_pattern': control_csf_glob_name_pattern}
    }

    control_dataframes_dict = {}
    for k, v in segments_dict.items():
        control_dataframes_dict[k] = import_matrices_from_folder(connectivity_path=v['path'], file_pattern=v['glob_name_pattern']);
        print(f'Imported data {k} data with {control_dataframes_dict[k].shape[0]} voxels and {control_dataframes_dict[k].shape[1]} patients')
        print(f'Example subject filename: {control_dataframes_dict[k].columns[-1]}')
        print('--------------------------------')

    return control_dataframes_dict


In [10]:
control_dataframes_dict = import_control_dataframes(base_directory_control, control_grey_matter_glob_name_pattern, control_white_matter_glob_name_pattern, control_csf_glob_name_pattern)

I will search:  /Volumes/OneTouch/resources/adni/neuroimaging/true_control/anat/mri/*smwp1*resampled*
Imported data grey_matter data with 902629 voxels and 136 patients
Example subject filename: smwp1002_S_4264_resampled.nii
--------------------------------
I will search:  /Volumes/OneTouch/resources/adni/neuroimaging/true_control/anat/mri/*smwp2*resampled*
Imported data white_matter data with 902629 voxels and 136 patients
Example subject filename: smwp2941_S_4376_resampled.nii
--------------------------------
I will search:  /Volumes/OneTouch/resources/adni/neuroimaging/true_control/anat/mri/*smwp3*resampled*
Imported data cerebrospinal_fluid data with 902629 voxels and 136 patients
Example subject filename: smwp3941_S_4376_resampled.nii
--------------------------------


**Extract Subject ID From File Names**
- Using the example filenames that have been printed above, please define a general string:
1) Preceding the subject ID. If nothing preceding subject identifier, enter "".
- **Do NOT include mwp[1/2/3] in this.**
2) Proceeding the subject ID. If nothing proceeding subject identifier, enter "".


- The example filenames were all provided above

In [11]:
preceding_id = 's'
proceeding_id = '_re'

In [12]:
control_dataframes_dict = rename_dataframe_subjects(control_dataframes_dict, preceding_id, proceeding_id)

Dataframe:  grey_matter


Unnamed: 0,002_S_4270,006_S_4150,006_S_4357,006_S_4449,006_S_4485,009_S_4337,009_S_4388,009_S_4612,010_S_4345,010_S_4442,...,941_S_4066,941_S_4100,941_S_4255,941_S_4292,941_S_4365,941_S_4376,002_S_4213,002_S_4225,002_S_4262,002_S_4264
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 

Dataframe:  white_matter


Unnamed: 0,002_S_4213,002_S_4225,002_S_4262,002_S_4264,002_S_4270,006_S_4150,006_S_4357,006_S_4449,006_S_4485,009_S_4337,...,153_S_4125,153_S_4139,153_S_4151,153_S_4372,941_S_4066,941_S_4100,941_S_4255,941_S_4292,941_S_4365,941_S_4376
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 

Dataframe:  cerebrospinal_fluid


Unnamed: 0,002_S_4213,002_S_4225,002_S_4262,002_S_4264,002_S_4270,006_S_4150,006_S_4357,006_S_4449,006_S_4485,009_S_4337,...,153_S_4125,153_S_4139,153_S_4151,153_S_4372,941_S_4066,941_S_4100,941_S_4255,941_S_4292,941_S_4365,941_S_4376
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 



# Import Covariates
Expects a CSV as below: 
```
+---------+----------------------------+--------------+--------------+--------------+
|Covariate| Subject 1                  | Subject 2    | Subject . . .| Subject N    |
+---------+----------------------------+--------------+--------------+--------------+
| Male    | 0                          | 1            | 1            | 1            |
| Female  | 1                          | 0            | 0            | 0            |
| Age     | 65                         | 72           | 87           | 90           |
+---------+----------------------------+--------------+--------------+--------------+
```
**1 is True, 0 is False, Age is represented in years.**

In [13]:
import pandas as pd
from typing import Tuple
def import_covariates(control_covariates_csv_path: str, patient_covariates_csv_path: str) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """
    Import the covariates given paths. 
    Remove NaNs
    """
    control_covariates_df = pd.read_csv(control_covariates_csv_path, index_col=0).dropna(axis=1)
    patient_covariates_df = pd.read_csv(patient_covariates_csv_path, index_col=0).dropna(axis=1)
    
    return control_covariates_df, patient_covariates_df

In [14]:
control_covariates_csv_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/metadata/paths_and_covariates/experimental_group_covariates.csv'
patient_covariates_csv_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/metadata/paths_and_covariates/control_group_covariates.csv'

In [15]:
patient_covariates_df, control_covariates_df = import_covariates(control_covariates_csv_path=control_covariates_csv_path, patient_covariates_csv_path=patient_covariates_csv_path)
control_covariates_df

Unnamed: 0_level_0,002_S_4213,002_S_4225,002_S_4262,002_S_4264,006_S_4150,006_S_4357,006_S_4449,006_S_4485,009_S_4337,009_S_4388,...,153_S_4125,153_S_4139,153_S_4151,153_S_4372,941_S_4066,941_S_4100,941_S_4255,941_S_4292,941_S_4365,941_S_4376
Covariates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Age,78.112329,69.964384,72.934247,74.191781,73.945205,73.723288,66.969863,73.347945,72.032877,66.931507,...,75.89863,70.676712,72.364384,70.153425,78.775342,78.621918,72.517808,70.99726,80.410959,76.594521
Male,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0
Female,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,...,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0


In [16]:
patient_covariates_df

Unnamed: 0_level_0,47,42,18,29,35,5,4,19,1,30,...,27,45,25,15,23,33,39,14,37,11
Covariates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Age,70.838356,82.10137,82.345205,74.526027,73.665753,69.791781,79.934247,77.10137,65.230137,75.39726,...,71.917808,73.723288,64.060274,71.065753,62.452055,83.460274,69.106849,83.438356,62.40274,76.923288
Male,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
Female,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0


Improve the Naming of the Covariate Subjects
- Pending construction. Should have named it appropraitely when you built your CSV. 
- Will code PRN. 

# Generate W-Scored Atrophy Maps for Each Segment

In [27]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from tqdm import tqdm
from typing import Tuple

class CalvinWMap():
    """
    This is a class to orchestrate W-mapping process. It is not optimal, but it is easy to code and easy to follow.
    Will initialize with the requisite dictionaries containing dataframes of information, as well as covariate dataframes. 
    
    Improvements:
    Hat-matrix based vectorization to extract betas. 
    Apply betas on vectorized basis to extract predictions. 
    vectorized extraction of prediction error standard deviation. 
    betas can be used on the patient data rather than calling the regression. 
    Vectorized standaridization of prediction error. 
    """
    def __init__(self, dataframes_dict: dict, control_dataframes_dict: dict, control_covariates_df: pd.DataFrame, patient_covariates_df: pd.DataFrame, use_intercept: bool=False, mask: bool=True):
        """
        Need to provide the dataframe dictionaries and dataframes of importance. 
        
        Args:
        - dataframes_dict (dict): Dictionary containing patient dataframes.
        - control_dataframes_dict (dict): Dictionary containing control dataframes.
        - control_covariates_df (pd.DataFrame): DataFrame where each column represents represents a control subject,
                                        and each row represents the covariate. 
        - patient_covariates_df (pd.DataFrame): Same as above, but for patients. 
        - use_intercept (bool): If true, model will use an intercept for the GLM, which is atypical. Defaults to False. 
        - mask (bool): If true, will mask the data to conserve memory
        """
        self.dataframes_dict =  dataframes_dict
        self.control_dataframes_dict = control_dataframes_dict
        self.control_covariates_df = control_covariates_df
        self.patient_covariates_df = patient_covariates_df
        self.use_intercept = use_intercept
        self.mask = mask
    
    def threshold_probabilities(self, df: pd.DataFrame, threshold: float=0.2) -> pd.DataFrame:
        """
        This will mask the raw probabilities. 
        Generally, VBM probabilities under 0.2 are masked out.
        
        Will then find all voxels which are nonzero across all dataframes and create a mask from them. 
        Will then return the masked dataframe and the mask for computational speed.
        """
        df = df.where(df > threshold, 0)
        nonzero_mask = df.sum(axis=1) > 0
        return df, nonzero_mask

    def sort_dataframes(self, voxel_df: pd.DataFrame, covariate_df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """
        Will sort the rows of the voxelwise DF and the covariate DF to make sure they are identically organized.
        Then will check that the columns are equivalent. 
        """
        # Force Columns to Match
        voxel_cols = set(voxel_df.columns.astype(str).sort_values().values)
        covariate_cols = set(covariate_df.columns.astype(str).sort_values().values)
        shared_columns = list(voxel_cols.intersection(covariate_cols))
        # This will occur when columns have strange naming, such as subject 1 being 0001 verus 1. 
        if len(shared_columns) == 0:
            voxel_cols = voxel_df.columns.astype(int).astype(str).sort_values().values
            covariate_cols = covariate_df.columns.astype(int).astype(str).sort_values().values
            
            voxel_df.columns = voxel_cols
            covariate_df.columns = covariate_cols
            
            shared_columns = list(set(voxel_cols).intersection(set(covariate_cols)))
            
        return voxel_df.loc[:, shared_columns], covariate_df.loc[:, shared_columns]
    
    def mask_dataframe(self, control_df: pd.DataFrame, patient_df: pd.DataFrame, threshold: float=0.2):
        """
        Simple masking function.
        """
        # Now you can use the function to apply a threshold to patient_df and control_df
        patient_df, _ = self.threshold_probabilities(patient_df, threshold)
        control_df, nonzero_mask = self.threshold_probabilities(control_df, threshold)
        
        whole_mask = control_df.index
        masked_patient_df = patient_df.loc[nonzero_mask, :]
        masked_control_df = control_df.loc[nonzero_mask, :]
        return whole_mask, nonzero_mask, masked_patient_df, masked_control_df
    
    def unmask_dataframe(self, whole_mask: pd.Index, nonzero_mask:pd.Index, patient_df: pd.DataFrame, patient_w_scores:pd.DataFrame):
        """
        Simple unmasking function.
        """
        unmasked_w_score = pd.DataFrame(index=whole_mask, columns=patient_df.columns, data=0)
        unmasked_w_score.loc[nonzero_mask, :] = patient_w_scores.loc[nonzero_mask, :]
        return unmasked_w_score
    
    def calculate_w_scores_vectorized(self, control_df: pd.DataFrame, patient_df: pd.DataFrame, debug: bool=True) -> pd.DataFrame:
        """
        Calculate voxelwise W-scores in a vectorized manner using linear regression.
        This applies a single linear regression across the entire dataset, resulting in inherent smoothing. 
        It is STRONGLY advised to set mask=True when running this.

        This function performs a linear regression using sklearn's LinearRegression, fitted on control data
        and applied to both control and patient data. The regression is done once across all voxels simultaneously,
        treating each voxel's values across subjects as independent responses. This vectorized approach
        efficiently handles the calculations by leveraging matrix operations, which are computationally
        optimized in libraries like numpy and sklearn.

        Args:
            control_df (pd.DataFrame): DataFrame where each column represents a control subject,
                                    and each row represents flattened image data for a voxel.
            patient_df (pd.DataFrame): DataFrame where each column represents a patient,
                                    and each row represents flattened image data for a voxel.
            debu (bool): if true, prints out summary metrics

        Returns:
            pd.DataFrame: A DataFrame of the same shape as patient_df, containing the W-scores for each voxel.

        Explanation of Process:
            1. **Setup Response Variables:** Both control and patient data are transposed to shape subjects as rows
            and voxels as columns, facilitating simultaneous regression across all voxels.
            2. **Design Matrix:** A common design matrix is created from sorted_control_covariate_df, which is used
            for fitting the model. This matrix contains the predictors (covariates) for each subject.
            3. **Fit the Model:** The LinearRegression model is fitted using the control data. The model is designed
            to handle multiple response variables (voxels) simultaneously, which are treated as independent.
            This method ensures that the relationship modeled in the voxelwise approach is maintained across
            all voxels simultaneously, mirroring the structure where each voxel is analyzed independently but more efficiently.
            4. **Prediction and Error Calculation:** The model predicts both control and patient data. Residuals
            are computed for control predictions to determine the variability unexplained by the model across voxels.
            5. **Compute W-scores:** W-scores are calculated by dividing the prediction errors (patient data minus
            their predictions) by the voxel-specific residual standard deviation, providing a normalized measure
            of deviation for each patient voxel relative to the control model.
        """
        # Optional masking for memory consveration
        if self.mask:
            whole_mask, nonzero_mask, patient_df, control_df = self.mask_dataframe(control_df, patient_df)
        
        # Design matrix X for control group, outcomes Y for control group
        X_control = self.sorted_control_covariate_df.T 
        Y_control = control_df.T.values
        
        # Fit model on control data across all voxels
        control_model = LinearRegression(fit_intercept=self.use_intercept)
        control_model.fit(X_control, Y_control)

        # Design matrix X for experimental group, outcomes Y for experimental group
        X_patient = self.sorted_patient_covariate_df.T
        Y_patient = patient_df.T.values
        
        # Predict on experimental group and calculate errors
        PREDICTION = control_model.predict(X_patient)
        RESIDUALS = Y_patient - PREDICTION
        RSS = np.sum(RESIDUALS**2, axis=0)
        DF = Y_patient.shape[0] - X_patient.shape[1] - int(self.use_intercept)
        RSD = np.sqrt(RSS / DF)

        # Compute W-scores for patient data
        w_scores = RESIDUALS / RSD
        if debug:
            print(X_patient.shape, Y_patient.shape, PREDICTION.shape, RESIDUALS.shape, RSS.shape, RSD.shape, w_scores.shape)

        # Reshape W-scores to DataFrame format
        w_scores_df = pd.DataFrame(w_scores.T, index=control_df.index, columns=patient_df.columns)
        if self.mask:
            w_scores_df = self.unmask_dataframe(whole_mask, nonzero_mask, patient_df, w_scores_df)
        
        return w_scores_df


    def calculate_w_scores(self, control_df: pd.DataFrame, patient_df: pd.DataFrame) -> pd.DataFrame:
        """
        Function to calculate voxelwise W-map.
        1) This will first perform a regression on the control voxel to identify the standard deviation of the error. 
        2) Then this will use the first model to predict the value of the experimental voxels.
        3) Then this will divide the prediction error by the residual standard deviation, giving the W-score.
        4) This will then iterate over all voxels until a W-map is complete. 
        
        This is a slow function, but it is easy to code. 
        
        Args: 
        control_df (pd.DataFrame): DataFrame where each column represents a control subject, 
                                and each row represents flattened image data for a voxel.
        patient_df (pd.DataFrame): DataFrame where each column represents a patient, 
                                and each row represents flattened image data for a voxel.
                                
        Note:
        The covariates_df MUST have the same subject ID in for column names as the dataframe with the voxels
        """
        if self.mask:
            whole_mask, nonzero_mask, patient_df, control_df = self.mask_dataframe(control_df, patient_df)
        patient_w_scores = pd.DataFrame(index=patient_df.index, columns=patient_df.columns)

        for voxel in tqdm(control_df.index, desc='Fitting voxelwise model'):
            ## CONTROL FIT
            # Set predictors to shape (samples, regressors)
            X_control = self.sorted_control_covariate_df.T  

            # Set observations to shape (samples, 1)
            y_control = control_df.loc[voxel, :].values.reshape(-1, 1)

            # Fit linear regression to control data
            control_model = LinearRegression(fit_intercept=self.use_intercept)
            control_model.fit(X=X_control, y=y_control)
            
            
            ## EXPERIMENTAL FIT
            # Predict on patient data
            X_patient = self.sorted_patient_covariate_df.T  # Transpose to match orientation as above.
            Y_patient = patient_df.loc[voxel, :].values.reshape(-1, 1)
            Yi_patient = control_model.predict(X_patient)
            
            # Derive Mean Squared Error 
            RESIDUALS = Y_patient - Yi_patient
            SSE = np.sum(RESIDUALS**2)
            
            # Derive Adjusted Degrees of Freedom (DF = n - p)
            DF = Y_patient.shape[0] - X_patient.shape[1] - int(self.use_intercept)
            
            # Derive Residual Standard Deviation (Root(Sum Squared Errors / Adjusted Degrees of Freedom)) AKA Root(MSE)
            RSD = np.sqrt(SSE/DF)
            
            # Calculate W-scores
            patient_w_scores.loc[voxel, :] = RESIDUALS.flatten() / RSD
            
        # Unmask W-scores
        if self.mask:
            patient_w_scores = self.unmask_dataframe(whole_mask, nonzero_mask, patient_df, patient_w_scores)
        return patient_w_scores
            
    def process_atrophy_dataframes(self, dataframes_dict, control_dataframes_dict, vectorize=True):
        """
        Processes the provided dataframes to calculate z-scores and determine significant atrophy.

        Parameters:
        - dataframes_dict (dict): Dictionary containing patient dataframes.
        - control_dataframes_dict (dict): Dictionary containing control dataframes.
        - vector (bool): If set to false, will consider the statistical distribution of each voxel independently.
            If set to true, will 

        Returns:
        - tuple: A tuple containing two dictionaries - atrophy_dataframes_dict and significant_atrophy_dataframes_dict.
        """
        
        atrophy_dataframes_dict = {}
        significant_atrophy_dataframes_dict = {}

        for k in dataframes_dict.keys():
            # Make sure the covariates line up with the voxels
            self.sorted_control_voxel_df, self.sorted_control_covariate_df = self.sort_dataframes(control_dataframes_dict[k], self.control_covariates_df)
            self.sorted_patient_voxel_df, self.sorted_patient_covariate_df = self.sort_dataframes(dataframes_dict[k], self.patient_covariates_df)
            
            # Submit
            if vectorize:
                atrophy_dataframes_dict[k] = self.calculate_w_scores_vectorized(control_df=self.sorted_control_voxel_df, patient_df=self.sorted_patient_voxel_df)
            else:
                atrophy_dataframes_dict[k] = self.calculate_w_scores(control_df=self.sorted_control_voxel_df, patient_df=self.sorted_patient_voxel_df)
            
            # Threshold
            if k == 'cerebrospinal_fluid':
                significant_atrophy_dataframes_dict[k] = atrophy_dataframes_dict[k].where(atrophy_dataframes_dict[k] > 2, 0)
            else:
                significant_atrophy_dataframes_dict[k] = atrophy_dataframes_dict[k].where(atrophy_dataframes_dict[k] < -2, 0)
            print('Dataframe: ', k)
            display(dataframes_dict[k])
            print('------------- \n')
        
        return atrophy_dataframes_dict, significant_atrophy_dataframes_dict
    
    def run(self):
        """
        Orchestration method. 
        """
        atrophy_dataframes_dict, significant_atrophy_dataframes_dict = self.process_atrophy_dataframes(self.dataframes_dict, self.control_dataframes_dict)
        return atrophy_dataframes_dict, significant_atrophy_dataframes_dict


In [28]:
wmapper = CalvinWMap(dataframes_dict=dataframes_dict, control_dataframes_dict=control_dataframes_dict, control_covariates_df=control_covariates_df, patient_covariates_df=patient_covariates_df)
unthresholded_atrophy_dataframes_dict, significant_atrophy_dataframes_dict = wmapper.run()

(49, 3) (49, 223612) (49, 223612) (49, 223612) (223612,) (223612,) (49, 223612)
Dataframe:  grey_matter


Unnamed: 0,1,10,11,12,13,14,15,16,17,18,...,46,47,48,49,5,50,6,7,8,9
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 

(49, 3) (49, 173121) (49, 173121) (49, 173121) (173121,) (173121,) (49, 173121)
Dataframe:  white_matter


Unnamed: 0,1,10,11,12,13,14,15,16,17,18,...,46,47,48,49,5,50,6,7,8,9
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 

(49, 3) (49, 216095) (49, 216095) (49, 216095) (216095,) (216095,) (49, 216095)
Dataframe:  cerebrospinal_fluid


Unnamed: 0,1,10,11,12,13,14,15,16,17,18,...,46,47,48,49,5,50,6,7,8,9
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


------------- 



**Derive Significant Atrophy Map**

In [19]:
import pandas as pd

def finalize_atrophy_dataframes(dataframes_dict):
    """
    Summates the absolute values of DataFrames within a dictionary 
    and adds the summation as a new key-value pair with the key 'composite'.
    
    Parameters:
    - dataframes_dict (dict): A dictionary containing DataFrames.
    
    Returns:
    - dict: The input dictionary updated with the 'composite' key representing the summation of absolute values.
    
    Example:
    >>> dfs = {
    ...     'a': pd.DataFrame({'col1': [-1, 2], 'col2': [3, -4]}),
    ...     'b': pd.DataFrame({'col1': [5, -6], 'col2': [-7, 8]})
    ... }
    >>> summed_dfs = summate_absolute_dataframes(dfs)
    >>> print(summed_dfs['composite'])
       col1  col2
    0     6    10
    1     8    12
    """
    
    # Create an empty DataFrame to store the summation of absolute values
    composite_df = pd.DataFrame()
    for k in dataframes_dict.keys():
        abs_df = dataframes_dict[k].abs() # Take the absolute value of the DataFrame

        if composite_df.empty:  # If the composite_df is still empty, initialize it with the first absolute DataFrame
            composite_df = dataframes_dict[k].abs().copy()
        else:
            composite_df += abs_df  # Otherwise, add the absolute values to the composite DataFrame
    
    # Add the composite DataFrame to the dictionary with key 'composite'
    dataframes_dict['composite'] = composite_df
    
    return dataframes_dict


In [20]:
thresholded_atrophy_dataframes_dict = finalize_atrophy_dataframes(significant_atrophy_dataframes_dict)

**Save the Atrophy Results**

Save Raw W-Scores

In [21]:
import os
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti #<-----CAlVIN IMPORT
from tqdm import tqdm

def save_nifti_to_bids(dataframes_dict, bids_base_dir, analysis='tissue_segment_w_scores', ses=None, dry_run=True):
    """
    Saves NIFTI images to a BIDS directory structure.
    
    Parameters:
    - dataframes_dict (dict): Dictionary containing dataframes with NIFTI data.
    - bids_base_dir (str): The base directory where the BIDS structure starts.
    - ses (str, optional): Session identifier. If None, defaults to '01'.
    
    Note:
    This function assumes a predefined BIDS directory structure and saves the NIFTI 
    images accordingly. The function currently has the view_and_save_nifti call commented out 
    for safety. Uncomment this call if you wish to actually save the NIFTI images.
    
    Example:
    >>> dfs = { ... }  # some dictionary with dataframes
    >>> save_nifti_to_bids(dfs, '/path/to/base/dir')
    """
    
    for k in tqdm(dataframes_dict.keys()):
        for col in dataframes_dict[k].columns:
            # Define BIDS Directory Architecture
            sub_no = col
            if ses is None:
                ses_no = '01'
            else:
                ses_no = ses
            
            # Define and Initialize the Save Directory
            out_dir = os.path.join(bids_base_dir, f'sub-{sub_no}', f'ses-{ses_no}', analysis)
            os.makedirs(out_dir, exist_ok=True)
            
            # Save Image to BIDS Directory
            if dry_run:
                print(out_dir+f'/sub-{sub_no}_{k}')
            else:
                if col == dataframes_dict[k].columns[-1]:
                    silent=False
                else:
                    silent=True
                
                view_and_save_nifti(matrix=dataframes_dict[k][col].fillna(0),
                                    out_dir=out_dir,
                                    output_name=(f'sub-{sub_no}_{k}'),
                                    silent=silent)


# Save the W-Scored Maps

Unthresholded Maps

In [22]:
base_directory='/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm'

In [23]:
save_nifti_to_bids(unthresholded_atrophy_dataframes_dict, bids_base_dir=base_directory, analysis='tissue_segment_w_scores', dry_run=False);

  0%|          | 0/3 [00:00<?, ?it/s]

Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/sub-4/ses-01/tissue_segment_w_scores


 33%|███▎      | 1/3 [00:11<00:23, 11.68s/it]

Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/sub-4/ses-01/tissue_segment_w_scores


 67%|██████▋   | 2/3 [00:21<00:10, 10.80s/it]

Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/sub-4/ses-01/tissue_segment_w_scores


100%|██████████| 3/3 [00:32<00:00, 10.69s/it]


Thresholded Maps - The 'Real' Atrophy


In [24]:
save_nifti_to_bids(thresholded_atrophy_dataframes_dict, bids_base_dir=base_directory, analysis='thresholded_tissue_segment_w_scores', dry_run=False);

  0%|          | 0/4 [00:00<?, ?it/s]

Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/sub-4/ses-01/thresholded_tissue_segment_w_scores


 25%|██▌       | 1/4 [00:09<00:27,  9.24s/it]

Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/sub-4/ses-01/thresholded_tissue_segment_w_scores


 50%|█████     | 2/4 [00:18<00:18,  9.25s/it]

Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/sub-4/ses-01/thresholded_tissue_segment_w_scores


 75%|███████▌  | 3/4 [00:27<00:09,  9.26s/it]

Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/sub-4/ses-01/thresholded_tissue_segment_w_scores


100%|██████████| 4/4 [00:37<00:00,  9.26s/it]


All Done. Enjoy your atrophy seeds.

--Calvin