# Get Files and Put them in a CSV
- First, we will find the nifti files of interest and put them in a dataframe. 
- Second, you have two options. 
    - 1) You may save the nifti paths to a CSV file on their own. 
    - 2) You may add these nifti paths to another CSV and save that with them in it.

# 01 Option A (Preferred) - Find Files From Paths

**Search for the Files**
_______
Formatting the Directory-Pattern Dictionary
The function glob_multiple_file_paths expects a dictionary where each key-value pair corresponds to a root directory and a file pattern to search within that directory. The keys are the root directories where you want to start the search, and the values are the file patterns to match against.

Example Dictionary Format:

>dir_pattern_dict = {
>    '/path/to/first/root_dir': '*.nii',
>
>    '/path/to/second/root_dir': '*.nii.gz',
>
>    '/another/path': '*_label.nii'
>     Add more key-value pairs as needed
>}

Using Wildcards:

The file patterns can include wildcards to match multiple files:
- *: Matches zero or more characters
- **: Searches all directories recursively
- *.nii will match all files ending with .nii
- ?: Matches any single character
- file?.nii will match file1.nii, file2.nii, etc.
- [seq]: Matches any character in seq
- file[1-3].nii will match file1.nii, file2.nii, file3.nii
- [!seq]: Matches any character NOT in seq
- file[!1-3].nii will match any file that doesn't have 1, 2, or 3 in that position, like file4.nii, file5.nii, etc.

Feel free to combine these wildcards to create complex file patterns. For example, *_??.nii will match files like file_01.nii, file_02.nii, etc.

**Non-Cognitive Controls. Pending**
dir_pattern_dict = {
    '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/grafman/derivatives/network_maps/grafman_noncognitive_controls': '**/*.nii*',
    '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/kletenik_ms/derivatives/symptom_maps': '**/*CONTRAST*.nii',
    '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/corbetta/derivatives/symptom_networks/noncognitive_controls/r_map': '**/*nii',
    
}

In [None]:
# Define the dictionary with root directories and file patterns
dir_pattern_dict = {
    '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/smoothed_atrophy_seeds': '*/*/unthresholded_tissue_segment_z_scores/*grey*no*'}

## Glob the files and check to see if acceptable

In [None]:
save_files = False

In [6]:
from calvin_utils.file_utils.file_path_collector import glob_multiple_file_paths
import os
# Validate Directory
# os.mkdir(os.path.dirname(csv_path))
# Call the function and save the returned DataFrame to a CSV file
path_df = glob_multiple_file_paths(dir_pattern_dict, save=save_files, save_path=None)

# Display the saved path and the DataFrame
display(path_df)

Unnamed: 0,paths
0,/Users/cu135/Dropbox (Partners HealthCare)/stu...
1,/Users/cu135/Dropbox (Partners HealthCare)/stu...
2,/Users/cu135/Dropbox (Partners HealthCare)/stu...
3,/Users/cu135/Dropbox (Partners HealthCare)/stu...
4,/Users/cu135/Dropbox (Partners HealthCare)/stu...
5,/Users/cu135/Dropbox (Partners HealthCare)/stu...
6,/Users/cu135/Dropbox (Partners HealthCare)/stu...
7,/Users/cu135/Dropbox (Partners HealthCare)/stu...
8,/Users/cu135/Dropbox (Partners HealthCare)/stu...
9,/Users/cu135/Dropbox (Partners HealthCare)/stu...


# 01 Option B - Import a Spreadsheet and Get the Files From it
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [None]:
spreadsheet_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/metadata/paths_and_covariates/merged_dataframe.csv'
sheet = None

In [None]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=spreadsheet_path, output_dir=os.path.dirname(spreadsheet_path), sheet=sheet)
# Call the process_nifti_paths method
path_df = cal_palm.read_and_display_data()

In [None]:
import pandas as pd
from glob import glob
def iterate_fill_file_path_in_df(dataframe: pd.DataFrame, target_dict: dict) -> pd.DataFrame:
    # Iterate over each key in the target dictionary (e.g., 'blinded_id', 'PTID')
    for subject_col, path_configs in target_dict.items():
        # Iterate over the list of path configuration dictionaries for each key
        for path_config in path_configs:
            # Each path_config is a dictionary with one key-value pair
            for path_col, path_info in path_config.items():
                # Path_info is also a dictionary with one key-value pair, get the root_dir and target_name
                for root_dir, target_name in path_info.items():
                    # Use the fill_file_path_in_df function to update the dataframe
                    dataframe = fill_file_path_in_df(dataframe, subject_col, path_col, root_dir, target_name)
    return dataframe
        

def fill_file_path_in_df( dataframe:pd.DataFrame, sub_id_col:str, path_col:str, root_directory:str, target_name:str, debug:bool=False) -> pd.DataFrame:
    """
    this will iterate over each subject in the dataframe, replace their name in the glob target, and find the path. Then, it will fill the path in the dataframe.
    This expects target_name input to be a wildcarded glob target name, with '<sub_id>' representing the thing which will be replaced by the subject id of the given column. 
    """
    
    # Iterating over each row.
    for index, col in dataframe.iterrows():
        # get the subject name for that row
        subject = str(dataframe.loc[index, sub_id_col])
        newname = target_name.replace("<sub_id>", subject)
        glob_target =  os.path.join(root_directory, newname)
        globbed_path = glob(glob_target)
        if debug:
            print("target_name: ", newname)
            print("I will check: ", glob_target)
            print("I found: ", globbed_path)

        dataframe.loc[index, path_col] = globbed_path[0] if globbed_path else None
    return dataframe
    

Takes a Dict of Targets

```
dict = {subject_col: {path_col1: {root:target_name}},
        {path_col2:  {root:target_name}},
        etc}
```

In [None]:
target_dict ={'blinded_id': [
                            {'z6_csf_paths': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/smoothed_atrophy_seeds':'*/*/unthresholded_tissue_segment_z_scores/*00*<sub_id>*cerebro*no*'}},
                            {'z6_wm_paths': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/smoothed_atrophy_seeds':'*/*/unthresholded_tissue_segment_z_scores/*00*<sub_id>*white*no*'}},
                            {'z6_gm_paths': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/smoothed_atrophy_seeds':'*/*/unthresholded_tissue_segment_z_scores/*00*<sub_id>*grey*no*'}},
                            {'z6_ct_paths': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/smoothed_atrophy_seeds':'*/*/unthresholded_tissue_segment_z_scores/*00*<sub_id>*ct*no*'}},
                            {'w6_csf_path': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/with_intercept': '*/*/tissue_segment_w_scores/sub-<sub_id>*cerebro*'}},
                            {'w6_wm_path': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/with_intercept':'*/*/tissue_segment_w_scores/sub-<sub_id>*white*'}},
                            {'w6_gm_path': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/vbm/with_intercept':'*/*/tissue_segment_w_scores/sub-<sub_id>*grey*'}}
                            ],
              'PTID': [
                            {'w6_ct_path': {'/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/wmaps/sbm/10mm_covariates_untrhesholded':'*/vol/<sub_id>*MNI152*.nii.gz'}}
              ]
              }

In [None]:
path_df = iterate_fill_file_path_in_df(path_df, target_dict)
path_df

# 02 Option B - Import Another CSV and Add the Paths to It
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [7]:
spreadsheet_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/metadata/paths_and_covariates/merged_dataframe.csv'
sheet = None #If using Excel, enter a string here

In [None]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=spreadsheet_path, output_dir=os.path.dirname(spreadsheet_path), sheet=sheet)
# Call the process_nifti_paths method
data_df = cal_palm.read_and_display_data()

What Should the Column Be Called

In [None]:
column_name = 'File_Paths'

In [None]:
data_df[column_name] = path_df['paths']

Save Results

In [None]:
data_df.to_csv('/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/metadata/paths_and_covariates/master_metadata_list.csv')

Hope this was helpful

--Calvin