# Get Files and Put them in a CSV
- First, we will find the nifti files of interest and put them in a dataframe. 
- Second, you have two options. 
    - 1) You may save the nifti paths to a CSV file on their own. 
    - 2) You may add these nifti paths to another CSV and save that with them in it.

# 01 - Find Files From Paths

**Search for the Files**
_______
Formatting the Directory-Pattern Dictionary
The function glob_multiple_file_paths expects a dictionary where each key-value pair corresponds to a root directory and a file pattern to search within that directory. The keys are the root directories where you want to start the search, and the values are the file patterns to match against.

Example Dictionary Format:

>dir_pattern_dict = {
>    '/path/to/first/root_dir': '*.nii',
>
>    '/path/to/second/root_dir': '*.nii.gz',
>
>    '/another/path': '*_label.nii'
>     Add more key-value pairs as needed
>}

Using Wildcards:

The file patterns can include wildcards to match multiple files:
- *: Matches zero or more characters
- **: Searches all directories recursively
- *.nii will match all files ending with .nii
- ?: Matches any single character
- file?.nii will match file1.nii, file2.nii, etc.
- [seq]: Matches any character in seq
- file[1-3].nii will match file1.nii, file2.nii, file3.nii
- [!seq]: Matches any character NOT in seq
- file[!1-3].nii will match any file that doesn't have 1, 2, or 3 in that position, like file4.nii, file5.nii, etc.

Feel free to combine these wildcards to create complex file patterns. For example, *_??.nii will match files like file_01.nii, file_02.nii, etc.

Where to Save to

In [8]:
out_dir = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/resources/datasets/Jung_TMS_AD'
filename = 'master_list_lesions.csv'

**Non-Cognitive Controls. Pending**
dir_pattern_dict = {
    '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/grafman/derivatives/network_maps/grafman_noncognitive_controls': '**/*.nii*',
    '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/kletenik_ms/derivatives/symptom_maps': '**/*CONTRAST*.nii',
    '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/corbetta/derivatives/symptom_networks/noncognitive_controls/r_map': '**/*nii',
    
}

In [9]:
# Define the dictionary with root directories and file patterns
dir_pattern_dict = {
    '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/resources/datasets/Jung_TMS_AD/derivatives': 'sub-*/ses-01/conn/*.nii.gz'}

## Glob the files and check to see if acceptable

In [10]:
save_files = False

In [11]:
from calvin_utils.file_utils.file_path_collector import glob_multiple_file_paths
import os
# Validate Directory
# os.mkdir(os.path.dirname(csv_path))
# Call the function and save the returned DataFrame to a CSV file
path_df = glob_multiple_file_paths(dir_pattern_dict, save=save_files, save_path=None)

# Display the saved path and the DataFrame
display(path_df)

Unnamed: 0,paths
0,/Users/cu135/Partners HealthCare Dropbox/Calvi...
1,/Users/cu135/Partners HealthCare Dropbox/Calvi...
2,/Users/cu135/Partners HealthCare Dropbox/Calvi...
3,/Users/cu135/Partners HealthCare Dropbox/Calvi...
4,/Users/cu135/Partners HealthCare Dropbox/Calvi...
5,/Users/cu135/Partners HealthCare Dropbox/Calvi...
6,/Users/cu135/Partners HealthCare Dropbox/Calvi...
7,/Users/cu135/Partners HealthCare Dropbox/Calvi...
8,/Users/cu135/Partners HealthCare Dropbox/Calvi...
9,/Users/cu135/Partners HealthCare Dropbox/Calvi...


In [12]:
print(path_df.iloc[0,0])

/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/resources/datasets/Jung_TMS_AD/derivatives/sub-S38/ses-01/conn/sub-S38-MNI152_T1_2mm-tms_sphere_roi_Precom_T.nii.gz


In [13]:
# Define the preceding and proceeding strings
preceding = 'sub-'
proceeding = '/ses-01'

# Extract the substring and add it to a new column 'subject'
path_df['subject'] = path_df['paths'].str.extract(f'{preceding}(.*?){proceeding}')

# Display the updated DataFrame
display(path_df)

Unnamed: 0,paths,subject
0,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S38
1,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S31
2,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S09
3,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S36
4,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S37
5,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S06
6,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S46
7,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S48
8,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S24
9,/Users/cu135/Partners HealthCare Dropbox/Calvi...,S12


In [14]:
import os
os.makedirs(out_dir, exist_ok=True)
path_df.to_csv(os.path.join(out_dir, filename))

# 02 - Import Another CSV and Add the Paths to It
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [17]:
# spreadsheet_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/collaborations/hart_epilepsy_meta_analysis/master_list.csv'
# sheet = None #If using Excel, enter a string here

In [18]:
# from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# # Instantiate the PalmPrepararation class
# cal_palm = CalvinStatsmodelsPalm(input_csv_path=spreadsheet_path, output_dir=os.path.dirname(spreadsheet_path), sheet=sheet)
# # Call the process_nifti_paths method
# data_df = cal_palm.read_and_display_data()

# 03 - Save The New CSV

In [19]:
# data_df['nifti_paths'] = path_df['paths']

Hope this was helpful

--Calvin