## Narrowing the data based on sequencing type
Previously I sorted data to 4 cohort (check "mri_data_sort_to_cohorts.ipynb"). <br>
Now I want to keep only relevant sequencing types: MPRAGE and FSPGR. <br> 
<br>
In my 4 cohorts you can find 3 different naming conventions. I will provide 3 examples, one from each type: <br>
1) 1018_NACC282203_20170908ni <br>
2) mri129ni<br>
3) NACC497363_128401136192134176253428319601354034337135ni<br> 

The 1) all have MPRAGE sequencing, the 2) all have FSPGR sequencing and some of 3) have FSPGR, some MPRAGE.

In [6]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import os 
import re
import shutil

### 1) 1018_NACC282203_20170908ni
Directory structure example: <br>
<br>
1018_NACC282203_20170908ni 
- 1018_NACC282203_20170908 
    - Mag_Images_17_1312211075219452552017090815014550782025439000
    - mIP_ImagesSW_19_1312211075219452552017090815014550780625438000
    - MPRAGE_GRAPPA2_6_1312211075219452552017090814552268696468275000
    - Pha_Images_18_1312211075219452552017090815014550782825440000
    - SWI_Images_20_1312211075219452552017090815014550784425442000
    - T2FLAIRSPACENEW_7_1312211075219452552017090814555267209169171000

I will only keep the MPRAGE folder and delete the others.

In [7]:
ncPath = '../../NACC_data/sorted_cohorts/NC/'
mciPath = '../../NACC_data/sorted_cohorts/MCI/'
alzdPath = '../../NACC_data/sorted_cohorts/ALZD/'
transPath = '../../NACC_data/sorted_cohorts/TRANS/'

In [8]:
# Regular expression pattern for folder names starting with 4 digits followed by an underscore
pattern = r'^\d{4}_'

In [9]:
# List to store the matching folders
matching_folders = []

# Iterate over the items in the directory
for item in os.listdir(ncPath):
    item_path = os.path.join(ncPath, item)
    
    # Check if the item is a folder and matches the pattern
    if os.path.isdir(item_path) and re.match(pattern, item):
        matching_folders.append(item_path)

In [10]:
for folder in matching_folders:
    print(folder)

../../NACC_data/sorted_cohorts/NC/1018_NACC282203_20170908ni
../../NACC_data/sorted_cohorts/NC/1018_NACC282203_20201106ni
../../NACC_data/sorted_cohorts/NC/1018_NACC711567_20200114ni
../../NACC_data/sorted_cohorts/NC/1018_NACC711567_20201214ni
../../NACC_data/sorted_cohorts/NC/1018_NACC822475_20171116ni
../../NACC_data/sorted_cohorts/NC/1018_NACC822475_20201119ni


In [11]:
# Regular expression for MPRAGE subfolder
keep_prefix = 'MPRAGE'

In [12]:
subfs = []

# Loop through each folder in matching_folders
for folder in matching_folders:

    # Get the path to the subfolder (e.g., 1018_NACC282203_20170908)
    subfolder_path = os.path.join(folder, os.listdir(folder)[0])         # only 1 subfolder exists at this level

    # List all subfolders in the subfolder_path
    subfolders = os.listdir(subfolder_path)

    subfs.append(subfolders)



[['Mag_Images_17_1312211075219452552017090815014550782025439000', 'mIP_ImagesSW_19_1312211075219452552017090815014550780625438000', 'MPRAGE_GRAPPA2_6_1312211075219452552017090814552268696468275000', 'Pha_Images_18_1312211075219452552017090815014550782825440000', 'SWI_Images_20_1312211075219452552017090815014550784425442000', 'T2FLAIRSPACENEW_7_1312211075219452552017090814555267209169171000'], ['Mag_Images_14_1312211075219452552020110611062354047895758000', 'mIP_ImagesSW_16_1312211075219452552020110611062354046095757000', 'MPRAGE_GRAPPA2_6_1312211075219452552020110610432126420639580000', 'Pha_Images_15_1312211075219452552020110611062354048995759000', 'SWI_Images_17_1312211075219452552020110611062354050895761000', 'T2FLAIRSPACENEW_7_1312211075219452552020110610555731873340561000'], ['Mag_Images_14_1312211075219452552020011409280154872195758000', 'mIP_ImagesSW_16_1312211075219452552020011409280154870295757000', 'MPRAGE_GRAPPA2_6_1312211075219452552020011409045885927639580000', 'Pha_Images

In [None]:
for subfolder in subfs:
        
        subfolder_full_path = os.path.join(subfolder_path, subfolder)
        
        # Check if the subfolder name starts with 'MPRAGE'
        if os.path.isdir(subfolder_full_path) and not subfolder.startswith(keep_prefix):

            # If the subfolder doesn't start with 'MPRAGE', delete it
            shutil.rmtree(subfolder_full_path)
            print(f"Deleted: {subfolder_full_path}")
        
        else:
            print(f"Kept: {subfolder_full_path}")