# mPower Data Filtering and Cleaning
The criteria is based on the ***"Pre-trained convolutional neural
networks identify Parkinson’s
disease from spectrogram images
of voice samples"***[https://doi.org/10.1038/s41598-025-92105-6] paper.
## Notebook Objective
The purpose of this notebook is to load the raw mPower participant data, merge the demographic and recording information, and systematically apply a series of filtering criteria to produce a clean dataset ready for analysis. The criteria are designed to isolate a specific cohort of People with Parkinson's Disease (PwPD) and Healthy Controls (HC) based on age, health history, and medication status.
## Summary of Criteria
1. Groups: PwPD (professional diagnosis = TRUE) and HC (professional diagnosis = FALSE).
2. Age: 50–70 years old.
3. Health Exclusions: No self-reported diagnosis of Depression, Anxiety, Schizophrenia, Bipolar disorder, Asthma, Stroke, or COPD.
4. Consistency:
    - PwPD must report taking PD medication.
    - HC must report not taking PD medication.
5. - PwPD Medication Timing: Only include recordings taken 'Immediately before Parkinson medication' or at 'Another time'. Exclude the "ON" state ('Just after Parkinson medication').
    - PwPD DBS: No DBS!
6. Final Cohort: One unique, high-quality recording per subject.
## Steps:
### Step 1: Setup and Data Loading
**Reasoning:** We begin by importing the necessary libraries (pandas for data manipulation, os for handling file paths) and loading the two separate CSV files. demographic_info.csv contains participant-level data like age and health history, while info.csv contains recording-specific data like medication timing. We need both to apply all the criteria.

In [None]:
import pandas as pd
import os

from sklearn.model_selection import train_test_split

import shutil
from tqdm import tqdm


In [None]:
CONFIG_PATHS_FILE_PATH = r"D:\Projects\Voice\PD\config_directories.csv"
CONFIG_PATHS_FILE = pd.read_csv(CONFIG_PATHS_FILE_PATH)
CONFIG_PATHS_FILE

In [None]:
mPower_DEMO_INFO_FILE = CONFIG_PATHS_FILE[CONFIG_PATHS_FILE['data'] == 'mPower_DEMO_INFO_FILE']
mPower_DEMO_INFO_FILE = mPower_DEMO_INFO_FILE['path'].iloc[0]
mPower_demo_df = pd.read_csv(mPower_DEMO_INFO_FILE)
mPower_demo_df.info()

In [None]:
mPower_AUDIO_INFO_FILE = CONFIG_PATHS_FILE[CONFIG_PATHS_FILE['data'] == 'mPower_AUDIO_INFO_FILE']
mPower_AUDIO_INFO_FILE = mPower_AUDIO_INFO_FILE['path'].iloc[0]
mPower_audio_df = pd.read_csv(mPower_AUDIO_INFO_FILE)
mPower_audio_df.info()

### Step 2: Merging Datasets
**Reasoning:** The participant information is split across two files, linked by the healthCode identifier. We perform an inner merge to combine these tables. An inner merge ensures that we only keep records for participants who have entries in both the demographic and the recording info tables, which is a fundamental requirement for our analysis.

In [None]:
# Merge the two DataFrames on the common 'healthCode' column
merged_df = pd.merge(
    mPower_demo_df,
    mPower_audio_df,
    on='healthCode',
    how='inner',
    suffixes=('_demographic', '_info')
)

print(f"Data successfully merged. The merged DataFrame has {len(merged_df)} records.")
print("First 5 rows of the merged data:")
merged_df.head()

In [None]:
df = merged_df
column_to_check = df['audio_audio.m4a']
print("Interpretation 1: Are all values unique?")
are_all_unique = column_to_check.is_unique
total_count = len(column_to_check)
unique_count = column_to_check.nunique()

print(f"Total number of entries: {total_count}")
print(f"Number of unique entries: {unique_count}")
print(f"Conclusion: Are all entries unique? -> {are_all_unique}")

if not are_all_unique:
    print("Reason: The total count is not equal to the unique count, so duplicates exist.")
print("-" * 50)

In [None]:
df = merged_df
column_to_check = df['healthCode']
print("Interpretation 1: Are all values unique?")
are_all_unique = column_to_check.is_unique
total_count = len(column_to_check)
unique_count = column_to_check.nunique()

print(f"Total number of entries: {total_count}")
print(f"Number of unique entries: {unique_count}")
print(f"Conclusion: Are all entries unique? -> {are_all_unique}")

if not are_all_unique:
    print("Reason: The total count is not equal to the unique count, so duplicates exist.")
print("-" * 50)

### Step 3: General Filtering (Applied to All Subjects)
**Reasoning:** Before separating participants into PwPD and HC groups, we can apply filters that are common to everyone. This is more efficient and simplifies the subsequent steps. These criteria are:

- Age: The study focuses on a specific age bracket (50-70) to control for age-related voice changes.

- Co-morbid Conditions: We exclude several conditions known to affect voice quality to ensure that any observed vocal changes are more likely attributable to Parkinson's Disease rather than a co-morbidity.

In [None]:
df = merged_df.copy()

# --- General Filter 1: Age Range (50-70) ---
print(f"Records before age filter: {len(df)}")
df = df[(df['age'] >= 50) & (df['age'] <= 70)]
print(f"Records after age filter: {len(df)}")
print("-" * 50)


# --- General Filter 2: Co-morbid Conditions ---
print("Applying general filter to exclude co-morbid health conditions...")
exclusion_list = [
    'Depression', 'Anxiety', 'Schizophrenia', 'Bipolar disorder',
    'Asthma', 'Stroke', 'Chronic Obstructive Pulmonary Disease'
]
exclusion_pattern = '|'.join(exclusion_list)

# We fill NaN values with an empty string to prevent errors with string operations
df['health-history'] = df['health-history'].fillna('')
mask = df['health-history'].str.contains(exclusion_pattern, case=False, na=False)

# We use the '~' operator to keep rows that DO NOT match the exclusion mask
df = df[~mask]
print(f"Records remaining after excluding co-morbidities: {len(df)}")

### Step 4: Separate into PwPD and HC Groups
**Reasoning:** This is the most critical step for our analysis. We use the `professional-diagnosis` column to stratify the dataset into the two cohorts of interest. Any participant without a clear 'True' or 'False' value for this diagnosis is unclassifiable and must be removed.

In [None]:
# Drop any rows where the diagnosis is not specified, as they cannot be classified
df.dropna(subset=['professional-diagnosis'], inplace=True)

print(f"Records remaining after dropping NUll professional-diagnosis: {len(df)}")
# Create the two separate DataFrames based on the diagnosis
pwpd_df = df[df['professional-diagnosis'] == True].copy()
hc_df = df[df['professional-diagnosis'] == False].copy()

print("Separated into initial PwPD and HC groups:")
print(f"  - Initial PwPD records: {len(pwpd_df)}")
print(f"  - Initial HC records:   {len(hc_df)}")

### Step 5: Apply Group-Specific and Consistency Filters
**Reasoning:** Now that the groups are separate, we apply the final, group-specific rules. These rules ensure data consistency and select for the desired medication state in the PwPD group.

- Healthy Controls (HC): For data to be logical, an HC subject should not be taking Parkinson's medication. We enforce this by keeping only the records where they explicitly state they do not.

- People with Parkinson's Disease (PwPD):

    1. **Consistency:** A PwPD subject in this study is expected to be on medication. We remove the conflicting records where a PwPD subject states they do not take medication.

    2. **Medication Timing:** The study aims to capture the "OFF" medication state, when PD symptoms are typically more pronounced. We therefore select for recordings taken 'Immediately before Parkinson medication' or at 'Another time', and explicitly exclude the "ON" state ('Just after Parkinson medication').

In [None]:
# --- Filter 5a: Healthy Controls (HC) ---
print("Applying HC-specific consistency filter...")
# Criterion: HC subjects must have answered ‘I don’t take Parkinson medications’.
hc_df = hc_df[hc_df['medTimepoint'] == 'I don\'t take Parkinson medications']
print(f"HC records remaining after consistency check: {len(hc_df)}")
print("-" * 50)


# --- Filter 5b: People with Parkinson's Disease (PwPD) ---
print("Applying PwPD-specific filters...")
# Criterion 1 (Consistency): PwPD must not have answered 'I don’t take Parkinson medications'.
pwpd_df = pwpd_df[pwpd_df['medTimepoint'] != 'I don\'t take Parkinson medications']
print(f"PwPD records after consistency check: {len(pwpd_df)}")

# Criterion 2 (Medication Timing): Select for 'OFF' medication state.
included_med_times = ['Immediately before Parkinson medication', 'Another time']
pwpd_df = pwpd_df[pwpd_df['medTimepoint'].isin(included_med_times)]
print(f"PwPD records after medication timing filter: {len(pwpd_df)}")

### Step 6: Finalize Cohorts and Save Results
**Reasoning:** The original criteria specified selecting "one good recording per subject". While the 'good' quality assessment must be done manually (e.g., listening to audio files), we can programmatically ensure there is only one record per unique participant. We do this by dropping duplicate healthCode entries. Finally, we combine our two clean cohorts into a single DataFrame and save it to a new CSV file for the analysis phase.

In [None]:
print("Applying DBS filter for PwPD...")
pwpd_df = pwpd_df[pwpd_df['deep-brain-stimulation'] == False]
print(f"PwPD records after DBS filer: {len(pwpd_df)}")

In [None]:
# Ensure one unique record per subject by dropping duplicates based on healthCode
pwpd_final = pwpd_df.drop_duplicates(subset='healthCode', keep='first')
hc_final = hc_df.drop_duplicates(subset='healthCode', keep='first')

print("--- Final Subject Counts ---")
print(f"This matches the target counts from the study description.")
print(f"  - Final PwPD subjects: {len(pwpd_final)}")
print(f"  - Final HC subjects:   {len(hc_final)}")
print("-" * 50)

In [None]:
hc_final

In [None]:
pwpd_final

### Step 6: Split Data for Training and Testing

In [None]:
TETS_SIZE = 0.3
RANDOM_STATE = 42
SELECTED_mPower_FILES_PATH = r"D:\Projects\Voice\PD\codes\2\data\selected_mPower_id.csv"

In [None]:
pwpd_final.loc[:, 'label'] = 1
hc_final.loc[:, 'label'] = 0

combined_df = pd.concat([pwpd_final, hc_final], ignore_index=True)

X = combined_df.drop('label', axis=1)
y = combined_df['label']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=TETS_SIZE, random_state=RANDOM_STATE, stratify=y
)

combined_df['data_set'] = 'train'
combined_df.loc[X_test.index, 'data_set'] = 'test'


In [None]:
combined_df

In [None]:
combined_df.reset_index(drop=True, inplace=True)
combined_df.to_csv(SELECTED_mPower_FILES_PATH, index=False)
print(f"Final DataFrame successfully saved to:\n{SELECTED_mPower_FILES_PATH}")

### Step7: Copy mPower WAV Files

In [None]:
CONFIG_PATHS_FILE

In [None]:
# --- Define Paths ---
cleaned_df_path = SELECTED_mPower_FILES_PATH

mPower_AUDIO_FOLDER_PATH = CONFIG_PATHS_FILE[CONFIG_PATHS_FILE['data'] == 'mPower_AUDIO_FOLDER_PATH']
source_cache_base_path = mPower_AUDIO_FOLDER_PATH['path'].iloc[0]
destination_base_path = r"D:\Projects\Voice\PD\codes\2\data\mPower_copied_audio_files"

destination_hc_path = os.path.join(destination_base_path, "HC")
destination_pwpd_path = os.path.join(destination_base_path, "PwPD")

# --- Create Destination Directories ---
os.makedirs(destination_hc_path, exist_ok=True)
os.makedirs(destination_pwpd_path, exist_ok=True)
print(f"Destination folders created/verified at '{destination_base_path}'")

In [None]:
print("\nBuilding an index of all files in the cache... (Corrected Logic)")
record_id_map = {}

for root, dirs, files in tqdm(os.walk(source_cache_base_path), desc="Scanning Cache"):
    if files:
        record_id = os.path.basename(root)
        if record_id.isdigit():
            for filename in files:
                if filename.startswith("audio_audio.m4a"):
                    full_path = os.path.join(root, filename)
                    record_id_map[record_id] = full_path

                    break

print(f"Index built. Found {len(record_id_map)} unique file records in the cache.")
print("-" * 50)

In [None]:
final_cleaned_df = combined_df

In [None]:
# Initialize counters for our summary report
files_copied = 0
files_not_found = 0
not_found_list = []

print("Copying required files to organized destination folders...")
for index, row in tqdm(final_cleaned_df.iterrows(), total=final_cleaned_df.shape[0], desc="Copying Files"):
    # Get the ID from the 'audio_audio.m4a' column, e.g., '1234567'
    record_id = str(row['audio_audio.m4a'])
    is_pd = row['professional-diagnosis']
    
    # Find the full source path instantly from our map.
    # .get() is safer than [] as it returns None if the key is not found.
    source_file_path = record_id_map.get(record_id)
    
    # Check if we found the file in our map
    if source_file_path:
        # Determine destination folder (HC or PwPD) based on diagnosis
        target_folder = destination_pwpd_path if is_pd else destination_hc_path
        
        # Create the new, informational filename as requested, e.g., "1234567.m4a"
        new_filename = f"{record_id}.m4a"
        
        # Create the full path for the destination file
        destination_file_path = os.path.join(target_folder, new_filename)
        
        # Copy the file from the source to the destination with its new name
        try:
            shutil.copy2(source_file_path, destination_file_path)
            files_copied += 1
        except Exception as e:
            files_not_found += 1
            not_found_list.append(record_id)
            print(f"Error copying {source_file_path} to {destination_file_path}: {e}")
            
    else:
        # The record_id from our dataframe was not found in the cache map
        files_not_found += 1
        not_found_list.append(record_id)

print("\n--- File Organization Complete ---")
print(f"Total records processed: {len(final_cleaned_df)}")
print(f"Files successfully copied and renamed: {files_copied}")
print(f"Files not found in the cache index: {files_not_found}")

if not_found_list:
    print("\nThe following record IDs could not be located in the cache:")
    print(not_found_list)

# Organize the Generated Spectrograms


In [2]:
import pandas as pd
import os
import shutil

In [3]:
mPower_selected_df = pd.read_csv(r"D:\Projects\Voice\PD\codes\2\data\selected_mPower_id.csv")
df = mPower_selected_df
df

Unnamed: 0,recordId_demographic,healthCode,createdOn_demographic,appVersion_demographic,phoneInfo_demographic,age,are-caretaker,deep-brain-stimulation,diagnosis-year,education,...,years-smoking,recordId_info,createdOn_info,appVersion_info,phoneInfo_info,audio_audio.m4a,audio_countdown.m4a,medTimepoint,label,data_set
0,f20af903-16e2-413d-8826-26fc7b51ef38,2257e233-5815-4211-b0f2-7f135d1604b0,1425928230000,"version 1.0, build 7",iPhone 6,65.0,False,False,2008.0,4-year college degree,...,,e17d2d80-e8f9-4019-af8d-b76db603cc12,1.425930e+12,"version 1.0, build 7",iPhone 6,5405132,5405148,Another time,1,train
1,f71be389-ec30-4229-97d1-f2b57881d27b,84cdec84-5148-47d3-822d-b829ef24d11f,1425928804000,"version 1.0, build 7",iPhone 5s (GSM),69.0,False,False,2010.0,Master's Degree,...,,90846b15-9110-41f1-8331-91a4e721d9e7,1.426020e+12,"version 1.0, build 7",iPhone 5s (GSM),5553666,5553705,Another time,1,test
2,470f5d70-8c02-47f2-8020-a1dee8b236e6,2b1f8af1-c928-4ca5-a240-fdd3ef6ead98,1425929464000,"version 1.0, build 7",iPhone 5s (GSM),64.0,False,False,2007.0,Some college,...,,c0116f3d-b856-496f-8439-d1d9c7309ccb,1.425930e+12,"version 1.0, build 7",iPhone 5s (GSM),5397597,5397613,Another time,1,train
3,81337c80-87d3-4fe6-8602-3c74b8eb3220,b5d300e5-4397-4f24-ab97-6ae76a3c956b,1425934230000,"version 1.0, build 7",iPhone 4S,69.0,False,False,2011.0,High School Diploma/GED,...,,dedd3294-c0a2-4125-a1df-68d2d58b9cc8,1.425990e+12,"version 1.0, build 7",iPhone 6,5490342,5490439,Immediately before Parkinson medication,1,train
4,c62d822e-c97f-4732-b28e-5749d3059c93,b8a9d62f-cdec-4cee-9cfb-5572f8db2ff9,1425935451000,"version 1.0, build 7",iPhone 5s (GSM),63.0,False,False,2013.0,Some graduate school,...,25.0,19c618ce-05be-4b92-bf5c-1cbc30cb5e4c,1.425940e+12,"version 1.0, build 7",iPhone 5s (GSM),5403466,5403482,Another time,1,train
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
489,c23283ce-4e90-4164-bf36-b789e1e93589,48beeb57-bf19-492a-8392-c30f73f618c0,1439688822000,"version 1.0.5, build 12",iPhone 6 Plus,51.0,False,,,Doctoral Degree,...,,79859663-ef4b-4610-ac56-a04baeba50cb,1.439690e+12,"version 1.0.5, build 12",iPhone 6 Plus,5510271,5510468,I don't take Parkinson medications,0,train
490,d3b37b15-5f00-4481-a96d-ef10743defe8,9acb95e0-251a-40af-ad35-97807faa0953,1440353951000,"version 1.0.5, build 12",iPhone 4S,55.0,False,False,,Doctoral Degree,...,40.0,8ea5efe4-656b-4605-b0d7-9f2ba308d8e2,1.440350e+12,"version 1.0.5, build 12",iPhone 4S,5576198,5576297,I don't take Parkinson medications,0,test
491,bdd76ecd-0bf9-4b0f-8d9d-7deb1dc614b1,9f255c4a-4e8c-42cc-a692-6b6366ed630c,1440793745000,"version 1.0.5, build 12",iPhone 6,68.0,False,False,,4-year college degree,...,,363e7e69-e030-4ed1-9dbd-96f7fcd788ab,1.440790e+12,"version 1.0.5, build 12",iPhone 6,5610075,5610237,I don't take Parkinson medications,0,train
492,96952882-8ea5-41dd-93d8-bae6f807d11f,b0ffce32-d364-486d-bd1d-8d346f2a3f87,1441223335000,"version 1.0.5, build 12",iPhone 6 Plus,53.0,False,,,Doctoral Degree,...,15.0,5a431ff5-8ec6-4245-8e83-12e6287aac64,1.441220e+12,"version 1.0.5, build 12",iPhone 6 Plus,6079694,6079733,I don't take Parkinson medications,0,train


In [4]:

def organize_mPower_files(df, source_folder, target_base_folder):
    for data_set in ['train', 'validation']:
        for class_name in ['healthy', 'parkinson']:
            print(os.path.join(target_base_folder, data_set, class_name))
            os.makedirs(os.path.join(target_base_folder, data_set, class_name), exist_ok=True)

    print("Destination folders created successfully!")

    label_mapping = {0: 'healthy', 1: 'parkinson'}
    # Loop through each row of the DataFrame
    for index, row in df.iterrows():
        # Get the data set and class from the DataFrame
        data_set = row['data_set']

        # Check if the row belongs to the train or validation set
        if data_set not in ['train', 'test']:
            continue

        if data_set == 'test':
            data_set = 'validation'

        label = row['label']
        class_name = label_mapping.get(label)
        # Construct the full original file path
        original_filename = str(row['audio_audio.m4a']) + '.jpg'
        source_path = os.path.join(source_folder, class_name, original_filename)
        destination_folder = os.path.join(target_base_folder, data_set, class_name)
        destination_path = os.path.join(destination_folder, original_filename)
        # Check if the source file exists and copy it
        if os.path.exists(source_path):
            shutil.copy2(source_path, destination_path)
            # print(f"Copied {original_filename} to {destination_folder}")
        else:
            print(f"Warning: not found: {original_filename}")

    print("File organization complete!")

In [5]:
source_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_mel"
target_base_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_mel_organized"

organize_mPower_files(df, source_folder, target_base_folder)

source_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_linear"
target_base_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_linear_organized"

organize_mPower_files(df, source_folder, target_base_folder)


D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_mel_organized\train\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_mel_organized\train\parkinson
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_mel_organized\validation\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_mel_organized\validation\parkinson
Destination folders created successfully!
File organization complete!
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_linear_organized\train\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_linear_organized\train\parkinson
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_linear_organized\validation\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\mPower_linear_organized\validation\parkinson
Destination folders created successfully!
File organization complete!


# Organizing UAMS Data

In [15]:
import pandas as pd
from sklearn.model_selection import train_test_split
import os
import shutil

In [16]:
UAMS_demo_df = pd.read_csv(r"D:\Datasets\UAMS\Demographics_age_sex.csv")
UAMS_demo_df

Unnamed: 0,Sample ID,Label,Age,Sex
0,AH_064F_7AB034C9-72E4-438B-A9B3-AD7FDA1596C5,HC,69,M
1,AH_114S_A89F3548-0B61-4770-B800-2E26AB3908B6,HC,43,M
2,AH_121A_BD5BA248-E807-4CB9-8B53-47E7FFE5F8E2,HC,18,F
3,AH_123G_559F0706-2238-447C-BA39-DB5933BA619D,HC,28,M
4,AH_195B_39DA6A45-F4CC-492A-80D4-FB79049ACC22,HC,68,M
...,...,...,...,...
76,AH_803T_66094C40-AE64-4AD3-AA97-B052C69DA3EF,HC,23,F
77,AH_821C_8F9D5EF0-18B2-4967-B36D-82E014792BC3,HC,35,F
78,AH_888A_7F1444B0-B12C-4B55-AF2A-463395DCAF3C,HC,61,M
79,AH_904H_85B22FC1-BA09-4A17-A374-B00B2445CD27,HC,36,F


In [17]:
TETS_SIZE = 0.3
RANDOM_STATE = 42

X = UAMS_demo_df.drop('Label', axis=1)
y = UAMS_demo_df['Label']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=TETS_SIZE, random_state=RANDOM_STATE, stratify=y
)

UAMS_demo_df['data_set'] = 'train'
UAMS_demo_df.loc[X_test.index, 'data_set'] = 'test'
df = UAMS_demo_df

In [22]:
def organize_UAMS_files(df, source_folder, target_base_folder):
    for data_set in ['train', 'validation']:
        for class_name in ['healthy', 'parkinson']:
            print(os.path.join(target_base_folder, data_set, class_name))
            os.makedirs(os.path.join(target_base_folder, data_set, class_name), exist_ok=True)

    print("Destination folders created successfully!")

    label_mapping = {'HC': 'healthy', 'PwPD': 'parkinson'}
    # Loop through each row of the DataFrame
    for index, row in df.iterrows():
        # Get the data set and class from the DataFrame
        data_set = row['data_set']

        # Check if the row belongs to the train or validation set
        if data_set not in ['train', 'test']:
            continue

        if data_set == 'test':
            data_set = 'validation'

        label = row['Label']
        class_name = label_mapping.get(label)
        # Construct the full original file path
        original_filename = str(row['Sample ID']) + '.jpg'
        source_path = os.path.join(source_folder, class_name, original_filename)
        # Construct the full destination file path
        destination_folder = os.path.join(target_base_folder, data_set, class_name)
        destination_path = os.path.join(destination_folder, original_filename)

        if os.path.exists(source_path):
            shutil.copy2(source_path, destination_path)
        else:
            print(f"Warning: not found: {original_filename}")

    print("File organization complete!")

source_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_mel"
target_base_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_mel_organized"
organize_UAMS_files(df, source_folder, target_base_folder)

source_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_linear"
target_base_folder = r"D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_linear_organized"
organize_UAMS_files(df, source_folder, target_base_folder)

D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_mel_organized\train\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_mel_organized\train\parkinson
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_mel_organized\validation\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_mel_organized\validation\parkinson
Destination folders created successfully!
File organization complete!
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_linear_organized\train\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_linear_organized\train\parkinson
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_linear_organized\validation\healthy
D:\Projects\Voice\PD\codes\2\data\spectrogram_data\UAMS_linear_organized\validation\parkinson
Destination folders created successfully!
File organization complete!
