> Ongoing (resting-state), eyes-closed EEG was recorded for ten minutes using the same amplifier across centers, a 128-channel Biosemi Active-two acquisition system (pin-type active, sintered Ag-AgCl electrodes). The reference electrodes were set to linked mastoids. Furthermore, external electrodes were placed in periocular locations to record blinks and eye movements. Analog filters were set at 0.03 and 100 Hz. The EEG was monitored online for detecting drowsiness, and myogenic and sweat artifacts.The EEG was processed offline using an in-house pipeline built upon pre-existing EEGLab functions85.

> Only basic steps were implemented (i.e., re-referencing, filtering, and eliminating bad channels) to allow dataset users to conduct custom analyses. The row data (_.bdf extension) was imported into EEGLab using the BDFimport plugging and processed in the _.set extension (default EEGLab extension).

> Recordings were re-referenced to the average of all channels (average reference), and band-pass filtered between 0.5 and 40 Hz using a zero-phase shift Butterworth filter of order = 8. Data were down sampled to 512 Hz, and Independent Component Analysis (ICA) was used to correct EEG artifacts induced by blinking and eye movements. Malfunctioning channels were identified using a semiautomatic detection method and replaced using weighted spherical interpolation.


# Characteristics of the dataset

Biosemi 128 channels

- 0.5-40 Hz
- 512 Hz
- ICA
- 10 minutes


# Step-by-step guide

- Add cap locations
- -divide data into equally sized epochs 30s
- compute power spectrum on the epochs
- compute `specparam` parameters on the epochs
- average the parameters across epochs
- visualize the parameters on the scalp per group
- visualize the differences between groups


In [5]:
import os
import pandas as pd
from pathlib import Path
import mne
from pathlib import Path
import matplotlib
mne.viz.set_browser_backend("matplotlib", verbose=None)

def create_hierarchical_dataframe(root_dir):
    data = []
    root = Path(root_dir)
    
    # Find all non-hidden .set and .fdt files
    files = [f for f in root.glob('**/*.*') 
             if (f.suffix in ['.set', '.fdt']) and not f.name.startswith('.')]
    
    # Process each file
    for file in files:
        # Get relative path components as a list
        rel_path = file.relative_to(root)
        path_parts = list(rel_path.parts)
        
        # Add full path as the last element
        data.append(path_parts + [str(file)])
    
    # Find max depth and create column names
    max_depth = max(len(row) - 1 for row in data) if data else 0
    folder_cols = [f'folder_{i}' for i in range(max_depth)]
    columns = folder_cols + ['filename', 'full_path']
    
    # Create DataFrame with consistent column count
    df_data = []
    for row in data:
        filename = row[-2]
        full_path = row[-1]
        folders = row[:-2]
        # Pad folders with empty strings if needed
        padded_folders = folders + [''] * (max_depth - len(folders))
        df_data.append(padded_folders + [filename, full_path])
    
    df = pd.DataFrame(df_data, columns=columns).sort_values('full_path')
    
    # Add extension column
    df['extension'] = df['filename'].apply(lambda x: os.path.splitext(x)[1])
    
    return df

# Usage
root_dir = "/Volumes/T7/BrainLat/EEG data"
df = create_hierarchical_dataframe(root_dir)
df.to_csv('eeg_file_hierarchy.csv', index=False)

# Summary stats
print(f"Total files: {len(df)}")
print(f"Set files: {sum(df['filename'].str.endswith('.set'))}")
print(f"Fdt files: {sum(df['filename'].str.endswith('.fdt'))}")
display(df.head())

Using matplotlib as 2D backend.
Total files: 256
Set files: 162
Fdt files: 94


Unnamed: 0,folder_0,folder_1,folder_2,folder_3,folder_4,filename,full_path,extension
159,1_AD,AR,sub-30001,eeg,,s6_sub-30001_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
158,1_AD,AR,sub-30002,eeg,,s6_sub-30002_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
157,1_AD,AR,sub-30004,eeg,,s6_sub-30004_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
170,1_AD,AR,sub-30008,eeg,,s6_sub-30008_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
168,1_AD,AR,sub-30009,eeg,,s6_sub-30009_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set


In [2]:
# Analysis cell - filter out folder_0 with value "4_MS"
filtered_df = (df
    .query('folder_0 != "4_MS"')
    .copy()
)

# Display filtered results
print(f"Original dataset size: {len(df)}")
print(f"Filtered dataset size: {len(filtered_df)}")
print(f"Removed {len(df) - len(filtered_df)} entries with folder_0 = '4_MS'")

# Distribution by extension in filtered dataset
extension_counts = (filtered_df
    .groupby('extension')
    .size()
    .reset_index(name='count')
    .sort_values('count', ascending=False)
)

# Distribution by top-level folders in filtered dataset
folder_counts = (filtered_df
    .groupby('folder_0')
    .size()
    .reset_index(name='count')
    .sort_values('count', ascending=False)
)

# Display summary statistics
display(extension_counts)
display(folder_counts)
display(filtered_df.head())

Original dataset size: 256
Filtered dataset size: 190
Removed 66 entries with folder_0 = '4_MS'


Unnamed: 0,extension,count
1,.set,129
0,.fdt,61


Unnamed: 0,folder_0,count
3,5_HC,78
2,3_PD,58
0,1_AD,35
1,2_bvFTD,19


Unnamed: 0,folder_0,folder_1,folder_2,folder_3,folder_4,filename,full_path,extension
159,1_AD,AR,sub-30001,eeg,,s6_sub-30001_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
158,1_AD,AR,sub-30002,eeg,,s6_sub-30002_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
157,1_AD,AR,sub-30004,eeg,,s6_sub-30004_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
170,1_AD,AR,sub-30008,eeg,,s6_sub-30008_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set
168,1_AD,AR,sub-30009,eeg,,s6_sub-30009_rs-HEP_eeg.set,/Volumes/T7/BrainLat/EEG data/1_AD/AR/sub-3000...,.set


In [3]:
# check pd_for example
pd_df = (df
    .query('folder_0 == "3_PD"')
    .copy()
)
display(pd_df.head())
# Get the first value from the full_path column

first_full_path = pd_df['full_path'].iloc[1]
print(first_full_path)

Unnamed: 0,folder_0,folder_1,folder_2,folder_3,folder_4,filename,full_path,extension
149,3_PD,AR,sub-40001,eeg,,s40001_AR_PD_reject.fdt,/Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-4000...,.fdt
150,3_PD,AR,sub-40001,eeg,,s40001_AR_PD_reject.set,/Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-4000...,.set
143,3_PD,AR,sub-40006,eeg,,s40006_AR_PD_reject.fdt,/Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-4000...,.fdt
144,3_PD,AR,sub-40006,eeg,,s40006_AR_PD_reject.set,/Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-4000...,.set
153,3_PD,AR,sub-40008,eeg,,s40008_AR_PD_reject.fdt,/Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-4000...,.fdt


/Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40001/eeg/s40001_AR_PD_reject.set


# Checking all files 


In [None]:
counter = 0
error_folders = []  # List to store folder_2 values that cause errors
known_errors = ['3_PD_CL_sub-40005', '3_PD_CL_sub-40007', '3_PD_CL_sub-40008', '3_PD_CL_sub-40010', '3_PD_CL_sub-40012', '3_PD_CL_sub-40016', '3_PD_CL_sub-40017', '3_PD_CL_sub-40018' , '3_PD_CL_sub-40019', '3_PD_CL_sub-40020', '3_PD_CL_sub-40021', '3_PD_CL_sub-40022', '3_PD_CL_sub-40023', '3_PD_CL_sub-40024', '3_PD_CL_sub-40025', '3_PD_CL_sub-40026', ]  # List of known errors to ignore
missing_data = []

filtered_df_subset  = filtered_df[['folder_0','full_path', 'folder_1', 'folder_2', 'extension']].query('extension == ".set"')
filtered_df_subset['file_loadable'] = False  # Initialize all values to False
filtered_df_subset['id'] = filtered_df_subset['folder_0'] + '_' + filtered_df_subset['folder_1'] + '_' + filtered_df_subset['folder_2']

for index, row in filtered_df_subset.iterrows():
    if row['folder_0'] == "3_PD" and row['id'] not in known_errors:
            try:
                mne.io.read_epochs_eeglab(row['full_path'])
                filtered_df_subset.loc[index, 'file_loadable'] = True
                counter += 1  # Increase the counter by 1
            except Exception as e:
                print(f"Error reading {row['full_path']}: {e}")
                error_folders.append(row['id'])
            
    else:
        try:
            raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)
            filtered_df_subset.loc[index, 'file_loadable'] = True
            counter += 1  # Increase the counter by 1
        except Exception as e:
                print(f"Error reading {row['full_path']}: {e}")
                missing_data.append(row['id'])



print(counter)  


Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40001/eeg/s40001_AR_PD_reject.set...
Not setting metadata
360 matching events found
No baseline correction applied
0 projection items activated
Ready.
Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40006/eeg/s40006_AR_PD_reject.set...


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


Not setting metadata
380 matching events found
No baseline correction applied
0 projection items activated
Ready.
Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40008/eeg/s40008_AR_PD_reject.set...
Not setting metadata
369 matching events found


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


No baseline correction applied
0 projection items activated
Ready.
Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40010/eeg/s40010_AR_PD_reject.set...
Not setting metadata
386 matching events found
No baseline correction applied
0 projection items activated
Ready.
Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40011/eeg/s40011_AR_PD_reject.set...


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


Not setting metadata
294 matching events found
No baseline correction applied
0 projection items activated
Ready.
Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40012/eeg/s40012_AR_PD_reject.set...


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


Not setting metadata
401 matching events found
No baseline correction applied
0 projection items activated
Ready.
Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/AR/sub-40013/eeg/s40013_AR_PD_reject.set...
Not setting metadata
422 matching events found
No baseline correction applied
0 projection items activated
Ready.


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40001/eeg/s40001_CH_PD_reject.set...
Not setting metadata


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


579 matching events found
No baseline correction applied
0 projection items activated
Ready.
Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40004/eeg/s40004_CH_PD_reject.set...
Not setting metadata
516 matching events found


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


No baseline correction applied
0 projection items activated
Ready.
Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40005/eeg/sub-40005_rs_eeg.fdt
Reading 0 ... 263167  =      0.000 ...   513.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40006/eeg/s40006_CH_PD_reject.set...


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


Not setting metadata
585 matching events found
No baseline correction applied
0 projection items activated
Ready.
Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40007/eeg/sub-40007_rs_eeg.fdt
Reading 0 ... 414719  =      0.000 ...   809.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40008/eeg/sub-40008_rs_eeg.fdt
Reading 0 ... 236031  =      0.000 ...   460.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40009/eeg/s40009_CH_PD_reject.set...


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


Not setting metadata
609 matching events found
No baseline correction applied
0 projection items activated
Ready.
Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40010/eeg/sub-40010_rs_eeg.fdt
Reading 0 ... 351231  =      0.000 ...   685.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40011/eeg/s40011_CH_PD_reject.set...
Not setting metadata
566 matching events found


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


No baseline correction applied
0 projection items activated
Ready.
Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40012/eeg/sub-40012_rs_eeg.fdt
Reading 0 ... 460799  =      0.000 ...   899.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Extracting parameters from /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40015/eeg/s40015_CH_PD_reject.set...
Not setting metadata
589 matching events found


  mne.io.read_epochs_eeglab(row['full_path'])
  mne.io.read_epochs_eeglab(row['full_path'])


No baseline correction applied
0 projection items activated
Ready.
Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40016/eeg/sub-40016_rs_eeg.fdt
Reading 0 ... 374783  =      0.000 ...   731.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40017/eeg/sub-40017_rs_eeg.fdt
Reading 0 ... 247807  =      0.000 ...   483.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40018/eeg/sub-40018_rs_eeg.fdt
Reading 0 ... 425983  =      0.000 ...   831.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40019/eeg/sub-40019_rs_eeg.fdt
Reading 0 ... 394751  =      0.000 ...   770.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40020/eeg/sub-40020_rs_eeg.fdt
Reading 0 ... 247807  =      0.000 ...   483.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40021/eeg/sub-40021_rs_eeg.fdt
Reading 0 ... 553983  =      0.000 ...  1081.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40022/eeg/sub-40022_rs_eeg.fdt
Reading 0 ... 333311  =      0.000 ...   650.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40023/eeg/sub-40023_rs_eeg.fdt
Reading 0 ... 231423  =      0.000 ...   451.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40024/eeg/sub-40024_rs_eeg.fdt
Reading 0 ... 449023  =      0.000 ...   876.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40025/eeg/sub-40025_rs_eeg.fdt
Reading 0 ... 332799  =      0.000 ...   649.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/3_PD/CL/sub-40026/eeg/sub-40026_rs_eeg.fdt
Reading 0 ... 233471  =      0.000 ...   455.998 secs...


  raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)


Reading /Volumes/T7/BrainLat/EEG data/5_HC/AR/sub-100012/eeg/s6_sub-100012_rs_eeg.fdt
Reading 0 ... 176941  =      0.000 ...   345.588 secs...
Reading /Volumes/T7/BrainLat/EEG data/5_HC/AR/sub-100015/eeg/s6_sub-100015_rs_eeg.fdt
Reading 0 ... 209992  =      0.000 ...   410.141 secs...
Reading /Volumes/T7/BrainLat/EEG data/5_HC/AR/sub-100018/eeg/s6_sub-100018_rs_eeg.fdt
Reading 0 ... 185470  =      0.000 ...   362.246 secs...
Reading /Volumes/T7/BrainLat/EEG data/5_HC/AR/sub-10002/eeg/s6_sub-10002_rs_eeg.fdt
Reading 0 ... 186584  =      0.000 ...   364.422 secs...
Reading /Volumes/T7/BrainLat/EEG data/5_HC/AR/sub-100020/eeg/s6_sub-100020_rs_eeg.fdt
Reading 0 ... 168239  =      0.000 ...   328.592 secs...
Reading /Volumes/T7/BrainLat/EEG data/5_HC/AR/sub-100022/eeg/s6_sub-100022_rs_eeg.fdt
Reading 0 ... 228144  =      0.000 ...   445.594 secs...
Reading /Volumes/T7/BrainLat/EEG data/5_HC/AR/sub-100024/eeg/s6_sub-100024_rs_eeg.fdt
Reading 0 ... 253980  =      0.000 ...   496.055 secs...
R

In [19]:
print(missing_data)
print(counter)  # Prints: 
print(error_folders )
print(known_errors)  # Prints: 1
filtered_df_subset.to_csv('available_eeg.csv', index=False)


['5_HC_CL_sub-100013', '5_HC_CL_sub-100019', '5_HC_CL_sub-100023', '5_HC_CL_sub-100025', '5_HC_CL_sub-100027', '5_HC_CL_sub-100032', '5_HC_CL_sub-100036', '5_HC_CL_sub-100039', '5_HC_CL_sub-100040', '5_HC_CL_sub-100041', '5_HC_CL_sub-100042', '5_HC_CL_sub-100044', '5_HC_CL_sub-100045', '5_HC_CL_sub-100046', 'sub-100013', 'sub-100019', 'sub-100023', 'sub-100025', 'sub-100027', 'sub-100032', 'sub-100036', 'sub-100039', 'sub-100040', 'sub-100041', 'sub-100042', 'sub-100044', 'sub-100045', 'sub-100046']
115
[]
['3_PD_CL_sub-40005', '3_PD_CL_sub-40007', '3_PD_CL_sub-40008', '3_PD_CL_sub-40010', '3_PD_CL_sub-40012', '3_PD_CL_sub-40016', '3_PD_CL_sub-40017', '3_PD_CL_sub-40018', '3_PD_CL_sub-40019', '3_PD_CL_sub-40020', '3_PD_CL_sub-40021', '3_PD_CL_sub-40022', '3_PD_CL_sub-40023', '3_PD_CL_sub-40024', '3_PD_CL_sub-40025', '3_PD_CL_sub-40026']


In [None]:
# Group by folder_0 and folder_1 and count loadable files
loadable_summary = filtered_df_subset.groupby(['folder_0', 'folder_1']).agg({
    'file_loadable': ['count', 'sum', lambda x: round(x.mean() * 100, 2)]
}).reset_index()

# Rename the columns for clarity
loadable_summary.columns = ['folder_0', 'folder_1', 'total_files', 'loadable_files', 'loadable_percentage']

# Sort by folder_0 and folder_1
loadable_summary = loadable_summary.sort_values(['folder_0', 'folder_1'])

markdown_table = loadable_summary.to_markdown(index=False)

# Print the markdown table
print(markdown_table)


| folder_0   | folder_1   |   total_files |   loadable_files |   loadable_percentage |
|:-----------|:-----------|--------------:|-----------------:|----------------------:|
| 1_AD       | AR         |            16 |               16 |                100    |
| 1_AD       | CL         |            19 |               19 |                100    |
| 2_bvFTD    | AR         |            13 |               13 |                100    |
| 2_bvFTD    | CL         |             6 |                6 |                100    |
| 3_PD       | AR         |             7 |                7 |                100    |
| 3_PD       | CL         |            22 |               22 |                100    |
| 5_HC       | AR         |            19 |               19 |                100    |
| 5_HC       | CL         |            27 |               13 |                 48.15 |


# Load problematic files


In [18]:
bad_subset = filtered_df_subset.query('file_loadable == False')

for index, row in bad_subset.iterrows():
        try:
            raw = mne.io.read_raw_eeglab(row['full_path'], preload=True)
            #filtered_df_subset.loc[index, 'file_loadable'] = True
            counter += 1  # Increase the counter by 1
        except Exception as e:
            print(f"Error reading {row['full_path']}: {e}")
            missing_data.append(row['folder_2'])

Reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100013/eeg/s6_sub-100013_rs_eeg.fdt
Error reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100013/eeg/s6_sub-100013_rs_eeg.set: File /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100013/eeg/s6_sub-100013_rs_eeg.fdt not found.
Reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100019/eeg/s6_sub-100019_rs_eeg.fdt
Error reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100019/eeg/s6_sub-100019_rs_eeg.set: File /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100019/eeg/s6_sub-100019_rs_eeg.fdt not found.
Reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100023/eeg/s6_sub-100023_rs_eeg.fdt
Error reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100023/eeg/s6_sub-100023_rs_eeg.set: File /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100023/eeg/s6_sub-100023_rs_eeg.fdt not found.
Reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100025/eeg/s6_sub-100025_rs_eeg.fdt
Error reading /Volumes/T7/BrainLat/EEG data/5_HC/CL/sub-100025/eeg/s6_sub-100025_rs_eeg.set: Fi

In [40]:
# combine the two dataframes
df_metadata = pd.read_csv('brainlat_metadata.csv').query('condition!= "4_MS"')#.join(filtered_df_subset.set_index('full_path'), on='full_path', how='left')
display(df_metadata.head())
df_eeg = pd.read_csv('available_eeg.csv')

df_eeg = df_eeg.rename(columns={'folder_2': 'id_EEG',  'folder_0': 'condition',
    'folder_1': 'country' })[["condition", "country", "id_EEG",  "file_loadable"]]
df_eeg.head()
df_combined = pd.merge(
    df_metadata,
    df_eeg,
    on=['condition', 'country', 'id_EEG'],
    how='left'
)
display(df_combined.head())
df_combined.to_csv('brainlat_metadata_eeg_info.csv', index=False)


Unnamed: 0,condition,country,path,id_EEG,diagnosis,eeg,sex,Age,years_education,laterality,...,ifs_motor_series,ifs_conflicting_instructions,ifs_motor_inhibition,ifs_digits,ifs_months,ifs_visual_wm,ifs_proverb,ifs_verbal_inhibition,mini_sea_fer,mini_sea_tom
0,1_AD,AR,1_AD/AR,sub-30001,AD,1.0,1,81,12,1.0,...,0.0,3.0,0.0,3.0,0.0,1.0,1.0,2.0,8.6,14.3
1,1_AD,AR,1_AD/AR,sub-30002,AD,1.0,1,79,9,1.0,...,0.0,3.0,0.0,3.0,2.0,1.0,0.0,3.0,11.6,11.6
2,1_AD,AR,1_AD/AR,sub-30004,AD,1.0,1,70,9,1.0,...,3.0,3.0,0.0,3.0,2.0,1.0,0.0,4.0,13.7,12.0
3,1_AD,AR,1_AD/AR,sub-30008,AD,1.0,1,80,2,1.0,...,3.0,3.0,0.0,3.0,2.0,3.0,2.0,1.0,9.0,8.3
4,1_AD,AR,1_AD/AR,sub-30009,AD,1.0,0,82,7,1.0,...,3.0,2.0,1.0,2.0,0.0,2.0,3.0,4.0,12.4,15.0


Unnamed: 0,condition,country,path,id_EEG,diagnosis,eeg,sex,Age,years_education,laterality,...,ifs_conflicting_instructions,ifs_motor_inhibition,ifs_digits,ifs_months,ifs_visual_wm,ifs_proverb,ifs_verbal_inhibition,mini_sea_fer,mini_sea_tom,file_loadable
0,1_AD,AR,1_AD/AR,sub-30001,AD,1.0,1,81,12,1.0,...,3.0,0.0,3.0,0.0,1.0,1.0,2.0,8.6,14.3,True
1,1_AD,AR,1_AD/AR,sub-30002,AD,1.0,1,79,9,1.0,...,3.0,0.0,3.0,2.0,1.0,0.0,3.0,11.6,11.6,True
2,1_AD,AR,1_AD/AR,sub-30004,AD,1.0,1,70,9,1.0,...,3.0,0.0,3.0,2.0,1.0,0.0,4.0,13.7,12.0,True
3,1_AD,AR,1_AD/AR,sub-30008,AD,1.0,1,80,2,1.0,...,3.0,0.0,3.0,2.0,3.0,2.0,1.0,9.0,8.3,True
4,1_AD,AR,1_AD/AR,sub-30009,AD,1.0,0,82,7,1.0,...,2.0,1.0,2.0,0.0,2.0,3.0,4.0,12.4,15.0,True


In [46]:
non_boolean_loadable = df_combined[~df_combined['file_loadable'].isin([True, False])]
display(non_boolean_loadable.head())
non_boolean_loadable[['condition', 'country', 'id_EEG', 'file_loadable']].to_markdown(index=False) 

Unnamed: 0,condition,country,path,id_EEG,diagnosis,eeg,sex,Age,years_education,laterality,...,ifs_conflicting_instructions,ifs_motor_inhibition,ifs_digits,ifs_months,ifs_visual_wm,ifs_proverb,ifs_verbal_inhibition,mini_sea_fer,mini_sea_tom,file_loadable
51,2_bvFTD,CL,2_bVFTD/CL,sub-20009,FTD,1.0,1,57,14,1.0,...,3.0,3.0,2.0,2.0,0.0,2.0,0.0,4.0,6.9,
52,2_bvFTD,CL,2_bVFTD/CL,sub-20009,FTD,1.0,1,57,14,1.0,...,1.0,2.0,2.0,3.0,3.0,1.0,2.0,0.0,1.0,
58,2_bvFTD,CL,2_bVFTD/CL,sub-20009,FTD,1.0,1,65,16,1.0,...,3.0,3.0,2.0,2.0,0.0,2.0,0.0,4.0,6.9,
59,2_bvFTD,CL,2_bVFTD/CL,sub-20009,FTD,1.0,1,65,16,1.0,...,1.0,2.0,2.0,3.0,3.0,1.0,2.0,0.0,1.0,


'| condition   | country   | id_EEG    |   file_loadable |\n|:------------|:----------|:----------|----------------:|\n| 2_bvFTD     | CL        | sub-20009 |             nan |\n| 2_bvFTD     | CL        | sub-20009 |             nan |\n| 2_bvFTD     | CL        | sub-20009 |             nan |\n| 2_bvFTD     | CL        | sub-20009 |             nan |'