# Inspect export parameters from .edf files of an EEG database 
This notebook allows you to inspect the parameters of your .edf EEG database. \
It does not load and inspect the quality of your data. \
It helps you identify if you need to re-export some participant's data (due to poor signal resolution or clipping, as explained later). \
It provides information that helps analyze your dataset (such as the different channel labels, sampling frequency, and filters). 

---
EDF (European Data Format) is a standard format for storing multichannel biological and physical signals: https://www.edfplus.info/. \
It was first published in 1992, and an upgraded version was released in 2003 (adding discontinuous recordings handling, annotations, stimuli, and events in a UTF-8 format). \
It is a compressed 16-bit format, meaning that each measured data point can take 2¹⁶ values between the minimum and maximum values that you set at exportation. \
The min/max values correspond to the dynamic range of your data.

---
At the exportation in the .edf format (with software like Profusion from Compumedics), many parameters need to be set per recorded channels, such as channels' label, sampling frequency, filtering, units, and dynamic range. \
Exporting as .edf format is tedious and time-consuming, so mistakes in parameters can easily be made.  \
To avoid making mistakes, there is the possibility of implementing montage within some software that will apply your  pre-defined parameters directly **(insert a link on how to make a routine in Compumedics).** \
Inspecting those parameters can also be necessary if you work on an already existing dataset, to make sure that every participant's data is exploitable, specifically if the data comes from multiple sleep clinics. 

---
This notebook reads information from the .edf files, instead of loading the data and relying on existing packages. \
Existing packages, such as PYEDFLIB or MNE, are either too rigid (not able to read some data) or do not return all the information (such as boundaries of dynamic range, filtering parameters, etc).\
\
**This notebook is intended to stand alone, you just have to run it with the package voila.\
In your terminal, run `voila inspect_edf_voila` to start the notebook.\
Then, you will just have to select your database folder and read the outputs.**\
_The notebook will save summary tables as .tsv files (file format easy to read with Excel and to import in Python script) so that you can visually inspect (or reload later) if needed._\
_Those summaries will be stored in a summary folder within the study folder._\
\
This notebook is organized into 4 sections:
1. Select your study folder, extract the data's information, and return general information
2. Inspect EEG channels
3. Inspect EOG channels
4. Inspect ECG channels

_A Jupyter notebook version exists in order to easily interact with the code_

---
This notebook was developed on the ICEBERG database and tested on APOMORPHEE (from Noémie's internship).  \
last update 16/09/2025, YN

<hr style="height:4px; background-color:black; border:none;">
First, let's check if you have already installed the required packages:

In [None]:
#%% Import packages and define custom functions
# Import cell
try:
    import os
    import re
    import chardet
    import warnings
    import traceback
    import numpy as np
    import pandas as pd
    from pathlib import Path
    import ipywidgets as widgets
    from ipyfilechooser import FileChooser
    from IPython.display import display, HTML
except ImportError as e:
    print("⚠️ Error: ", e)
else:
    print("✅ Packages and functions successfully imported!")

# custom function to detect automatically and return the encoding of edf file
def detect_encoding(byte_string, min_confidence=0.6):
    result = chardet.detect(byte_string)
    encoding = result['encoding']
    confidence = result['confidence']
    if encoding is None or confidence < min_confidence:
        raise UnicodeDecodeError("chardet", byte_string, 0, len(byte_string),
                                 f"\tUnable to reliably detect encoding. Detected: {encoding} with confidence {confidence}")
    return encoding

# custom function to read information from EDF headers, without using the pyedflib package (that was too strict for ICEBERG)
# EDF file should follow a strict format, dedicating a specific number of octets for each type of information.
# it means that we can read the info octet by octet by specifying the number of octets we expect for the next variable (that is known from the EDF norm)
def read_edf_header_custom(file_path):
    with open(file_path, 'rb') as f: # open the file in binary mode, to read octet by octet. 
        header = {}
        # detect encoding
        raw_header = f.read(256)
        encoding = detect_encoding(raw_header)
        # print(f"\tDetected encoding for {file_path} : {encoding}")
        # Rewind to the beginning of the file
        f.seek(0)
        
        # the first 256 octets are global subject info
        header['version'] = f.read(8).decode(encoding).strip()
        header['patient_id'] = f.read(80).decode(encoding).strip()
        header['recording_id'] = f.read(80).decode(encoding).strip()
        header['start_date'] = f.read(8).decode(encoding).strip()
        header['start_time'] = f.read(8).decode(encoding).strip()
        header['header_bytes'] = int(f.read(8).decode(encoding).strip())
        header['reserved'] = f.read(44).decode(encoding).strip()
        header['n_data_records'] = int(f.read(8).decode(encoding).strip())
        header['duration_data_record'] = float(f.read(8).decode(encoding).strip())
        header['n_channels'] = int(f.read(4).decode(encoding).strip())
        
        # get info per channel
        n = header['n_channels']
        channel_fields = {
            'channel': [],
            'transducer_type': [],
            'dimension': [],
            'physical_min': [],
            'physical_max': [],
            'digital_min': [],
            'digital_max': [],
            'prefiltering': [],
            'sampling_frequency': [],
            'reserved': [],
        }

        for key in channel_fields:
            length = {
                'channel': 16,
                'transducer_type': 80,
                'dimension': 8,
                'physical_min': 8,
                'physical_max': 8,
                'digital_min': 8,
                'digital_max': 8,
                'prefiltering': 80,
                'sampling_frequency': 8,
                'reserved': 32,
            }[key]
            channel_fields[key] = [f.read(length).decode(encoding).strip() for _ in range(n)]

        header.update(channel_fields)
    
    return header

# function to extract filter information from the string in headers
def extract_filter_value(s, tag):
    if pd.isna(s):
        return None
    match = re.search(rf'{tag}[:\s]*([\d\.]+)\s*', s, re.IGNORECASE)
    return float(match.group(1)) if match else None

# custom function to get the sampling frequency out of a dataframe (the df needs to have 'subject' and 'channel' as columns)
def get_sf(df, subject, channel):
    df_sf = df[(df['subject'] == subject) & (df['channel'] == channel)]
    if not df_sf.empty:
        return df_sf.iloc[0]['sampling_frequency']
    else:
        return None

# function to create a widget slider to select the configuration to inspect
def mk_config_slider(value = 1, min = 1, max = 5):
    config_slider = widgets.IntSlider(
    value=value,
    min=min,
    max=max,
    step=1,
    description='Selected configuration:',
    style={'description_width': '150px'},   # increase description width (to adjust based on the description)
    layout=widgets.Layout(width='400px'),   # to adjust widget size
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
    )
    return config_slider

# function to print the configuration of a dataset parameter
def print_config(i, config_dict, param):
    # get the key and value from the dictionary
    idx = i - 1
    # get participant ID
    value = list(config_dict.values())  
    v = value[idx]  
    # get configuration
    key = list(config_dict.keys())
    k = key[idx]
    
    # print info
    print(f'Selected configuration: # {i}')
    print(f'\t{len(k)} {param}: {k}')
    print(f'\t{len(v)} participants: {v}')

# function to create a scrollable box for long output (e.g., cell loading the data) 
def print_in_scrollable_box(text, height=300, font_size="12px"):
    display(HTML(f'<pre style="overflow-y:scroll; height:{height}px; border:1px solid black; padding:10px; font-size:{font_size};">{text}</pre>'))


#%% select data folder (the rest of the code will be encapsulated in the folder selection so that voila waits for the selection to run the rest of cells)
# folder selector
chooser = FileChooser(os.getcwd())
chooser.title = "<b>Choose your study folder</b>"
chooser.show_only_dirs = True

# validation button
run_button = widgets.Button(description="Start the inspection", button_style='success')

# result display area
out = widgets.Output()

section1 = widgets.HTML("""
<hr style="height:4px; background-color:black; border:none;">
<h2>1. Select your data folder</h2>
""")
display(section1)

# main function to run the inspection when a folder have been selected
def run_inspection(_):
    out.clear_output()
    with out:
        chooser.folder_path = chooser.selected_path
        print("📁 Selected Path:", chooser.folder_path)
        
        # get the edf file list 
        chooser.edf_files = [
            f for f in Path(chooser.folder_path).rglob('*.edf')
            if not f.name.startswith('._') # don't select files starting with ._ (that can be found in mac for example)
            ]
        if not chooser.edf_files:
            print(f"⚠️ There is no .edf file in your folder")
        else:
            print(f"\nThere is {len(chooser.edf_files)} .edf files in your folder!")
        
        # check the existence and/or create the summary folder that will receive the summary tables and the report
        chooser.summary_path = f'{chooser.folder_path}/summary'
        if not os.path.exists(chooser.summary_path):
            os.makedirs(chooser.summary_path)
            print("\nCreated summary folder at: " + chooser.summary_path)
        else:
            print("\nSummary folder already exists. \nPrevious summary tables (if any) will be overwritten at: \n" + chooser.summary_path)

        # get variables from the chooser widget
        folder_path = chooser.folder_path
        summary_path = chooser.summary_path
        edf_files = chooser.edf_files

        # check if there is a participants.tsv file to get different groups or sessions
        # if there is not a participants.tsv we will try to infer groups from subfolder organization or filename components (additional part from subject number)
        # in ICEBERG, subfolders define groups within the data folder
        # in APOMORPHEE, suffixes define nights ("session") 
        table_found = False
        for root, dirs, files in os.walk(folder_path):
            if 'participants.tsv' in files:
                table_found = True        
                print(f"Table containing participants information found at: {os.path.join(root, 'participants.tsv')} ")
                subj_table_path = os.path.join(root, 'participants.tsv')
        
        found_group = False
        if table_found:
            subj_table = pd.read_csv(subj_table_path, sep = '\t', dtype={'participant_id': str, 'group': str})
            if "group" in subj_table.columns:
                found_group = True
                print(f"We will extract participant's group from it")
            else:
                print(f"No column 'group' was found in the table")
                print(f"Group will be inferred from subfolder organization or subfolder component") 
            
        else:
            print(f"No table containing participants information (labelled 'participants.tsv') was found")
            print(f"If you have a table, please rename it 'participants.tsv' (and make sure you have columns labelled 'participant_id' and 'group' if any)")
            print(f"In the meantime, we will infer participant's group from subfolder organization or filename component")
            subj_table = pd.DataFrame()

        # initialyse list of dataframe to store file info, that will be concatenated at the end (this is better for performance)
        df_list = []
        # Initialize an empty list for files that could not be read
        failed_list = []
        # initialize output for a dynamic display (with a scroll box)
        output = ""
        dynamic_out = widgets.Output()
        display(dynamic_out)

        # Loop over the edf file list to extract parameters from each participant
        for e, edf_path in enumerate(edf_files):
            with dynamic_out:
                output += (f'file {e+1}/{len(edf_files)}, currently opening file: {edf_path}\n')
                dynamic_out.clear_output(wait=True)
                print_in_scrollable_box(output, font_size = "12px")
                
                # read file with the custom function
                try:
                    edf_header = read_edf_header_custom(edf_path) 
                    
                    # get subject name (corresponding to file_name)
                    sub_name = edf_path.stem
                    
                    # get subject group (from the parent folder because in the ICEBERG database subfolders were created per patient group)
                    sub_folder = edf_path.parent.name # get the parent folder of the subject file (path)
                    
                    # create df from signal info
                    df = pd.DataFrame(edf_header)
                        
                    # theoretical resolution (edf are 16bit files so the eeg signal can take 2^16 values within the dynamic range)
                    df['res_theoretical'] = (abs(pd.to_numeric(df['physical_min']))+abs(pd.to_numeric(df['physical_max'])))/pow(2,16)
                    # turn theoretical resolution to uV if dimension is mV (if no dimension, it is a mess)
                    df.loc[df['dimension'].str.contains('mv', case=False, na=False), 'res_theoretical'] *= 1000
                    
                    # get filtering info in different columns
                    df['lowpass']   = df['prefiltering'].apply(lambda x: extract_filter_value(x, 'LP'))
                    df['highpass']  = df['prefiltering'].apply(lambda x: extract_filter_value(x, 'HP'))
                    df['notch']  = df['prefiltering'].apply(lambda x: extract_filter_value(x, 'NOTCH'))
                    
                    # add subject info in the dataframe
                    df['subject'] = sub_name
                    df['sub_folder'] = sub_folder
                    df['group'] = np.nan # initialyze column 'group' with NaN
                    # get group from participants table if any (else group will be inferred from subfolder or filename extension later)
                    if found_group:
                        df['group'] = subj_table.loc[subj_table['participant_id'] == sub_name, 'group'].iloc[0]
        
                    # extract filename component before and after subject number (so we assume subject name contains at least incrementing numbers that are at the beginning of the file name)  
                    #   ^       → start of string  
                    # (.*?)     → group 1: as few chars as possible, up to the first digit  
                    # (\d+)     → group 2: the number itself  
                    # (.*)      → group 3: the rest of the string  
                    # $         → end of string
                    pre_comp = sub_num = post_comp = np.nan
                    pattern = re.compile(r'^(.*?)(\d+)(.*)$')
                    m = pattern.match(sub_name)
                    if m:
                        pre_comp = m.group(1) or np.nan
                        sub_num = m.group(2) or np.nan
                        post_comp = m.group(3) or np.nan
                    df['pre_fn_comp'] = pre_comp
                    df['post_fn_comp'] = post_comp
                    df['sub_num'] = sub_num
                    
                    df['path'] = str(edf_path)
                    df['session'] = np.nan # session will be inferred later from file name component
                    
                    # select only the columns of interest
                    df = df[['subject', 'group', 'session', 'path', 'sub_folder', 'sub_num', 'pre_fn_comp', 'post_fn_comp', 'channel', 'transducer_type', 'dimension', 'sampling_frequency', 
                         'highpass', 'lowpass', 'notch', 'physical_min', 'physical_max', 'res_theoretical']]
                    
                    # store subject data
                    df_list.append(df)
            
                except UnicodeDecodeError as e:
                    err = f"⚠️ Encoding problem for {edf_path}\n"
                    output += err
                    clear_output(wait=True)
                    print_in_scrollable_box(output, font_size="12px")
                    failed_list.append((edf_path, 'encoding'))
                except Exception as e:
                    # tb = traceback.format_exc()
                    err = f"❌ Unexpected problem for {edf_path} : {e}\n"
                    output += err
                    clear_output(wait=True)
                    print_in_scrollable_box(output, font_size="12px")
                    failed_list.append((edf_path, 'other'))
           
        # concatenate dataframe into one and only
        with warnings.catch_warnings(): # this is to skip a warning not affecting our operation
            warnings.simplefilter("ignore", FutureWarning)
            df_full = pd.concat(df_list, ignore_index=True)
        
        # save the failed list if not empty:
        failed_df = pd.DataFrame(failed_list)
        if not failed_df.empty:
            failed_df.to_csv(f'{summary_path}/failed_edf_read.tsv', sep = '\t')
            print(f'\nSaving the list of files that could not be read to: \n{summary_path}/failed_edf_read.tsv')    
        #____________________________________________________________________________________________
        
        #%% 1.2 General info of the dataset___________________________________________________________
        section12 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3> Dataset general information (# participants, groups, recorded sensors)</h3>
        """)
        display(section12)
        
        # get group information, from participants.tsv file, sub_folder, or filename component
        print("Get group information:")
        if df_full['group'].isna().all():
            print("The column 'group' is empty, we will infer group from subfolder, if any...")
            if len(df_full['sub_folder'].unique()) > 1:
                df_full['group'] = df_full['sub_folder']
                print(">>> Group inferred from folders within the database <<<")
            else:
                print("There is no distinct folders for groups.")
                print("Trying to infer group from filename component...")
                # looping across subject number (and not subject filename) to test if there are multiple filename components per subject (to disentangle groups from session)  
                count_precomp = np.zeros(len(df_full['sub_num'].unique()))
                count_postcomp = np.zeros(len(df_full['sub_num'].unique()))
                for sn, sub_num in enumerate(df_full['sub_num'].unique()):
                    df_sub = df_full[df_full['sub_num'] == sub_num]
                    count_precomp[sn] = len(df_sub['pre_fn_comp'].unique())
                    count_postcomp[sn] = len(df_sub['post_fn_comp'].unique())
                # fn component is a group if within subject there is only one component, but there are multiple components between subject
                # 1st, try for component before the subject number, 2nd try for component after the subject number 
                if len(df_full['pre_fn_comp'].unique()) > 1 and count_precomp.mean() == 1:
                    df_full['group'] = df_full['pre_fn_comp']
                    print(">>> Group inferred from filename component (before subject number) <<<")
                elif len(df_full['post_fn_comp'].unique()) > 1 and count_postcomp.mean() == 1:
                    df_full['group'] = df_full['post_fn_comp']
                    print(">>> Group inferred from filename component (after subject number) <<<")
                else:
                    print("Did not succeed to identify group from filename component.")
                    print("It seems that there is only one group in the study!")
        else:
            print(">>> Group information coming from participants.tsv <<<")
        
        print("\nGet session information")
        if len(df_full['pre_fn_comp'].unique()) > 1 and count_precomp.mean() > 1:
            print(">>> Session inferred from filename component (before subject number) <<<")
            df_full['session'] = df_full['pre_fn_comp']
        elif len(df_full['post_fn_comp'].unique()) > 1 and count_postcomp.mean() > 1:
            print(">>> Session inferred from filename component (after subject number) <<<")
            df_full['session'] = df_full['post_fn_comp']
        else:
            print("It seems that there is only one session in the study")
        
        # save summary table containing full info
        df_full.to_csv(f'{summary_path}/FULL_summary_table_edf.tsv', sep = '\t')
        print(f'\nSaving full informations from the dataset to:\n{summary_path}/FULL_summary_table_edf.tsv')
        
        print("\n\nDataset information:")
        print(f"- Number of files: {len(df_full['subject'].unique())}")
        print(f"- Number of participants: {len(df_full['sub_num'].unique())}")
        print(f"- Number of groups: {len(df_full['group'].unique())}")
        print(f"- Number of sessions: {len(df_full['session'].unique())}")
        
        if len(df_full['group'].unique()) > 1:
            print("\nParticipants per groups:")
            print(df_full.drop_duplicates().groupby('group').agg(n_subjects=('subject', 'nunique')))
    
        print('\nFull recorded sensors configuration of your database (across participants): ')
        ch_output = '\n'.join(df_full['channel'].unique())
        print_in_scrollable_box(ch_output, height = 150)
        #____________________________________________________________________________________________

        #%% 2. Select only the EEGs__________________________________________________________________
        section2 = widgets.HTML("""
        <hr style="height:4px; background-color:black; border:none;">
        <h2>2. Inspect EEG</h2>
        """)
        display(section2)
        
        # select only EEG channels and return a warning if the number of participant is smaller/higher
        mask_ch = df_full['transducer_type'].str.contains(r'EEG|AGAGCL ELECTRODE', case = False, na=False) # create a mask that returns true for lines containing either EEG/AGAGCL ELECTRODE in the transducer_type column
        df_ch = df_full[mask_ch]
        # remove the emg channels that were captured with the AGAGCL ELECTRODE transducer type 
        df_ch = df_ch[~df_ch['channel'].str.contains(r'emg|ecg|eog', case=False, na=False)] # the ~ allows to not select the selection (like ! in matlab)
        
        # Check if the number of participants with only EEG is the same as df_full. 
        # If not, it might be because the transducer type was no correctly detected. 
        # One possibility is to add the type of transducer to the condition line 2 of this cell.
        if len(df_full['subject'].unique()) > len(df_ch['subject'].unique()):
            # identify missing subjects
            missing_sub = set(df_full['subject'].unique()) - set(df_ch['subject'].unique())
            print('\n!!! There is less participants in the dataset with only EEGs !!!')
            print(f'Missing participants: {missing_sub}')
            print("\nEither these participants don't have EEGs.")
            print("Or the transducer type was not correctly detected.")
            # get df of missing sub to save and inspect
            df_miss = df_full[df_full['subject'].isin(missing_sub)]
            df_miss.to_csv(f'{summary_path}/EEG_missing_edf.tsv', sep = '\t')
            print(f'\nSaving informations from missing participants to:\n{summary_path}/EEG_missing_edf.tsv')
            print('Please inspect the file, and specifically the column transducer_type')
        elif len(df_full['subject'].unique()) < len(df_ch['subject'].unique()):
            print('\n!!! There is more participants in the dataset with only EEGs !!!')
            print('This should not be the case.')
            print('Please inspect what is happening in a code editor (spyder..), or ask Yvan.')
            more_sub = set(df_ch['subject'].unique()) - set(df_full['subject'].unique())
            df_more = df_ch[df_ch['subject'].isin(more_sub)]
            df_more.to_csv(f'{summary_path}/EEG_suspect_edf.tsv', sep = '\t')
            print(f'\nSaving informations from suspect participants to:\n{summary_path}/EEG_suspect_edf.tsv')
        
        # saving info from eeg
        df_ch.to_csv(f'{summary_path}/EEG_summary_table.tsv', sep = '\t')
        print(f'\nSaving informations from EEGs to:\n{summary_path}/EEG_summary_table.tsv')
        #____________________________________________________________________________________________

        #%% 2.1 Inspect EEG configurations___________________________________________________________
        section21 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>2.1 Inspect EEG configurations</h3>
        <p>By EEG configurations, here, we mean the number of EEG channels and their labeling.
        <br>In practice, most EEG studies rely on a single configuration, since all data are typically recorded at the same location, with the same system, and by the same experimenter.
        <br>In multicentric datasets, there will likely be many EEG configurations (specific to each recording center).</p>
        <p>Knowing your channels' configurations will allow you to select the subset of channels for your analyses (and later re-harmonize the channel labeling if needed).</p>
        <p>In classic polysomnographic EEG, there should be at least 4 EEG (F3, C3, O1 and A2 used as the reference) (or less frequently F4, C4, O2 and A1).</p>
        <p>Depending on your planned analyses, if a given configuration lacks those channels, you will either need to re-export the data (provided those channels were originally recorded) or exclude the participant.</p>
        <p>Below, we check how many EEG configurations your dataset contains.</p>
        """)
        display(section21)
        
        # get the EEG configuration per participant 
        ch_per_sub = df_ch.groupby('subject')['channel'].apply(lambda x: tuple(sorted(set(x))))
        
        # identify the channel configuration of each participant and store them in a dict to print per channel config
        ch_config_dict = {}
        for config in ch_per_sub.unique():
            sub = ch_per_sub[ch_per_sub == config].index.tolist()
            ch_config_dict[config] = sub
        
        if len(ch_config_dict) > 1:
            print('\n>>> There is multiple EEG configurations in your dataset! <<<')    
            print(f'\n\tNumber of different configuration: {len(ch_config_dict)}\n')
        else:
            print('\n>>> There is only one EEG configuration in your dataset! <<<\n')

        if len(ch_config_dict)>=1:
            # widget to select the configuration of interest
            config_ch_slider = mk_config_slider(value = 1, min = 1, max = len(ch_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=ch_config_dict, param="channels"), i=config_ch_slider);
        else: 
            print("No EEG configuration found")
        #____________________________________________________________________________________________

        #%% 2.2 Inspect sampling frequency___________________________________________________________
        section22 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>2.2 Inspect EEG sampling frequency</h3>
        <p>The sampling frequency is the number of recorded samples per time unit (expressed in Hz). It is set at the acquisition.
        <br>Ideally, you expect to have only one sampling frequency for all the EEGs and participants.
        <br>In multicentric dataset, you might end up with different sampling frequencies across participants (specific to each recording center).</p> 
        <p>If you have multiple sampling frequencies across participants in your dataset, we recommend that you harmonize your dataset by downsampling your data to the lowest sampling frequency before your analyses.</p>
        <p>Below, we check how many different sampling frequencies your dataset contains.</p>
        <p><em>Side note: With multiple sampling frequencies within participants (that can happen for EEG and EOG), each EEG analysis software behaves differently. For example:
        <br>&#x2022; MNE python will automatically upsample channels to the highest sampling frequency (with .edf/.bdf/.gdf format)
        <br>&#x2022; Fieldtrip will load only a subset of channels (with the sampling frequency the most represented)
        </em></p>
        """)
        display(section22)
        
        # the sampling frequency configuration
        sf_per_sub = df_ch.groupby('subject')['sampling_frequency'].apply(lambda x: tuple(sorted(set(x))))
        # identify the sampling frequency configuration of each participant and store them in a dict to print per sampling configuration config
        sf_config_dict = {}
        for config in sf_per_sub.unique():
            sub = sf_per_sub[sf_per_sub == config].index.tolist()
            sf_config_dict[config] = sub
        
        # print info per sf configuration (maybe print it only for multiple config)
        if len(sf_config_dict) > 1:
            print('\n>>> There is multiple sampling frequency for EEGs in your dataset! <<<')    
            print(f'\n\tNumber of different sampling frequency configuration: {len(sf_config_dict)}\n')
            print('Quick overlook of the EEGs associated to sampling frequencies:')
            for s, sf in enumerate(df_ch['sampling_frequency'].unique()):
                # select only rows with the current sf
                df_sf = df_ch[df_ch['sampling_frequency'] == sf].copy()
                print(f'\n{sf} Hz: {df_sf["channel"].unique()}\n')
        else:
            print(f'\n>>> There is only one sampling frequency for EEGs in your dataset: {df_ch['sampling_frequency'].unique()} <<<\n')

        if len(sf_config_dict)>=1:
            # widget to select the configuration of interest
            config_sf_slider = mk_config_slider(value = 1, min = 1, max = len(sf_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=sf_config_dict, param="sampling frequencies"), i=config_sf_slider);
        else:
            print("No EEG sampling frequency found")
        #____________________________________________________________________________________________

        #%% 2.3 Inspect EEG filters__________________________________________________________________
        section23 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>2.3 Inspect EEG filters</h3>
        <p>When we visualize EEG data, signals are classically filtered by the software (like compumedics).
        <br>For analyses, we classically apply high-pass (to remove very low frequency), low-pass (to remove high frequency), and notch (to remove electric noise) filters.
        <br>When we export the data, we can specify whether we want the data to be filtered or not.
        <br>A good practice is to export the data without any filter, so that you can apply filters later according to your analyses.
        <br>However, for whole-night recordings, we recommend to export the data with a high-pass filter of 0.01 Hz in order to remove slow drift on such long recordings.</p>
        <p>If you have multiple filter configurations, we recommend re-exporting the data without filters if possible.</p>
        <p>Below, we check which filters were applied and counts how many different ones were used when exporting your dataset.</p>
        """)
        display(section23)
        
        if len(df_ch['highpass'].unique())+len(df_ch['lowpass'].unique())+len(df_ch['notch'].unique()) == 3:
            print('\n>>> All EEGs have the same filters! <<<')
        elif len(df_ch['highpass'].unique())+len(df_ch['lowpass'].unique())+len(df_ch['notch'].unique()) > 3:
            print('\n>>> Filters are not fully consistent in EEGs across the dataset! <<<')
        else:
            print('\n>>> There may have been a problem in reading the filters. Here is the output: <<<')
        
        # Get the list of participants with different filtering parameters
        # 1st replace NaN because groupby does not like NaN
        df_filt = df_ch.copy()
        df_filt[['lowpass', 'highpass', 'notch']] = df_filt[['lowpass', 'highpass', 'notch']].fillna('missing')
        
        config_filters = (
            df_filt.groupby(['lowpass', 'highpass', 'notch'])['subject']
            .apply(lambda x: sorted(set(x)))
            .reset_index(name = 'subjects')
        )
        
        # print filter configuration
        print(f'\n\tNumber of different EEG filters configurations: {len(config_filters)}\n')

        if len(config_filters)>=1:
            # widget to select the configuration of interest
            config_filter_slider = mk_config_slider(value = 1, min = 1, max = len(config_filters))
            
            # function to rpint filters configurations
            def print_filters(config_slider):
                # get the info from the dataframe
                idx = config_slider - 1
                sID = config_filters.iloc[idx]['subjects']
                hpass = config_filters.iloc[idx]['highpass']
                lpass = config_filters.iloc[idx]['lowpass']
                notch = config_filters.iloc[idx]['notch']
                
                # print info
                print(f'Selected configuration: # {config_slider}')
                print(f'\tFilters configuration: highpass: {hpass}; lowpass: {lpass}; notch: {notch}')
                print(f'\t{len(sID)} participants: {sID}')
            
            widgets.interact(print_filters, config_slider = config_filter_slider);
        else:
            print("No EEG filters found")
        #____________________________________________________________________________________________

        #%% 2.4 Inspect units in the dataset_________________________________________________________
        section24 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>2.4 Inspect EEG units</h3>
        <p>At the exportation, channels can be imported in different units.
        <br>Each analysis software will handle units differently, so it can be helpful to know which units your dataset contains.
        </p> 
        <p>&#x2022; MNE python will automatically detect the units and convert the data to Volt. However, if the unit is not read correctly, the data will <b>not</b> be converted (e.g. "UV" is not interpredted as µV, therefore data are not converted to Volt)
        <br>&#x2022; Fieldtrip is loading the data with their unit of origin, so you might want to convert all channels to the same unit before your analysis
        </p>
        <p>Below, we check how many different units your dataset contains.</p>
        """)
        display(section24)

        if len(df_ch['dimension'].unique()) == 1:
            print(f'\n>>> All EEGs have the same unit: {df_ch["dimension"].unique()} <<<\n')
        elif len(df_ch['dimension'].unique()) > 1:
            print('\n>>> Multiple units were found! <<<')
            print(f'\n\tNumber of different units configurations: {len(df_ch['dimension'].unique())}\n')
            print('Quick overlook of EEGs associated to units:')
            for u, unit in enumerate(df_ch['dimension'].unique()):
                # select only rows with the current sf
                df_unit = df_ch[df_ch['dimension'] == unit].copy()
                print(f'\n{unit}: {df_unit["channel"].unique()}')
            print(f'\n')
            
        # print the different configuration of units 
        # if info about sf configuration is needed
        unit_per_sub = df_ch.groupby('subject')['dimension'].apply(lambda x: tuple(sorted(set(x))))
        ch_per_unit = df_ch.groupby('dimension')['channel'].apply(lambda x: tuple(sorted(set(x))))
        # identify the sampling frequency configuration of each participant and store them in a dict to print per sampling configuration config
        unit_config_dict = {}
        for config in unit_per_sub.unique():
            sub = unit_per_sub[unit_per_sub == config].index.tolist()
            unit_config_dict[config] = sub

        if len(unit_config_dict)>=1:
            # widget to select the configuration of interest
            config_unit_slider = mk_config_slider(value = 1, min = 1, max = len(unit_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=unit_config_dict, param="Units"), i=config_unit_slider);
        else:
            print("No EEG unit found")
        #____________________________________________________________________________________________

        #%% 2.5 Inspect EEG signal inversion_________________________________________________________
        section25 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>2.5 Inspect EEG signal inversion</h3>
        <p>Some softwares (e.g. profusion from compumedics) allows to invert the polarity of the exported data. It can be extremely confusing and can lead to wrong results.
        <br>Here, we inspect if the signal is inverted by checking if the minimum physical boundary is higher than the maximum physical boundary.
        <br>For .edf file, the physical boundaries are values that are set when exporting the data by specifying the scale of the data.
        <br>In profusion (from compumedics) a scale of 1mV will lead to a min physical boundary of -500 µV and a max physical boundary of +500 µV.
        </p>
        <p>For other EEG format and software, the dynamical range might be set before recording (e.g. to be specified in the montage) and can't be changed at the exportation.
        </p>
        <p> Below, we check, for each EEG channel, if the minimum physical boundary is greater than the maximum physical boundary, and saves a table containing the channels with inverted polarity.</p>
        """)
        display(section25)
        
        # select rows where the physical min is greater than the physical max
        df_inv = df_ch[df_ch['physical_min'] > df_ch['physical_max']]
        
        if not df_inv.empty:
            print('\n>>> Inverted polarity detected in EEGs! <<<')
            print(f'{df_inv.shape[0]} EEGs have an inverted polarity (from {df_ch.shape[0]} EEGs in {len(edf_files)} edf files)')
            print(df_inv[['subject', 'channel', 'dimension', 'physical_min', 'physical_max']])
        else:
            print('\n>>> No inverted polarity was detected in EEGs <<<')
        df_inv.to_csv(f'{summary_path}/EEG_inverted_polarity_edf.tsv', sep = '\t')
        print(f'\nSaving informations from inverted polarity EEGs to:\n{summary_path}/EEG_inverted_polarity_edf.tsv \n(will be empty if no inverted polarity)')
        #____________________________________________________________________________________________

        #%% 2.6 Inspect EEG dynamic range and resolution_____________________________________________
        section26 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>2.6 Inspect EEG dynamic range and resolution</h3>
        <p>The EDF format stores signals in 16 bits, meaning that each sample can take 65&nbsp;536 discrete values (2<sup>16</sup>).
        <br>In order to convert these values into real EEG amplitudes, a dynamic range (a minimum and maximum value) must be defined when exporting the data.
        <br>Each sample is then given a value between this minimum and maximum (among the 2<sup>16</sup> levels).
        </p>
        <p>This choice of dynamic range can lead to two opposing problems:</p>
            <ul>
                <li>
                    <strong>Clipping:</strong><br>
                    If the dynamic range is too small, certain signal amplitudes exceed the limits.<br>
                    The exceeding values are then cut off (therefore “locked” at the min/max), and information is lost.<br>
                    Example of data with a dynamic range of ± 100 µV:<br>
                    <img src="images/clipped.png" width="250"/>
                </li>
                <li>
                    <strong>Loss of resolution:</strong><br>
                    If the dynamic range is too large, the 65&nbsp;536 levels are spread over a too-wide amplitude.<br>
                    Each quantization step then becomes too large, and small variations in signal amplitude are no longer visible with precision.<br>
                    Example of data with a resolution of 30 µV:<br>
                    <img src="images/low_resolution.png" width="250">
                </li>
                Example of clean data (dynamic range = ± 500 µV; resolution = 0.01 µV:<br>
                <img src="images/clean.png" width="250"/>
            </ul>
        """)
        display(section26)
        #---------------------------
        section261 = widgets.HTML("""
        <h4>Dynamic range</h4>
        <p>Typical physiological EEG data (good quality) varies from ± 500 µV.
        <br>Below, we check if the dynamic range physical boundaries are lower than 500 µV (± 250 µV).
        <br>You can change the dynamic range threshold with the widget.
        <br>Detected bad channels are saved to a summary table.</p>
        """)
        display(section261)
        
        dr_thres = widgets.BoundedFloatText(
            value=500,
            min=0,
            max=5000,
            step=0.1,
            style={'description_width': '200px'},  # augmente la largeur de la description
            layout=widgets.Layout(width='270px'),   # ajuste la taille totale du widget si besoin
            description='Dynamic range threshold (µV):',
            disabled=False
        );
        
        
        def check_bad_dr(threshold):
            dr_mask = df_ch['res_theoretical']*pow(2,16) <= threshold
            bad_dr = df_ch[dr_mask]
            
            if not bad_dr.empty:
                print(f'\n>>> Dynamic range <= {threshold} µV detected in EEGs! <<<\n')
                print(f'{bad_dr.shape[0]} EEGs detected (from {df_ch.shape[0]} EEGs in {len(edf_files)} edf files)')
                print(bad_dr[['subject', 'channel', 'dimension', 'physical_min', 'physical_max', 'res_theoretical']])
            else:
                print(f'\n>>> No EEG with a dynamic range <= {threshold} µV was detected! <<<')
            bad_dr.to_csv(f'{summary_path}/EEG_bad_dynamic_range_edf.tsv', sep = '\t')
            print(f'\nSaving informations from bad dynamic range EEGs to:\n{summary_path}/EEG_bad_dynamic_range_edf.tsv \n(will be empty if no bad resolution)')
        
        widgets.interact(check_bad_dr, threshold = dr_thres);
        #---------------------------
        #---------------------------
        section262 = widgets.HTML("""
        <h4>Resolution</h4>
        <p>The theoretical resolution of .edf file is the minimum amplitude variation that can be recorded between two samples (influenced by the dynamic range, as stated above).
        <br>Below, we detect EEG channels that have a resolution higher than 0.1 µV.
        <br>You can change the resolution threshold with the widget.
        <br>Channels with a lower resolution than the threshold are saved to a summary table.</p>
        """)
        display(section262)

        # res_theo have been converted to uV, but if dimension was not read or not indicated in the headers, it might not work. I might need to add something more robust
        r_thres = widgets.BoundedFloatText(
            value=0.1,
            min=0,
            max=10.0,
            step=0.1,
            style={'description_width': '150px'},  # augmente la largeur de la description
            layout=widgets.Layout(width='230px'),   # ajuste la taille totale du widget si besoin
            description='Resolution threshold (µV):',
            disabled=False
        );
        
        # define a function to interact with the widget
        def check_bad_res(threshold):
            r_mask = df_ch['res_theoretical'] >= threshold
            bad_res = df_ch[r_mask]
            
            if not bad_res.empty:
                print(f'\n>>> EEGs with a resolution >= {threshold} µV detected! <<<')
                print(f'{bad_res.shape[0]} EEGs detected (from {df_ch.shape[0]} EEGs in {len(edf_files)} edf files)')
                print(bad_res[['subject', 'channel', 'dimension', 'physical_min', 'physical_max', 'res_theoretical']])
            else:
                print(f'\n>>> No EEG with a resolution >= {threshold} µV was detected! <<<')
            bad_res.to_csv(f'{summary_path}/EEG_bad_resolution_edf.tsv', sep = '\t')
            print(f'\nSaving informations from bad resolution EEGs to:\n{summary_path}/EEG_bad_resolution_edf.tsv \n(will be empty if no bad resolution)')
        
        widgets.interact(check_bad_res, threshold=r_thres);
        #---------------------------
        #____________________________________________________________________________________________

        #%% 3. Select only the EOGs__________________________________________________________________
        section3 = widgets.HTML("""
        <hr style="height:4px; background-color:black; border:none;">
        <h2>3. Inspect EOG</h2>
        <p>This section follows the same structure, logic, and outputs as section 2. Inspect EEG.
        <br>Hence, the code cells are not commented (please refer to section 2. if you need a refresher).
        </p>
        """)
        display(section3)

        # select only EOGs and return a warning if the number of participant is smaller/higher
        mask_eog = df_full['channel'].str.contains(r'EOG', case = False, na=False) # create a mask that returns true for lines containing either EOG in the channel column
        df_eog = df_full[mask_eog]
        # remove the emg channels that were captured with the AGAGCL ELECTRODE transducer type 
        # df_eog = df_eog[~df_eog['channel'].str.contains(r'emg|ecg|eeg|a1|a2', case=False, na=False)] # the ~ allows to not select the selection (like ! in matlab)
        
        # Check if the number of participants with only EOG is the same as df_full. 
        # If not, it might be because the transducer type was no correctly detected. 
        # One possibility is to add the type of transducer to the condition line 2 of this cell.
        if len(df_full['subject'].unique()) > len(df_eog['subject'].unique()):
            # identify missing subjects
            missing_sub = set(df_full['subject'].unique()) - set(df_eog['subject'].unique())
            print('\n!!! There is less participants in the dataset with only EOGs !!!')
            print(f'Missing participants: {missing_sub}')
            print("\nEither these participants don't have EOGs.")
            print("Or the transducer type was not correctly detected.")
            # get df of missing sub to save and inspect
            df_eogmiss = df_full[df_full['subject'].isin(missing_sub)]
            df_eogmiss.to_csv(f'{summary_path}/EOG_missing_edf.tsv', sep = '\t')
            print(f'\nSaving informations from missing participants to:\n{summary_path}/EOG_missing_edf.tsv')
            print('Please inspect the file, and specifically the column transducer_type')
        elif len(df_full['subject'].unique()) < len(df_eog['subject'].unique()):
            print('\n!!! There is more participants in the dataset with only EOGs !!!')
            print('This should not be the case.')
            print('Please inspect what is happening in a code editor (spyder..), or ask Yvan.')
            more_sub = set(df_eog['subject'].unique()) - set(df_full['subject'].unique())
            df_more = df_eog[df_eog['subject'].isin(more_sub)]
            df_more.to_csv(f'{summary_path}/EOG_suspect_edf.csv', sep = '\t')
            print(f'\nSaving informations from suspect participants to:\n{summary_path}/EOG_suspect_edf.tsv')
        
        # saving info from EOG
        df_eog.to_csv(f'{summary_path}/EOG_summary_table.tsv', sep = '\t')
        print(f'\nSaving informations from EOGs to:\n{summary_path}/EOG_summary_table.tsv')
        #____________________________________________________________________________________________

        #%% 3.1 Inspect EOG configurations___________________________________________________________
        section31 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>3.1 Inspect EOG configurations</h3>
        """)
        display(section31)

        # get the EOG configuration per participant 
        eog_per_sub = df_eog.groupby('subject')['channel'].apply(lambda x: tuple(sorted(set(x))))
        
        # identify the EOG configuration of each participant and store them in a dict to print per EOG config
        eog_config_dict = {}
        for config in eog_per_sub.unique():
            sub = eog_per_sub[eog_per_sub == config].index.tolist()
            eog_config_dict[config] = sub
        
        if len(eog_config_dict) > 1:
            print('\n>>> There is multiple EOG configurations in your dataset! <<<')    
            print(f'\n\tNumber of different configuration: {len(eog_config_dict)}\n')
        else:
            print('\n>>> There is only one EOG configuration in your dataset! <<<')

        if len(eog_config_dict)>=1:
            # widget to select the configuration of interest
            config_eog_slider = mk_config_slider(value = 1, min = 1, max = len(eog_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=eog_config_dict, param="Channels"), i=config_eog_slider);
        else:
            print("No EOG configuration found")
        #____________________________________________________________________________________________
        
        #%% 3.2 Inspect EOG sampling frequencies_____________________________________________________
        section32 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>3.2 Inspect EOG sampling frequencies</h3>
        """)
        display(section32)

        # the sampling frequency configuration
        sfeog_per_sub = df_eog.groupby('subject')['sampling_frequency'].apply(lambda x: tuple(sorted(set(x))))
        # identify the sampling frequency configuration of each participant and store them in a dict to print per sampling configuration config
        sfeog_config_dict = {}
        for config in sfeog_per_sub.unique():
            sub = sfeog_per_sub[sfeog_per_sub == config].index.tolist()
            sfeog_config_dict[config] = sub
        
        # print info per sf configuration (maybe print it only for multiple config)
        if len(sfeog_config_dict) > 1:
            print('\n>>> There is multiple sampling frequency for EOGs in your dataset! <<<')    
            print(f'\n\tNumber of different sampling frequency configuration: {len(sfeog_config_dict)}\n')
            print('Quick overlook of the EOGs associated to sampling frequencies:')
            for s, sf in enumerate(df_eog['sampling_frequency'].unique()):
                # select only rows with the current sf
                df_sf = df_eog[df_eog['sampling_frequency'] == sf].copy()
                print(f'\n{sf} Hz: {df_sf["channel"].unique()}')
        else:
            print(f'\n>>> There is only one sampling frequency for EOGs in your dataset: {df_eog['sampling_frequency'].unique()} <<<')

        if len(sfeog_config_dict)>=1:
            # widget to select the configuration of interest
            config_sfeog_slider = mk_config_slider(value = 1, min = 1, max = len(sfeog_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=sfeog_config_dict, param="Sampling frequencies"), i=config_sfeog_slider);
        else:
            print("No EOG sampling frequency found")
        #____________________________________________________________________________________________
        
        #%% 3.3 Inspect EOG filters__________________________________________________________________
        section33 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>3.3 Inspect EOG filters</h3>
        """)
        display(section33)

        if len(df_eog['highpass'].unique())+len(df_eog['lowpass'].unique())+len(df_eog['notch'].unique()) == 3:
            print('\n>>> All EOGs have the same filters! <<<')
        elif len(df_eog['highpass'].unique())+len(df_eog['lowpass'].unique())+len(df_eog['notch'].unique()) > 3:
            print('\n>>> Filters are not fully consistent across the dataset! <<<')
        else:
            print('\n>>> There may have been a problem in reading the filters. Here is the output: <<<')
        
        # Get the list of participants with different filtering parameters
        # 1st replace NaN because groupby does not like NaN
        df_eogfilt = df_eog.copy()
        df_eogfilt[['lowpass', 'highpass', 'notch']] = df_eogfilt[['lowpass', 'highpass', 'notch']].fillna('missing')
        
        config_eogfilters = (
            df_eogfilt.groupby(['lowpass', 'highpass', 'notch'])['subject']
            .apply(lambda x: sorted(set(x)))
            .reset_index(name = 'subjects')
        )

        # print filter configuration
        print(f'\n\tNumber of different EOG filters configurations: {len(config_eogfilters)}\n')

        if len(config_eogfilters)>=1:
            # widget to select the configuration of interest
            config_eogfilter_slider = mk_config_slider(value = 1, min = 1, max = len(config_eogfilters))
            
            # function to rpint filters configurations
            def print_eogfilters(config_slider):
                # get the info from the dataframe
                idx = config_slider - 1
                sID = config_eogfilters.iloc[idx]['subjects']
                hpass = config_eogfilters.iloc[idx]['highpass']
                lpass = config_eogfilters.iloc[idx]['lowpass']
                notch = config_eogfilters.iloc[idx]['notch']
                
                # print info
                print(f'Selected configuration: # {config_slider}')
                print(f'\tFilters configuration: highpass: {hpass}; lowpass: {lpass}; notch: {notch}')
                print(f'\t{len(sID)} participants: {sID}')
            
            widgets.interact(print_eogfilters, config_slider = config_eogfilter_slider);
        else:
            print("No EOG filters found")
        #____________________________________________________________________________________________
        
        #%% 3.4 Inspect EOG units____________________________________________________________________
        section34 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>3.4 Inspect EOG units</h3>
        """)
        display(section34)

        if len(df_eog['dimension'].unique()) == 1:
            print(f'\n>>> All EOGs have the same unit: {df_eog["dimension"].unique()} <<<\n')
        elif len(df_eog['dimension'].unique()) > 1:
            print('\n>>> Multiple units were found! <<<')
            print(f'\n\tNumber of different EOG units configurations: {len(df_eog['dimension'].unique())}\n')
            print('Quick overlook of EOGs associated to units:')
            for u, unit in enumerate(df_eog['dimension'].unique()):
                # select only rows with the current sf
                df_unit = df_eog[df_eog['dimension'] == unit].copy()
                print(f'\n{unit}: {df_unit["channel"].unique()}')
            
        
        # print the different configuration of units 
        # if info about sf configuration is needed
        eogunit_per_sub = df_eog.groupby('subject')['dimension'].apply(lambda x: tuple(sorted(set(x))))
        eog_per_unit = df_eog.groupby('dimension')['channel'].apply(lambda x: tuple(sorted(set(x))))
        # identify the sampling frequency configuration of each participant and store them in a dict to print per sampling configuration config
        eogunit_config_dict = {}
        for config in eogunit_per_sub.unique():
            sub = eogunit_per_sub[eogunit_per_sub == config].index.tolist()
            eogunit_config_dict[config] = sub

        if len(eogunit_config_dict)>=1:
            # widget to select the configuration of interest
            config_eogunit_slider = mk_config_slider(value = 1, min = 1, max = len(eogunit_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=eogunit_config_dict, param="Units"), i=config_eogunit_slider);
        else:
            print("No EOG unit found")
        #____________________________________________________________________________________________
        
        #%% 3.5 Inspect EOG signal inversion_________________________________________________________
        section35 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>3.5 Inspect EOG signal inversion</h3>
        """)
        display(section35)

        # select rows where the physical min is greater than the physical max
        df_eoginv = df_eog[df_eog['physical_min'] > df_eog['physical_max']]
        
        if not df_eoginv.empty:
            print('\n>>> Inverted polarity detected in EOGs! <<<')
            print(f'{df_eoginv.shape[0]} EOGs have an inverted polarity (from {df_eog.shape[0]} EOGs in {len(edf_files)} edf files)')
            print(df_eoginv[['subject', 'channel', 'dimension', 'physical_min', 'physical_max']])
        else:
            print('\n>>> No inverted polarity was detected in EOGs <<<')
        df_eoginv.to_csv(f'{summary_path}/EOG_inverted_polarity_edf.tsv', sep = '\t')
        print(f'\nSaving informations from inverted polarity EOGs to:\n{summary_path}/EOG_inverted_polarity_edf.tsv \n(will be empty if no inverted polarity)')
        #____________________________________________________________________________________________

        #%% 3.6 Inspect EOG dynamic range and resolution_____________________________________________
        section36 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>3.6 Inspect EOG dynamic range and resolution</h3>
        """)
        display(section36)
        #---------------------------
        section361 = widgets.HTML("""
        <h4>Dynamic range</h4>
        """)
        display(section361)

        dr_eogthres = widgets.BoundedFloatText(
            value=400,
            min=0,
            max=5000,
            step=0.1,
            style={'description_width': '200px'},  # augmente la largeur de la description
            layout=widgets.Layout(width='270px'),   # ajuste la taille totale du widget si besoin
            description='Dynamic range threshold (µV):',
            disabled=False
        );

        def check_bad_eogdr(threshold):
            dr_mask = df_eog['res_theoretical']*pow(2,16) <= threshold
            bad_dr = df_eog[dr_mask]
            
            if not bad_dr.empty:
                print(f'\n>>> Dynamic range <= {threshold} µV detected in EOGs! <<<\n')
                print(f'{bad_dr.shape[0]} EOGs detected (from {df_eog.shape[0]} EOGs in {len(edf_files)} edf files)')
                print(bad_dr[['subject', 'channel', 'dimension', 'physical_min', 'physical_max', 'res_theoretical']])
            else:
                print(f'\n>>> No EOG with a dynamic range <= {threshold} µV was detected! <<<')
            bad_dr.to_csv(f'{summary_path}/EOG_bad_dynamic_range_edf.tsv', sep = '\t')
            print(f'\nSaving informations from bad dynamic range EOGs to:\n{summary_path}/EOG_bad_dynamic_range_edf.tsv \n(will be empty if no bad resolution)')
        
        widgets.interact(check_bad_eogdr, threshold = dr_eogthres);
        #---------------------------
        #---------------------------
        section362 = widgets.HTML("""
        <h4>Resolution</h4>
        """)
        display(section362)

        # res_theo have been converted to uV, but if dimension was not read or not indicated in the headers, it might not work. I might need to add something more robust
        eogr_thres = widgets.BoundedFloatText(
            value=0.1,
            min=0,
            max=10.0,
            step=0.1,
            style={'description_width': '150px'},  # augmente la largeur de la description
            layout=widgets.Layout(width='230px'),   # ajuste la taille totale du widget si besoin
            description='Resolution threshold (µV):',
            disabled=False
        );
        
        # define a function to interact with the widget
        def check_bad_eogres(threshold):
            r_mask = df_eog['res_theoretical'] >= threshold
            bad_res = df_eog[r_mask]
            
            if not bad_res.empty:
                print(f'\n>>> EOGs with a resolution >= {threshold} µV detected! <<<')
                print(f'{bad_res.shape[0]} EOGs detected (from {df_eog.shape[0]} EOGs in {len(edf_files)} edf files)')
                print(bad_res[['subject', 'channel', 'dimension', 'physical_min', 'physical_max', 'res_theoretical']])
            else:
                print(f'\n>>> No EOG with a resolution >= {threshold} µV was detected! <<<')
            bad_res.to_csv(f'{summary_path}/EOG_bad_resolution_edf.tsv', sep = '\t')
            print(f'\nSaving informations from bad resolution EOGs to:\n{summary_path}/EOG_bad_resolution_edf.tsv \n(will be empty if no bad resolution)')
        
        widgets.interact(check_bad_eogres, threshold=eogr_thres);
        #---------------------------
        #____________________________________________________________________________________________

        #%% 4. Select only the ECGs__________________________________________________________________
        #____________________________________________________________________________________________
        section4 = widgets.HTML("""
        <hr style="height:4px; background-color:black; border:none;">
        <h2>4. Inspect ECG</h2>
        <p>This section follows the same structure, logic, and outputs as section 2. Inspect EEG.
        <br>Hence, the code cells are not commented (please refer to section 2. if you need a refresher).
        </p>
        """)
        display(section4)

        # select only ECGs and return a warning if the number of participant is smaller/higher
        mask_ecg = df_full['channel'].str.contains(r'ecg', case = False, na=False) # create a mask that returns true for lines containing either ecg in the channel column
        df_ecg = df_full[mask_ecg]
        
        # Check if the number of participants with only ECG is the same as df_full. 
        # If not, it might be because the transducer type was no correctly detected. 
        # One possibility is to add the type of transducer to the condition line 2 of this cell.
        if len(df_full['subject'].unique()) > len(df_ecg['subject'].unique()):
            # identify missing subjects
            missing_sub = set(df_full['subject'].unique()) - set(df_ecg['subject'].unique())
            print('\n!!! There is less participants in the dataset with only ECGs !!!')
            print(f'Missing participants: {missing_sub}')
            print("\nEither these participants don't have ECGs.")
            print("Or the transducer type was not correctly detected.")
            # get df of missing sub to save and inspect
            df_ecgmiss = df_full[df_full['subject'].isin(missing_sub)]
            df_ecgmiss.to_csv(f'{summary_path}/ECG_missing_edf.tsv', sep = '\t')
            print(f'\nSaving informations from missing participants to:\n{summary_path}/ECG_missing_edf.tsv')
            print('Please inspect the file, and specifically the column transducer_type')
        elif len(df_full['subject'].unique()) < len(df_ecg['subject'].unique()):
            print('\n!!! There is more participants in the dataset with only ECGs !!!')
            print('This should not be the case.')
            print('Please inspect what is happening in a code editor (spyder..), or ask Yvan.')
            more_sub = set(df_ecg['subject'].unique()) - set(df_full['subject'].unique())
            df_more = df_ecg[df_ecg['subject'].isin(more_sub)]
            df_more.to_csv(f'{summary_path}/ECG_suspect_edf.csv', sep = '\t')
            print(f'\nSaving informations from suspect participants to:\n{summary_path}/ECG_suspect_edf.tsv')
        
        # saving info from ECG
        df_ecg.to_csv(f'{summary_path}/ECG_summary_table.tsv', sep = '\t')
        print(f'\nSaving informations from ECGs to:\n{summary_path}/ECG_summary_table.tsv')
        #___________________________________________________________________________________________
        
        #%% 4.1 Inspect ECG configurations___________________________________________________________
        section41 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>4.1 Inspect ECG configurations</h3>
        """)
        display(section41)

        # get the ECGs configuration per participant 
        ecg_per_sub = df_ecg.groupby('subject')['channel'].apply(lambda x: tuple(sorted(set(x))))
        
        # identify the ECG configuration of each participant and store them in a dict to print per ECG config
        ecg_config_dict = {}
        for config in ecg_per_sub.unique():
            sub = ecg_per_sub[ecg_per_sub == config].index.tolist()
            ecg_config_dict[config] = sub
        
        if len(ecg_config_dict) > 1:
            print('\n>>> There is multiple ECG configurations in your dataset! <<<')    
            print(f'\n\tNumber of different ECG configuration: {len(ecg_config_dict)}\n')
        else:
            print('\n>>> There is only one ECG configuration in your dataset! <<<')

        if len(ecg_config_dict)>=1:
            # widget to select the configuration of interest
            config_ecg_slider = mk_config_slider(value = 1, min = 1, max = len(ecg_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=ecg_config_dict, param="Channels"), i=config_ecg_slider);
        else:
            print("No ECG configuration found")
        #___________________________________________________________________________________________
        
        #%% 4.2 Inspect ECG sampling frequencies____________________________________________________
        section42 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>4.2 Inspect ECG sampling frequencies</h3>
        """)
        display(section42)
        
        # the sampling frequency configuration
        ecgsf_per_sub = df_ecg.groupby('subject')['sampling_frequency'].apply(lambda x: tuple(sorted(set(x))))
        # identify the sampling frequency configuration of each participant and store them in a dict to print per sampling configuration config
        ecgsf_config_dict = {}
        for config in ecgsf_per_sub.unique():
            sub = ecgsf_per_sub[ecgsf_per_sub == config].index.tolist()
            ecgsf_config_dict[config] = sub
        
        # print info per sf configuration (maybe print it only for multiple config)
        if len(ecgsf_config_dict) > 1:
            print('\n>>> There is multiple sampling frequency for ECGs in your dataset! <<<')    
            print(f'\n\tNumber of different sampling frequency configuration: {len(ecgsf_config_dict)}\n')
            print('Quick overlook of the ECGs associated to sampling frequencies:')
            for s, sf in enumerate(df_ecg['sampling_frequency'].unique()):
                # select only rows with the current sf
                df_sf = df_ecg[df_ecg['sampling_frequency'] == sf].copy()
                print(f'\n{sf} Hz: {df_sf["channel"].unique()}')
        else:
            print(f'\n>>> There is only one sampling frequency for ECGs in your dataset: {df_ecg['sampling_frequency'].unique()} <<<')

        if len(ecgsf_config_dict):
            # widget to select the configuration of interest
            config_ecgsf_slider = mk_config_slider(value = 1, min = 1, max = len(ecgsf_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=ecgsf_config_dict, param="Sampling frequencies"), i=config_ecgsf_slider);
        else:
            print("No ECG sampling frequency")
        #___________________________________________________________________________________________
        
        #%% 4.3 Inspect ECG filters_________________________________________________________________
        section43 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>4.3 Inspect ECG filters</h3>
        """)
        display(section43)

        if len(df_ecg['highpass'].unique())+len(df_ecg['lowpass'].unique())+len(df_ecg['notch'].unique()) == 3:
            print('\n>>> All ECGs have the same filters! <<<')
        elif len(df_ecg['highpass'].unique())+len(df_ecg['lowpass'].unique())+len(df_ecg['notch'].unique()) > 3:
            print('\n>>> Filters are not fully consistent across the dataset! <<<')
        else:
            print('\n>>> There may have been a problem in reading the filters. Here is the output: <<<')
        
        # Get the list of participants with different filtering parameters
        # 1st replace NaN because groupby does not like NaN
        df_ecgfilt = df_ecg.copy()
        df_ecgfilt[['lowpass', 'highpass', 'notch']] = df_ecgfilt[['lowpass', 'highpass', 'notch']].fillna('missing')
        
        config_ecgfilters = (
            df_ecgfilt.groupby(['lowpass', 'highpass', 'notch'])['subject']
            .apply(lambda x: sorted(set(x)))
            .reset_index(name = 'subjects')
        )

        # print filter configuration
        print(f'\n\tNumber of different ECG filters configurations: {len(config_ecgfilters)}\n')

        if len(config_ecgfilters)>=1:
            # widget to select the configuration of interest
            config_ecgfilter_slider = mk_config_slider(value = 1, min = 1, max = len(config_ecgfilters))
            
            # function to rpint filters configurations
            def print_ecgfilters(config_slider):
                # get the info from the dataframe
                idx = config_slider - 1
                sID = config_ecgfilters.iloc[idx]['subjects']
                hpass = config_ecgfilters.iloc[idx]['highpass']
                lpass = config_ecgfilters.iloc[idx]['lowpass']
                notch = config_ecgfilters.iloc[idx]['notch']
                
                # print info
                print(f'Selected configuration: # {config_slider}')
                print(f'\tFilters configuration: highpass: {hpass}; lowpass: {lpass}; notch: {notch}')
                print(f'\t{len(sID)} participants: {sID}')
            
            widgets.interact(print_ecgfilters, config_slider = config_ecgfilter_slider);
        else:
            print("No ECG filters")
        #___________________________________________________________________________________________
        
        #%% 4.4 Inspect ECG units___________________________________________________________________
        section44 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>4.4 Inspect ECG units</h3>
        """)
        display(section44)

        if len(df_ecg['dimension'].unique()) == 1:
            print(f'\n>>> All ECGs have the same unit: {df_ecg["dimension"].unique()} <<<\n')
        elif len(df_ecg['dimension'].unique()) > 1:
            print('\n>>> Multiple units were found for ECGs! <<<')
            print(f'\n\tNumber of different units configurations: {len(df_ecg['dimension'].unique())}\n')
            print('Quick overlook of ECGs associated to units:')
            for u, unit in enumerate(df_ecg['dimension'].unique()):
                # select only rows with the current sf
                df_unit = df_ecg[df_ecg['dimension'] == unit].copy()
                print(f'\n{unit}: {df_unit["channel"].unique()}')
            
        
        # print the different configuration of units 
        # if info about sf configuration is needed
        ecgunit_per_sub = df_ecg.groupby('subject')['dimension'].apply(lambda x: tuple(sorted(set(x))))
        ecg_per_unit = df_ecg.groupby('dimension')['channel'].apply(lambda x: tuple(sorted(set(x))))
        # identify the sampling frequency configuration of each participant and store them in a dict to print per sampling configuration config
        ecgunit_config_dict = {}
        for config in ecgunit_per_sub.unique():
            sub = ecgunit_per_sub[ecgunit_per_sub == config].index.tolist()
            ecgunit_config_dict[config] = sub

        if len(ecgunit_config_dict)>=1:
            # widget to select the configuration of interest
            config_ecgunit_slider = mk_config_slider(value = 1, min = 1, max = len(ecgunit_config_dict))
            
            # print the configuration selected
            # interact with the slider output through the printing function 
            widgets.interact(lambda i: print_config(i, config_dict=ecgunit_config_dict, param="Units"), i=config_ecgunit_slider);
        else:
            print("No ECG unit found")
        #___________________________________________________________________________________________
        
        #%% 4.5 Inspect ECG signal inversion________________________________________________________
        section45 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>4.5 Inspect ECG signal inversion</h3>
        """)
        display(section45)

        # select rows where the physical min is greater than the physical max
        df_ecginv = df_ecg[df_ecg['physical_min'] > df_ecg['physical_max']]
        
        if not df_ecginv.empty:
            print('\n>>> Inverted polarity detected in ECGs! <<<')
            print(f'{df_ecginv.shape[0]} ECGs have an inverted polarity (from {df_ecg.shape[0]} ECGs in {len(edf_files)} edf files)')
            print(df_ecginv[['subject', 'channel', 'dimension', 'physical_min', 'physical_max']])
        else:
            print('\n>>> No inverted polarity was detected in ECGs <<<')
        df_ecginv.to_csv(f'{summary_path}/ECG_inverted_polarity_edf.tsv', sep = '\t')
        print(f'\nSaving informations from inverted polarity ECGs to:\n{summary_path}/ECG_inverted_polarity_edf.tsv \n(will be empty if no inverted polarity)')
        #____________________________________________________________________________________________

        #%% 4.6 Inspect ECG dynamic range and resolution_____________________________________________
        section46 = widgets.HTML("""
        <hr style="height:1px; background-color:black; border:none;">
        <h3>4.6 Inspect ECG dynamic range and resolution</h3>
        """)
        display(section46)
        #---------------------------
        section461 = widgets.HTML("""
        <h4>Dynamic range</h4>
        """)
        display(section461)

        ecgdr_thres = widgets.BoundedFloatText(
            value=400,
            min=0,
            max=5000,
            step=0.1,
            style={'description_width': '200px'},  # augmente la largeur de la description
            layout=widgets.Layout(width='270px'),   # ajuste la taille totale du widget si besoin
            description='Dynamic range threshold (µV):',
            disabled=False
        );
        
        
        def check_bad_ecgdr(threshold):
            dr_mask = df_ecg['res_theoretical']*pow(2,16) <= threshold
            bad_dr = df_ecg[dr_mask]
            
            if not bad_dr.empty:
                print(f'\n>>> Dynamic range <= {threshold} µV detected in ECGs! <<<\n')
                print(f'{bad_dr.shape[0]} ECGs detected (from {df_ecg.shape[0]} ECGs in {len(edf_files)} edf files)')
                print(bad_dr[['subject', 'channel', 'dimension', 'physical_min', 'physical_max', 'res_theoretical']])
            else:
                print(f'\n>>> No ECG with a dynamic range <= {threshold} µV was detected! <<<')
            bad_dr.to_csv(f'{summary_path}/ECG_bad_dynamic_range_edf.tsv', sep = '\t')
            print(f'\nSaving informations from bad dynamic range ECGs to:\n{summary_path}/ECG_bad_dynamic_range_edf.tsv \n(will be empty if no bad resolution)')
        
        widgets.interact(check_bad_ecgdr, threshold = ecgdr_thres);
        #---------------------------
        #---------------------------
        section462 = widgets.HTML("""
        <h4>Resolution</h4>
        """)
        display(section462)

        # res_theo have been converted to uV, but if dimension was not read or not indicated in the headers, it might not work. I might need to add something more robust
        ecgr_thres = widgets.BoundedFloatText(
            value=0.1,
            min=0,
            max=10.0,
            step=0.1,
            style={'description_width': '150px'},  # augmente la largeur de la description
            layout=widgets.Layout(width='230px'),   # ajuste la taille totale du widget si besoin
            description='Resolution threshold (µV):',
            disabled=False
        );
        
        # define a function to interact with the widget
        def check_bad_ecgres(threshold):
            r_mask = df_ecg['res_theoretical'] >= threshold
            bad_res = df_ecg[r_mask]
            
            if not bad_res.empty:
                print(f'\n>>> ECGs with a resolution >= {threshold} µV detected! <<<')
                print(f'{bad_res.shape[0]} ECGs detected (from {df_ecg.shape[0]} ecgs in {len(edf_files)} edf files)')
                print(bad_res[['subject', 'channel', 'dimension', 'physical_min', 'physical_max', 'res_theoretical']])
            else:
                print(f'\n>>> No ECG with a resolution >= {threshold} µV was detected! <<<')
            bad_res.to_csv(f'{summary_path}/ECG_bad_resolution_edf.tsv', sep = '\t')
            print(f'\nSaving informations from bad resolution ECGs to:\n{summary_path}/ECG_bad_resolution_edf.tsv \n(will be empty if no bad resolution)')
        
        widgets.interact(check_bad_ecgres, threshold=ecgr_thres);
        #____________________________________________________________________________________________

        #%% end of the code section__________________________________________________________________
        #____________________________________________________________________________________________

# Link run button to the main function
run_button.on_click(run_inspection)
# callback to run the function only when a folder is selected
chooser.register_callback(run_inspection)

# Display in voila
display(chooser, run_button, out)