# iEEG Pipeline Example

This notebook is meant to demonstrate how the core libraries for iEEG processing can be combined to make a user-friendly pipeline.

The code presented in this example is in no way exhaustive. Personal code can be integrated into this pipeline, and these functions are simple meant to help outline a project and ensure that researchers use the same libraries when possible.

This code is not meant to be used in a CI/CD style pipeline. We have opted for a lightweight solution for the purposes of individual research. If you require a larger scaleable solution to code deployment, please reach out to the [Brian Prager](mailto:bjprager@seas.upenn.edu) or [Joshua Asuncion](asuncion@seas.upenn.edu) for help.

## Imports

In [1]:
# General imports
import sys
import argparse

In [4]:
# Pipeline imports
from pipeline_ieeg.data_pull import pipeline_datapull_ieeg as PDI
from pipeline_ieeg.data_quality import dataframe_properties_check as DPC
from pipeline_ieeg.preprocessing import pipeline_preprocessing_ieeg as PPI
from pipeline_ieeg.feature_selection import pipeline_feature_selection_ieeg as PFSI

The pipeline functions shown above have all been designed to reference the core libraries. This ensures that everyone is using the same core libraries when possible. It also allows for quick hot swapping of various analysis techniques. When appropriate, each function can take in a list of processing step, or each step can be called individually. We will provide a demonstration below for what this means for analysis.

Generally, pipeline functions will follow the structure shown in the import statement. In the case of iEEG, that means the module is named pipeline_ieeg.

The directory underneath is broken up by processing step:
1. data pull
2. data quality check
3. preprocessing
4. feature selection
5. model
6. data reporting

Finally, the format for the script will typically take the form of pipeline_**processing step**_ieeg. (Data quality is linked to unit testing, and as of 02/21/2023 does not follow this format.)

**For more information on each of the pipeline functions or core libraries, please reference the relevant examples.**

In [6]:
def main():
    """
    Calls series of commands for a simple ieeg pipeline.

    Returns
    -------
    Cleaned dataframe, sampling frequency, and a dictionary of features.

    """
    
    # Command line options needed to obtain data.
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--user', required=True, help='username')
    parser.add_argument('-p', '--password', help='password (will be prompted if omitted)')
    parser.add_argument('--dataset', help='dataset name')
    parser.add_argument('--start', type=int, help='start offset in usec')
    parser.add_argument('--duration', type=int, help='number of usec to request')
    parser.add_argument('--local_path', default=None, type=str, help='Path to local data to ingest manually. Default=None.')
    parser.add_argument('--silent', dest='verbose', default=True, action='store_false', help='Silent Verbose Output. Default=False.')
    parser.add_argument('--nchan', type=int, help='Number of channels')
    args = parser.parse_args()
    
    # Data ingestion
    DF,fs = PDI.main(args)
    
    # Data quality check
    qflag = DPC.main(DF,16,verbose=args.verbose)
    if qflag and args.verbose:
        print("All data quality checks came back True. Proceeding to next step.")
    
    # Data preprocessing
    DF = PPI.main(DF)
    
    # Feature selection
    feature_dict = PFSI.main(DF,fs)
    
    return DF,fs,feature_dict

This function (named main to provide a simple consistent naming convention when building modules) goes through an entire iEEG processing pipeline. For specific examples of how each function works, please see below.

### Command-line Arguments
At a high level, the code begins by requesting information from the user. Most of this information is to help identify the dataset the user wishes to analyze. Additional pipeline options can be added if specific information needs to be passed to the data reduction or modeling code.

### Data Ingestion
At the data ingestion step, the code performs a series of checks to prevent data duplication. It will first check for data that matches the provided criteria in the user_data folder of the repository. Alternatively, it will check the file location provided by args.local_path. If the file is not found, it will then check against Borel and Lief. If the data does not exist within our cache, only then will it try to connect through the iEEG API to download the new data.

### Data Quality
This code checks the dataframe or array properties against expectation. This includes array shape, data types, and the presence of data like infs or NaNs. This type of code also aligns closely with unit testing, and may be cross referenced to the unit_test folder at the top of the repository. Not all unit tests are quality checks, so ask yourself what sort of "issues" your data may have you wish to be alerted to before running the pipeline.

### Preprocessing
This part of the code performs preprocessing on the data. Specific examples of how to use this function are provided below.

### Feature selection
This part of the code performs feature selection on the data. Specific examples of how to use this function are provided below. It returns a new object that contains the results of each feature selection within a dictionary where the key is the name of the feature, and the value is the result.

### Output
We then provide the output back to the user. If we were running a specific model, we could instead call it here. Or pass the data to a third party piece of software.