# Running the analysis pipeline on the EBRAINS collab

### Check config files

Before executing the pipeline, check whether the config files are set according to your needs. For test runs, most default settings should suffice.

In particular, the `DATA_SETS` parameter in _stage01_data_entry/configs/config_IDIBAPS.yaml_ and _.../config_LENS.yaml_ needs to contain the correct path to the datasets on you collab drive. The default is `/mnt/user/drive/My Libraries/My Library/datasets/...` but this might need to be changed, e.g., for folder names in different languages.

### Build the environment
When running this notebook for the first time in your jupyter hub session, first the dependencies of the pipeline scripts need to be installed. When this is not the first execution of the notebook, this step can be skipped.

In [2]:
%%capture
%pip install snakemake==5.15.0
%pip install jinja2==2.10.3
%pip install pygments==2.4.2
# %pip install pygraphviz==1.5  # can not be compiled on the system
%pip install git+https://github.com/NeuralEnsemble/elephant.git 
%pip install git+https://github.com/NeuralEnsemble/python-neo.git
%pip install nixio==1.5.0b4
%pip install pillow==7.0.0
%pip install matplotlib
%pip install seaborn
%pip install networkx
%pip install shapely==1.6.4.post2
%pip install scikit-learn==0.22.1
%pip install scikit-image==0.16.2
%pip install pandas==1.0.1

To update the kernel with the newly installed packages, __the kernel needs to be restarted (Kernel->Restart)__.

### Navigate to the working directory

In [None]:
import os
import sys
os.chdir('./pipeline')
sys.path.append('./')
print(os.getcwd())

### Set output location
Here, you can specify where the output files of the pipelines should be stored.
A suggested location would be your personal drive storage, e.g,:

`/mnt/user/drive/My Libraries/My Library/results/`

In [None]:
def update_output_path(path):
    with open('settings.py', 'r') as f:
        prev_content = f.read()
    with open('settings.py', 'w') as f:
        f.write("output_path = '{}'".format(path))
    print("Previous Content:\n", prev_content, "\n\n",
          "New Content:\n", "output_path = '{}'".format(path))
    return None
    
update_output_path('/mnt/user/drive/My Libraries/My Library/results/')

### Execute the pipeline
The pipeline is run by simply calling the `snakemake` command as in the cell below. When called in the top-level, folder the whole pipeline with all stages is executed. In order to just run a single stage, you can navigate to that subfolder (e.g. `os.chdir('stage03_trigger_detection'`) and call the `snakemake` command there, but with also explicitly specifying the configfile (`--configfile='configs/config_IDIBAPS.yaml`).

_Due to memory constraints this example run is downscaled to the first 10s of the recording (`T_STOP = 10` in stage01_data_entry/config.yaml). If the memory is exceeded it will cause a `bus error`._ 
You can look at the results of the full dataset in the results folder in the collab drive. 

In [None]:
print('Working directory: ', os.getcwd())
!snakemake --cores=1

To run another dataset you can either edit the `PROFILE` parameter in the pipeline config file or directly in the command line (`!snakemake --config PROFILE=LENS --cores=1`).

### Investigate Results

You can navigate to your output location which you set above to inspect the results, using either the Jupyter Hub or in collab drive interface.

The precomputed results of the full datasets are stored in the collab drive under _results/_.

To run another dataset you can either edit the `PROFILE` parameter in the pipeline config file or directly in the command line (`!snakemake --config PROFILE=LENS --cores=1`).

### Provide feedback, report bugs

If you encounter any difficulties please report them on the project's GitHub site, to help us further improving the analysis pipeline: https://github.com/INM-6/wavescalephant/issues