# KJM ECoG Fingerflex Dataset

Two of the datasets we have prepared for the workshop come from Kai Miller's ECoG data repository.
* [Miller, KJ. A library of human electrocorticographic data and analyses. Nature Human Behaviour, 2019](https://www.nature.com/articles/s41562-019-0678-3)
* [Direct Data Repository Link](https://searchworks.stanford.edu/view/zk881ps0522)

If you want the full raw data from the entire repository then you can use the scripts in this workshop's data/kjm_ecog folder.
If you are satisfied using somewhat preprocessed data then you can proceed below and the data will be downloaded on demand.

In this notebook we will explore the "Fingerflex" study data.
The data originally appeared in the [manuscript](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002655)
Human Motor Cortical Activity Is Selectively Phase-Entrained on Underlying Rhythms published in PLoS Computational Biology in 2012.

A portion of the dataset composes the [BCI Competition IV Dataset 4](http://www.bbci.de/competition/iv/#dataset4).


## Prepare the notebook
The next cell is to normalize local environments and Google Colaboratory environments.
If you're running this on Colab, then a widget will appear asking you to upload your [kaggle API auth token](https://www.kaggle.com/docs/api#authentication).
This is normally found in `~/.kaggle/kaggle.json` where `~` is your Home directory. You may need to make visible
hidden files or folders before you can find the file in explorer.
If you're running this locally then please make sure the kaggle.json file is in the expected location.

In [None]:
from pathlib import Path
import os
import matplotlib.pyplot as plt


try:
    from google.colab import files
    %tensorflow_version 2.x
    os.chdir('..')
    
    if not (Path.home() / '.kaggle').is_dir():
        # Configure kaggle
        uploaded = files.upload()  # Find the kaggle.json file in your ~/.kaggle directory.
        if 'kaggle.json' in uploaded.keys():
            !mkdir -p ~/.kaggle
            !mv kaggle.json ~/.kaggle/
            !chmod 600 ~/.kaggle/kaggle.json
            
    if Path.cwd().stem != 'IntracranialNeurophysDL':
        if not (Path.cwd() / 'IntracranialNeurophysDL').is_dir():
            # Download the workshop repo and change to its directory
            !git clone --recursive https://github.com/SachsLab/IntracranialNeurophysDL.git
        os.chdir('IntracranialNeurophysDL')
    
    !pip install -q kaggle
    plt.style.use('dark_background')
    IN_COLAB = True
    
except ModuleNotFoundError:
    IN_COLAB = False
    import sys
    if Path.cwd().stem == 'notebooks':
        os.chdir(Path.cwd().parent)
    # Make sure the kaggle executable is on the PATH
    os.environ['PATH'] = os.environ['PATH'] + ';' + str(Path(sys.executable).parent / 'Scripts')

# Try to clear any logs from previous runs
if (Path.cwd() / 'logs').is_dir():
    import shutil
    try:
        shutil.rmtree(str(Path.cwd() / 'logs'))
    except PermissionError:
        print("Unable to remove logs directory.")
        

import numpy as np
import tensorflow as tf
from indl import turbo_cmap, reset_keras
plt.rcParams.update({
    'axes.titlesize': 24,
    'axes.labelsize': 20,
    'lines.linewidth': 3,
    'lines.markersize': 10,
    'xtick.labelsize': 16,
    'ytick.labelsize': 16,
    'legend.fontsize': 18
})
%load_ext autoreload
%autoreload 2

## Download Data
Run the next cell to download and unzip the data. Compared to the original data from the data repository,
the files in the Kaggle Dataset have had some additional preprocessing.

The downloaded data contain 3 files for each of the 9 subjects:
* subname_full.h5 - Bad channel removal, common average referencing, 60 Hz notch filter.
* subname_segs.h5 - Same as _full but the data have been segmented from -0.75 to +0.75 s around the onset of (human-coded) finger-flexion events.
* subname_bp.h5 - _full with spectral whitening, filterbank (6th order Butter filtfilt) at alpha, beta, and broadband frequencies,
power in each band downsampled to 20 Hz, zscored, segmented as in _segs.

In [None]:
# Download and unzip data
datadir = Path.cwd() / 'data' / 'kjm_ecog'
if not (datadir / 'converted').is_dir():
    !kaggle datasets download --unzip --path {str(datadir / 'converted' / 'fingerflex')} cboulay/kjm-ecog-fingerflex
    print("Finished downloading and extracting data.")
else:
    print("Data directory found. Skipping download.")

## Import one subject's data
The data have been stored as HDF5 files and thus can be imported directly into Python.
However, to make your life a little easier, we wrote a small function that makes importing slightly friendlier.

In [None]:
from data.utils.fileio import load_fingerflex


SUB_ID = 'cc'
X, Y, ax_info, behav = load_fingerflex(datadir, SUB_ID, feature_set='full', event_set='Stim')
tvec = ax_info['timestamps']
srate = ax_info['fs']
print(f"Found {len(tvec)} timestamps ({tvec[0]} to {tvec[-1]} at {srate} Hz), {X.shape[-1]} chans")