Dask provides multi-core and distributed parallel execution on larger-than-memory datasets.

We can think of dask at a high and a low level

*  **High level collections:**  dask provides high-level Array, Bag, and DataFrame
   collections that mimic NumPy, lists, and Pandas but can operate in parallel on
   datasets that don't fit into memory.  Dask's high-level collections are
   alternatives to NumPy and Pandas for large datasets. Dask functions are 
   alternative to Spark and MapReduce.
   
*  **Low Level schedulers:** dask provides dynamic task schedulers that
   execute task graphs in parallel.  These execution engines power the
   high-level collections mentioned above but can also power custom,
   user-defined workloads.  These schedulers are low-latency (around 1ms) and
   work hard to run computations in a small memory footprint.  Dask's
   schedulers are an alternative to direct use of `threading` or
   `multiprocessing` libraries in complex cases or other task scheduling
   systems like `Luigi` or `IPython parallel`.
  
<img src="collections-schedulers.png" align="center" width="60%">

We are proposing to use only scheduler, but in future we can incorporate MapReduce functionality of dask in mne-python. 


## Dependencies

Dask need the following core libraries

    conda install numpy pandas h5py Pillow matplotlib scipy toolz pytables fastparquet

Install dask distributed

    conda install dask distributed

The following is useful for task graph visualization

    conda install graphviz

We also need glob for reading directory contents

     conda install glob

In [6]:
%matplotlib inline
import mne
import glob
import timeit
import numpy as np
from dask import delayed
from mne.time_frequency import psd_multitaper

In [7]:
# Reading Directory Structure
fif_files_path = '/autofs/cluster/fusion/Sheraz/data/camcan/camcan47' \
'/cc700/meg/pipeline/release004/data_nomovecomp' \
'/aamod_meg_maxfilt_00001/*/rest/transdef_mf2pt2_rest_raw.fif'

files = glob.glob(fif_files_path)

In [8]:
print(files)

['/autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC710566/rest/transdef_mf2pt2_rest_raw.fif', '/autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC620413/rest/transdef_mf2pt2_rest_raw.fif', '/autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC620005/rest/transdef_mf2pt2_rest_raw.fif', '/autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC210526/rest/transdef_mf2pt2_rest_raw.fif', '/autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC320680/rest/transdef_mf2pt2_rest_raw.fif', '/autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC410015/rest/transdef_mf2pt2_rest_raw.fif

In [62]:
print(len(files))

650


In [None]:
%%time
# Sequential code
psds = []
start_time = timeit.default_timer()
for file in files[0:2]:
    raw = mne.io.read_raw_fif(file, preload=True)
    raw.crop(50, 150)
    picks = mne.pick_types(raw.info, meg='mag', eeg=False, 
                           eog=False, stim=False)
    psd, freqs = psd_multitaper(raw, fmin=2, fmax=55, 
                                n_jobs=1, picks=picks, normalization="full")
    psds.append(10 * np.log10(psd))
mean_psd = np.mean(psds)
time_elpased = timeit.default_timer() - start_time
print(time_elpased)
    

Opening raw data file /autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC710566/rest/transdef_mf2pt2_rest_raw.fif...
    Range : 47000 ... 613999 =     47.000 ...   613.999 secs
Ready.
Current compensation grade : 0
Reading 0 ... 566999  =      0.000 ...   566.999 secs...


In [54]:
mne.io.read_raw_fif?

In [56]:
file


'/'

In [57]:
files[0]

'/autofs/cluster/fusion/Sheraz/data/camcan/camcan47/cc700/meg/pipeline/release004/data_nomovecomp/aamod_meg_maxfilt_00001/CC710566/rest/transdef_mf2pt2_rest_raw.fif'

In [4]:
result = %timeit -o

In [5]:
result

In [9]:
start_time = timeit.default_timer()

1507329700.504068

In [15]:
psds[0].shape

(104, 30052)

In [16]:
psd_multitaper?