# Preprocessing pipeline: Spike preprocessing

Loads extracted spike trains (in toposample format; from step 1), computes filtered spike signals (e.g., required for reliability) and firing rates, and stores them into a .h5 data store.

__Preprocessing pipeline overview:__
1. Extract & cut spike trains [`run_spike_extraction.ipynb`]
2. Compute filtered spike signals [this notebook]
3. Compute firing rates [this notebook]


In [3]:
import sys
sys.path.append('../../../library')

In [4]:
# Settings
syn_class = 'EXC'  # 'EXC', 'INH', 'ALL'
num_sims = 10  # Number of simulations (extracted spike files) in working_dir
working_dir = '/path/to/working_dir'

### 2. Compute filtered spike signals

Runs preprocessing (filtering, w/o mean-centering) of (excitatory) spike trains [PARALLEL IMPLEMENTATION]

In [5]:
from preprocess import run_preprocessing, merge_into_h5_data_store

In [7]:
# Run preprocessing
spike_file_names = [f'raw_spikes_{syn_class.lower()}_{idx}.npy' for idx in range(num_sims)]

run_preprocessing(working_dir, spike_file_names, sigma=10.0, syn_class=syn_class, pool_size=10)

Finished preprocessing in 194.994s


In [8]:
# Merge individual preprocessed spike files into .h5 data store
#   split_by_gid==False ... Datasets per sim
#   split_by_gid==True  ... Datasets per sim & GID
tmp_file_names = [f'spike_signals_{syn_class.lower()}_{idx}__tmp__.npz' for idx in range(num_sims)]

h5_file = merge_into_h5_data_store(working_dir, tmp_file_names, data_store_name='processed_data_store', split_by_gid=False, syn_class=syn_class)

100%|██████████| 30/30 [22:24<00:00, 44.82s/it]


INFO: 30 files merged into "/gpfs/bbp.cscs.ch/data/scratch/proj9/bisimplices/bbp_workflow/ce776698-d3c9-468f-8714-92407570b292/working_dir_all/processed_data_store.h5"


### `OR` 2b. Create empty data store without filtered spike signals

In [6]:
from preprocess import create_empty_data_store

In [7]:
create_empty_data_store(working_dir, data_store_name='processed_data_store')

### 3. Compute firing rates

Runs firing rate extraction based on mean inverse inter-spike interval of (excitatory) spike trains [PARALLEL IMPLEMENTATION]

In [8]:
from preprocess import run_rate_extraction, merge_rates_to_h5_data_store

In [None]:
# Run rate extraction
spike_file_names = [f'raw_spikes_{syn_class.lower()}_{idx}.npy' for idx in range(num_sims)]

run_rate_extraction(working_dir, spike_file_names, syn_class=syn_class, pool_size=30)

In [10]:
# Merge individual rate files into .h5 data store
tmp_file_names = [f'firing_rates_{syn_class.lower()}_{idx}__tmp__.npz' for idx in range(num_sims)]

h5_file = merge_rates_to_h5_data_store(working_dir, tmp_file_names, data_store_name='processed_data_store', do_overwrite=False)

100%|██████████| 30/30 [00:00<00:00, 564.38it/s]

INFO: 30 files merged and added to "/gpfs/bbp.cscs.ch/data/scratch/proj9/bisimplices/bbp_workflow/ce776698-d3c9-468f-8714-92407570b292/working_dir_all/processed_data_store.h5"





__HOW TO LOAD PROCESSED SPIKE SIGNALS, META-INFO, AND RATES FROM .H5 DATA STORE:__
~~~
h5_store = h5py.File(h5_file, 'r')
print(f'Groups/Datasets: {list(h5_store.keys())}')

t_bins = np.array(h5_store['t_bins'])
gids = np.array(h5_store['gids'])
firing_rates = np.array(h5_store['firing_rates'])
sigma = np.array(h5_store['sigma']).tolist()
if 'mean_centered' in h5_store:  # [Backward compatibility]
    mean_centered = np.array(h5_store['mean_centered']).tolist()
else:
    mean_centered = False

print(f'Spike signals per sims: {list(h5_store["spike_signals_exc"].keys())}')
if split_by_gid == True:
    print(f'Spike signals within sim <SIM_IDX>: {list(h5_store["spike_signals_exc/sim_<SIM_IDX>"].keys())}')
    spike_signal = np.array(h5_store[f'spike_signals_exc/sim_{<SIM_IDX>}/gid_{<GID>}'])
else:
    spike_signals = np.array(h5_store[f'spike_signals_exc/sim_{<SIM_IDX>}'])

h5_store.close()
~~~