# SpikeInterface Advanced Tutorial - April 2022

## PART1: SpikeInterface Hidden Features


### Topics

1. ProbeInterface: probe handling and more
2. Working with segments and recordings
3. Lazy processing explained
4. WaveformExtractor and WaveformExtractorExtension
5. Parallelization - working with job tools
6. Save formats: binary / memory / zarr

## 1) ProbeInterface: probe handling and more


[`ProbeInterface`](https://github.com/SpikeInterface/probeinterface) is a SpikeInterface twin project to abstract and describe probe objects (you can read more about it [here](https://doi.org/10.3389/fninf.2022.823056)).

It provides several ways to create, read, and download probe information. It goes without saying...it nicely integrates with SpikeInterface! 

Here's how:

In [None]:
import spikeinterface.full as si
import probeinterface as pi

import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

%matplotlib widget

### 1.1 Create a probe from scratch

In [None]:
num_channels = 32

positions = np.zeros((num_channels, 2))
radius = 100
for i in range(num_channels):
    theta = i / num_channels * 2*np.pi 
    x = np.cos(theta) * radius
    y = np.sin(theta) * radius
    positions[i] = [x, y]
    
probe = pi.Probe(ndim=2)

In [None]:
probe.set_contacts(positions, shapes="rect", shape_params={"width": 4, "height": 10})
probe.set_contact_ids([f"Ch{i}" for i in range(num_channels)])
probe.create_auto_shape()

In [None]:
pi.plotting.plot_probe(probe, with_contact_id=True)

In [None]:
probe.to_dataframe(complete=True)

### Load probe to recording

In [None]:
rec, _ = si.toy_example(num_channels=num_channels)

In [None]:
rec.set_probe(probe)

We have to add the wiring!!!

In [None]:
probe.set_device_channel_indices(np.random.permutation(num_channels))

In [None]:
probe.to_dataframe(complete=True)

In [None]:
rec.set_probe(probe, in_place=True)

In [None]:
probe_loaded = rec.get_probe()
probe_loaded.to_dataframe(complete=True)

`device_channel_indices` are internally sorted!

In [None]:
rec.get_channel_ids()

In [None]:
si.plot_probe_map(rec, with_channel_ids=True)

Now when we slice the recording, the probe is also sliced accordingly:

In [None]:
rec_slice = rec.channel_slice(channel_ids=rec.channel_ids[::4])
print(rec_slice.channel_ids)

si.plot_probe_map(rec_slice, with_channel_ids=True)

### 1.2 Download probe from probe library

We can also download probes from the [probeinterface library](https://gin.g-node.org/spikeinterface/probeinterface_library)


First let's get the binary data from [Zenodo](https://zenodo.org/record/4657314) and load the recording:

In [None]:
# file path
recording_file = 'cambridge_data.bin'

# parameters to load the bin/dat format
num_channels = 64
sampling_frequency = 20000
gain_to_uV = 0.195
offset_to_uV = 0
dtype="int16"
time_axis = 1

In [None]:
recording_cambridge = si.read_binary(recording_file, num_chan=num_channels, sampling_frequency=sampling_frequency,
                                     dtype=dtype, gain_to_uV=gain_to_uV, offset_to_uV=offset_to_uV, 
                                     time_axis=time_axis)


In [None]:
manufacturer = 'cambridgeneurotech'
probe_name = 'ASSY-156-P-1'

probe_cambridge = pi.get_probe(manufacturer, probe_name)
print(probe_cambridge)

For the wiring, we can use automatic pathways that describe several widely used connectors + headstages:

In [None]:
probe_cambridge.to_dataframe(complete=True)

In [None]:
pi.get_available_pathways()

In [None]:
probe_cambridge.wiring_to_device('ASSY-156>RHD2164')
probe_cambridge.to_dataframe(complete=True)

In [None]:
pi.plotting.plot_probe(probe_cambridge)

In [None]:
recording_cambridge.set_probe(probe_cambridge, group_mode="by_shank", in_place=True)

In [None]:
si.plot_probe_map(recording_cambridge, with_channel_ids=True)

We can now easily split the recording by groups:

In [None]:
rec_split = recording_cambridge.split_by("group")
print(rec_split)

In [None]:
si.plot_probe_map(rec_split[0], with_channel_ids=True)

### 1.3 Automatically load the probe object


Several SI `read_***` functions automatically use `probeinterface` to load the respective probe. 

These include:

- SpikeGLX
- Open Ephys (only using Neuropix-PXI plugin)
- MEArec
- Maxwell
- 3Brain
- NWB

Let's see some examples from the [GIN ephy_testing_data](https://gin.g-node.org/NeuralEnsemble/ephy_testing_data) library:

In [None]:
local_ephy_data = Path("/home/alessio/Documents/data/gin/ephy_testing_data")

spikeglx1_folder = local_ephy_data / "spikeglx" / "Noise4Sam_g0"
spikeglx2_folder = Path("/home/alessio/Documents/data/spikeglx/np2/M136_2021_12_06")

oe_npix1_folder = Path("/home/alessio/Documents/data/allen/npix-open-ephys/605068_2022-02-28_17-00-14/")
oe_npix2_folder = Path("/home/alessio/Documents/data/allen/npix-open-ephys/595262_2022-02-21_15-18-07")


mearec_file = local_ephy_data / "mearec" / "mearec_test_10s.h5"

maxwell_file = local_ephy_data / "maxwell" / "MaxOne_data/Network/000010/data.raw.h5"

threebrain_file = local_ephy_data / "biocam" / "biocam_hw3.0_fw1.6.brw"

In [None]:
rec_spikeglx1 = si.read_spikeglx(spikeglx1_folder, stream_id="imec0.ap")
w = si.plot_probe_map(rec_spikeglx1)
w.ax.set_xlim(-100, 100)
w.ax.set_ylim(300, 600)
w.ax.set_title("SpikeGLX - NP1.0")



rec_spikeglx2 = si.read_spikeglx(spikeglx2_folder, stream_id="imec0.ap")
w = si.plot_probe_map(rec_spikeglx2)
w.ax.set_xlim(-100, 100)
w.ax.set_ylim(3000, 3300)
w.ax.set_title("SpikeGLX - NP2.0")


rec_oe1 = si.read_openephys(oe_npix1_folder, stream_id="0")
w = si.plot_probe_map(rec_oe1)
w.ax.set_xlim(-100, 100)
w.ax.set_ylim(300, 600)
w.ax.set_title("Open Ephys - NP1.0")


rec_oe2 = si.read_openephys(oe_npix2_folder, stream_id="0")
w = si.plot_probe_map(rec_oe2)
w.ax.set_xlim(-100, 100)
w.ax.set_ylim(300, 600)
w.ax.set_title("Open Ephys - NP2.0")


rec_mearec, _ = si.read_mearec(mearec_file)
w = si.plot_probe_map(rec_mearec)
w.ax.set_title("MEArec")


rec_maxwell = si.read_maxwell(maxwell_file)
w = si.plot_probe_map(rec_maxwell)
w.ax.set_title("Maxwell - MaxOne")

rec_biocam = si.read_biocam(threebrain_file)
w = si.plot_probe_map(rec_biocam)
w.ax.set_title("3Brain - Biocam")

## 2) Working with "segments" and "recordings"

A **RECORDING** represent an acquisition from the same session, which basically means that the underlying traces/spiking activity can be assumed to be stationary and channel ids/sampling frequency/dtype the same.

A recording can be made of multiple segment. A **SEGMENT** represents a piece of continuous traces, with a certain number of samples. For example, you hit play/pause multiple times for different conditions/experiments/trials.

In this section we'll quickly cover some utils to manipulate segments and recordings:


In [None]:
oe_folder1 = "/home/alessio/Documents/data/allen/npix-open-ephys/605068_2022-03-02_14-53-18"
oe_folder2 = "/home/alessio/Documents/data/allen/npix-open-ephys/605068_2022-02-28_17-00-14/"
oe_folder3 = "/home/alessio/Documents/data/allen/npix-open-ephys/605641_2022-03-10_15-53-45/"

In [None]:
rec_oe1 = si.read_openephys(folder_path=oe_folder1, stream_id="0")
rec_oe2 = si.read_openephys(folder_path=oe_folder2, stream_id="0")
rec_oe3 = si.read_openephys(folder_path=oe_folder3, stream_id="0")

print(rec_oe1)
print(rec_oe2)
print(rec_oe3)

Let's assume these recordings are from the same animal/same session, so we want to concatenate them:

In [None]:
# append_recordings() appends  the segments
rec_append = si.append_recordings([rec_oe1, rec_oe2, rec_oe3])
print(rec_append)

# concatenate_recordings() concatenate the segments into a single segment
rec_concat = si.concatenate_recordings([rec_oe1, rec_oe2, rec_oe3])
print(rec_concat)

**IMPORTANT**: many functions require a `segment_index` to be specified if a multi-segment object is passed!

In [None]:
print(rec_append.get_num_samples())

In [None]:
for segment_index in range(rec_append.get_num_segments()):
    print(f"Num samples segment {segment_index}: {rec_append.get_num_samples(segment_index=segment_index)}")

Also `SortingExtractor` objects can have multiple segments. For example, let's create one with 5 segments:

In [None]:
_, sort_multi = si.toy_example(num_segments=5)
print(sort_multi)

In [None]:
sort_multi.get_unit_spike_train(unit_id=sort_multi.unit_ids[0])

In [None]:
for segment_index in range(sort_multi.get_num_segments()):
    print(f"Num spikes segment {segment_index} unit {sort_multi.unit_ids[0]}: "
          f"{len(sort_multi.get_unit_spike_train(unit_id=sort_multi.unit_ids[0], segment_index=segment_index))}")

To select single segments from a recording one can:

In [None]:
rec_segments = si.select_segment_recording(rec_append, segment_indices=[1, 2])
print(rec_segments)

Moving forward, we want all objects to support multi-semgment object, so keep them in mind when developing and always add tests for multi-segment objects!

## 3) Lazy processing explained

This is just to recap that all processing objects are **lazy** in SI, meaning that NO OPERATION is performed until the `get_traces()` method is called.

In [None]:
rec = rec_oe1
print(rec)

In [None]:
rec_f = si.bandpass_filter(rec)
print(rec_f)

rec_cmr = si.common_reference(rec_f, reference="local", operator="median")
print(rec_cmr)

Now all these operations will be performed on 10 seconds, 384 channels

In [None]:
fig, axs = plt.subplots(nrows=3, figsize=(10, 7), sharex=True, sharey=True)

si.plot_timeseries(rec, segment_index=0, time_range=[100, 110], ax=axs[0])
si.plot_timeseries(rec_f, segment_index=0, time_range=[100, 110], ax=axs[1])
si.plot_timeseries(rec_cmr, segment_index=0, time_range=[100, 110], ax=axs[2])

In [None]:
rec_cmr.to_dict()

More about these in the next tutorial!

## 4) WaveformExtractor and WaveformExtractorExtension

The `WaveformExtractor` object is the base object for all the post-processing. It is used, among others, to compute PCA projections, quality metrics, extract spike amplitudes, and more. These are all `WaveformExtractorExtension` objects. 

Let's see how they work:

In [None]:
# load some simulated MEArec data
rec, sort = si.read_mearec("/home/alessio/Documents/data/mearec/recordings/recording_Neuropixels-128_900_int16.h5")

In [None]:
wt = si.plot_timeseries(rec, time_range=[100, 110], channel_ids=rec.channel_ids[:20], mode="line")
wr = si.plot_rasters(sort)
wr.ax.set_xlim(100, 110)

Waveforms can be extracted with the `si.extract_waveforms()` function and they are persistent to file. We have to specify an `output_folder` where they are stored:

In [None]:
si.extract_waveforms?

In [None]:
we = si.extract_waveforms(rec, sort, folder="mearec_wf", ms_before=1., ms_after=3.,
                          n_jobs=20, progress_bar=True, chunk_duration="1s")

It is much faster than before (a few seconds for 15 min recording / 128 channels!)

In [None]:
!ls mearec_wf/

In [None]:
!ls mearec_wf/waveforms/

Now let's compute spike amplitudes:

In [None]:
amplitudes = si.compute_spike_amplitudes(we, outputs="by_unit", n_jobs=20, progress_bar=True, chunk_duration="1s")

In [None]:
# return is multi-segment (in this case one segment)
len(amplitudes)

In [None]:
plt.figure()
_ = plt.hist(amplitudes[0][sort.unit_ids[0]], bins=30)

In [None]:
!ls mearec_wf/

Now there is a new `spike_amplitudes` folder with the computed amplitudes.

In [None]:
qm = si.compute_quality_metrics(we)
qm

In [None]:
we.get_available_extension_names()

In [None]:
amp_object = we.load_extension("spike_amplitudes")
qm_object = we.load_extension("quality_metrics")

print(amp_object)
print(qm_object)

Some nice features of the extension classes:

- If we reload the `WaveformExtractor` from folder, extensions are loaded too
- If we `select_units()` from the `WaveformExtractor` object (e.g. autocuration), extension also only copy the data relative to the selected units
- Extensions add new widgets to the `spikeinterface-gui`

In [None]:
# reload waveforms and extension
we_loaded = si.WaveformExtractor.load_from_folder("mearec_wf/")
we_loaded.get_available_extension_names()

In [None]:
# select units from waveform extractor
we_selected = we_loaded.select_units(sort.unit_ids[::10], new_folder="mearec_wf_selected")
qm_selected = we_selected.load_extension("quality_metrics")
qm_selected.get_metrics()

In [1]:
# run si-gui
!sigui mearec_wf/

Found invalid metadata in lib /home/alessio/anaconda3/envs/si/plugins/platforms/libqeglfs.so: Invalid metadata version
Found invalid metadata in lib /home/alessio/anaconda3/envs/si/plugins/platforms/libqminimal.so: Invalid metadata version
Found invalid metadata in lib /home/alessio/anaconda3/envs/si/plugins/platforms/libqminimalegl.so: Invalid metadata version
Found invalid metadata in lib /home/alessio/anaconda3/envs/si/plugins/platforms/libqoffscreen.so: Invalid metadata version
Found invalid metadata in lib /home/alessio/anaconda3/envs/si/plugins/platforms/libqvnc.so: Invalid metadata version
Found invalid metadata in lib /home/alessio/anaconda3/envs/si/plugins/platforms/libqxcb.so: Invalid metadata version
qt.qpa.plugin: Could not find the Qt platform plugin "xcb" in ""
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.



## 5) Parallelization - working with job tools

In [None]:
# job kwargs - ChunkProcessExecutor

# find ptp of traces

## 6)  Save formats: binary / memory / zarr