# Loading data from different formats

Kilosort 4 natively supports data in binary format, `.bin`. The simplest
way to save your data in this format is to load it into memory one chunk at a
time and save it to a `.bin` file using `NumPy's memmap` function. However,
if you aren't comfortable with that process, the `SpikeInterface` package
can load most common electrophysiology formats in a standardized way that makes
it easy to extract the data.

To follow the steps in this notebook, you will first need to install
`SpikeInterface`:
```
    pip install spikeinterface[full]
```

For each data format, `SpikeInterface` has a `read_<format>` utility that loads
the data as a `RecordingExtractor` object, which we can  use to extract the data
and relevant meta information like sampling frequency. The following example
shows the steps for the `Open Ephys` data format. At the bottom of the notebook,
there are notes on how to load several other common formats. For all cells after
the first, all steps should be the same regardless of format.

1. Load the data

In [None]:
from pathlib import Path
import numpy as np
from spikeinterface.extractors import read_openephys

# Specify the path where the data will be copied to, and where Kilosort 4
# results will be saved.
DATA_DIRECTORY = Path('/home/example_path')  # NOTE: You should change this
DATA_PATH = DATA_DIRECTORY / 'data.bin'
# Create filepath if it doesn't exist
DATA_DIRECTORY.mkdir(parents=True, exist_ok=True)
# Specify path to your existing data
filepath = Path(".../Record_Node_101")       # NOTE: You must change this
# Load existing data with spikeinterface
# NOTE: Open Ephys data can have multiple streams, specify `stream_id` to
#       load different ones.
recording = read_openephys(filepath, stream_id='0')

2. Get information about channel count, sampling frequency, etc.

In [None]:
c = recording.get_num_channels()
s = recording.get_num_segments()
fs = recording.get_sampling_frequency()
N = recording.get_total_samples()

3. Create a new binary file and copy the data to it 60,000 samples at a time. Depending on your system's memory, you could increase or decrease the number of samples loaded on each iteration.

In [None]:
y = np.memmap(DATA_PATH, dtype='int16', mode='w+', shape=(N,c))
NT = 60000  # Number of samples to copy at a time

# Copy data to binary file, 60000 samples at a time
# (same as Kilosort's default batch size).
for k in range(s):
    n = recording.get_num_samples(segment_index=k)
    i = 0 + k*NT
    while i < n:
        j = i + NT if (i + NT) < n else n
        t = recording.get_traces(segment_index=k, start_frame=i, end_frame=j)
        y[i:j,:] = t
        y.flush()
        i += NT

4. Verify that the data was copied correctly, once again one chunk at a time. If the data is not matched to within 8 decimal places, this cell will raise an `AssertionError`.

In [None]:
new_y = np.memmap(DATA_PATH, dtype='int16', mode='r', shape=(N,c))
for k in range(s):
    n = recording.get_num_samples(segment_index=k)
    i = 0 + k*NT
    while i < n:
        j = i + NT if (i + NT) < n else n
        t = recording.get_traces(segment_index=k, start_frame=i, end_frame=j)
        assert np.allclose(t, new_y[i:j,:])
        i += NT

At this point, it's a good idea to open the Kilosort gui and check that the
data and probe appear to have been loaded correctly and no settings need to be
tweaked. You will need to input the path to the binary datafile, the folder where
results should be saved, and select a probe file.

```python -m kilosort```

From there, you can either launch Kilosort using the GUI or run the
next notebook cell to run it through the API.

5. Run Kilosort (API)

Note that in this case, we don't actually need to specify a probe since it's
the same as the default Neuropixels 1 configuration. For handling different
probe layouts, provide your own .prb file and/or see the tutorial on creating a
new probe file from scratch.

In [None]:
from kilosort import run_kilosort, default_settings, PROBE_DIR, io
from kilosort.utils import download_probes

# TODO: explain difference between the two channel count settings.
settings = default_settings()
settings['fs'] = fs                      # Sampling rate
settings['NchanTOT'] = c                 # Number of channels
settings['n_chan_bin'] = c               # Number of channels
settings['data_dir'] = DATA_DIRECTORY    # Directory containing binary file.

# NOTE: The only necessary step here is `probe = io.load_probe(probe_path)`.
#       The rest is specific to Kilosort's default probe files.
download_probes()
probe_name = 'neuropixPhase3B1_kilosortChanMap.mat'
probe_path = PROBE_DIR / probe_name
probe = io.load_probe(probe_path)

# This command will both run the spike-sorting analysis and save the results to
# `DATA_DIRECTORY`.
ops, st, clu, tF, Wall, similar_templates, is_ref, est_contam_rate = run_kilosort(
    settings=settings, probe=probe, filename=DATA_PATH
    )

# TODO: explain the variables that are returned

Whether you used the gui or the API, the results can now be browsed in Phy from a terminal with:

```phy template-gui <DATA_DIRECTORY>/kilosort4/params.py```

(replacing DATA_DIRECTORY with the appropriate path)

## Instructions for additional data formats

The following cells demonstrate how to load other dataformats using spikeinterface.
Use these code snippets to modify the first cell of this notebook to work with
different datasets.

See [SpikeInterface's documentation](https://spikeinterface.readthedocs.io/en/latest/modules/extractors.html) for additional details.

MEArec

In [None]:
from spikeinterface.extractors import read_mearec
# NOTE: need to `pip install MEArec`
# Provide path to HDF5 file.
filepath = Path(".../mearec_test_10s.h5")
recording, sorting_true = read_mearec(filepath)

SpikeGLX

In [None]:
# NOTE: You do not need to load SpikeGLX data this way. It is already saved in
#       binary format, so you should just point Kilosort 4 to the .bin file.
from spikeinterface.extractors import read_spikeglx
# Provide path to directory containing .bin file.
filepath = Path(".../TEST_20210920_0_g0/")
recording = read_spikeglx(filepath)

Blackrock

In [None]:
from spikeinterface.extractors import read_blackrock
# Provide path to nsX file, not nev file.
filepath = Path(".../file_spec_3_0.ns6")
recording = read_blackrock(filepath)

Neuralynx

In [None]:
from spikeinterface.extractors import read_neuralynx
# Provide path to directory containing .Ncs file(s).
filepath = Path("C:/code/ephy_testing_data/neuralynx/BML/original_data/")
recording = read_neuralynx(filepath)

NWB

In [None]:
from spikeinterface.extractors import read_nwb_recording
# NOTE: SpikeInterface appears to be picky about which NWB formats it will load.
#       If you encounter issues, please reach out for assistance.
filepath = Path(".../ecephys_tutorial_v2.5.0.nwb")
recording = read_nwb_recording(filepath)

Intan

In [None]:
from spikeinterface.extractors import read_intan
# NOTE: You will need to select the appropriate data stream. If you run without
#       specifying `stream_id`, you will get an error message explaining what
#       each stream corresponds to.
filepath = Path(".../intan_rhs_test_1.rhs")
recording = read_intan(filepath, stream_id='0')