Each module can be run on its own using the following syntax:
python -m ecephys_spike_sorting.modules.<module name> --input_json <path to input json> --output_json <path to output json>
However, you'll typically want to run several modules in order, iterating over multiple sets of input files. The scripts in this directory provide examples of how to implement batch processing, as well how to auto-generate the required JSON files containing module parameters.
The first thing you'll want to do is edit create_input_json.py
. The input JSON file tells each module where to find the required files, where to save its outputs, and what parameters to use for processing. We've tried to avoid hard-coding any paths or file names within the modules themselves (with the exception of the names of files generated by Kilosort).
The createInputJson
function has one required input argument, the location for writing a JSON file. After that, at least one of three directories must be specified:
npx_directory
: The directory containing NPX files saved by Open Ephys (if you're starting processing with theextract_from_npx
module). If there are multiple NPX files, they will be concatenated together.extracted_data_directory
: The top-level directory containing Neuropixels continuous files in.bin
or.dat
format (if you're using thedepth_estimation
ormedian_subtraction
modules), e.g.:extracted_data_directory | ├── continuous | | | ├── Neuropix-3a-100.0 | | └── continuous.dat (AP band file) | | | └── Neuropix-3a-100.1 | └── continuous.dat (LFP band file) | └── events └── ...
kilosort_output_directory
: The directory containing the AP band.dat
or.bin
file, and potentially the.npy
files saved by Kilosort. This is required for running thekilosort_helper
,kilosort_postprocessing
,noise_templates
,mean_waveforms
, orquality_metrics
modules.
You can also specify the Neuropixels probe_type
('3A', '3B1', or '3B2'), because the reference channels will differ depending on which one you're using.
createInputJson
contains a dictionary entry for each module's parameters, as well as four entries for parameters that span modules. The default implementation contains many assumptions about file locations that are specific to the Allen Institute, so make sure that these match what's on your system. You only need to update the parameters for modules that you're actually going to use.
Documentation on input parameters can be found in the _schemas.py
file for each module, as well as in schemas.py
in the "common" directory.
Once you've updated the parameters dictionary, you can edit batch_processing.py
. Here, you'll want to update the list of directories containing files to process and the location where JSON files can be saved. Finally, comment out the names of the modules you don't want to use.
Then, you can run the script using pipenv
(assuming you've already created a pipenv virtual environment based on the steps in the main README file):
$ pipenv shell
(ecephys_spike_sorting) $ python ecephys_spike_sorting/scripts/batch_processing.py
(ecephys_spike_sorting) $ exit
$ pipenv shell
(.venv) $ python ecephys_spike_sorting\scripts\batch_processing.py
(.venv) $ exit
batch_processing.py
provides the basic framework for running multiple modules on a set of input files. We recommend starting with this one.
batch_procssing_serial.py
has similar functionality, but also includes functions for backing up data.
batch_processing_parallel.py
is more complex, and makes it possible to run modules simultaneously on multiple datasets.
batch_processing_gui.py
is a work-in-progress UI for launching spike sorting after an experiment finishes. It still needs to be cleaned up and documented, so use it at your own risk.