Skip to content

Latest commit

 

History

History
72 lines (48 loc) · 4.03 KB

File metadata and controls

72 lines (48 loc) · 4.03 KB

Batch Processing Scripts

Each module can be run on its own using the following syntax:

python -m ecephys_spike_sorting.modules.<module name> --input_json <path to input json> --output_json <path to output json>

However, you'll typically want to run several modules in order, iterating over multiple sets of input files. The scripts in this directory provide examples of how to implement batch processing, as well how to auto-generate the required JSON files containing module parameters.

Getting Started

The first thing you'll want to do is edit create_input_json.py. The input JSON file tells each module where to find the required files, where to save its outputs, and what parameters to use for processing. We've tried to avoid hard-coding any paths or file names within the modules themselves (with the exception of the names of files generated by Kilosort).

The createInputJson function has one required input argument, the location for writing a JSON file. After that, at least one of three directories must be specified:

  1. npx_directory: The directory containing NPX files saved by Open Ephys (if you're starting processing with the extract_from_npx module). If there are multiple NPX files, they will be concatenated together.
  2. extracted_data_directory: The top-level directory containing Neuropixels continuous files in .bin or .dat format (if you're using the depth_estimation or median_subtraction modules), e.g.:
    extracted_data_directory
    |
    ├── continuous
    |   |   
    |   ├── Neuropix-3a-100.0
    |   |   └── continuous.dat (AP band file)
    |   |
    |   └── Neuropix-3a-100.1
    |       └── continuous.dat (LFP band file)
    |
    └── events
        └── ...
      
    
  3. kilosort_output_directory: The directory containing the AP band .dat or .bin file, and potentially the .npy files saved by Kilosort. This is required for running the kilosort_helper, kilosort_postprocessing, noise_templates, mean_waveforms, or quality_metrics modules.

You can also specify the Neuropixels probe_type ('3A', '3B1', or '3B2'), because the reference channels will differ depending on which one you're using.

createInputJson contains a dictionary entry for each module's parameters, as well as four entries for parameters that span modules. The default implementation contains many assumptions about file locations that are specific to the Allen Institute, so make sure that these match what's on your system. You only need to update the parameters for modules that you're actually going to use.

Documentation on input parameters can be found in the _schemas.py file for each module, as well as in schemas.py in the "common" directory.

Once you've updated the parameters dictionary, you can edit batch_processing.py. Here, you'll want to update the list of directories containing files to process and the location where JSON files can be saved. Finally, comment out the names of the modules you don't want to use.

Then, you can run the script using pipenv (assuming you've already created a pipenv virtual environment based on the steps in the main README file):

Linux / macOS

    $ pipenv shell
    (ecephys_spike_sorting) $ python ecephys_spike_sorting/scripts/batch_processing.py
    (ecephys_spike_sorting) $ exit

Windows

    $ pipenv shell
    (.venv) $ python ecephys_spike_sorting\scripts\batch_processing.py
    (.venv) $ exit

Available Scripts

batch_processing.py provides the basic framework for running multiple modules on a set of input files. We recommend starting with this one.

batch_procssing_serial.py has similar functionality, but also includes functions for backing up data.

batch_processing_parallel.py is more complex, and makes it possible to run modules simultaneously on multiple datasets.

batch_processing_gui.py is a work-in-progress UI for launching spike sorting after an experiment finishes. It still needs to be cleaned up and documented, so use it at your own risk.