Avoid intermediate recording.dat file for SpikeGLXRecordingExtractor

Hi all, 

We are sorting enormous datasets for which the extra `recording.dat` binary copy becomes a serious issue, both in terms of speed and disk usage.
I see [here](https://github.com/SpikeInterface/spikeinterface/blob/37d40336cce3185acda29d76e5c90a7a970226d9/spikeinterface/sorters/kilosortbase.py#L120) that this step can be bypassed under certain conditions.

The `recording.dat` file is generated anyways when sorting a (single, non concatenated) SpikeGLXRecordingExtractor. Is this really needed?? My understanding was that it is common practice to use kilosort directly on the output of SGLX data preprocessed with CatGT (eg in[ Jennifer Colonel's pipeline ](https://github.com/jenniferColonell/ecephys_spike_sorting/tree/master/ecephys_spike_sorting/modules/kilosort_helper)) 

Also, since we're at it. One of the conditions to bypass recording.dat is that there is no file concatenation. Does anyone have an idea whether there's a fundamental reason for kilosort doesn't accept multiple bin files to concatenate on the fly during preprocessing?

To give a bit more context: currently we can get up to 4 copies of the same data at the same time:
- raw files
- preprocessing with catgt of each contiguous segment of data
- intermediate copy recording.dat that does pretty much nothing besides concatenation
- temp_wh.dat with preprocessed/drift corrected data
This wasn't so much of an issue in term of disk space because # 2 and # 3 are deleted after sorting. But now that we'd like to sort longer recordings having 4 copies of the data at the same time is becoming too much.

Thanks ! Tom

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid intermediate recording.dat file for SpikeGLXRecordingExtractor #1010

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid intermediate recording.dat file for SpikeGLXRecordingExtractor #1010

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions