-
Notifications
You must be signed in to change notification settings - Fork 222
Description
Hi all,
We are sorting enormous datasets for which the extra recording.dat
binary copy becomes a serious issue, both in terms of speed and disk usage.
I see here that this step can be bypassed under certain conditions.
The recording.dat
file is generated anyways when sorting a (single, non concatenated) SpikeGLXRecordingExtractor. Is this really needed?? My understanding was that it is common practice to use kilosort directly on the output of SGLX data preprocessed with CatGT (eg in Jennifer Colonel's pipeline )
Also, since we're at it. One of the conditions to bypass recording.dat is that there is no file concatenation. Does anyone have an idea whether there's a fundamental reason for kilosort doesn't accept multiple bin files to concatenate on the fly during preprocessing?
To give a bit more context: currently we can get up to 4 copies of the same data at the same time:
- raw files
- preprocessing with catgt of each contiguous segment of data
- intermediate copy recording.dat that does pretty much nothing besides concatenation
- temp_wh.dat with preprocessed/drift corrected data
This wasn't so much of an issue in term of disk space because # 2 and # 3 are deleted after sorting. But now that we'd like to sort longer recordings having 4 copies of the data at the same time is becoming too much.
Thanks ! Tom