Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run KS (any version) to extract data from very long recordings #405

Closed
brendonw1 opened this issue May 21, 2021 · 26 comments · Fixed by #595
Closed

How to run KS (any version) to extract data from very long recordings #405

brendonw1 opened this issue May 21, 2021 · 26 comments · Fixed by #595

Comments

@brendonw1
Copy link

We would like to do multi-day recordings, but these recordings mean that we exceed available memory. I am wondering if someone can explain how we might run KS on such long recordings to still extract spikes.
In 2019, in issue 135 (#135), we raised this. However, now I see in run_Templates.m that there is a new approach taken:

% update: these recommendations no longer apply for the datashift version!

Can someone explain whether and how we can run spike detection on a long-duration recordings (previously with the idea of detecting on a sub-set of data and then finding in the full data)?

Thanks so much
Brendon Watson

@WeissShahaf
Copy link

what excatly is failing? vRAM? system RAM? storage capacity?
i've been running 17 hr recordings without issue.

@brendonw1
Copy link
Author

brendonw1 commented Jun 13, 2021 via email

@WeissShahaf
Copy link

haven't tried that. why not chunk it into separate days?
what is the error that you get?

@brendonw1
Copy link
Author

Thank you so much. We are trying to track the same neurons, whenever possible, over days. If we chunk it, I assume we lose that ability. Am I right there? Does the algo somehow compensate for multiple separate sortings?

@grahamfindlay
Copy link

Hi Brendon,

We (@TomBugnon) have successfully run KS2.5+ on 24h recordings. IIRC, it takes ~24h on a Tesla V100 32GB, but Tom may correct me on that. It is worth noting that this GPU, while very expensive (~$9k), is far from optimal as far as Kilosort is concerned. I expect that (slightly) less expensive (~$6k) RTX would be optimal for sorting very long recordings in one go. Unfortunately, NVidia prevents you from putting these "consumer grade" GPUs into a "commercial grade" rackmount servers, so you've got to build a roaring PC desktop and use that.

I do not know of an accepted method or tool for reconciling the outputs of two or more separate sortings, which would allow you to sort shorter recordings independently and combine the results post-hoc. I have always assumed that the longer you can sort in one shot the better, but I imagine that the marginal benefits of doing this (vs. post-hoc reconciliation of sorting outputs) decreases with recording length.

Currently, Kilosort's method of drift correction seems to perform poorly when applied to very long recordings, and @TomBugnon is working on fixing that. The recently published approach here seems very promising, but I do not think that their code is ready for use.

Hope that is helpful!

@TomBugnon
Copy link

TomBugnon commented Jun 15, 2021

Hi @brendonw1
As far as I know there is no built-in way in kilosort to combine results post-hoc. However Allen Institute's spike sorting package contains a module to do just that here or here for Jennifer Colonell's SpikeGLX-friendly fork

It looks like these are not maintained and I haven't tried either but I would be curious to hear how it performs.

Also, I would guess that some kind of correction for day-to-day drift might be required. I reckon in the neuropixel 2.0 paper they concatenated the raw data and then used this piece of code to register both days to each other.

Some integrated package to handle this would be super useful I think...

@brendonw1
Copy link
Author

Awesome guys!
Thank you @TomBugnon and @grahamfindlay and @WeissShahaf for so much info.
I will try to dig into the methods out there with or without KS. One idea is maybe we just do multiple 12 or 24 hour sessions and run our various metrics on each one separately and see how that goes. We are lucky that we are mostly looking for global population stats rather than truly single cells in our intial analyses. So we'll start with just that.

Thanks so much!

@TomBugnon
Copy link

Sure! For the record, if anyone actually ends up working on tracking units across days (which will happen somewhere sometime, I guess) I'd be interested in getting in touch. Cheers

@brendonw1
Copy link
Author

Me too!

@achristensen56
Copy link

achristensen56 commented Jul 26, 2021

me too! I'd love to be kept in the loop of any updates or progress people make on this issue. @TomBugnon, have you tried using that flag in the KS code? I saw it a while ago but it wasn't obvious to me how to use it. This might be a silly question but do you know what units the midpoints flag is in?
The registration split is:

%register the first block
[imin1, F1] = align_block(F(:, :, 1:ops.midpoint));
% register the second block as usual
[imin2, F2] = align_block(F(:, :, ops.midpoint+1:end));

F is defined as:

F = zeros(dmax, 20, Nbatches);

where Nbatches is a KS default (I believe it's calculated as:)
Nbatch = ceil(bytes / 2NchanTOTNT);

so in this case would I keep track of how many batches are in each file and set the midpoint as the number of batches in the first file, when I concatenate the files? Curious if you've tried this... I concatenate the files in Matlab so I definitely could set that flag after reading the first file...

edit: I tried this, and it didn't qualitatively change my (poor) aligned results.

@brendonw1
Copy link
Author

brendonw1 commented Jul 28, 2021 via email

@alirezasaeedi1988
Copy link

alirezasaeedi1988 commented Sep 20, 2021

Hello there,
Just wanted to update you on this.
I have 26 days (2 or 3 hours a day) of recording over 2 months with a 32 channel electrode. I've merged them into a big binary file (270 GB). Then, I've fed it into KiloSort 2 and it worked with a little bit of playing around with the batch size. :)
I'm using NVIDIA Quadro p4000 which has 8 GB of memory.

PS. I've commented splitting part in the code!

@brendonw1
Copy link
Author

brendonw1 commented Sep 24, 2021 via email

@TomBugnon
Copy link

TomBugnon commented Sep 24, 2021

Hi Brandon, I think a convenient way to stitch recordings is using spikeinterface with something like this (hasty copypaste for inspiration):

import spikeinterface.extractors as se
import spikeinterface.sorters as ss
from pathlib import Path

binpaths = ...
output_dir = Path(...)
sorter = 'kilosort2_5'

rec_extractors = [
        se.SpikeGLXRecordingExtractor(binpath)
        for binpath in binpaths
    ]
 
if len(rec_extractors) == 1:
    recording = rec_extractors[0]
else:    
    recording = se.MultiRecordingTimeExtractor(rec_extractors)

# Must add chan locs 
recording.set_channel_locations(...)
 
ss.run_sorter(
    sorter,
    recording,
    output_folder=output_dir,
    verbose=True,
    **params
)

# Delete useless `recording.dat` file copied by spikeinterface to ks otput dir
rec_path = output_dir/'recording.dat'
if clean_dat_file and rec_path.exists():
    rec_path.unlink()

How to distinguish units that were "lost" by kilosort because of drift and should be merged from those that are lost for physiological/mecanical reasons doesn't seem like an easy-to-settle question and I'd be curious to hear thoughts on this.

@brendonw1
Copy link
Author

brendonw1 commented Sep 24, 2021 via email

@TomBugnon
Copy link

With this approach Kilosort treats the concatenated data as a single dataset (and is ignorant of where the concatenation happens). This is likely not ideal (notably for the drift correction)

@brendonw1
Copy link
Author

brendonw1 commented Sep 24, 2021 via email

@TomBugnon
Copy link

no no sorry I was unclear:
The data is concatenated before being passed to kilsoort. Kilsoort runs on the concatenated data as if it was a continuous dataset

@brendonw1
Copy link
Author

brendonw1 commented Sep 24, 2021 via email

@alirezasaeedi1988
Copy link

alirezasaeedi1988 commented Sep 24, 2021

Hi @brendonw1

in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups!
But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems.
with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)

best,
Alireza

@brendonw1
Copy link
Author

brendonw1 commented Sep 26, 2021 via email

@StringXD
Copy link

StringXD commented Jun 4, 2023

Hi, I want to know the criteria to make sure the recorded single unit belongs to the same neuron throughout the recording (>4 hours), and I believe there will be some drift during recording. It will be better if there is any reference paper to follow. Thanks very much!

@papannon
Copy link

Hi @brendonw1

in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)

best, Alireza

Hey @alirezasaeedi1988,
Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".)
I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted.
Thanks a lot!

@brendonw1
Copy link
Author

brendonw1 commented Aug 10, 2023 via email

@alirezasaeedi1988
Copy link

alirezasaeedi1988 commented Aug 16, 2023

Hi @StringXD

While Kilosort identifies and clusters waveforms across different days, it doesn't necessarily confirm that this always reflects reality. (I got a similar comment for my paper)

Several prior studies have made similar assertions, including Tolias et al. in JNeurophysiol 2007, McMahon et al. in PNAS 2014, Okun et al. in PLoS ONE 2016, Schoonover et al. in Nature 2021, and Steinmetz et al. in Science 2021. These studies have made considerable effort to confirm that the same neurons are being tracked using various methods.

In summary, after spike sorting, you can extract spikes from different time-window (which can vary depending on the experiment) and compare the waveform properties of each cluster with all the clusters (including itself) in the next time window by defining some similarity indices. If the neuron has been tracked correctly, it should best match itself in successive time windows.

@alirezasaeedi1988
Copy link

Hey @alirezasaeedi1988, Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".) I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted. Thanks a lot!


Hi @papannon

I am not familiar with np.memmap(), but if the concatenation is properly done, the rest should not be very challenging (considering size of your .dat file).
Have you tried to change ops.NT in the config file?
What GPU do you have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants