-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run KS (any version) to extract data from very long recordings #405
Comments
what excatly is failing? vRAM? system RAM? storage capacity? |
Thank you.
What about a multi-week recording? Does it run in a way that is able to
handle any length?
…On Sun, Jun 13, 2021 at 8:01 AM WeissShahaf ***@***.***> wrote:
what excatly is failing? vRAM? system RAM? storage capacity?
i've been running 17 hr recordings without issue.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTPRII3N7PUB5LACEJTTSSM2XANCNFSM45IEXCOQ>
.
|
haven't tried that. why not chunk it into separate days? |
Thank you so much. We are trying to track the same neurons, whenever possible, over days. If we chunk it, I assume we lose that ability. Am I right there? Does the algo somehow compensate for multiple separate sortings? |
Hi Brendon, We (@TomBugnon) have successfully run KS2.5+ on 24h recordings. IIRC, it takes ~24h on a Tesla V100 32GB, but Tom may correct me on that. It is worth noting that this GPU, while very expensive (~$9k), is far from optimal as far as Kilosort is concerned. I expect that (slightly) less expensive (~$6k) RTX would be optimal for sorting very long recordings in one go. Unfortunately, NVidia prevents you from putting these "consumer grade" GPUs into a "commercial grade" rackmount servers, so you've got to build a roaring PC desktop and use that. I do not know of an accepted method or tool for reconciling the outputs of two or more separate sortings, which would allow you to sort shorter recordings independently and combine the results post-hoc. I have always assumed that the longer you can sort in one shot the better, but I imagine that the marginal benefits of doing this (vs. post-hoc reconciliation of sorting outputs) decreases with recording length. Currently, Kilosort's method of drift correction seems to perform poorly when applied to very long recordings, and @TomBugnon is working on fixing that. The recently published approach here seems very promising, but I do not think that their code is ready for use. Hope that is helpful! |
Hi @brendonw1 It looks like these are not maintained and I haven't tried either but I would be curious to hear how it performs. Also, I would guess that some kind of correction for day-to-day drift might be required. I reckon in the neuropixel 2.0 paper they concatenated the raw data and then used this piece of code to register both days to each other. Some integrated package to handle this would be super useful I think... |
Awesome guys! Thanks so much! |
Sure! For the record, if anyone actually ends up working on tracking units across days (which will happen somewhere sometime, I guess) I'd be interested in getting in touch. Cheers |
Me too! |
me too! I'd love to be kept in the loop of any updates or progress people make on this issue. @TomBugnon, have you tried using that flag in the KS code? I saw it a while ago but it wasn't obvious to me how to use it. This might be a silly question but do you know what units the midpoints flag is in? %register the first block F is defined as: F = zeros(dmax, 20, Nbatches); where Nbatches is a KS default (I believe it's calculated as:) so in this case would I keep track of how many batches are in each file and set the midpoint as the number of batches in the first file, when I concatenate the files? Curious if you've tried this... I concatenate the files in Matlab so I definitely could set that flag after reading the first file... edit: I tried this, and it didn't qualitatively change my (poor) aligned results. |
yeah.
Some other groups not using Kilosort seem to have some ideas, but I've not
heard anything within Kilosort yet. It'd be nice if it were possible, but
we may need to migrate in the end
Brendon Watson, MD-PhD
Assistant Professor in Psychiatry
Biomedical Sciences Research Building, Room 5059
University of Michigan
109 Zina Pitcher Place
Ann Arbor, MI 48109-5720
Lab Website: http://watsonneurolab.org
Clinical phone: 734-764-0231
…On Mon, Jul 26, 2021 at 1:26 PM Amy Christensen ***@***.***> wrote:
me too! I'd love to be kept in the loop of any updates or progress people
make on this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTO4RPGMYVL2VE6QNXDTZWLDJANCNFSM45IEXCOQ>
.
|
Hello there, PS. I've commented splitting part in the code! |
Oh thanks that's amazing!
I admit it's been a while since I have thought deeply about KS. Are you
saying you analyzed each chunk/batch separately? Or were all 26 days done
in a way that the same spikes could be classified across all of them (if
the same spikes were recorded)?... if this latter one, how did you get them
to be considered "one recording" by KS2?
Thanks so much for this update
Brendon Watson, MD-PhD
Assistant Professor in Psychiatry
Biomedical Sciences Research Building, Room 5059
University of Michigan
109 Zina Pitcher Place
Ann Arbor, MI 48109-5720
Lab Website: http://watsonneurolab.org
Clinical phone: 734-764-0231
…On Mon, Sep 20, 2021 at 5:03 PM alirezasaeedi1988 ***@***.***> wrote:
Hello there,
Just wanted to update you on this.
I have 26 days (2 or 3 hours a day) of recording over 2 months with a 32
channel electrode. I've merged them into a big binary file (270 GB). Then,
I've fed it into KiloSort 2 and it worked with a little bit of playing
around with the batch size. :)
I'm using NVIDIA Quadro p4000 which has 8 GB of memory.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTNFDUT77QHZMWZZT4LUC6OTNANCNFSM45IEXCOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi Brandon, I think a convenient way to stitch recordings is using spikeinterface with something like this (hasty copypaste for inspiration): import spikeinterface.extractors as se
import spikeinterface.sorters as ss
from pathlib import Path
binpaths = ...
output_dir = Path(...)
sorter = 'kilosort2_5'
rec_extractors = [
se.SpikeGLXRecordingExtractor(binpath)
for binpath in binpaths
]
if len(rec_extractors) == 1:
recording = rec_extractors[0]
else:
recording = se.MultiRecordingTimeExtractor(rec_extractors)
# Must add chan locs
recording.set_channel_locations(...)
ss.run_sorter(
sorter,
recording,
output_folder=output_dir,
verbose=True,
**params
)
# Delete useless `recording.dat` file copied by spikeinterface to ks otput dir
rec_path = output_dir/'recording.dat'
if clean_dat_file and rec_path.exists():
rec_path.unlink() How to distinguish units that were "lost" by kilosort because of drift and should be merged from those that are lost for physiological/mecanical reasons doesn't seem like an easy-to-settle question and I'd be curious to hear thoughts on this. |
Thanks so much!
I guess my question with this method is whether the template matching is
happening in a common computational space? Or is each chunk separately
getting "clustered" (template finding) and so has different spikes?
Brendon Watson, MD-PhD
Assistant Professor in Psychiatry
Biomedical Sciences Research Building, Room 5059
University of Michigan
109 Zina Pitcher Place
Ann Arbor, MI 48109-5720
Lab Website: http://watsonneurolab.org
Clinical phone: 734-764-0231
…On Fri, Sep 24, 2021 at 7:02 AM TomBugnon ***@***.***> wrote:
Hi Brandon, I think a convenient way to stitch recordings is using
spikeinterface with something like this
import spikeinterface.extractors as seimport spikeinterface.sorters as ssfrom pathlib import Path
binpaths = ...output_dir = Path(...)
rec_extractors = [
se.SpikeGLXRecordingExtractor(binpath)
for binpath in binpaths
]
if len(rec_extractors) == 1:
recording = rec_extractors[0]else:
recording = se.MultiRecordingTimeExtractor(rec_extractors)
ss.run_sorter(
sorter,
rec,
output_folder=output_dir,
verbose=True,
**params
)
# Delete useless `recording.dat` file copied by spikeinterface to ks otput dirrec_path = output_dir/'recording.dat'if clean_dat_file and rec_path.exists():
rec_path.unlink()
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTK5ZOYGLNUXDLCIGDTUDRLC7ANCNFSM45IEXCOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
With this approach Kilosort treats the concatenated data as a single dataset (and is ignorant of where the concatenation happens). This is likely not ideal (notably for the drift correction) |
OK, so if I read this correctly:
- The data is split in chunks, the spike sorting is done separately for
each chunk
- The sorted spikes are not in any way "matched' after the joining, they
are just passively put together.
- So we can talk about population stats like "total firing rate" or EI
ratio, etc, but not single-cell properties over the entire span.
Right?
Brendon Watson, MD-PhD
Assistant Professor in Psychiatry
Biomedical Sciences Research Building, Room 5059
University of Michigan
109 Zina Pitcher Place
Ann Arbor, MI 48109-5720
Lab Website: http://watsonneurolab.org
Clinical phone: 734-764-0231
…On Fri, Sep 24, 2021 at 3:04 PM TomBugnon ***@***.***> wrote:
With this approach Kilosort treats the concatenated data as a single
dataset (and is ignorant of where the concatenation happens). This is
likely not ideal (notably for the drift correction)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTNFTJ7E564SLRHTWCLUDTDSZANCNFSM45IEXCOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
no no sorry I was unclear: |
Oh!
Great
When we have done similar, we run into RAM issues, but probably bc we use
single linear 64-site shanks that can't be broken into sub-groups of
channels like tetrodes.
Or has KS found a way to remove the memory limitations?
Thanks again so much
Brendon Watson, MD-PhD
Assistant Professor in Psychiatry
Biomedical Sciences Research Building, Room 5059
University of Michigan
109 Zina Pitcher Place
Ann Arbor, MI 48109-5720
Lab Website: http://watsonneurolab.org
Clinical phone: 734-764-0231
…On Fri, Sep 24, 2021 at 3:34 PM TomBugnon ***@***.***> wrote:
no no sorry I was unclear:
The data is concatenated before being passed to kilsoort. Kilsoort runs on
the concatenated data as if it was a continuous dataset
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTMLWG5J5JV67DGPP23UDTHF3ANCNFSM45IEXCOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi @brendonw1 in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! best, |
Thank you guys very much!
Brendon Watson, MD-PhD
Assistant Professor in Psychiatry
Biomedical Sciences Research Building, Room 5059
University of Michigan
109 Zina Pitcher Place
Ann Arbor, MI 48109-5720
Lab Website: http://watsonneurolab.org
Clinical phone: 734-764-0231
…On Fri, Sep 24, 2021 at 4:06 PM alirezasaeedi1988 ***@***.***> wrote:
Hi @brendonw1 <https://github.com/brendonw1>
in the way that I am doing it, I have one single concatenated data file so
for Kilosort it is a continuous dataset as @TomBugnon
<https://github.com/TomBugnon> just said. I am also having a single shank
electrode and didn't break the data into sub-groups!
But the batch size I mentioned in my first comment is a parameter in the
Kilosort config file to handle the RAM problems.
with a concatenated data from different days, you can track the same
neurons over those days (if the drift is not too much is easier for
Kilosort to keep the track but even if the drift is moderate you can still
merge some cluster in manual sorting stage using PHY)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTKV4LHEAMUPU2M2RPDUDTK2VANCNFSM45IEXCOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi, I want to know the criteria to make sure the recorded single unit belongs to the same neuron throughout the recording (>4 hours), and I believe there will be some drift during recording. It will be better if there is any reference paper to follow. Thanks very much! |
Hey @alirezasaeedi1988, |
Thanks very much
We are working on alternative ideas too. Like sub-sampling
Hope to get something done soon
I appreciate the follow up
Brendon Watson, MD-PhD
Assistant Professor in Psychiatry, Bioinformatics and Biomedical
Engineering
Biomedical Sciences Research Building, Room 5059
University of Michigan
109 Zina Pitcher Place
Ann Arbor, MI 48109-5720
Lab Website: http://watsonneurolab.org
Clinical phone: 734-764-0231
…On Thu, Aug 10, 2023 at 12:33 PM papannon ***@***.***> wrote:
Hi @brendonw1 <https://github.com/brendonw1>
in the way that I am doing it, I have one single concatenated data file so
for Kilosort it is a continuous dataset as @TomBugnon
<https://github.com/TomBugnon> just said. I am also having a single shank
electrode and didn't break the data into sub-groups! But the batch size
(ops.NT) I mentioned in my first comment is a parameter in the Kilosort
config file to handle the RAM problems. You can increase or decrease that
parameter to overcome RAM problems. with a concatenated data from different
days, you can track the same neurons over those days (if the drift is not
too much is easier for Kilosort to keep the track but even if the drift is
moderate you can still merge some cluster in manual sorting stage using PHY)
best, Alireza
Hey @alirezasaeedi1988 <https://github.com/alirezasaeedi1988>,
Could you please elaborate a little bit on this? I have 3 recordings
without moving the probe anywhere from my animals, each taking like 35min,
so after I combine the 3 sessions, I still have a .dat file under 200GB,
KS2 should be able to handle this, but it breaks after "main optimization"
of the spikesorting. (Breifly, I am losing it after main optimization,
during re-ordering the batches with a variety of error messages eg.: Error
running kilosort! EIG did not converge at index=something, or Error running
kilosort! Unrecognized field name "momentum".)
I would like to try commenting out the splitting part or play around with
the betch size, maybe that would work for me as well, but I could not find
out how can I do this. What did you do exactly to work with these
concatenated files? Btw, I used python np.memmap() to merge my files, they
look actually really nice in the KS2 GUI, so the file should not be
corrupted.
Thanks a lot!
—
Reply to this email directly, view it on GitHub
<#405 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA26WTJY7IOIYUBDFLYS7DLXUUENFANCNFSM45IEXCOQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @StringXD While Kilosort identifies and clusters waveforms across different days, it doesn't necessarily confirm that this always reflects reality. (I got a similar comment for my paper) Several prior studies have made similar assertions, including Tolias et al. in JNeurophysiol 2007, McMahon et al. in PNAS 2014, Okun et al. in PLoS ONE 2016, Schoonover et al. in Nature 2021, and Steinmetz et al. in Science 2021. These studies have made considerable effort to confirm that the same neurons are being tracked using various methods. In summary, after spike sorting, you can extract spikes from different time-window (which can vary depending on the experiment) and compare the waveform properties of each cluster with all the clusters (including itself) in the next time window by defining some similarity indices. If the neuron has been tracked correctly, it should best match itself in successive time windows. |
Hi @papannon I am not familiar with np.memmap(), but if the concatenation is properly done, the rest should not be very challenging (considering size of your .dat file). |
We would like to do multi-day recordings, but these recordings mean that we exceed available memory. I am wondering if someone can explain how we might run KS on such long recordings to still extract spikes.
In 2019, in issue 135 (#135), we raised this. However, now I see in run_Templates.m that there is a new approach taken:
% update: these recommendations no longer apply for the datashift version!
Can someone explain whether and how we can run spike detection on a long-duration recordings (previously with the idea of detecting on a sub-set of data and then finding in the full data)?
Thanks so much
Brendon Watson
The text was updated successfully, but these errors were encountered: