Make a how to loading data into a Sorting manually#2944
Make a how to loading data into a Sorting manually#2944samuelgarcia merged 11 commits intoSpikeInterface:mainfrom
Conversation
|
Fixes #2912 |
|
Can you add this to the how to index to see how it looks and review it? |
|
Yes of course! |
| ^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| Finally since SpikeInterface is tightly integrated with the Neo project you can create | ||
| a sorting from :code:`Neo.SpikeTrain` objects. See Neo documentation for more information on |
There was a problem hiding this comment.
I forgot how to do the auto-link so if someone can just remind me in review we should link to the Neo docs.
There was a problem hiding this comment.
Works like:
Please read the :doc:`Neo documentation<neo:index>`.
(currently we've got neo and probeinterface registered to be used like this)
h-mayorquin
left a comment
There was a problem hiding this comment.
This is great. I am wondering if we could enlist @GaelleChapuis to give this a reading as she was the one who asked about this recently.
New eyes are very helpful for didactic things.
| load_matlab_data | ||
| combine_recordings | ||
| process_by_channel_group | ||
| make_a_sorting |
There was a problem hiding this comment.
As a title of the section "make a sorting" sounds to me like how to make a new sorting extractor programatically.
maybe "load your own sorting data to spikeinterface" or something else?
There was a problem hiding this comment.
Do you want that for the toctree? The actual title isn't influenced by the file name at all. Are you worried we won't know which one is which in the future? I could definitely change but didn't want to be over verbose just for making the index.
There was a problem hiding this comment.
Yes. My intention is that people know what the how to is about from the title alone.
If you are concerned about too verbose maybe something in between like "load your sorting data" or similar?
There was a problem hiding this comment.
Yes for me, I agree to change the title itself as I was not 100% sure what Make your own Sorting referred to. For the toctree / .rst filename make_a_sorting I am okay with an abbreviated version of the title.
| are typically stored in samples/frames rather than in seconds. So you should input the times | ||
| in samples/frames. The sampling_frequency allows for easily switching between samples and seconds. | ||
|
|
||
| There are 3 options (along with making a NumpySorting from another sorting which will not be covered here): |
There was a problem hiding this comment.
I don't think the detail in the parenthesis is needed in a how to. I feel we don't need to be thoroug.
| With lists of spike trains and spike labels | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| In this case we need a list or array (or lists of lists for multisegment) of spike times, |
There was a problem hiding this comment.
| In this case we need a list or array (or lists of lists for multisegment) of spike times, | |
| In this case we need a list or array (or lists of lists for multisegment) of spike frames, |
I think that times_list is a bad argument name as it is prone to confusion.
There was a problem hiding this comment.
I agree. I would think we should say
spike times (in frames) or spike times in frames instead. Saying spike times is common in the field to mean in frames or in seconds so I think it would be more standard in the field to say spike times and then specify the units. What do you think?
There was a problem hiding this comment.
I think that you are in a better position to make that call than me. I trust you in this one.
| With a unit dictionary | ||
| ^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| We can also use a dictionary where each unit is a key and its spike times are values. |
There was a problem hiding this comment.
| We can also use a dictionary where each unit is a key and its spike times are values. | |
| We can also use a dictionary where each unit is a key and a list of spike frames are passed as values. |
In the same spirit of time vs frames distinction above.
There was a problem hiding this comment.
See comment above and then we can decide!
Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>
|
@GaelleChapuis would be more than welcome to comment as well! Note to self: add the :doc: sphinx before merge. |
| the spike times (i.e. when the neurons were actually firing) the unit labels (i.e. | ||
| who the spikes belong to. Also called cluster ids by some sorters), the unit ids (the unique | ||
| set of unit labels) and the sampling_frequency. To make your own :code:`Sorting` object you can | ||
| use :code:`NumpySorting`. It is important to note that in SpikeiInterface spike trains are handled internally in samples/frames rather than in seconds and we use the sampling frequency to ... |
There was a problem hiding this comment.
Unfinished sentence. Think you need to delete "are handled internally in samples/frames rather than in seconds and we use the sampling frequency to..." and then it connects to the next sentence.
| .. code-block:: python | ||
|
|
||
| from spikeinterface.core import NumpySorting | ||
|
|
There was a problem hiding this comment.
Black formatting would enhance the beauty.
There was a problem hiding this comment.
black doesn't run on rst unless you have a way. So I tried to mimic it....
|
|
||
| my_sorting = NumpySorting.from_unit_dict(units_dict_list={'0': [1,3], | ||
| '1': [2,4] | ||
| }, |
There was a problem hiding this comment.
Black formatting would enhance the beauty
There was a problem hiding this comment.
here also we should show an example with multi segment in mind.
|
|
||
| # neo_spiketrain is a Neo spiketrain object | ||
| my_sorting = NumpySorting.from_neo_spiketrain_list(neo_spiketrain, | ||
| sampling_frequency=30_000.0) |
There was a problem hiding this comment.
I'd add a line, just to wrap up the guide. Something like:
"
Now that you've created a Sorting object, you can combine it with a recording to make a :ref:Sorting Analyzer <sphx_glr_tutorials_core_plot_4_sorting_analyzer.py>, or start visualising by using the :py:func:~spikeinterface.widgets.plot_crosscorrelograms function.
"
But up to you!
|
@chrishalcrow thanks for all the feedback. I might get to it today, but otherwise I'm in the recording studio all week so hopefully I can incorporate all your fixes on Friday! |
| my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], | ||
| labels_list = [0,1,0,1], | ||
| sampling_frequency = 30_000.0 | ||
| ) |
There was a problem hiding this comment.
I think we should use np.array for spike trains.
And more importantly we should have the multi segment cases directly here.
In short
| my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], | |
| labels_list = [0,1,0,1], | |
| sampling_frequency = 30_000.0 | |
| ) | |
| my_sorting = NumpySorting.from_times_labels(times_list = [np.array( [1,2,3,4])], | |
| labels_list = [np.array([0,1,0,1])], | |
| sampling_frequency = 30_000.0 | |
| ) |
with a celar exlapnation of the multi segment story.
No ?
There was a problem hiding this comment.
For np.arrays I agree. That's fine.
I was purposely avoiding the multisegment because people said that is confusing for people working with mono segments. But I can add. Maybe one example lower down? That way we have an easy example and people can shut off their brains if they don't need the multisegment?
There was a problem hiding this comment.
OK for the mono segment without list but this must be explicit in the paragraph somewhere.
There was a problem hiding this comment.
Yep will do. My plan is to incorporate all changes on Friday when I can sit and do it carefully :)
There was a problem hiding this comment.
I also believe that in general is better to explain things in the mono segment and then we can mention the multi segment after. Is a complexity that is probably not necessary for most cases and concepts.
JoeZiminski
left a comment
There was a problem hiding this comment.
Hey @zm711 this is awesome! I did not know about this functionality, its really nice and this is a very useful explaination. I've added some suggestions!
| Make your own Sorting | ||
| ===================== | ||
|
|
||
| Why make a :code:`Sorting`? |
There was a problem hiding this comment.
Could the Sorting link to the API for sorting? I think this is possible in RTD hopefully as so. Maybe it is overkill for every reference, but maybe the first?
There was a problem hiding this comment.
That being said, this is not done elsewhere in the docs so maybe an issue / PR for another time.
There was a problem hiding this comment.
I'm bad at that for RTD. So if you have an idea definitely suggest. Linking is the bane of my existence, but I like it!
|
|
||
| Why make a :code:`Sorting`? | ||
|
|
||
| The :code:`Sorting` object is one of the core objects within the SpikeInterface library |
There was a problem hiding this comment.
This section is really nice, at the moment it leads with references to the Sorting object, my suggestion would be to start by motivating it from the perspective of the user and their problem. I think the Sorting object is important but it is more a means-to-an-end that interesting in its own right. e.g. (just an example of motivation / order rather than specific content):
SpikeInterface contains pre-build readers for the output of many common sorters. However, what if you have sorting output that is not in a standard format (e.g. old csv file)? If this is the case you can make your own Sorting object to load your data into SpikeInterface. This means you can still easily apply various downstream analyses to your results (e.g. building correlograms or for generating a SortingAnalyzer).
The Sorting object is a core object within SpikeInterface that acts as a convenient way to interface with sorting results, no matter which sorter was used to generate them. At a fundamental level it is a series of spike times and a series of labels for each spike along with some associated metadata. Below, we will show you have to take your existing data and load it as a SpikeInterface Sorting object.
There was a problem hiding this comment.
Maybe something in bold to state how easy this is (based on the really nice examples below, amazing you just need times, labels)!
All you need to load your own sorting output into spike interface is a list of spike times and associated unit IDs.
| Making a :code:`Sorting` | ||
| ------------------------ | ||
|
|
||
| For most formats the :code:`Sorting` is automatically generated. For example one could do |
There was a problem hiding this comment.
'formats' -> 'sorting output formats'?
|
|
||
| # For kilosort/phy files we can use either reader | ||
| ks_sorting = read_kilosort('path/to/folder') | ||
| phy_sorting = read_phy('path/to/folder') |
There was a problem hiding this comment.
maybe just ks_sorting example to keep the example focused? Maybe For example, if one had run sorting using Kilosort, you would load the sorting results into SpikeInterface with:
| phy_sorting = read_phy('path/to/folder') | ||
|
|
||
| This :code:`Sorting` contains important information about your spike trains including | ||
| the spike times (i.e. when the neurons were actually firing) the unit labels (i.e. |
There was a problem hiding this comment.
Maybe list these as bullet points?
There was a problem hiding this comment.
I would fold the discussion of frames into the definition of spike times. e.g.
The `Sorting object contains important information about your spike trains. You will need to provide the:
- spike times... Note these must be specified in samples. They will be converted to times under the hood using the provided sampling_frequency
- Unit its ...
- samplign frequency ...
| are typically stored in samples/frames rather than in seconds. So you should input the times | ||
| in samples/frames. The sampling_frequency allows for easily switching between samples and seconds. | ||
|
|
||
| There are 3 options (along with making a NumpySorting from another sorting which will not be covered here): |
There was a problem hiding this comment.
Could be slightly more specifc here e..g. 'There are three options for how you format the spike times, unit labels and sampling frequency that are passed to SpikeInterface. This can be as a list, a dictionary, or with Neo SpikeTrains. Below we will look at an example of each in turn."
Also, I wonder if there is another way to refer to 'spike times' because it is confusing as they are specified in samples. 'Spike sample indicies?' I'm not sure, spike times rolls of the toungue but it really implies these should be formatted in time units.
There was a problem hiding this comment.
This was the discussion Heberto and I had above. I would argue the field is used to the spike times term so I think we say your spike times in samples to be clearer.
| from spikeinterface.core import NumpySorting | ||
|
|
||
| # in this case we are making a monosegment sorting | ||
| my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], |
There was a problem hiding this comment.
Some more realistic times might make this clearer. Maybe above, you could have like "Say you had only four spikes in your dataset, at samples 1000, 12000, 15000, 22000 from two different units. With a sampling_frequency of XXX, the actual spike times when converted under the hood in spike interface would be XXXX."
Then, each below example can re-use the previously exaplined example?
There was a problem hiding this comment.
That's fair. I was being lazy :)
| # in this case we are making a monosegment sorting | ||
| my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4], | ||
| labels_list = [0,1,0,1], | ||
| sampling_frequency = 30_000.0 |
There was a problem hiding this comment.
is this underscore syntax correct?
There was a problem hiding this comment.
Yes. It is. You can put underscores into any number for spacing :)
|
I think I got most comments, but another review would be great. |
h-mayorquin
left a comment
There was a problem hiding this comment.
This is OK to me. Thanks a lot @zm711 . I am not approving only because you still have as draft.
| Load Your Own Data into a Sorting | ||
| ================================= | ||
|
|
||
| Why make a :code:`Sorting`? |
There was a problem hiding this comment.
As usual, your why sections are great!
There was a problem hiding this comment.
@JoeZiminski edited it so hats off for his assist.
|
|
||
| * spike times: the peaks of the extracellular potentials expressed in samples/frames these can | ||
| be converted to seconds under the hood using the sampling_frequency | ||
| * spike labels: the neuron id for each spike, can also be called cluster ids or unit ids |
There was a problem hiding this comment.
Comment to myself. I never thought about them this way. I always thought that the units have a label and a spike train but I think I had the "spike dictionary" representation too prominent in my mind. This makes more sense in the context of the spike vector.
Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>
Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>
|
@chrishalcrow, we await your read! |
|
Hey @zm711 looks great!! 🚀 |
chrishalcrow
left a comment
There was a problem hiding this comment.
Amazing! Got some beauty requests and a link to update, but otherwise it's great.
|
@h-mayorquin just in case you didn't see this the profile imports on Mac took longer than 1.65 seconds. I re-reran and it went away. So this test has some randomness--I would assume based on the runner that picks it up, but I don't know. |
|
Thanks @zm711 . I was probably too ambitious with the test. I will set it higher : ) |
This is just a first draft. I expect a good cleanup!
@JoeZiminski, feel free to comment as well.