Make a how to loading data into a Sorting manually by zm711 · Pull Request #2944 · SpikeInterface/spikeinterface

zm711 · 2024-05-31T08:38:37Z

This is just a first draft. I expect a good cleanup!

@JoeZiminski, feel free to comment as well.

zm711 · 2024-05-31T08:48:58Z

Fixes #2912

h-mayorquin · 2024-05-31T12:28:16Z

Can you add this to the how to index to see how it looks and review it?

zm711 · 2024-05-31T12:30:53Z

Yes of course!

zm711 · 2024-05-31T12:58:38Z

+^^^^^^^^^^^^^^^^^^^^
+
+Finally since SpikeInterface is tightly integrated with the Neo project you can create
+a sorting from :code:`Neo.SpikeTrain` objects. See Neo documentation for more information on


I forgot how to do the auto-link so if someone can just remind me in review we should link to the Neo docs.

Works like:

Please read the :doc:`Neo documentation<neo:index>`.

(currently we've got neo and probeinterface registered to be used like this)

h-mayorquin

This is great. I am wondering if we could enlist @GaelleChapuis to give this a reading as she was the one who asked about this recently.

New eyes are very helpful for didactic things.

h-mayorquin · 2024-06-01T10:04:31Z

    load_matlab_data
    combine_recordings
    process_by_channel_group
+    make_a_sorting


As a title of the section "make a sorting" sounds to me like how to make a new sorting extractor programatically.

maybe "load your own sorting data to spikeinterface" or something else?

Do you want that for the toctree? The actual title isn't influenced by the file name at all. Are you worried we won't know which one is which in the future? I could definitely change but didn't want to be over verbose just for making the index.

Yes. My intention is that people know what the how to is about from the title alone.

If you are concerned about too verbose maybe something in between like "load your sorting data" or similar?

Yes for me, I agree to change the title itself as I was not 100% sure what Make your own Sorting referred to. For the toctree / .rst filename make_a_sorting I am okay with an abbreviated version of the title.

h-mayorquin · 2024-06-01T10:11:12Z

+are typically stored in samples/frames rather than in seconds. So you should input the times
+in samples/frames. The sampling_frequency allows for easily switching between samples and seconds.
+
+There are 3 options (along with making a NumpySorting from another sorting which will not be covered here):


I don't think the detail in the parenthesis is needed in a how to. I feel we don't need to be thoroug.

h-mayorquin · 2024-06-01T10:18:22Z

+With lists of spike trains and spike labels
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this case we need a list or array (or lists of lists for multisegment) of spike times,


Suggested change

In this case we need a list or array (or lists of lists for multisegment) of spike times,

In this case we need a list or array (or lists of lists for multisegment) of spike frames,

I think that times_list is a bad argument name as it is prone to confusion.

I agree. I would think we should say

spike times (in frames) or spike times in frames instead. Saying spike times is common in the field to mean in frames or in seconds so I think it would be more standard in the field to say spike times and then specify the units. What do you think?

I think that you are in a better position to make that call than me. I trust you in this one.

h-mayorquin · 2024-06-01T10:19:19Z

+With a unit dictionary
+^^^^^^^^^^^^^^^^^^^^^^
+
+We can also use a dictionary where each unit is a key and its spike times are values.


Suggested change

We can also use a dictionary where each unit is a key and its spike times are values.

We can also use a dictionary where each unit is a key and a list of spike frames are passed as values.

In the same spirit of time vs frames distinction above.

See comment above and then we can decide!

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>

zm711 · 2024-06-01T19:14:58Z

@GaelleChapuis would be more than welcome to comment as well!

Note to self: add the :doc: sphinx before merge.

chrishalcrow

Beautiful! Thanks Zach!

chrishalcrow · 2024-06-03T08:52:19Z

+the spike times (i.e. when the neurons were actually firing) the unit labels (i.e.
+who the spikes belong to. Also called cluster ids by some sorters), the unit ids (the unique
+set of unit labels) and the sampling_frequency. To make your own :code:`Sorting` object you can
+use :code:`NumpySorting`. It is important to note that in SpikeiInterface spike trains are handled internally in samples/frames rather than in seconds and we use the sampling frequency to ...


Unfinished sentence. Think you need to delete "are handled internally in samples/frames rather than in seconds and we use the sampling frequency to..." and then it connects to the next sentence.

chrishalcrow · 2024-06-03T08:52:59Z

+.. code-block:: python
+
+    from spikeinterface.core import NumpySorting
+


Black formatting would enhance the beauty.

black doesn't run on rst unless you have a way. So I tried to mimic it....

chrishalcrow · 2024-06-03T08:53:13Z

+
+    my_sorting = NumpySorting.from_unit_dict(units_dict_list={'0': [1,3],
+                                                              '1': [2,4]
+                                                              },


Black formatting would enhance the beauty

here also we should show an example with multi segment in mind.

chrishalcrow · 2024-06-03T09:43:16Z

+
+    # neo_spiketrain is a Neo spiketrain object
+    my_sorting = NumpySorting.from_neo_spiketrain_list(neo_spiketrain,
+                                                       sampling_frequency=30_000.0)


I'd add a line, just to wrap up the guide. Something like:

"
Now that you've created a Sorting object, you can combine it with a recording to make a :ref:Sorting Analyzer <sphx_glr_tutorials_core_plot_4_sorting_analyzer.py>, or start visualising by using the :py:func:~spikeinterface.widgets.plot_crosscorrelograms function.
"

But up to you!

zm711 · 2024-06-03T11:29:03Z

@chrishalcrow thanks for all the feedback. I might get to it today, but otherwise I'm in the recording studio all week so hopefully I can incorporate all your fixes on Friday!

samuelgarcia · 2024-06-05T09:22:38Z

+    my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],
+                                                labels_list = [0,1,0,1],
+                                                sampling_frequency = 30_000.0
+                                                )


I think we should use np.array for spike trains.
And more importantly we should have the multi segment cases directly here.

In short

Suggested change

my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],

labels_list = [0,1,0,1],

sampling_frequency = 30_000.0

)

my_sorting = NumpySorting.from_times_labels(times_list = [np.array( [1,2,3,4])],

labels_list = [np.array([0,1,0,1])],

sampling_frequency = 30_000.0

)

with a celar exlapnation of the multi segment story.
No ?

For np.arrays I agree. That's fine.

I was purposely avoiding the multisegment because people said that is confusing for people working with mono segments. But I can add. Maybe one example lower down? That way we have an easy example and people can shut off their brains if they don't need the multisegment?

OK for the mono segment without list but this must be explicit in the paragraph somewhere.

Yep will do. My plan is to incorporate all changes on Friday when I can sit and do it carefully :)

I also believe that in general is better to explain things in the mono segment and then we can mention the multi segment after. Is a complexity that is probably not necessary for most cases and concepts.

JoeZiminski

Hey @zm711 this is awesome! I did not know about this functionality, its really nice and this is a very useful explaination. I've added some suggestions!

JoeZiminski · 2024-06-05T15:05:34Z

+Make your own Sorting
+=====================
+
+Why make a :code:`Sorting`?


Could the Sorting link to the API for sorting? I think this is possible in RTD hopefully as so. Maybe it is overkill for every reference, but maybe the first?

That being said, this is not done elsewhere in the docs so maybe an issue / PR for another time.

I'm bad at that for RTD. So if you have an idea definitely suggest. Linking is the bane of my existence, but I like it!

JoeZiminski · 2024-06-05T15:14:31Z

+
+Why make a :code:`Sorting`?
+
+The :code:`Sorting` object is one of the core objects within the SpikeInterface library


This section is really nice, at the moment it leads with references to the Sorting object, my suggestion would be to start by motivating it from the perspective of the user and their problem. I think the Sorting object is important but it is more a means-to-an-end that interesting in its own right. e.g. (just an example of motivation / order rather than specific content):

SpikeInterface contains pre-build readers for the output of many common sorters. However, what if you have sorting output that is not in a standard format (e.g. old csv file)? If this is the case you can make your own Sorting object to load your data into SpikeInterface. This means you can still easily apply various downstream analyses to your results (e.g. building correlograms or for generating a SortingAnalyzer).

The Sorting object is a core object within SpikeInterface that acts as a convenient way to interface with sorting results, no matter which sorter was used to generate them. At a fundamental level it is a series of spike times and a series of labels for each spike along with some associated metadata. Below, we will show you have to take your existing data and load it as a SpikeInterface Sorting object.

Maybe something in bold to state how easy this is (based on the really nice examples below, amazing you just need times, labels)!

All you need to load your own sorting output into spike interface is a list of spike times and associated unit IDs.

JoeZiminski · 2024-06-05T15:20:26Z

+Making a :code:`Sorting`
+------------------------
+
+For most formats the :code:`Sorting` is automatically generated. For example one could do


'formats' -> 'sorting output formats'?

JoeZiminski · 2024-06-05T15:21:05Z

+
+    # For kilosort/phy files we can use either reader
+    ks_sorting = read_kilosort('path/to/folder')
+    phy_sorting = read_phy('path/to/folder')


maybe just ks_sorting example to keep the example focused? Maybe For example, if one had run sorting using Kilosort, you would load the sorting results into SpikeInterface with:

Fair. I prefer focused.

JoeZiminski · 2024-06-05T15:21:53Z

+    phy_sorting = read_phy('path/to/folder')
+
+This :code:`Sorting` contains important information about your spike trains including
+the spike times (i.e. when the neurons were actually firing) the unit labels (i.e.


Maybe list these as bullet points?

I would fold the discussion of frames into the definition of spike times. e.g.

The `Sorting object contains important information about your spike trains. You will need to provide the:

spike times... Note these must be specified in samples. They will be converted to times under the hood using the provided sampling_frequency

Unit its ...

samplign frequency ...

JoeZiminski · 2024-06-05T15:30:08Z

+are typically stored in samples/frames rather than in seconds. So you should input the times
+in samples/frames. The sampling_frequency allows for easily switching between samples and seconds.
+
+There are 3 options (along with making a NumpySorting from another sorting which will not be covered here):


Could be slightly more specifc here e..g. 'There are three options for how you format the spike times, unit labels and sampling frequency that are passed to SpikeInterface. This can be as a list, a dictionary, or with Neo SpikeTrains. Below we will look at an example of each in turn."

Also, I wonder if there is another way to refer to 'spike times' because it is confusing as they are specified in samples. 'Spike sample indicies?' I'm not sure, spike times rolls of the toungue but it really implies these should be formatted in time units.

This was the discussion Heberto and I had above. I would argue the field is used to the spike times term so I think we say your spike times in samples to be clearer.

JoeZiminski · 2024-06-05T15:32:22Z

+    from spikeinterface.core import NumpySorting
+
+    # in this case we are making a monosegment sorting
+    my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],


Some more realistic times might make this clearer. Maybe above, you could have like "Say you had only four spikes in your dataset, at samples 1000, 12000, 15000, 22000 from two different units. With a sampling_frequency of XXX, the actual spike times when converted under the hood in spike interface would be XXXX."

Then, each below example can re-use the previously exaplined example?

That's fair. I was being lazy :)

JoeZiminski · 2024-06-05T15:32:32Z

+    # in this case we are making a monosegment sorting
+    my_sorting = NumpySorting.from_times_labels(times_list = [1,2,3,4],
+                                                labels_list = [0,1,0,1],
+                                                sampling_frequency = 30_000.0


is this underscore syntax correct?

Yes. It is. You can put underscores into any number for spacing :)

🤯 cool!!

It is very nice, man!

zm711 · 2024-06-06T12:34:34Z

I think I got most comments, but another review would be great.

h-mayorquin

This is OK to me. Thanks a lot @zm711 . I am not approving only because you still have as draft.

h-mayorquin · 2024-06-06T15:05:37Z

+Load Your Own Data into a Sorting
+=================================
+
+Why make a :code:`Sorting`?


As usual, your why sections are great!

@JoeZiminski edited it so hats off for his assist.

h-mayorquin · 2024-06-06T15:08:12Z

+
+  * spike times: the peaks of the extracellular potentials expressed in samples/frames these can
+    be converted to seconds under the hood using the sampling_frequency
+  * spike labels: the neuron id for each spike, can also be called cluster ids or unit ids


Comment to myself. I never thought about them this way. I always thought that the units have a label and a spike train but I think I had the "spike dictionary" representation too prominent in my mind. This makes more sense in the context of the spike vector.

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>

zm711 · 2024-06-06T16:14:32Z

@chrishalcrow, we await your read!

h-mayorquin

LGTM

JoeZiminski · 2024-06-06T17:01:15Z

Hey @zm711 looks great!! 🚀

chrishalcrow

Amazing! Got some beauty requests and a link to update, but otherwise it's great.

zm711 · 2024-06-07T11:11:55Z

@h-mayorquin just in case you didn't see this the profile imports on Mac took longer than 1.65 seconds. I re-reran and it went away. So this test has some randomness--I would assume based on the runner that picks it up, but I don't know.

h-mayorquin · 2024-06-07T13:56:46Z

Thanks @zm711 . I was probably too ambitious with the test. I will set it higher : )

zm711 added 2 commits May 30, 2024 16:50

make a sorting wip

d816d0f

continued docs

f1e4685

zm711 requested review from chrishalcrow and h-mayorquin May 31, 2024 08:38

zm711 added the documentation Improvements or additions to documentation label May 31, 2024

zm711 linked an issue May 31, 2024 that may be closed by this pull request

Add "how to" for how to load your own data as a sorting object #2912

Closed

add to index

6b4d934

zm711 commented May 31, 2024

View reviewed changes

tidy the how-to

15854f7

h-mayorquin reviewed Jun 1, 2024

View reviewed changes

alejoe91 added the hackathon-24 Contributions during the SpikeInterface Hackathon May 24 label Jun 1, 2024

Heberto improvement

8f9ca83

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>

chrishalcrow requested changes Jun 3, 2024

View reviewed changes

samuelgarcia reviewed Jun 5, 2024

View reviewed changes

JoeZiminski requested changes Jun 5, 2024

View reviewed changes

everyone's feedback I hope

5914e62

zm711 changed the title ~~Make a how to for creating a Sorting~~ Make a how to loading data into a Sorting manually Jun 6, 2024

h-mayorquin reviewed Jun 6, 2024

View reviewed changes

zm711 and others added 3 commits June 6, 2024 12:12

Heberto improvements

42ae4b7

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>

another Heberto clarification

7e639e0

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>

Merge branch 'main' into make-a-sorting-doc

5131f57

zm711 marked this pull request as ready for review June 6, 2024 16:14

h-mayorquin approved these changes Jun 6, 2024

View reviewed changes

JoeZiminski approved these changes Jun 6, 2024

View reviewed changes

chrishalcrow requested changes Jun 7, 2024

View reviewed changes

zm711 added 2 commits June 7, 2024 07:00

spacing of examples

62ccf27

one more spacing

9edf693

zm711 requested a review from chrishalcrow June 7, 2024 11:03

chrishalcrow approved these changes Jun 7, 2024

View reviewed changes

samuelgarcia approved these changes Jun 7, 2024

View reviewed changes

samuelgarcia merged commit 2a4bc57 into SpikeInterface:main Jun 7, 2024

zm711 deleted the make-a-sorting-doc branch June 7, 2024 13:48

	In this case we need a list or array (or lists of lists for multisegment) of spike times,
	In this case we need a list or array (or lists of lists for multisegment) of spike frames,

	We can also use a dictionary where each unit is a key and its spike times are values.
	We can also use a dictionary where each unit is a key and a list of spike frames are passed as values.

		.. code-block:: python

		from spikeinterface.core import NumpySorting


		Why make a :code:`Sorting`?

		The :code:`Sorting` object is one of the core objects within the SpikeInterface library

Conversation

zm711 commented May 31, 2024

Uh oh!

zm711 commented May 31, 2024

Uh oh!

h-mayorquin commented May 31, 2024

Uh oh!

zm711 commented May 31, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrishalcrow May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-mayorquin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zm711 commented Jun 1, 2024

Uh oh!

chrishalcrow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zm711 commented Jun 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoeZiminski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

chrishalcrow May 31, 2024 •

edited

Loading