Proposal of format to hold the manual curation information by remi-pr · Pull Request #2933 · SpikeInterface/spikeinterface

remi-pr · 2024-05-30T11:19:53Z

We propose a format to contain the information relative to manual curation.

Took 1 hour 17 minutes

for more information, see https://pre-commit.ci

magland · 2024-05-30T13:20:37Z

I have some small suggestions. Let's look at this example from your tests

merged_and_removed = {
    "unit_ids": [1, 2, 3, 6, 10, 14, 20, 31, 42],
    "labels_definition": {
        "quality": {"name": "quality", "labels": ["good", "noise", "MUA", "artifact"], "auto_eclusive": True},
        "experimental": {
            "name": "experimental",
            "labels": ["acute", "chronic", "headfixed", "freelymoving"],
            "auto_eclusive": False,
        },
    },
    "manual_labels": [
        {"unit_id": 1, "label_category_key": "quality", "label_category_value": "good"},
        {"unit_id": 2, "label_category_key": "quality", "label_category_value": "noise"},
        {"unit_id": 2, "label_category_key": "experimental", "label_category_value": ["chronic", "headfixed"]},
    ],
    "merged_unit_groups": [[3, 6], [10, 14, 20]],  # one cell goes into at most one list
    "removed_units": [3, 31, 42],  # Can not be  in the merged_units
}

labels_definition => label_definitions

auto_eclusive => auto_exclusive

label_definitions.quality.labels => label_definitions.quality.label_options

What exactly does "auto_exclusive" mean?

label_category_key => label_category

label_category_value => labels (make it a list and allow more than one)

Let me know what you think

Took 1 hour 7 minutes

remi-pr · 2024-05-30T13:48:18Z

Great comments, I will implement quickly.

Just two notes, which are actually just one:

auto_exclusive means that you can only pick one from the list of options . For example a unit can not be 'good' and 'noise'. It could be renamed to just exclusive. I would let that up to you.
label_category_value => labels (make it a list and allow more than one) : This is linked to the auto_exclusive parameter. If only one is allowed -> str if more than one -> list as you can see in the example "label_category_value": ["chronic", "headfixed"]. I added a check for that in the validation function.

Took 10 minutes

for more information, see https://pre-commit.ci

zm711 · 2024-05-30T14:13:39Z

I think I would be with Jeremy here in that auto_exclusive isn't super clear. exclusive by itself basically says the same thing. I would personally prefer more explicit like single_label or single_label_only, but I guess `exclusive isn't too bad.

label_category_value => labels (make it a list and allow more than one) :

Also I weakly fall on Jeremy's side for this too. You could have values always be a list and then if there is exclusivity you just validate that the list only contains one thing rather than a string. That way you are only working with the list and not switching between list and string. What is the benefit of your current implementation over doing just the list?

magland · 2024-05-30T14:20:29Z

auto_exclusive means that you can only pick one from the list of options . For example a unit can not be 'good' and 'noise'. It could be renamed to just exclusive. I would let that up to you.

I think I prefer exclusive

label_category_value => labels (make it a list and allow more than one) : This is linked to the auto_exclusive parameter. If only one is allowed -> str if more than one -> list as you can see in the example "label_category_value": ["chronic", "headfixed"]. I added a check for that in the validation function.

I'm concerned about having two different possible types here. I'm thinking about code that parses this... you'd need a lot of if statements... if exclusive... or if is str. And converting between exclusive=False and exclusive=True would be a pain. It also makes validation simpler to always have list. Curious about opinion from others.

EDIT: Ooops I just saw reply from @zm711

remi-pr · 2024-05-30T14:21:21Z

I think exclusive is a better name you are right.

labels being a list or a string: We had this conversation with @samuelgarcia and the conclusion was that it is likely that most labels will be exclusive and that sometimes they won't be. Thus by having a string will avoid having labels[0] everywhere it is exclusive. On the other hand we will have to handle the list case separately. I have no strong opinion on this and I am OK with both options.

zm711 · 2024-05-30T14:22:43Z

EDIT: Ooops I just saw reply from @zm711

When someone is agreeing with me I'm always happy to reread the opinions :)

zm711 · 2024-05-31T08:26:23Z

Sam and I talked about this more last night and I think his idea was that he would really prefer it to just accept string and make everything mutually exclusive, but that would require a slight reworking on the sotringview side. Then string only would make the most sense.

magland · 2024-05-31T13:44:41Z

I'm a bit concerned about the complexity of this format. Wouldn't this be more straightforward?

... label definitions goes here ...
'labels': [
  {
    'unit_id': '1',
    'manual': {'quality': 'good', 'experimental': 'chronic'}
  },
  ...
]

samuelgarcia · 2024-06-03T09:31:00Z

Hi Jeremy,
thanks for the feeback.
I am OK we can simplify we what you propose.
The ultimate stuff to discuss is the value being "list" (always) or scalar or both.
Note that Remy and what for "both" but having list with size one when mode exclusive is OK for me if this make your implementation easier.

I will update this PR this week.

for more information, see https://pre-commit.ci

samuelgarcia · 2024-06-05T12:10:35Z

@remi-pr @zm711 @magland
I push some more change with what we decided.
I also did a small doc in the rst : maybe some american eye to check the gramar could be nice :)

I need to do some more test for converting old format to this new one.

zm711

Just some comments on the rst to start with.

for more information, see https://pre-commit.ci

Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>

magland · 2024-06-05T13:11:51Z

@samuelgarcia I think this looks better.

But isn't this redundant in that "quality" is the key and also the name? Same for "putative_type"?

"label_definitions": {
        "quality": {"name": "quality", "label_options": ["good", "noise", "MUA", "artifact"], "exclusive": True},
        "putative_type": {
            "name": "putative_type",
            "label_options": ["excitatory", "inhibitory", "pyramidal", "mitral"],
            "exclusive": False,
        },
    },

Maybe "label_definitions" should be a list instead.

samuelgarcia · 2024-06-05T13:16:09Z

The "name" can be be removed, it was more here for title like "Putative cell type".
Sure in our example it is annoyingly redundant.
I propose to remove at the moment.
We choose dict instead instead list because it ensure unique keys.
Is this OK ?

magland · 2024-06-05T13:31:49Z

yeah that sounds good.

for more information, see https://pre-commit.ci

samuelgarcia · 2024-06-05T13:42:47Z

OK done
And I also did a curation_label_to_dataframe() to illustrate the exclusive=True/False.

Exclusive=False return boolean columns whereas exclusive=True are unique.

   quality excitatory inhibitory pyramidal mitral
1     good      False      False     False  False
2    noise       True      False      True  False
3               False       True     False  False
6               False      False     False  False
10              False      False     False  False
14              False      False     False  False
20              False      False     False  False
31              False      False     False  False
42              False      False     False  False

samuelgarcia · 2024-06-05T13:47:10Z

@alejoe91 : this is ready to review on my side.
The implementation of applying the format to a sorting will be done in another PR.
This PR include only machinery for handling/checking/converting the curtaion format.

apply_sortingview_curation() still need to be rewrite with this in mind using the convertion.

…nto curation_format

for more information, see https://pre-commit.ci

alejoe91

A few minor comments.

Aren't those 4 files missing?

Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>

samuelgarcia · 2024-06-14T19:15:00Z

Merci beacoup Rémi.

remi-pr · 2024-06-18T09:19:25Z

Great job guys. Let me know if you ever want help implementing the curation to a sorting.

alejoe91 · 2024-06-18T09:27:30Z

Thanks @remi-pr

I think that the next level for curation (also automated) would be to add a curation layer to the SortingAnalyzer.

We now have the analyzer.select_units and we can easily make a remove_units to discard rejected units.

The tricky but interesting part would be to implement an analyzer.merge(merge_unit_ids). This function will need to be implemented in each extension to apply/recompute the merge.

For example:

the templates extension could simply recompute the merged template as a weighted sum of the merged unit templates
the correlograms will only recompute correlograms for the merge
the quality metrics is more tricky: some of them will just need to be recomputed on the merged unit (e.g. all the spike train based), but others (like PCA metrics) will need a recompute on the updated unit neighborhood (e.g., isolation_distance, l-ratio, etc.)

This is a big change, but I think that it's currently the main bottleneck to a clean workflow since now one needs to recompute the analyzer and all extensions after a merge (or a split).

IMO, it makes sense to start with this now since it will be a super useful feature!

yger · 2024-06-18T09:44:45Z

+1, I agree this would be super useful, I can help also if needed, we could start a branch for that. Indeed, I'm currently starting to implement these function myself for the automated curation, but as standalone functions and not at the analyzer level

yger · 2024-06-19T06:55:08Z

Actually, I have a branch and i'll make a PR soon, to implement such a sorting_analyzer.merge_units() function in a proper manner, similarly to select_units(). Don't start anything, I should push that today hopefully to start the discussion

remi-pr · 2024-06-19T07:40:38Z

Perfect. I am looking forward to this. I am really looking forward to this new feature.

yger · 2024-06-19T08:21:55Z

See #3043

remi-pr and others added 2 commits May 30, 2024 13:17

WIP: Proposal of format to hold the manual curation information

5f8d49b

Took 1 hour 17 minutes

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa2493b

for more information, see https://pre-commit.ci

alejoe91 added the curation Related to curation module label May 30, 2024

Feature: Conversion from sortingview format to the new proposed format

1021163

Took 1 hour 7 minutes

remi-pr and others added 2 commits May 30, 2024 15:54

Renaming curation dictionary keys

8094053

Took 10 minutes

[pre-commit.ci] auto fixes from pre-commit.com hooks

5983194

for more information, see https://pre-commit.ci

alejoe91 added the hackathon-24 Contributions during the SpikeInterface Hackathon May 24 label Jun 1, 2024

Update on the curation format.

64453c6

samuelgarcia changed the title ~~WIP: Proposal of format to hold the manual curation information~~ Proposal of format to hold the manual curation information Jun 5, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a4630b

for more information, see https://pre-commit.ci

zm711 reviewed Jun 5, 2024

View reviewed changes

samuelgarcia and others added 4 commits June 5, 2024 14:40

more tests for converting curation format

8ef0633

black is annoying

5c35370

[pre-commit.ci] auto fixes from pre-commit.com hooks

85dcea7

for more information, see https://pre-commit.ci

Merci Zach

1d10bdc

Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>

samuelgarcia added 2 commits June 5, 2024 15:40

curation_label_to_dataframe()

710eade

Black is annoying

62a5743

[pre-commit.ci] auto fixes from pre-commit.com hooks

52857cc

for more information, see https://pre-commit.ci

TODO for alessio in the code.

22ff94d

samuelgarcia and others added 3 commits June 5, 2024 15:50

oups

a9c5aad

Merge branch 'curation_format' of github.com:remi-pr/spikeinterface i…

57b51d4

…nto curation_format

[pre-commit.ci] auto fixes from pre-commit.com hooks

6eef836

for more information, see https://pre-commit.ci

samuelgarcia added this to the 0.101.0 milestone Jun 12, 2024

alejoe91 requested changes Jun 13, 2024

View reviewed changes

Apply suggestions from code review: typography et al

8111789

Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>

Merge branch 'main' into curation_format

1c57cfe

samuelgarcia requested a review from alejoe91 June 14, 2024 19:41

alejoe91 approved these changes Jun 18, 2024

View reviewed changes

samuelgarcia merged commit 2281dda into SpikeInterface:main Jun 18, 2024

Conversation

remi-pr commented May 30, 2024

Uh oh!

magland commented May 30, 2024

Uh oh!

remi-pr commented May 30, 2024

Uh oh!

zm711 commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

magland commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remi-pr commented May 30, 2024

Uh oh!

zm711 commented May 30, 2024

Uh oh!

zm711 commented May 31, 2024

Uh oh!

magland commented May 31, 2024

Uh oh!

samuelgarcia commented Jun 3, 2024

Uh oh!

samuelgarcia commented Jun 5, 2024

Uh oh!

zm711 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

magland commented Jun 5, 2024

Uh oh!

samuelgarcia commented Jun 5, 2024

Uh oh!

magland commented Jun 5, 2024

Uh oh!

samuelgarcia commented Jun 5, 2024

Uh oh!

samuelgarcia commented Jun 5, 2024

Uh oh!

alejoe91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samuelgarcia commented Jun 14, 2024

Uh oh!

remi-pr commented Jun 18, 2024

Uh oh!

alejoe91 commented Jun 18, 2024

Uh oh!

yger commented Jun 18, 2024

Uh oh!

yger commented Jun 19, 2024

Uh oh!

remi-pr commented Jun 19, 2024

Uh oh!

yger commented Jun 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zm711 commented May 30, 2024 •

edited

Loading

magland commented May 30, 2024 •

edited

Loading