Skip to content

Proposal of format to hold the manual curation information#2933

Merged
samuelgarcia merged 20 commits intoSpikeInterface:mainfrom
remi-pr:curation_format
Jun 18, 2024
Merged

Proposal of format to hold the manual curation information#2933
samuelgarcia merged 20 commits intoSpikeInterface:mainfrom
remi-pr:curation_format

Conversation

@remi-pr
Copy link
Copy Markdown
Contributor

@remi-pr remi-pr commented May 30, 2024

We propose a format to contain the information relative to manual curation.

@magland
Copy link
Copy Markdown
Member

magland commented May 30, 2024

Thanks @remi-pr and @samuelgarcia

I have some small suggestions. Let's look at this example from your tests

merged_and_removed = {
    "unit_ids": [1, 2, 3, 6, 10, 14, 20, 31, 42],
    "labels_definition": {
        "quality": {"name": "quality", "labels": ["good", "noise", "MUA", "artifact"], "auto_eclusive": True},
        "experimental": {
            "name": "experimental",
            "labels": ["acute", "chronic", "headfixed", "freelymoving"],
            "auto_eclusive": False,
        },
    },
    "manual_labels": [
        {"unit_id": 1, "label_category_key": "quality", "label_category_value": "good"},
        {"unit_id": 2, "label_category_key": "quality", "label_category_value": "noise"},
        {"unit_id": 2, "label_category_key": "experimental", "label_category_value": ["chronic", "headfixed"]},
    ],
    "merged_unit_groups": [[3, 6], [10, 14, 20]],  # one cell goes into at most one list
    "removed_units": [3, 31, 42],  # Can not be  in the merged_units
}

labels_definition => label_definitions

auto_eclusive => auto_exclusive

label_definitions.quality.labels => label_definitions.quality.label_options

What exactly does "auto_exclusive" mean?

label_category_key => label_category

label_category_value => labels (make it a list and allow more than one)

Let me know what you think

@alejoe91 alejoe91 added the curation Related to curation module label May 30, 2024
@remi-pr
Copy link
Copy Markdown
Contributor Author

remi-pr commented May 30, 2024

Great comments, I will implement quickly.

Just two notes, which are actually just one:

  • auto_exclusive means that you can only pick one from the list of options . For example a unit can not be 'good' and 'noise'. It could be renamed to just exclusive. I would let that up to you.
  • label_category_value => labels (make it a list and allow more than one) : This is linked to the auto_exclusive parameter. If only one is allowed -> str if more than one -> list as you can see in the example "label_category_value": ["chronic", "headfixed"]. I added a check for that in the validation function.

@zm711
Copy link
Copy Markdown
Member

zm711 commented May 30, 2024

I think I would be with Jeremy here in that auto_exclusive isn't super clear. exclusive by itself basically says the same thing. I would personally prefer more explicit like single_label or single_label_only, but I guess `exclusive isn't too bad.

label_category_value => labels (make it a list and allow more than one) :

Also I weakly fall on Jeremy's side for this too. You could have values always be a list and then if there is exclusivity you just validate that the list only contains one thing rather than a string. That way you are only working with the list and not switching between list and string. What is the benefit of your current implementation over doing just the list?

@magland
Copy link
Copy Markdown
Member

magland commented May 30, 2024

  • auto_exclusive means that you can only pick one from the list of options . For example a unit can not be 'good' and 'noise'. It could be renamed to just exclusive. I would let that up to you.

I think I prefer exclusive

  • label_category_value => labels (make it a list and allow more than one) : This is linked to the auto_exclusive parameter. If only one is allowed -> str if more than one -> list as you can see in the example "label_category_value": ["chronic", "headfixed"]. I added a check for that in the validation function.

I'm concerned about having two different possible types here. I'm thinking about code that parses this... you'd need a lot of if statements... if exclusive... or if is str. And converting between exclusive=False and exclusive=True would be a pain. It also makes validation simpler to always have list. Curious about opinion from others.

EDIT: Ooops I just saw reply from @zm711

@remi-pr
Copy link
Copy Markdown
Contributor Author

remi-pr commented May 30, 2024

I think exclusive is a better name you are right.

labels being a list or a string: We had this conversation with @samuelgarcia and the conclusion was that it is likely that most labels will be exclusive and that sometimes they won't be. Thus by having a string will avoid having labels[0] everywhere it is exclusive. On the other hand we will have to handle the list case separately. I have no strong opinion on this and I am OK with both options.

@zm711
Copy link
Copy Markdown
Member

zm711 commented May 30, 2024

EDIT: Ooops I just saw reply from @zm711

When someone is agreeing with me I'm always happy to reread the opinions :)

@zm711
Copy link
Copy Markdown
Member

zm711 commented May 31, 2024

Sam and I talked about this more last night and I think his idea was that he would really prefer it to just accept string and make everything mutually exclusive, but that would require a slight reworking on the sotringview side. Then string only would make the most sense.

@magland
Copy link
Copy Markdown
Member

magland commented May 31, 2024

I'm a bit concerned about the complexity of this format. Wouldn't this be more straightforward?

... label definitions goes here ...
'labels': [
  {
    'unit_id': '1',
    'manual': {'quality': 'good', 'experimental': 'chronic'}
  },
  ...
]

@alejoe91 alejoe91 added the hackathon-24 Contributions during the SpikeInterface Hackathon May 24 label Jun 1, 2024
@samuelgarcia
Copy link
Copy Markdown
Member

Hi Jeremy,
thanks for the feeback.
I am OK we can simplify we what you propose.
The ultimate stuff to discuss is the value being "list" (always) or scalar or both.
Note that Remy and what for "both" but having list with size one when mode exclusive is OK for me if this make your implementation easier.

I will update this PR this week.

@samuelgarcia samuelgarcia changed the title WIP: Proposal of format to hold the manual curation information Proposal of format to hold the manual curation information Jun 5, 2024
@samuelgarcia
Copy link
Copy Markdown
Member

@remi-pr @zm711 @magland
I push some more change with what we decided.
I also did a small doc in the rst : maybe some american eye to check the gramar could be nice :)

I need to do some more test for converting old format to this new one.

Copy link
Copy Markdown
Member

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments on the rst to start with.

Comment thread doc/modules/curation.rst Outdated
Comment thread doc/modules/curation.rst Outdated
Comment thread doc/modules/curation.rst Outdated
Comment thread doc/modules/curation.rst Outdated
Comment thread doc/modules/curation.rst Outdated
Comment thread doc/modules/curation.rst Outdated
Comment thread doc/modules/curation.rst Outdated
samuelgarcia and others added 4 commits June 5, 2024 14:40
@magland
Copy link
Copy Markdown
Member

magland commented Jun 5, 2024

@samuelgarcia I think this looks better.

But isn't this redundant in that "quality" is the key and also the name? Same for "putative_type"?

"label_definitions": {
        "quality": {"name": "quality", "label_options": ["good", "noise", "MUA", "artifact"], "exclusive": True},
        "putative_type": {
            "name": "putative_type",
            "label_options": ["excitatory", "inhibitory", "pyramidal", "mitral"],
            "exclusive": False,
        },
    },

Maybe "label_definitions" should be a list instead.

@samuelgarcia
Copy link
Copy Markdown
Member

The "name" can be be removed, it was more here for title like "Putative cell type".
Sure in our example it is annoyingly redundant.
I propose to remove at the moment.
We choose dict instead instead list because it ensure unique keys.
Is this OK ?

@magland
Copy link
Copy Markdown
Member

magland commented Jun 5, 2024

yeah that sounds good.

@samuelgarcia
Copy link
Copy Markdown
Member

OK done
And I also did a curation_label_to_dataframe() to illustrate the exclusive=True/False.

Exclusive=False return boolean columns whereas exclusive=True are unique.

   quality excitatory inhibitory pyramidal mitral
1     good      False      False     False  False
2    noise       True      False      True  False
3               False       True     False  False
6               False      False     False  False
10              False      False     False  False
14              False      False     False  False
20              False      False     False  False
31              False      False     False  False
42              False      False     False  False

@samuelgarcia
Copy link
Copy Markdown
Member

@alejoe91 : this is ready to review on my side.
The implementation of applying the format to a sorting will be done in another PR.
This PR include only machinery for handling/checking/converting the curtaion format.

apply_sortingview_curation() still need to be rewrite with this in mind using the convertion.

@samuelgarcia samuelgarcia added this to the 0.101.0 milestone Jun 12, 2024
Copy link
Copy Markdown
Member

@alejoe91 alejoe91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments.

Aren't those 4 files missing?

Comment thread src/spikeinterface/curation/curation_format.py Outdated
Comment thread src/spikeinterface/curation/tests/test_curation_format.py Outdated
Comment thread src/spikeinterface/curation/tests/test_curation_format.py Outdated
Comment thread src/spikeinterface/curation/tests/test_curation_format.py
Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>
@samuelgarcia
Copy link
Copy Markdown
Member

Merci beacoup Rémi.

@samuelgarcia samuelgarcia requested a review from alejoe91 June 14, 2024 19:41
@samuelgarcia samuelgarcia merged commit 2281dda into SpikeInterface:main Jun 18, 2024
@remi-pr
Copy link
Copy Markdown
Contributor Author

remi-pr commented Jun 18, 2024

Great job guys. Let me know if you ever want help implementing the curation to a sorting.

@alejoe91
Copy link
Copy Markdown
Member

Thanks @remi-pr

I think that the next level for curation (also automated) would be to add a curation layer to the SortingAnalyzer.

We now have the analyzer.select_units and we can easily make a remove_units to discard rejected units.

The tricky but interesting part would be to implement an analyzer.merge(merge_unit_ids). This function will need to be implemented in each extension to apply/recompute the merge.

For example:

  • the templates extension could simply recompute the merged template as a weighted sum of the merged unit templates
  • the correlograms will only recompute correlograms for the merge
  • the quality metrics is more tricky: some of them will just need to be recomputed on the merged unit (e.g. all the spike train based), but others (like PCA metrics) will need a recompute on the updated unit neighborhood (e.g., isolation_distance, l-ratio, etc.)

This is a big change, but I think that it's currently the main bottleneck to a clean workflow since now one needs to recompute the analyzer and all extensions after a merge (or a split).

IMO, it makes sense to start with this now since it will be a super useful feature!

@yger
Copy link
Copy Markdown
Collaborator

yger commented Jun 18, 2024

+1, I agree this would be super useful, I can help also if needed, we could start a branch for that. Indeed, I'm currently starting to implement these function myself for the automated curation, but as standalone functions and not at the analyzer level

@yger
Copy link
Copy Markdown
Collaborator

yger commented Jun 19, 2024

Actually, I have a branch and i'll make a PR soon, to implement such a sorting_analyzer.merge_units() function in a proper manner, similarly to select_units(). Don't start anything, I should push that today hopefully to start the discussion

@remi-pr
Copy link
Copy Markdown
Contributor Author

remi-pr commented Jun 19, 2024

Perfect. I am looking forward to this. I am really looking forward to this new feature.

@yger
Copy link
Copy Markdown
Collaborator

yger commented Jun 19, 2024

See #3043

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

curation Related to curation module hackathon-24 Contributions during the SpikeInterface Hackathon May 24

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants