Proposal of format to hold the manual curation information#2933
Proposal of format to hold the manual curation information#2933samuelgarcia merged 20 commits intoSpikeInterface:mainfrom
Conversation
Took 1 hour 17 minutes
for more information, see https://pre-commit.ci
|
Thanks @remi-pr and @samuelgarcia I have some small suggestions. Let's look at this example from your tests merged_and_removed = {
"unit_ids": [1, 2, 3, 6, 10, 14, 20, 31, 42],
"labels_definition": {
"quality": {"name": "quality", "labels": ["good", "noise", "MUA", "artifact"], "auto_eclusive": True},
"experimental": {
"name": "experimental",
"labels": ["acute", "chronic", "headfixed", "freelymoving"],
"auto_eclusive": False,
},
},
"manual_labels": [
{"unit_id": 1, "label_category_key": "quality", "label_category_value": "good"},
{"unit_id": 2, "label_category_key": "quality", "label_category_value": "noise"},
{"unit_id": 2, "label_category_key": "experimental", "label_category_value": ["chronic", "headfixed"]},
],
"merged_unit_groups": [[3, 6], [10, 14, 20]], # one cell goes into at most one list
"removed_units": [3, 31, 42], # Can not be in the merged_units
}labels_definition => label_definitions auto_eclusive => auto_exclusive label_definitions.quality.labels => label_definitions.quality.label_options What exactly does "auto_exclusive" mean? label_category_key => label_category label_category_value => labels (make it a list and allow more than one) Let me know what you think |
Took 1 hour 7 minutes
|
Great comments, I will implement quickly. Just two notes, which are actually just one:
|
Took 10 minutes
for more information, see https://pre-commit.ci
|
I think I would be with Jeremy here in that
Also I weakly fall on Jeremy's side for this too. You could have values always be a list and then if there is exclusivity you just validate that the list only contains one thing rather than a string. That way you are only working with the list and not switching between list and string. What is the benefit of your current implementation over doing just the list? |
I think I prefer exclusive
I'm concerned about having two different possible types here. I'm thinking about code that parses this... you'd need a lot of if statements... EDIT: Ooops I just saw reply from @zm711 |
|
I think labels being a list or a string: We had this conversation with @samuelgarcia and the conclusion was that it is likely that most labels will be exclusive and that sometimes they won't be. Thus by having a string will avoid having labels[0] everywhere it is exclusive. On the other hand we will have to handle the list case separately. I have no strong opinion on this and I am OK with both options. |
When someone is agreeing with me I'm always happy to reread the opinions :) |
|
Sam and I talked about this more last night and I think his idea was that he would really prefer it to just accept string and make everything mutually exclusive, but that would require a slight reworking on the sotringview side. Then string only would make the most sense. |
|
I'm a bit concerned about the complexity of this format. Wouldn't this be more straightforward? ... label definitions goes here ...
'labels': [
{
'unit_id': '1',
'manual': {'quality': 'good', 'experimental': 'chronic'}
},
...
] |
|
Hi Jeremy, I will update this PR this week. |
for more information, see https://pre-commit.ci
zm711
left a comment
There was a problem hiding this comment.
Just some comments on the rst to start with.
for more information, see https://pre-commit.ci
Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>
|
@samuelgarcia I think this looks better. But isn't this redundant in that "quality" is the key and also the name? Same for "putative_type"? "label_definitions": {
"quality": {"name": "quality", "label_options": ["good", "noise", "MUA", "artifact"], "exclusive": True},
"putative_type": {
"name": "putative_type",
"label_options": ["excitatory", "inhibitory", "pyramidal", "mitral"],
"exclusive": False,
},
},Maybe "label_definitions" should be a list instead. |
|
The "name" can be be removed, it was more here for title like "Putative cell type". |
|
yeah that sounds good. |
for more information, see https://pre-commit.ci
|
OK done Exclusive=False return boolean columns whereas exclusive=True are unique. |
|
@alejoe91 : this is ready to review on my side.
|
…nto curation_format
for more information, see https://pre-commit.ci
alejoe91
left a comment
There was a problem hiding this comment.
A few minor comments.
Aren't those 4 files missing?
Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>
|
Merci beacoup Rémi. |
|
Great job guys. Let me know if you ever want help implementing the curation to a sorting. |
|
Thanks @remi-pr I think that the next level for curation (also automated) would be to add a curation layer to the We now have the The tricky but interesting part would be to implement an For example:
This is a big change, but I think that it's currently the main bottleneck to a clean workflow since now one needs to recompute the analyzer and all extensions after a merge (or a split). IMO, it makes sense to start with this now since it will be a super useful feature! |
|
+1, I agree this would be super useful, I can help also if needed, we could start a branch for that. Indeed, I'm currently starting to implement these function myself for the automated curation, but as standalone functions and not at the analyzer level |
|
Actually, I have a branch and i'll make a PR soon, to implement such a sorting_analyzer.merge_units() function in a proper manner, similarly to select_units(). Don't start anything, I should push that today hopefully to start the discussion |
|
Perfect. I am looking forward to this. I am really looking forward to this new feature. |
|
See #3043 |
We propose a format to contain the information relative to manual curation.