Skip to content

Conversation

@chrishalcrow
Copy link
Member

@chrishalcrow chrishalcrow commented Feb 19, 2025

This PR would allow user to pass dicts of recordings to all preprocessing classes (for now: just common reference!)

This would allow for the following syntax:

import spikeinterface.full as si
rec, _ = si.generate_ground_truth_recording(num_channels=16)
rec.set_property(key="tetrode_number", values=[0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3])
pp_rec = si.common_reference(rec.split_by("tetrode_number"))
pp_rec
>>> {0: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB,
 1: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB,
 2: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB,
 3: GroundTruthRecording (CommonReferenceRecording): 4 channels - 25.0kHz - 1 segments 
                       250,000 samples - 10.00s - float32 dtype - 3.81 MiB}

which can be chained without problem. Also allows for users to take in two sessions from e.g. different days and ensure the same preprocessing steps are applied to them. Or helpful if you want to split your recording in half and sort seperately (UnitMatch almost do this).

This would certainly remove a few annoying for loops from our labs' pipeline.

We agreed to not touch the BaseRecording class. I made the define_function_or_dict_from_class , which looks through the args and kwargs for a recording or dict of recordings, then applies either the normal function (e.g. CommonReferenceRecording) or produces a dict, where each recording has had the normal function applied to it, keeping the original keys. I've copied the signature, doc and set a new name in the function, rather than calling another decorator (ala

@copy_signature(source_class)
) . I think this is simpler??

Also, note that this only works for the functions. So common_reference(dict_of_recs) is good but CommonReferenceRecording(dict_of_recs) does not work. This seems ok from the user point of view, but maybe there's some deeper issue with loading that I'm not thinking of??

@zm711
Copy link
Member

zm711 commented Feb 19, 2025

Also, note that this only works for the functions. So common_reference(dict_of_recs) is good but CommonReferenceRecording(dict_of_recs) does not work. This seems ok from the user point of view, but maybe there's some deeper issue with loading that I'm not thinking of??

Also I think this was Sam who liked keeping this around right @alejoe91 ? I only ever use the function version of all the classes. And I remember a discussion somewhere on the repo about removing classes from init and only exposing the functions moving forward.

@alejoe91 alejoe91 added core Changes to core module preprocessing Related to preprocessing module labels Feb 20, 2025
@chrishalcrow
Copy link
Member Author

Had another go and it feels more beautiful! Added the silence_period example because it has another arg (rather than kwarg). @alejoe91 @samuelgarcia, would appreciate feedback!

Copy link
Member

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple naming things to think about while you wait for Alessio and Sam.

return preprocessed_recordings_dict


def define_rec_or_dict_function(source_class, name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for symmetry maybe define_rec_or_dict_function_from_class so that we see that this function is the same as below function but to have a dict_function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: define_function_or_dict_from_class, which is the same as below, but I think is more accurate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. I guess the real full name would be define_function_or_dict_of_function_from_class but at some point the name is just too long :P.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep tho original name qnd explain what it does on the docstting for simplicity :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Zach is right: it's nice to have the same grammar as define_function_from_class. Also added docstrings to both new functions

@samuelgarcia
Copy link
Member

This looks good.
We should check the behavior when the input is a list of recordings for instance when concatenating.
What do we want here ?

@chrishalcrow
Copy link
Member Author

This looks good. We should check the behavior when the input is a list of recordings for instance when concatenating. What do we want here ?

I think we should limit it to recordings or dicts of recordings. It encourages the user to label groups of recordings.

Also, if we use lists, we'd probably add a key when we save to keep track of things. But when a user loads it, they'd expect a list. To me, it complexifies the saving/loading.

Anyone have a good use case for lists?

@samuelgarcia
Copy link
Member

samuelgarcia commented Mar 7, 2025

This is OK for me.
Do you want to make doc inside this PR ?
You should for instance change the process_by_channel_group.rst (or maybe make a gallery example) ?

Alessio did a renaming.

@chrishalcrow
Copy link
Member Author

chrishalcrow commented Mar 12, 2025

All looks good: I corrected the spelling on blank_saturation.

Thanks for adding it to all the preprocessors @alejoe91 !

Since there's a lot of new preprocessing stuff in the works, the plan is to do a major docs rewrite once some more features are added.

@alejoe91
Copy link
Member

Perfect! As part of this PR, can you modify this doc page to show both strategies?

@chrishalcrow
Copy link
Member Author

Oops!

@chrishalcrow
Copy link
Member Author

Perfect! As part of this PR, can you modify this doc page to show both strategies?

Hello, this shows how to split, preprocess then aggregate so the user can then sort. So I think we need to get something like #3767 merged before updating the docs and suggesting users use this.

@alejoe91
Copy link
Member

Oups! I merged the absolute imports and a bunch of small conflicts showed up...

@chrishalcrow
Copy link
Member Author

Ughhh, got a new machine and have made a mess... gimme a minute...

@chrishalcrow
Copy link
Member Author

I made too much of a git mess. See new PR here: #3773

@zm711
Copy link
Member

zm711 commented Mar 14, 2025

I did something bad

Favorite commit message maybe ever.

@chrishalcrow chrishalcrow deleted the group-splitting-in-pp branch July 30, 2025 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to core module preprocessing Related to preprocessing module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants