Improve noise level machinery #3359

samuelgarcia · 2024-09-03T15:12:53Z

Add machinery to compute noise level in paralell
Add get_random_recording_slices() to implement more futur) strategy for random chunk (aka non overlaping,
regular, uniform...)

A very important change is that now the seed=None (instead of seed=0) in the function which I think is the good way. seed must be explicit and no inplicit. So the consequence is:

all test that are running twice the get_random_data_chunk() (sometimes this isn hidden) are not garanted
anymore to have the same results. The solution is to explicitly seed everywhere which is a good practice.

@yger

@cwindolf : have a look to this, this is a first step to have a better noise levels estimate in SI.

…ow more methods

src/spikeinterface/core/recording_tools.py

cwindolf · 2024-09-03T15:16:38Z

Looks cool! @oliche 's strategy could be implemented here now.

h-mayorquin

Two general questions:

Will this not fail with formats that lock IO access to the same region if the chunks overlap?
I am still confused on why computing noise requires so many samples. The methods we use assume normality (we use MAD to estimate the std) but then we go and sample far way more than the converge criteria of normal distributions would naively suggests. What gives? Is there some empirical work on this? Now that we have a lot of open data available estimating sampling requirements for a variety of neural data (species, areas, etc) coul be done. It appears to me that this could be a quick and informative paper that we could put out for the community if there is no previous work.

src/spikeinterface/core/job_tools.py

src/spikeinterface/core/recording_tools.py

h-mayorquin · 2024-09-04T02:02:30Z

src/spikeinterface/core/recording_tools.py

+    )
+
+
+    if worker_ctx["method"] == "mad":


Should pre-allocate to reduce memory footprint.

we do not have garanty that chaunk have the same size.
I think this will handle more globally when adding out=... in get_traces()

h-mayorquin · 2024-09-04T02:04:02Z

src/spikeinterface/core/recording_tools.py

    force_recompute: bool = False,
-    **random_chunk_kwargs,
+    **kwargs,
+    # **random_chunk_kwargs,


Why just not pass different dics? This is more complicated to read, document and has caused bugs before with the verbose and job_kwargs story.

this was to keep backward compatibility I guess.

This is somethign we should discuss. I was for what you propose but this will break backward compatibility.

samuelgarcia · 2024-09-04T06:30:22Z

Will this not fail with formats that lock IO access to the same region if the chunks overlap?

Really good point! The access is read only.
We will be able to add a none overlapiing option in random slices

I am still confused on why computing noise requires so many samples. The methods we use assume normality (we use MAD to estimate the std) but then we go and sample far way more than the converge criteria of normal distributions would naively suggests. What gives? Is there some empirical work on this? Now that we have a lot of open data available estimating sampling requirements for a variety of neural data (species, areas, etc) coul be done. It appears to me that this could be a quick and informative paper that we could put out for the community if there is no previous work.

Honestly I was pretty sure that the number of sample used to be enough.
After discussion with @cwindolf I get the impression that we should have more ...
Charlie any coment ?

yger · 2024-09-04T07:57:59Z

src/spikeinterface/core/recording_tools.py

+        recording_slices = []
+        low = margin_frames
+        size = num_chunks_per_segment
+        for segment_index in range(num_segments):


There could be an option to avoid overlapping chunks. This was not really necessary as long as we had not too many chunks given the size of the recording, but if we are taking more, maybe this is worth considering it.

yes and this would a new method

for a futur PR

cwindolf · 2024-09-04T16:39:17Z

Yeah... in my experience, more blocks helps to stabilize the estimate (let's say we want numbers within x% of each other across runs with different seeds). The data certainly is not Gaussian, it has spikes, and spike activity can vary wildly across a recording. So using very few blocks, they will by chance disproportionately land in higher or lower activity regions (maybe in different ways across channels). You need a good number of blocks to reduce that effect -- for short or super consistent recordings maybe fewer blocks is fine.

Also, it would be cool if si.zscore() and the other normalize_scale stuff can use these tools :)

h-mayorquin · 2024-09-04T16:44:40Z

But if the data is not Gaussian would it mean that using MAD to estimate std is wrong? This assumes normality:

https://en.wikipedia.org/wiki/Median_absolute_deviation
(see relationship to Relation to standard deviation here)

Anyway, if your experience is that more samples stabilize the estimator that I think trumps these considerations.

cwindolf · 2024-09-04T16:49:10Z

Yeah, it's wrong! But I don't have any better ideas. Ideally one would be able to subtract away all of the spikes and then MAD the residuals (which would ideally be only Gaussian noise, but even that is not 100% true...), but that requires sorting, which requires some kind of standardization...

h-mayorquin · 2024-09-04T16:52:17Z

Agree on the limitation. Thanks for answering my questions.

alejoe91 · 2024-10-22T15:13:58Z

@samuelgarcia can you fix the test? There are some concatenated that trigger some errors

src/spikeinterface/core/recording_tools.py

alejoe91 · 2024-10-22T15:18:06Z

src/spikeinterface/core/recording_tools.py

+    random_slices_kwargs : dict
+        Options transmited to  get_random_recording_slices(), please read documentation from this
+        function for more details.
+    **job_kwargs: 


I would actually copy/paste the kwargs in the docstring here, since it's much higher level API

alejoe91

Super @samuelgarcia!!!

Just a few suggestions and failing tests to fix

Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>

for more information, see https://pre-commit.ci

samuelgarcia added 2 commits September 2, 2024 17:36

Refactor the get_random_data_chunks with an internal function. to all…

36474a8

…ow more methods

Noise level in parallel

63574ef

cwindolf reviewed Sep 3, 2024

View reviewed changes

src/spikeinterface/core/recording_tools.py Outdated Show resolved Hide resolved

h-mayorquin reviewed Sep 4, 2024

View reviewed changes

yger reviewed Sep 4, 2024

View reviewed changes

alejoe91 added the core Changes to core module label Sep 4, 2024

yger mentioned this pull request Sep 10, 2024

Extracting traces in parallel to speed up get_noise_levels (or any other traces related functions) #2382

Closed

samuelgarcia added 2 commits October 7, 2024 20:51

feedback and clean

4e38686

update signatures

458e90e

alejoe91 reviewed Oct 22, 2024

View reviewed changes

src/spikeinterface/core/recording_tools.py Outdated Show resolved Hide resolved

alejoe91 reviewed Oct 22, 2024

View reviewed changes

alejoe91 requested changes Oct 22, 2024

View reviewed changes

samuelgarcia and others added 6 commits October 23, 2024 14:58

merci alessio

1ce2f8d

Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>

wip

eb62199

more updates

6ee7299

fix scaling seed

7517d06

Merge branch 'main' into improve_noise_level_machinery

52e8d2a

[pre-commit.ci] auto fixes from pre-commit.com hooks

bac57fe

for more information, see https://pre-commit.ci

alejoe91 approved these changes Oct 24, 2024

View reviewed changes

samuelgarcia merged commit 0df1160 into SpikeInterface:main Oct 25, 2024
15 checks passed

samuelgarcia deleted the improve_noise_level_machinery branch July 29, 2025 13:43

Improve noise level machinery #3359

Improve noise level machinery #3359

Uh oh!

Conversation

samuelgarcia commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cwindolf commented Sep 3, 2024

Uh oh!

h-mayorquin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-mayorquin Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samuelgarcia commented Sep 4, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cwindolf commented Sep 4, 2024

Uh oh!

h-mayorquin commented Sep 4, 2024

Uh oh!

cwindolf commented Sep 4, 2024

Uh oh!

h-mayorquin commented Sep 4, 2024

Uh oh!

alejoe91 commented Oct 22, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alejoe91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

samuelgarcia commented Sep 3, 2024 •

edited

Loading

h-mayorquin left a comment •

edited

Loading

h-mayorquin Sep 4, 2024 •

edited

Loading