Remove the strategy argument from the noise generators

The current noise generation machinery (both `NoiseGeneratorRecording` and the new `MockRecording`, name under discussion https://github.com/SpikeInterface/spikeinterface/pull/4520) have a `strategy` argument that determines how noise is generated internally. `tile_pregenerated` was the first strategy I introduced in #1581, and `on_the_fly` was added later by @samuelgarcia in #1948 with the goal of reducing memory consumption even further by avoiding the upfront allocation of a noise block.

I think this argument should be removed. How the noise is generated internally is an implementation detail that we should not expose to the user. The choice of strategy does not affect the behavior of the class from the outside; we should just keep whichever one performs best. If `on_the_fly` consumed less memory without a significant cost to generation speed, we should use it. The point is that the internal generation method could change over time and we do not want to lock ourselves into supporting both as part of the public API. The central contract of the class is to behave like an array-like lazy generator of noise.

I profiled both strategies with [memray](https://github.com/bloomberg/memray) to measure peak memory during `get_traces()`, which is the relevant metric for spikeinterface: in a preprocessing chain, each `get_traces` call's peak memory compounds through the stack. You can reproduce the results with `uv run profile_strategies.py` ([gist](https://gist.github.com/h-mayorquin/db025f02ac949d7ee80b32f999f9fb60)).

**Peak memory during get_traces()**

| Scenario | Output size | tile_pregenerated | on_the_fly | on-the-fly-memory-overhead |
|----------|------------|------------------|-----------|----------|
| 32ch, 1000 samples | 0.1 MB | 0.1 MB | 4.0 MB | 32x |
| 32ch, 30000 samples (1s) | 3.7 MB | 3.7 MB | 11.0 MB | 3x |
| 384ch, 1000 samples | 1.5 MB | 1.5 MB | 45.6 MB | 31x |
| 384ch, 30000 samples (1s) | 43.9 MB | 43.9 MB | 131.8 MB | 3x |
| 384ch, 90000 samples (3s) | 131.8 MB | 131.8 MB | 219.7 MB | 1.7x |
| 384ch, 1800000 samples (1min) | 2636.7 MB | 2636.7 MB | 2724.6 MB | 1.0x |

**Speed**

| Scenario | tile_pregenerated | on_the_fly | slowdown |
|----------|------------------|-----------|----------|
| 32ch, 1000 samples | 0.01 ms | 9.52 ms | 1087x |
| 32ch, 30000 samples (1s) | 0.57 ms | 17.63 ms | 31x |
| 384ch, 1000 samples | 0.09 ms | 111.43 ms | 1281x |
| 384ch, 30000 samples (1s) | 26.46 ms | 244.12 ms | 9x |
| 384ch, 90000 samples (3s) | 78.40 ms | 515.28 ms | 7x |
| 384ch, 1800000 samples (1min) | 1602.41 ms | 8144.07 ms | 5x |

`tile_pregenerated` is better overall. It allocates exactly the output array during `get_traces()`, nothing more, and is 7x to 3000x faster depending on the scenario. There are two places where `on_the_fly` has an advantage:

1. **Zero initialization cost.** `on_the_fly` allocates nothing at init, which is useful for serialization and dump/load. This has a simple solution: we can delay the generation of the tile until the first `get_traces()` call and cache it. That would give us zero init cost with the runtime performance of `tile_pregenerated`. We should do this regardless of which way we decide to go.

2. **Non-repeating noise across blocks.** `on_the_fly` seeds each block with `(seed, block_index)`, so different blocks produce genuinely different noise. This might be necessary for simulated data. I don't see a simple fix: numpy's RNG doesn't support seeking, so reproducibility requires generating the full block even for small slices. The overhead is worst when the requested trace is small relative to the noise block (default 30000 samples, 1 second at 30kHz), which is the typical preprocessing chunk size (3x memory, 9x speed). For larger reads the overhead shrinks (see the 1-minute row) but the speed cost never disappears.

Now that we are separating testing from simulation, tiling is the clear choice for the testing side. The open question is whether simulation needs non-repeating noise. @samuelgarcia @cwindolf @alejoe91 @chrishalcrow, does repeating the same noise block affect simulation quality or introduce artifacts in downstream analysis, or is a large enough block size indistinguishable from non-repeating noise in practice? I think we should leave the simulation side with `on_the_fly` as that is the [current default](https://github.com/SpikeInterface/spikeinterface/blob/4c3a6f9f9a1c61cd634fa353c469e4327c4fb900/src/spikeinterface/core/generate.py#L2319).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the strategy argument from the noise generators #4522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenario	Output size	tile_pregenerated	on_the_fly	on-the-fly-memory-overhead
32ch, 1000 samples	0.1 MB	0.1 MB	4.0 MB	32x
32ch, 30000 samples (1s)	3.7 MB	3.7 MB	11.0 MB	3x
384ch, 1000 samples	1.5 MB	1.5 MB	45.6 MB	31x
384ch, 30000 samples (1s)	43.9 MB	43.9 MB	131.8 MB	3x
384ch, 90000 samples (3s)	131.8 MB	131.8 MB	219.7 MB	1.7x
384ch, 1800000 samples (1min)	2636.7 MB	2636.7 MB	2724.6 MB	1.0x

Scenario	tile_pregenerated	on_the_fly	slowdown
32ch, 1000 samples	0.01 ms	9.52 ms	1087x
32ch, 30000 samples (1s)	0.57 ms	17.63 ms	31x
384ch, 1000 samples	0.09 ms	111.43 ms	1281x
384ch, 30000 samples (1s)	26.46 ms	244.12 ms	9x
384ch, 90000 samples (3s)	78.40 ms	515.28 ms	7x
384ch, 1800000 samples (1min)	1602.41 ms	8144.07 ms	5x

Remove the strategy argument from the noise generators #4522

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions