Skip to content

Instability with SC2 in dense recordings #4150

@b-grimaud

Description

@b-grimaud

Sorry to bring this up again, as I'm pretty sure this is only an issue on recordings with a few thousands of units.

I did not keep up with new commits for a while, and when trying to run sorting again on the latest version it systematically crashes.

The weird thing is that the point at which it crashes is not always reproducible. Starting the script from a local session, over SSH or in a tmux server also seems to have an influence.

The most common crash points are :

  • Creating the mask array in compute_similarity_with_templates_array (here)
  • During the main loop of _compute_similarity_matrix_numba (here)

Looking at the resource usage, the pattern is always the same : there are several Python processes, the number of which is equal to n_jobs, with an equal amount of memory usage, and another Python process with a lot more memory (20-30GB). Then, that amount rapidly climbs to to ~50-60GB, then the session crashes. I'm not entirely sure this is an OOM error, because this amount is still small enough to fit in memory. So either it rises so fast that the session crashes before the display has time to update, or something else is happening.

As a side note, _get_optimal_n_jobs works properly and n_jobs is scaled to memory appropriately. I've noticed that it is only used when templates_from_svd is set to False but I'm not quite sure of the difference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions