-
Notifications
You must be signed in to change notification settings - Fork 222
Description
Sorry to bring this up again, as I'm pretty sure this is only an issue on recordings with a few thousands of units.
I did not keep up with new commits for a while, and when trying to run sorting again on the latest version it systematically crashes.
The weird thing is that the point at which it crashes is not always reproducible. Starting the script from a local session, over SSH or in a tmux server also seems to have an influence.
The most common crash points are :
- Creating the
mask
array incompute_similarity_with_templates_array
(here) - During the main loop of
_compute_similarity_matrix_numba
(here)
Looking at the resource usage, the pattern is always the same : there are several Python processes, the number of which is equal to n_jobs
, with an equal amount of memory usage, and another Python process with a lot more memory (20-30GB). Then, that amount rapidly climbs to to ~50-60GB, then the session crashes. I'm not entirely sure this is an OOM error, because this amount is still small enough to fit in memory. So either it rises so fast that the session crashes before the display has time to update, or something else is happening.
As a side note, _get_optimal_n_jobs
works properly and n_jobs
is scaled to memory appropriately. I've noticed that it is only used when templates_from_svd
is set to False
but I'm not quite sure of the difference.