Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda error in extract_spikes #316

Closed
rajatsaxena opened this issue Feb 1, 2021 · 18 comments · Fixed by #595
Closed

cuda error in extract_spikes #316

rajatsaxena opened this issue Feb 1, 2021 · 18 comments · Fixed by #595

Comments

@rajatsaxena
Copy link
Contributor

rajatsaxena commented Feb 1, 2021

I get the following error:

799.31 sec, 5601 batches, 7512789 spikes 
813.56 sec, 5701 batches, 7529466 spikes 
827.84 sec, 5801 batches, 7541652 spikes 
842.04 sec, 5901 batches, 7551935 spikes 
856.31 sec, 6001 batches, 7560302 spikes 
870.63 sec, 6101 batches, 7567715 spikes 
884.90 sec, 6201 batches, 7574967 spikes 
899.16 sec, 6301 batches, 7582074 spikes 
913.43 sec, 6401 batches, 7588447 spikes 
927.68 sec, 6501 batches, 7594498 spikes 
Error using gpuArray/subsasgn
An unexpected error occurred trying to launch a kernel. The CUDA error was:
invalid configuration argument

Error in extract_spikes (line 97)
    st(5,:) = cF;

Error in main_kilosort3_swil3 (line 40)
[rez, st3, tF]     = extract_spikes(rez);

It seems like in a random batch, the output variable st has the shape 4 x N rather than 6 x N as expected from spikedetector3PC. I will have to go through the mex code to understand why there is a shape mismatch. Let me know if you need any more information.

@marius10p
Copy link
Contributor

That looks like an uninformative CUDA error. Does it always stop in the same place? Can you try to see what's different about that batch? You might also need to upgrade your Nvidia drivers. What is your GPU?

@rajatsaxena
Copy link
Contributor Author

It does always stop at the same place. I couldn't find anything weird with the batch except the batch before this had variable st with size 6 x 0. I tried skipping this batch and running through a different set of batches: got the same error with the same pattern, i.e., the previous batch had variable st with size 6 x 0. The two erroneous batches are of 6 x 23 and 6 x 13 size.

I have updated the drivers. GPU = Nvidia GeForce 1070 Ti

@marius10p
Copy link
Contributor

It's probably the batch with 0 spikes where it errored (CUDA errors come up asynchronously, on the next GPU operation). And it's probably 0 spikes because something went wrong inside the kernel, not because there are really 0 spikes (can you check, do you see spikes in that batch?).

@rajatsaxena
Copy link
Contributor Author

yeah, I see spikes in that batch.

@AlexSonneborn
Copy link

Hi @rajatsaxena, have you made any progress with this in the last couple of days? I just got Kilosort 3 and am also having this problem. It only occurs with certain recordings though, maybe because of a lack of spikes in a batch?

@shirquinn
Copy link

Hi, @rajatsaxena @AlexSonneborn , any news? I'm also encountering this error

@DradeAW
Copy link

DradeAW commented Feb 9, 2021

Hi,

I'm also having the same issue.
Here is my output:

>> main_kilosort3
Looking for data inside /users/nsr/wyngaard/Documents/cells_tracking/0153/session_1/rec 
Time   0s. Computing whitening matrix.. 
Getting channel whitening matrix... 
Channel-whitening matrix computed. 
Time   0s. Loading raw data and applying filters... 
Time  13s. Finished preprocessing 147 batches. 
vertical pitch size is 1.250000e+01 
horizontal pitch size is 250 
  -25.0000  104.1667  233.3333  362.5000  491.6667  620.8333  750.0000

    39

0.04 sec, 1 batches, 109 spikes 
2.80 sec, 101 batches, 10689 spikes 
4.05 sec, 147 batches, 36229 spikes 
time 10.70, Shifted up/down 147 batches. 
0.03 sec, 1 batches, 236 spikes 
Error using gpuArray/subsasgn
An unexpected error occurred trying to launch a kernel. The CUDA error was:
invalid configuration argument

Error in extract_spikes (line 97)
    st(5,:) = cF;

Error in main_kilosort3 (line 40)
[rez, st3, tF]     = extract_spikes(rez);

@bryzgalovdm
Copy link
Contributor

Hello,

I had the same error due to mismatch between real and indicated number of channels. (I indicated 68 channels for 64-channel recording).
Maybe, it will help someone - check whether your input (nChan, groups) matches the reality of your data.

@DradeAW
Copy link

DradeAW commented Feb 12, 2021

Ah yes, I forgot to change ops.NchanTOT (which was still at 385 instead of 64).

Thank you @bryzgalovdm !

@sujayane
Copy link

I still have this problem with ops.NchanTOT=384 (i.e right # of channels for NP probe). @rajatsaxena @AlexSonneborn - did you solve this by correcting the channel count?

@sujayane
Copy link

My problem was also #of channels after all. ops.NchanTOT should be 385 (384 neural channels + 1 sync channel) as was originally in the config file.

@JoseGuzman
Copy link
Contributor

JoseGuzman commented Mar 16, 2021

I also find this problem with the a correct ops.NchatTOT. It happens when Kilosort3 and Kilosort2.5 finds a long period without spikes -like blank periods at the beginning of the recording, see #358 - but also in recordings where these absent periods occurs for some reason.

0.03 sec, 1 batches, 0 spikes 
Error using gpuArray/subsasgn
An unexpected error occurred trying to launch a kernel. The CUDA error was:
invalid configuration argument

Error in extract_spikes (line 97)
    st(5,:) = cF;

Error in main_kilosort3 (line 49)
[rez, st3, tF]     = extract_spikes(rez);

I can get rid of in Kilosort-2.5 if I add ops.nblocks=0 to my config file

@gawygawy
Copy link

Hi, I also find that if if I turn off the registration I can avoid this error for certain recordings. Why is that?

@JoseGuzman
Copy link
Contributor

JoseGuzman commented Mar 25, 2021

I'm not very sure about it but pull #288 suggests a solution that may work when you have zeros in some channels...

@shirquinn
Copy link

even when nblocks =0, and even with #288 solution, i keep getting this error with kilosort 3
i record with a 32 channel linear neuronexus electrode.. if anyone have more suggestions please share

@RishiRajalingham
Copy link

@gawygawy @shirquinn This is my guess: this error occurs when there are no spikes in a registration block in a batch. This could be because the batch contains a period of time when there are no spikes (e.g. paused streaming with zero padding), or all the electrodes in that block aren't measuring spiking activity (e.g. all channels in that block sitting out of cortex).

A fix for the former is to make your batch size bigger, or to remove those batches beforehand (as in #288).
A fix for the latter is to make the registration blocks bigger (make ops.nblocks smaller).

@jingjie-li
Copy link

I also got this kind of error with my data. But setting nblocks =0 cannot solve that.
The problem is that I have a short batch which has firing_rate=0. A way to solve this is to make those batches larger.
To do that, we tried to set a breakpoint at line 15 of https://github.com/MouseLand/Kilosort/blob/main/preProcess/preprocessDataSub.m

and run ops.NT = ops.NT*2.
Therefore we can set the batch size doubled. And it can avoid that error in my data.

@Rubinsteinlab
Copy link

@celelion , do you just add ops.NT = ops.NT*2 to line 15? can you send me the code to see?
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.