Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error was: invalid configuration argument #274

Closed
claysmyth opened this issue Nov 9, 2020 · 11 comments · Fixed by #595
Closed

CUDA error was: invalid configuration argument #274

claysmyth opened this issue Nov 9, 2020 · 11 comments · Fixed by #595

Comments

@claysmyth
Copy link

Having trouble approaching the following error. I think it has something to do with an inappropriate ops.Nfilt value, so I will set that to 1024. Any insight would be appreciated.


Error using gpuArray/nan
An unexpected error occurred trying to launch a kernel. The CUDA error was:
invalid configuration argument

Error in median (line 71)
y = nan(s,'like',x);

Error in learnTemplates (line 248)
toc, ibatch, niter, Nfilt, sum(nsp), median(mu), numel(st0), ndrop)

Error in learnAndSolve8b (line 35)
rez = learnTemplates(rez, rez.iorig);

Error in main_kilosort (line 46)
rez = learnAndSolve8b(rez, iseed);

Error in run (line 91)
evalin('caller', strcat(script, ';'));

@claysmyth
Copy link
Author

I see that #231 is the same problem, but I cannot implement the same workaround of parsing out sparse portions of the recording. I will comment out the print statement for now as it does not seem crucial to the sort, but insight would still be welcomed.

@claysmyth
Copy link
Author

Now I'm running into this error:

`Error using gpuArray/subsref
An unexpected error occurred trying to launch a kernel. The CUDA error was:
invalid configuration argument

Error in learnTemplates (line 228)
W(:,Nfilt + [1:size(dWU0,3)],:) = W0(:,ones(1,size(dWU0,3)),:);
% initialize temporal components of waveforms

Error in learnAndSolve8b (line 35)
rez = learnTemplates(rez, rez.iorig);

Error in main_kilosort (line 46)
rez = learnAndSolve8b(rez, iseed);

Error in run (line 91)
evalin('caller', strcat(script, ';'));

`

@marius10p
Copy link
Contributor

Can you please confirm that the path is pointing only to Kilosort2.5 and you compiled the mex files successfully? Also please copy paste here the output of "gpuDevice(1)" in Matlab.

@RobertoDF
Copy link

Hi Marius,
I run into exactly the same two errors, I compiled the mex files succesfully and kilosort 2.0 is not on the path.

This is the output of gpuDevice():
      
 Name: 'Quadro RTX 4000'
                     Index: 1
         ComputeCapability: '7.5'
            SupportsDouble: 1
             DriverVersion: 11
            ToolkitVersion: 10.1000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 8.5899e+09
           AvailableMemory: 6.9117e+09
       MultiprocessorCount: 36
              ClockRateKHz: 1545000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1


This is the ops file:
   rootZ: 'F:/ecephys_output/GG_M608__g0_t40,44/catgt_GG_M608__g0/GG_M608__g0_imec0/imec0_ks2'
              datafile: 'F:/ecephys_output/GG_M608__g0_t40,44/catgt_GG_M608__g0/GG_M608__g0_imec0/GG_M608__g0_tcat.imec0.ap.bin'
                ntbuff: 64
              AUCsplit: 0.9000
              nSkipCov: 25
                useRAM: 0
                fshigh: 300
                 minFR: 0
                 nskip: 25
                LTseed: 1
                trange: [0 Inf]
            nNeighbors: 32
             scaleproc: 200
                    NT: 65600
              momentum: [20 400]
               reorder: 1
               chanMap: 'chanMap.mat'
                    Th: [10 4]
                 fproc: 'F:/kilosort_datatemp/temp_wh.dat'
                 ThPre: 8
               CSBseed: 1
                   lam: 10
             sigmaMask: 30
          nfilt_factor: 4
        whiteningRange: 32
                  gain: 2.3438
    minfr_goodchannels: 0.0400
                  nPCs: 3
                   GPU: 1
                 spkTh: -6
                 Nchan: 385
              NchanTOT: 385
                    fs: 3.0000e+04
                   sig: 20
               nblocks: 5
               fbinary: 'F:\ecephys_output\GG_M608__g0_t40,44\catgt_GG_M608__g0\GG_M608__g0_imec0\GG_M608__g0_tcat.imec0.ap.bin'


@RobertoDF
Copy link

RobertoDF commented Nov 24, 2020

Reading other related issues I think the problem comes from the way ecephys_spike_sorting concatenates trials. Trials are concatenated with zeropadding, so if you concatenate two recordings that had a 5 min break you will have 5 min of no spiking in the middle and that seems to cause the error. What would be the best workaround?

My concatenated file has breaks of maximum 20 seconds.

@claysmyth
Copy link
Author

@marius10p I can confirm that mexGPUall.m ran successfully, and that I am running kilosort 2.5 (as I only have 2.5 downloaded):

`Building with 'nvcc'.
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(65): warning: variable "C0" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(159): warning: variable "NchanUp" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(161): warning: variable "d2" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(230): warning: variable "nt0" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "nt0max" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "NchanMax" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(65): warning: variable "C0" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(159): warning: variable "NchanUp" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(161): warning: variable "d2" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(230): warning: variable "nt0" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "nt0max" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "NchanMax" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(65): warning: variable "C0" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(159): warning: variable "NchanUp" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(161): warning: variable "d2" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(230): warning: variable "nt0" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "nt0max" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "NchanMax" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(65): warning: variable "C0" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(159): warning: variable "NchanUp" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(161): warning: variable "d2" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(230): warning: variable "nt0" was set but never used
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "nt0max" was declared but never referenced
/home/csmyth/matlab_packages/Kilosort/CUDA/spikedetector3.cu(20): warning: variable "NchanMax" was declared but never referenced

MEX completed successfully.
Building with 'nvcc'.
MEX completed successfully.
Building with 'nvcc'.
MEX completed successfully.
Building with 'nvcc-dp'.
MEX completed successfully.
Building with 'nvcc'.
MEX completed successfully.
Building with 'nvcc'.
MEX completed successfully.
Building with 'nvcc'.
MEX completed successfully.
Building with 'nvcc'.
MEX completed successfully.
Building with 'nvcc'.
MEX completed successfully.
`

When run gpuDevice(1):

`CUDADevice with properties:

                  Name: 'TITAN Xp'
                 Index: 1
     ComputeCapability: '6.1'
        SupportsDouble: 1
         DriverVersion: 10.1000
        ToolkitVersion: 10
    MaxThreadsPerBlock: 1024
      MaxShmemPerBlock: 49152
    MaxThreadBlockSize: [1024 1024 64]
           MaxGridSize: [2.1475e+09 65535 65535]
             SIMDWidth: 32
           TotalMemory: 1.2788e+10
       AvailableMemory: 1.2521e+10
   MultiprocessorCount: 30
          ClockRateKHz: 1582000
           ComputeMode: 'Exclusive process'
  GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
      CanMapHostMemory: 1
       DeviceSupported: 1
        DeviceSelected: 1

`

@nmtimme
Copy link

nmtimme commented Dec 30, 2020

I've run into a similar error and I agree that it seems to be due to periods of the recording with no spikes. In my investigations it looks like if mexGetSpikes2 in learnTemplates doesn't find any spikes or templates in ibatch = 1, it results in Nfilt = 0. Then, something goes wrong in the later mex functions, which results in a CUDA error if you try to write to W (as shown above) or (in my case) nsp a few lines later. Basically, it seems like additional functionality needs to be added to deal with breaks in the data (e.g. #288) or recordings with few good neurons, at least in terms of reporting the issue so the user knows the problem is a lack of neurons, not some other issue. As always, thanks @marius10p for all your hard work! Our lab really appreciates this resource!

@ensorpalacios
Copy link

Hi @marius10p , is this still an issue for anyone? I think I'm running on it too. I'm recording a baseline of 15/20 minutes, apply a drug and wait ~15/20, then record other 15/20 min. I'm concatenating the two sessions with CatGT and then use Kilosort2.

  • The error is Kilosort2 is the following:
    Looking for data inside /media/bunaken/Ensor/npx/NNos/EP_NNos_210729_g0
    Time 0s. Determining good channels..
    found 142624 threshold crossings in 108.35 seconds of data
    found 94 bad channels
    Time 41s. Computing whitening matrix..
    Getting channel whitening matrix...
    Channel-whitening filters computed.
    Time 87s. Loading raw data and applying filters...
    Time 1530s. Finished preprocessing 3353 batches.
    Obtained 7 PC waveforms in 3.57 seconds
    time 0.37, pre clustered 1 / 3353 batches
    time 71.56, pre clustered 501 / 3353 batches
    time 142.69, pre clustered 1001 / 3353 batches
    time 177.87, pre clustered 1501 / 3353 batches
    time 204.67, pre clustered 2001 / 3353 batches
    time 254.38, pre clustered 2501 / 3353 batches
    time 319.69, pre clustered 3001 / 3353 batches
    time 0.11, compared 1 / 3353 batches
    time 34.50, compared 501 / 3353 batches
    time 68.45, compared 1001 / 3353 batches
    time 104.24, compared 1501 / 3353 batches
    time 138.71, compared 2001 / 3353 batches
    time 173.01, compared 2501 / 3353 batches
    time 207.65, compared 3001 / 3353 batches
    time 235.45, Re-ordered 3353 batches.
    Time 239s. Optimizing templates ...
    239.60 sec, 1 / 6707 batches, 40 units, nspks: 21.8025, mu: 23.9409, nst0: 419, merges: 0.0000, 0.0000
    273.07 sec, 101 / 6707 batches, 493 units, nspks: 5348.7165, mu: 20.4125, nst0: 6301, merges: 91.7763, 8.7845
    331.39 sec, 201 / 6707 batches, 551 units, nspks: 6910.8232, mu: 17.9800, nst0: 7099, merges: 110.9458, 11.9081
    375.51 sec, 301 / 6707 batches, 410 units, nspks: 1030.8911, mu: 17.2411, nst0: 0, merges: 64.2969, 4.2458
    385.21 sec, 401 / 6707 batches, 88 units, nspks: 4.3247, mu: 14.5654, nst0: 0, merges: 24.2403, 0.5162
    Error using gpuArray/nan
    An unexpected error occurred trying to launch a kernel. The CUDA error was:
    invalid configuration argument

Error in median (line 71)
y = nan(s,'like',x);

Error in learnAndSolve8b (line 264)
toc, ibatch, niter, Nfilt, sum(nsp), median(mu), numel(st0), ndrop)

Error in master_kilosortM (line 53)
rez = learnAndSolve8b(rez);

  • The output of gpuDevice() is:
    ans =
    CUDADevice with properties:
    Name: 'Quadro P4000'
    Index: 1
    ComputeCapability: '6.1'
    SupportsDouble: 1
    DriverVersion: 11.4000
    ToolkitVersion: 11
    MaxThreadsPerBlock: 1024
    MaxShmemPerBlock: 49152
    MaxThreadBlockSize: [1024 1024 64]
    MaxGridSize: [2.1475e+09 65535 65535]
    SIMDWidth: 32
    TotalMemory: 8.5055e+09
    AvailableMemory: 7.7485e+09
    MultiprocessorCount: 14
    ClockRateKHz: 1480000
    ComputeMode: 'Default'
    GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
    CanMapHostMemory: 1
    DeviceSupported: 1
    DeviceAvailable: 1
    DeviceSelected: 1

  • mexGPUall runs successfully, although I also get this:
    Building with 'nvcc'.
    nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
    nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
    MEX completed successfully.

Does anyone have any tips? Thanks

@RobertoDF
Copy link

Hi! You should take a look at issue #275. I have a fork of kilosort that took care of that problem. You can also reduce the "pause" with cat_gt https://billkarsh.github.io/SpikeGLX/help/dmx_vs_gbl/dmx_vs_gbl/ that would solve the problem too and it's simpler if you are already using the eceephys pipeline.

@ensorpalacios
Copy link

Hi @RobertoDF, thanks a lot for the help. I'll try your fork of KS 2.5 then.

On a side note, I'm already using cat_gt, but I'm unsure how this could help? I'm using cat_gt with the following options:
@echo off
CatGT -dir=X:\Ensor\npx -run=EP_NNos_210805 -g=0 -t=0,1 ^
-prb_fld -t_miss_ok ^
-ap -prb=0 ^
-aphipass=300 -gbldmx -gfix=10,8 ^
-dest=X:\Ensor\npx\EP_NNos_210805_g0
echo done

Thanks
Ensor

@ensorpalacios
Copy link

@RobertoDF I tried your fork (downloaded and extracted the zip file) but I'm running in the following problems:

  • if I try to run your fork of KS on data on which KS2 works (straight 40 min recording) I get this error:
    Error using gpuArray/nan
    An unexpected error occurred trying to launch a kernel. The CUDA error was:
    invalid configuration argument

Error in median (line 71)
y = nan(s,'like',x);

Error in learnTemplates (line 248)
toc, ibatch, niter, Nfilt, sum(nsp), median(mu), numel(st0), ndrop)

Error in learnAndSolve8b (line 35)
rez = learnTemplates(rez, rez.iorig);

Error in main_kilosort (line 48)
rez = learnAndSolve8b(rez, iseed);

  • Instead, if I try your version on data with a pause in between (on which KS2 does not work as reported above) then KS runs in 2 minutes (for recording of 40') and I get odd results (attached example from phy, all units look similar)

phy_screenshot_20210831174931_WaveformView

Am I doing something obviously wrong when I setup the new KS?

By the way, I'm running this on Ubuntu 20.04; also, KS2 folder does not appear in my MATLAB path.

Thanks
Ensor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants