Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA timeout in preprocessing, KS3 #323

Closed
frikyng opened this issue Feb 3, 2021 · 6 comments · Fixed by #595
Closed

CUDA timeout in preprocessing, KS3 #323

frikyng opened this issue Feb 3, 2021 · 6 comments · Fixed by #595

Comments

@frikyng
Copy link

frikyng commented Feb 3, 2021

Hi,

when I am preprocessing Neuropixel phase 3a or NP 1.0 data in KiloSort 3 I get an error from CUDA in the KS GUI. This error happens after the screen response has lagged for a few minutes and I haven't been able to do any work in parallel.

CUDA_ERROR_LAUNCH_TIMEOUT

I have looked up the error online and it seems to be due to the fact that my graphics card (Nvidia Quadro M4000) has to serve my screen and KS at the same time. When the KS instruction to the GPU takes longer than 2 seconds to complete a protocol is triggered that resets the graphics driver (and cancelling KS). It would be possible to remove 2 second threshold in the Windows regEdit but this would only alleviate the symptom and not solve the problem (while additionally making the screen response slow).

What I have tried so far:

  • run the mexGPUall script again
  • checked compatibility of compilers and CUDA (which all ran KS 2 well)
  • tried a small data set to check if it's due to the size of the input

I have seen that when KS 3 is preprocessing data the graphics card is occupied only period-wise. Though KS 3 has a new spike detection algorithm it looks like it is processing chunks like KS 2. A colleague of mine can run KS 3 fine with a Quadro P4000 without any frame rate drops, which only slightly better than my M4000.

@marius10p
Copy link
Contributor

Can you please provide the command line output? Something seems to be going wrong.

In the past we've had to disable that watchdog, but I think current Nvidia drivers automatically disable it or circumvent it somehow. Maybe it's different for Quadro cards. There shouldn't be a disadvantage to disabling the watchdog, and your screen response should not become slow.

@frikyng
Copy link
Author

frikyng commented Feb 5, 2021

First it give me this repeating warning that recurs for nearly how much space there is in the command window

CUDA_ERROR_LAUNCH_TIMEOUT 
> In standalone_detector (line 11)
  In datashift2 (line 40)
  In ksGUI/runPreproc (line 726)
  In ksGUI/runAll (line 627)
  In ksGUI>@(~,~)obj.runAll() (line 319) 

then it throws this error

Error using gpuArray
An unexpected error occurred during CUDA execution. The CUDA error was:
the launch timed out and was terminated

Error in ksFilter (line 15)
    dataRAW = gpuArray(buff);

Error in ksGUI/updateDataView (line 867)
                    datAllF = ksFilter(datAll, obj.ops);

Error in ksGUI/dataClickCB (line 1403)
            obj.updateDataView;

Error in ksGUI>@(f,k)obj.dataClickCB(f,k) (line 385)
            set(obj.H.dataAx, 'ButtonDownFcn', @(f,k)obj.dataClickCB(f, k));

Error using ksGUI/log (line 1588)

Error while evaluating Axes ButtonDownFcn.

My CUDA version is 10.2. My colleague who can run KS 3 without any issues has a Quadro P4000. Maybe mine is just a bit underpowered to handle the task?

@marius10p
Copy link
Contributor

8GB of gpu RAM is more than enough.

I forgot to mention, but you also need to install the specific version of CUDA that your Matlab version requires: https://www.mathworks.com/help/parallel-computing/gpu-support-by-release.html;jsessionid=f68ff768914bd294d61356fc7d1d

@frikyng
Copy link
Author

frikyng commented Feb 19, 2021

Yeah, like I mentioned before, colleagues of mine get by easily with a P4000, which has nearly he same specs. I noticed that there were 5 (!) versions of CUDA installed on the PC so removed all of them and left CUDA 10.0. I am working with Matlab version 2019b so it should be the right one.
Unfortunately, this hasn't alleviated the problem and KS crashed with the same error when I tried it again..

I am attaching a screenshot from the task manager where KS 3 is preprocessing NP data. You can see how it ramps up to 100% on every chunk that is processed.

KS3_newCuda

@marius10p
Copy link
Contributor

The memory usage is stable, that's just the usage ramping up. Your GPU really is up to the task, but there must be something wrong with it's configuration. Have you updated the Nvidia drivers? This is separate from CUDA. In cases like this I would just start over from scratch with uninstalling and re installing visual studio, CUDA and Matlab, in that order.

@frikyng
Copy link
Author

frikyng commented Mar 4, 2021

I reinstalled CUDA and VisualStudio but still have the same issue. Though the throttling pattern of the CPU changed, which shows something changed under the hood (screenshot attached).
Here is my current configuration:

  • Matlab 2019b (2020a is also installed)
  • CUDA 10.1 (tried to run KS3 with freshly installed 10.0 as well)
  • VisualStudio 2017 with the "windows" and three "other toolsets" workloads installed

Does that look alright?
KS3_newCuda(10 1)+newVisuaStudio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants