Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error in Kilosort3: illegal address during spike sorting (template matching on binary file) #380

Closed
Tywang-720 opened this issue Apr 21, 2021 · 2 comments · Fixed by #595

Comments

@Tywang-720
Copy link

Tywang-720 commented Apr 21, 2021

Hi I'm steadily getting the following error during spike sorting (when trying to do template matching on binary file from Ks3 GUI).
I report it here in case it helps.

Error info:

In trackAndSort (line 3)
In ksGUI/runSpikesort (line 801)
In ksGUI>@(,)obj.runSpikesort() (line 339)
Error using +
Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS

Error in trackAndSort (line 173)
dWU1 = dWU1 + dWU0;

Error in ksGUI/runSpikesort (line 801)
[obj.rez, st3, tF] = trackAndSort(obj.rez);

Error in ksGUI>@(,)obj.runSpikesort() (line 339)
'Callback', @(,)obj.runSpikesort());

Error while evaluating UIControl Callback.

(The first 5 lines "In trackAndSort...CUDA_ERROR_ILLEGAL_ADDRESS" were actually shown for hundreds of times and I cannot see the message before them. Message in GUI says the program is trying to do template matching on binary file)

Data info:

I'm getting this issue from a 20-minutes-long, 1024-channels recording file (~77GB , 1500 units).

Ks3 works normally on another 20-seconds-long, 1024-channels recording file (~600MB, 40 units),with same channel map.
So I guess this might be related to data size (But I do notice in other issues people are using much larger datasets than these).

Data preprocessing could complete normally.

System Info:

Windows10 64bit machine
RAM 64GB
CPU: Intel i9-10900K CPU@3.70GHz
GPU: GeForce RTX 3070

MATLAB 2021a
CUDA 11.0.2_451.48
Visual Studio 2019.

Performance of other versions of KiloSort on same file:

KiloSort2.5,: got "out of Memory on device" error during preprocessing.

KiloSort2: similar error during "Main optimization":

Time 0s. Determining good channels..
found 914834 threshold crossings in 288.64 seconds of data
found 0 bad channels
Time 74s. Computing whitening matrix..
Getting channel whitening matrix...
Channel-whitening matrix computed.
Time 83s. Loading raw data and applying filters...
Time 377s. Finished preprocessing 616 batches.
random seed for clusterSingleBatches: 1
Obtained 7 PC waveforms in 6.33 seconds
time 0.73, pre clustered 1 / 616 batches
time 321.08, pre clustered 501 / 616 batches
time 0.28, compared 1 / 616 batches
time 91.58, compared 501 / 616 batches
time 114.35, Re-ordered 616 batches.
Time 136s. Optimizing templates ...
136.94 sec, 1 / 616 batches, 107 units, nspks: 44.2454, mu: 10.0000, nst0: 748, merges: 0.4000, 0.0000

Warning: Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS

In learnTemplates (line 4)
In learnAndSolve8b (line 15)
In ksGUI/runSpikesort (line 789)
In ksGUI>@(,)obj.runSpikesort() (line 337)

@Tywang-720
Copy link
Author

Tywang-720 commented Apr 22, 2021

Update20210422

Changed visual studio (from 2019 to 2015), updated to newest nvidia driver, set environment variable "CUDA_CACHE_MAXSIZE" to 4GB, doesn't help.

After that, I found decreasing batch size works(from 64x1024+ ops.ntbuff to 16x1024+ops.ntbuff).

Now Kilosort2 could finish the whole pipeline, but slightly slower than what was observed before.

Haven't tested Kilosort3 yet, but I suppose the problem could be solved in similar way.

@Tywang-720
Copy link
Author

Tywang-720 commented Apr 23, 2021

Update20210423

Confirm now that this issue could be solved in the same way in Ks3.

I also noticed that, with same parameters (that's what I presume), Ks3 works much faster than Ks2 (42 min vs 75 min for the 77GB 1024 channel data mentioned above), and returns less good units (231 vs more than 358,I inspected the results and find Ks2 are returning some noise clusters).

Last thing is, I found sometimes Kilosort is discarding many true spikes to get good units. When I examine the results in Phy2, I often found spikes in a good cluster looks almost the same as spikes in another mua unit in the same channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant