Replies: 2 comments 8 replies
-
During the calculations global synchronization points are needed, i.e., a kind of gpu-wide But I agree that at least for some "not very time-critical" kernels (e.g. editing operations) not every |
Beta Was this translation helpful? Give feedback.
-
I've now removed the usage of CDP. The performance has improved a bit :) What I find quite bizarre though is (should be analyzed with a profiler): alien/source/EngineGpuKernels/SimulationKernels.cu Lines 108 to 114 in 4bb2783 |
Beta Was this translation helpful? Give feedback.
-
What is the reason for using CUDA Dynamic Parallelism for the main simulation driver (
calcSimulationTimestepKernel
)?alien/source/EngineGpuKernels/SimulationKernels.cuh
Lines 119 to 140 in 210fdd2
Also, somewhat related, every kernel launch is followed by a
cudaDeviceSynchronize()
. It seems an overkill in most places, am I wrong?Beta Was this translation helpful? Give feedback.
All reactions