GPU under-utilized when waiting for CPU to launch kernel #8818

panyx0718 · 2018-03-07T01:15:54Z

Currently we run Op one-by-one synchronously. For Ops that can be quickly finished by GPU, the CPU is too slow to launch GPU kernels. Hence, in many cases, the GPU is under-utilized.

To mitigate the situation, we need to schedule Ops in parallel (based on dependency information). So, we can better utilize both cpus and gpus.

tonyyang-svail · 2018-03-07T01:19:46Z

@panyx0718 I believe the GPU kernel is launched asynchronously.

Paddle/paddle/fluid/operators/sgd_op.cu

Lines 76 to 78 in 78c884d

    
           SGDKernel<T><<<grid, block, 0, ctx.cuda_device_context().stream()>>>( 
        
               grad_data, param_data, learning_rate->data<T>(), param->numel(), 
        
               param_out_data);

In many computational intensive Ops (conv, MatMul etc.), I feel it is hard to believe that CPU, who launches the kernel, is slower than GPU, who executes the kernel.

panyx0718 · 2018-03-07T01:28:27Z

GPU kernel is launched async. But CPU is not launching then fast enough to make GPU full. As the timeline shows, SGD and Elementwise_mul is much faster on GPU than CPU.

tonyyang-svail · 2018-03-07T01:37:39Z

SGD and Elementwise_mul is much faster on GPU than CPU.

Sure. What is the time percental these ops makes up?

panyx0718 · 2018-03-07T10:38:43Z

I don't have actual number. By looking at the timeline, I feel that we can improve ~20% if we can get GPU always busy. (not just sgd and elementwise_mul, but also other ops)

chengduoZH · 2018-03-07T11:14:24Z

GPU under-utilized when waiting for CPU to launch kernel

I don't think launching kernel will cost much time, "GPU under-utilized" is because the data processed by CUDA kernel is so little.

The two pictures can prove that.

A simple description, we assume that GPU has a task queue (in fact, there is indeed a task queue), in Figure 1, due to a large amount of data, each kernel takes a long time, maybe that GPU has not been complete the computation over kernel1 when kernel2 is added to the task queue. So the axis of GPU time is compact.

Figure 1:

However, for Figure 2, due to a small amount of data, each kernel takes very little time. maybe the task queue is empty in most time.

Figure 2:

panyx0718 · 2018-03-07T12:21:54Z

I don't mean the "launch kernel call" tasks too long. I mean the GPU is waiting for CPU to do all the computations and then launch the kernel so that it can start doing something

shanyi15 · 2018-08-15T11:38:04Z

您好，此issue在近一个月内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题，我们将在24小时内回复您。因关闭带来的不便我们深表歉意，请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

panyx0718 added the performance tuning label Mar 7, 2018

chengduoZH mentioned this issue Mar 8, 2018

[Don't merge] Add sgd group #8869

Closed

jacquesqiao mentioned this issue Mar 8, 2018

optimize optimizer learning rate #8873

Closed

reyoung added this to Doing in Performance Tuning Mar 12, 2018

shanyi15 closed this as completed Aug 15, 2018

Performance Tuning automation moved this from Doing to Done Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU under-utilized when waiting for CPU to launch kernel #8818

GPU under-utilized when waiting for CPU to launch kernel #8818

panyx0718 commented Mar 7, 2018

tonyyang-svail commented Mar 7, 2018 •

edited

Loading

panyx0718 commented Mar 7, 2018 •

edited

Loading

tonyyang-svail commented Mar 7, 2018 •

edited

Loading

panyx0718 commented Mar 7, 2018

chengduoZH commented Mar 7, 2018

panyx0718 commented Mar 7, 2018

shanyi15 commented Aug 15, 2018

GPU under-utilized when waiting for CPU to launch kernel #8818

GPU under-utilized when waiting for CPU to launch kernel #8818

Comments

panyx0718 commented Mar 7, 2018

tonyyang-svail commented Mar 7, 2018 • edited Loading

panyx0718 commented Mar 7, 2018 • edited Loading

tonyyang-svail commented Mar 7, 2018 • edited Loading

panyx0718 commented Mar 7, 2018

chengduoZH commented Mar 7, 2018

panyx0718 commented Mar 7, 2018

shanyi15 commented Aug 15, 2018

tonyyang-svail commented Mar 7, 2018 •

edited

Loading

panyx0718 commented Mar 7, 2018 •

edited

Loading

tonyyang-svail commented Mar 7, 2018 •

edited

Loading