The sm_efficiency of this raytracing program is just 1.79% #1

WilliamWangPeng · 2021-09-03T07:57:29Z

Hi dear author,
It's an honor to open one issue here, I have compiled your program "raytracing" successfully, and I use nvprof to test the sm_efficiency, which is only 1.79%.

==17595== Metric result:
Invocations                               Metric Name                        Metric Description         Min         Max         Avg
Device "Tesla P100-PCIE-16GB (0)"
    Kernel: render_init(int, int, curandStateXORWOW*)
          1                             sm_efficiency                   Multiprocessor Activity       1.75%       1.75%       1.75%
    Kernel: rand_init(curandStateXORWOW*)
          1                             sm_efficiency                   Multiprocessor Activity       1.00%       1.00%       1.00%
    Kernel: render(Vec3*, int, int, int, Camera**, Entity**, curandStateXORWOW*)
          1                             sm_efficiency                   Multiprocessor Activity       1.79%       1.79%       1.79%
    Kernel: texture_init(unsigned char*, int, int, ImageTexture**)
          1                             sm_efficiency                   Multiprocessor Activity       1.60%       1.60%       1.60%
    Kernel: create_cornell_box(Entity**, Entity**, Camera**, int, int, ImageTexture**, curandStateXORWOW*)
          1                             sm_efficiency                   Multiprocessor Activity       1.78%       1.78%       1.78%

thank you
Best Regards
William

The text was updated successfully, but these errors were encountered:

Belval · 2021-09-03T13:54:48Z

That's actually super interesting, but not very surprising. My code only uses a single SM processor and the P100 has 56 of those some quick back-of-the-napkin math tells us that the SM processor that we are using is being used at 100% because 100 / 56 = 1.7857 or 1.79%.

Now an interesting question would be how can we change the code to use all available SMs? To be clear I don't have an answer but as far as I could tell from Googling around this is not something you want to do as the CUDA scheduler is smart enough to figure out a good allocation for us. My intuition is that since every pixel takes a while to render (by default I run with a depth of 400) there is no need for multiple SM because we already achieve max utilization.

That being said, if you figure out a way to make the code faster I am always interested!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The sm_efficiency of this raytracing program is just 1.79% #1

The sm_efficiency of this raytracing program is just 1.79% #1

WilliamWangPeng commented Sep 3, 2021

Belval commented Sep 3, 2021

The sm_efficiency of this raytracing program is just 1.79% #1

The sm_efficiency of this raytracing program is just 1.79% #1

Comments

WilliamWangPeng commented Sep 3, 2021

Belval commented Sep 3, 2021