Question about performance drop on the cluster #950

Uio96 · 2023-07-30T00:54:47Z

Describe the issue

I have successfully configured the data generation pipeline on my local machine. Since I want to generate a larger dataset, I try to run the same script on the cluster with slurm.

The script can work, however, there is a huge speed drop (more than 10x) even if the GPU on the cluster is supposed to better than my local one.

I once assumed it might be a misconfiguration of CUDA. But when I tried CPU mode instead, the local machine's performance was still better the cluster one.
I also once doubted if that was an issue of I/O bottleneck, so I re-directed the output and tmp to another shared memory. It did not help much. The main bottleneck was still within the rendering process instead of writing files to some disks.
I also checked the minimal script of quickstart demo, the local machine will take 1.7s while the cluster one takes more than 5s. So it has nothing to do with my own script.

I am not sure there is any other issue may affect the performance so much. Has anyone had similar effort before? Thanks a lot.

Minimal code example

blenderproc quickstart

Files required to run the code

No response

Expected behavior

The cluster will take a lot more time to render the results compared to the local machine.

BlenderProc version

GitHub main branch

cornerfarmer · 2023-07-31T11:53:45Z

Can you check which part of the quickstart example is slower on your cluster? Especially the quickstart example does not take long to render, so the gpu choice will not make much of a difference. The other parts of the script mostly use the cpu, which is usually weaker on a gpu cluster compared to your local machine. So maybe that might be the reason why its slower.

Uio96 · 2023-07-31T16:17:41Z

Can you check which part of the quickstart example is slower on your cluster? Especially the quickstart example does not take long to render, so the gpu choice will not make much of a difference. The other parts of the script mostly use the cpu, which is usually weaker on a gpu cluster compared to your local machine. So maybe that might be the reason why its slower.

Sure. The only bottleneck is within bpy.ops.render.render(animation=True, write_still=True) inside renderer/RendererUtility.py.

BTW, the cluster machines uses the CPU as my local one. I do not think it would cause so much difference.

(Update) Here is the output if I enable the verbose option (CPU rendering):

From local machine:

Finished rendering after 1.253 seconds

From cluster machine:

Selecting render devices...
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Tesla V100-SXM2-16GB of type OPTIX found and used.
Device Intel Xeon CPU E5-2698 v4 @ 2.20GHz of type CPU found and used.
Selecting render devices...
Using only the CPU for rendering
Rendering 1 frames of colors...
Finished rendering after 6.250 seconds

cornerfarmer · 2023-07-31T19:57:16Z

Hey @Uio96,

it seems that you are using all 8 gpus at once on the cluster machine. If you are using something like slurm, make sure to only reserve 1 GPU for your job. Alternatively, you can also use desired_gpu_ids=0 in bproc.renderer.set_render_devices() to only use the first gpu.

Using multiple gpus at once can speed up rendering of big scenes, but for small scenes it creates usually more overhead. So could you try whether using only one gpu resolves your issue?

Of course its still strange that the render time is so different when using cpu only mode...

Uio96 · 2023-07-31T20:09:35Z

Hey @Uio96,

it seems that you are using all 8 gpus at once on the cluster machine. If you are using something like slurm, make sure to only reserve 1 GPU for your job. Alternatively, you can also use desired_gpu_ids=0 in bproc.renderer.set_render_devices() to only use the first gpu.

Using multiple gpus at once can speed up rendering of big scenes, but for small scenes it creates usually more overhead. So could you try whether using only one gpu resolves your issue?

Of course its still strange that the render time is so different when using cpu only mode...

Thanks for the reply. But I already enabled the CPU rendering ("Using only the CPU for rendering") by changing bproc.renderer.set_render_devices(True) in the case I showed.

I did try setting up the available GPU to the process before (e.g., one GPU available per process), but no help. (The result of using a single GPU is similar, costs around 5s, still several times slower than the local one.)

cornerfarmer · 2023-07-31T20:40:39Z

Hmm, this is really strange. It seems to be specific to your system.
I just tried a similar setup and always get around 1s render time.

Selecting render devices...
Device NVIDIA A100-SXM4-80GB of type OPTIX found and used.
Device Intel Xeon Gold 6336Y CPU @ 2.40GHz of type CPU found and used.
Rendering 1 frames of colors...
Finished rendering after 1.457 seconds

There is not really something I can do here.
One last thing you can try is to use GPU-only rendering, by using

bproc.renderer.set_render_devices(desired_gpu_ids=0)

Maybe the cpu is slowing it down on your system.

Uio96 · 2023-07-31T20:59:02Z

Thank you so much for the help. I do not think the issue comes from Blender or Blenderporc. I tried the cluster machines with the exact same spec (interactive node and submit node), they also had performance disparity.

I will ask the administrator to see if there is anything special about the setup or I will just scale up the nodes to generate data.

Uio96 added the question Question, not yet a bug ;) label Jul 30, 2023

cornerfarmer added the first answer provided label Jul 31, 2023

Uio96 closed this as completed Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about performance drop on the cluster #950

Question about performance drop on the cluster #950

Uio96 commented Jul 30, 2023

cornerfarmer commented Jul 31, 2023

Uio96 commented Jul 31, 2023 •

edited

cornerfarmer commented Jul 31, 2023 •

edited

Uio96 commented Jul 31, 2023 •

edited

cornerfarmer commented Jul 31, 2023

Uio96 commented Jul 31, 2023

Question about performance drop on the cluster #950

Question about performance drop on the cluster #950

Comments

Uio96 commented Jul 30, 2023

Describe the issue

Minimal code example

Files required to run the code

Expected behavior

BlenderProc version

cornerfarmer commented Jul 31, 2023

Uio96 commented Jul 31, 2023 • edited

cornerfarmer commented Jul 31, 2023 • edited

Uio96 commented Jul 31, 2023 • edited

cornerfarmer commented Jul 31, 2023

Uio96 commented Jul 31, 2023

Uio96 commented Jul 31, 2023 •

edited

cornerfarmer commented Jul 31, 2023 •

edited

Uio96 commented Jul 31, 2023 •

edited