New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about performance drop on the cluster #950
Comments
Can you check which part of the quickstart example is slower on your cluster? Especially the quickstart example does not take long to render, so the gpu choice will not make much of a difference. The other parts of the script mostly use the cpu, which is usually weaker on a gpu cluster compared to your local machine. So maybe that might be the reason why its slower. |
Sure. The only bottleneck is within BTW, the cluster machines uses the CPU as my local one. I do not think it would cause so much difference. (Update) Here is the output if I enable the verbose option (CPU rendering): From local machine: Device Quadro GP100 of type OPTIX found and used. Finished rendering after 1.253 seconds From cluster machine: Fra:0 Mem:8.75M (Peak 8.82M) | Time:00:00.02 | Mem:0.00M, Peak:0.00M | Scene, ViewLayer | Synchronizing object | Suzanne Selecting render devices... |
Hey @Uio96, it seems that you are using all 8 gpus at once on the cluster machine. If you are using something like slurm, make sure to only reserve 1 GPU for your job. Alternatively, you can also use Using multiple gpus at once can speed up rendering of big scenes, but for small scenes it creates usually more overhead. So could you try whether using only one gpu resolves your issue? Of course its still strange that the render time is so different when using cpu only mode... |
Thanks for the reply. But I already enabled the CPU rendering ("Using only the CPU for rendering") by changing I did try setting up the available GPU to the process before (e.g., one GPU available per process), but no help. (The result of using a single GPU is similar, costs around 5s, still several times slower than the local one.) |
Hmm, this is really strange. It seems to be specific to your system.
There is not really something I can do here.
Maybe the cpu is slowing it down on your system. |
Thank you so much for the help. I do not think the issue comes from Blender or Blenderporc. I tried the cluster machines with the exact same spec (interactive node and submit node), they also had performance disparity. I will ask the administrator to see if there is anything special about the setup or I will just scale up the nodes to generate data. |
Describe the issue
I have successfully configured the data generation pipeline on my local machine. Since I want to generate a larger dataset, I try to run the same script on the cluster with slurm.
The script can work, however, there is a huge speed drop (more than 10x) even if the GPU on the cluster is supposed to better than my local one.
I am not sure there is any other issue may affect the performance so much. Has anyone had similar effort before? Thanks a lot.
Minimal code example
Files required to run the code
No response
Expected behavior
The cluster will take a lot more time to render the results compared to the local machine.
BlenderProc version
GitHub main branch
The text was updated successfully, but these errors were encountered: