Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if we can decrease the RAM usage #151

Closed
3 tasks done
Theverat opened this issue May 5, 2018 · 8 comments
Closed
3 tasks done

Check if we can decrease the RAM usage #151

Theverat opened this issue May 5, 2018 · 8 comments
Assignees
Labels
enhancement Additional feature

Comments

@Theverat
Copy link
Member

Theverat commented May 5, 2018

Compared to Cycles, our RAM usage is extremely high, most noticeable with high-resolution film size.
Find out where we can save RAM.
For example, I think each render pass needs 3 times as much RAM as strictly necessary:

  1. the render pass in Blender (unavoidable)
  2. a Python array we use for reading the AOV from LuxCore, then convert it to Blender render pass
  3. the AOV in LuxCore (unavoidable)

Maybe we can save number 2 by directly converting from LuxCore AOV to Blender pass rect in the pyluxcoreforblender helper functions.

Todos:

  • Optimize pyluxcoreforblender conversion functions (see above)
  • Only define film outputs for lightgroups that are actually used
  • Find out why the RAM usage is still high
@Theverat Theverat added the enhancement Additional feature label May 5, 2018
@Theverat Theverat changed the title Check if we can lower the RAM usage Check if we can decrease the RAM usage May 5, 2018
@Theverat Theverat self-assigned this May 24, 2018
@Theverat Theverat added this to the BlendLuxCore v2.1 milestone May 24, 2018
@Theverat
Copy link
Member Author

Theverat commented May 24, 2018

I have removed the need for Python buffers during the restructuring of the pyluxcoreforblender conversion functions.

I will also remove the intermediate float arrays in all functions where we simply std::copy the values (writing the data directly to pass.rect with GetOutput).
edit: done in LuxCoreRender/LuxCore@c6f242b


Even after the optimizations mentioned above, I'm a bit puzzled.
Attached is a simple "cube on plane" type scene, with render resolution set to 4000x3000.
ram_usage_test.blend.zip
Only one RGB_IMAGEPIPELINE is defined.
4000 * 3000 * 3 * sizeof(float) / (1024 * 1024) = 137.3 MiB

Yet, this is the RAM usage I observe for the Blender process (includes LuxCore RAM usage):

  • After loading the scene: 330 MiB
  • During the render: 3.5 GiB
  • After ending the render: 400 MiB

Settings: PATHCPU, SOBOL, path depths total: 7, diffuse: 5, glossy: 5, specular: 6
Using TILEPATHCPU, it needs 1.9 GiB during the render (tilesize 256, 1 AA sample), which is still much.
Using Cycles, the Blender process peaks at 400 MiB during the render (same resolution, same tile size).

I used 8 CPU threads for all tests.
When I use only 1 thread with PATHCPU, the Blender process needs 1.2 GiB.

Why does LuxCore need this much RAM?
Am I doing something wrong in BlendLuxCore?
This is my config:

[LuxCore][4.906] Configuration: 
[LuxCore][4.906]   path.pathdepth.total = 7
[LuxCore][4.906]   path.pathdepth.diffuse = 5
[LuxCore][4.906]   path.pathdepth.glossy = 5
[LuxCore][4.906]   path.pathdepth.specular = 6
[LuxCore][4.906]   sampler.sobol.adaptive.strength = 0.69999998807907104
[LuxCore][4.906]   sampler.random.adaptive.strength = 0.69999998807907104
[LuxCore][4.906]   sampler.metropolis.largesteprate = 0.40000000000000002
[LuxCore][4.906]   sampler.metropolis.maxconsecutivereject = 512
[LuxCore][4.906]   sampler.metropolis.imagemutationrate = 0.10000000000000001
[LuxCore][4.906]   film.filter.type = "BLACKMANHARRIS"
[LuxCore][4.906]   film.filter.width = 1.5
[LuxCore][4.906]   lightstrategy.type = "LOG_POWER"
[LuxCore][4.906]   renderengine.type = "PATHCPU"
[LuxCore][4.906]   film.height = 3000
[LuxCore][4.906]   film.width = 4000
[LuxCore][4.906]   scene.epsilon.min = 9.9999997473787516e-06
[LuxCore][4.906]   scene.epsilon.max = 0.10000000149011612
[LuxCore][4.906]   sampler.type = "SOBOL"
[LuxCore][4.906]   path.forceblackbackground.enable = 0
[LuxCore][4.906]   renderengine.seed = 1
[LuxCore][4.906]   film.outputs.0.index = 0
[LuxCore][4.907]   film.outputs.0.filename = "RGB_IMAGEPIPELINE_0.png"
[LuxCore][4.907]   film.outputs.0.type = "RGB_IMAGEPIPELINE"
[LuxCore][4.907]   film.imagepipelines.0.radiancescales.0.enabled = 1
[LuxCore][4.907]   film.imagepipelines.0.radiancescales.0.globalscale = 1
[LuxCore][4.907]   film.imagepipelines.0.radiancescales.0.rgbscale = 1 1 1
[LuxCore][4.907]   film.imagepipelines.0.0.type = "NOP"
[LuxCore][4.907]   film.imagepipelines.0.1.type = "TONEMAP_AUTOLINEAR"
[LuxCore][4.907]   film.imagepipelines.0.2.type = "TONEMAP_LINEAR"
[LuxCore][4.907]   film.imagepipelines.0.2.scale = 0.5
[LuxCore][4.907]   batch.haltthreshold = 0.0001
[LuxCore][4.907]   batch.haltthreshold.stoprendering.enable = 0
[LuxCore][4.907]   batch.haltspp = 0
[LuxCore][4.907]   batch.halttime = 0

@Theverat
Copy link
Member Author

luxcoreui needs 2.7 GiB of RAM.
luxcoreconsole needs 2.7 GiB of RAM.
.bcf attached:
test.zip

@Kompwu
Copy link

Kompwu commented Jun 24, 2018

@Theverat How's going ?

@Theverat
Copy link
Member Author

Theverat commented Jun 24, 2018

I'm waiting for Dade to look into this issue.
Unfortunately I have nearly no free time at the moment.

@Theverat
Copy link
Member Author

Theverat commented Aug 9, 2018

Here is a profiling run done with massif.
I was using the scene that is attached 3 posts up (in test.zip).

massif.zip

  • massif.out.1411 is the raw output generated by massif.
  • ms_print_1411.txt is the output of ms_print massif.out.1411 redirected to a file

@Theverat
Copy link
Member Author

Theverat commented Aug 14, 2018

After Dade's latest changes as of commit LuxCoreRender/LuxCore@b3505a2, luxcoreconsole now needs 509 MiB on Linux (built without OpenCL) with 8 threads to render the testscene (previous RAM usage was 2.7 GiB).

The average samples/sec after 55 seconds rendertime:
Previous (one film per thread): 3.97M
New (shared film): 3.73M

@Dade916
Copy link
Member

Dade916 commented Aug 14, 2018

The average samples/sec after 55 seconds rendertime:
Previous (one film per thread): 3.97M
New (shared film): 3.73M

More complex is the scene, less samples/sec there are to splat and smaller is the difference. The difference is also related to the kind of CPU used (some as a more modern/fast support for atomic ops).

A not marginal benefit of a single film is also no more thread film merges (aka film updated is very fast and doesn't requires anymore 30+secs for very high resolutions).

@Theverat
Copy link
Member Author

Sounds good, thanks for your work Dade!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Additional feature
Projects
None yet
Development

No branches or pull requests

3 participants