Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render on demand #2336

Closed
wants to merge 15 commits into from
Closed

Render on demand #2336

wants to merge 15 commits into from

Conversation

ffreyer
Copy link
Collaborator

@ffreyer ffreyer commented Oct 14, 2022

Description

This is an attempt at making a "render on demand" mode for GLMakie. The idea is have all renderobject input observables update a flag requires_update which is checked in the renderloop. Rendering only takes place if this flag is true for at least one (visible) renderobject.

The basic idea already works, but it's currently not catching all updates. For example changes to a mesh aren't tracked. I guess the main task will be to find all of those.

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • Added an entry in NEWS.md (for new features and breaking changes)
  • Added or changed relevant sections in the documentation

@MakieBot
Copy link
Collaborator

MakieBot commented Oct 14, 2022

Compile Times benchmark

Note, that these numbers may fluctuate on the CI servers, so take them with a grain of salt. All benchmark results are based on the mean time and negative percent mean faster than the base branch. Note, that GLMakie + WGLMakie run on an emulated GPU, so the runtime benchmark is much slower. Results are from running:

using_time = @ctime using Backend
# Compile time
create_time = @ctime fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @ctime Makie.colorbuffer(display(fig))
# Runtime
create_time = @benchmark fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @benchmark Makie.colorbuffer(display(fig))
using create display create display
GLMakie 29.51s (29.24, 30.06) 0.36+- 18.63s (18.37, 19.32) 0.40+- 17.41s (17.01, 17.94) 0.42+- 16.57ms (16.11, 17.13) 0.33+- 49.14ms (48.12, 51.36) 1.25+-
master 29.74s (29.54, 30.37) 0.28+- 18.62s (18.52, 18.77) 0.09+- 16.57s (16.46, 16.70) 0.08+- 16.55ms (16.25, 17.17) 0.31+- 49.14ms (47.57, 52.08) 1.62+-
evaluation -0.77%, -0.23s invariant (-0.70d, 0.21p, 0.32std) +0.07%, 0.01s invariant (0.04d, 0.94p, 0.25std) +4.83%, 0.84s slower X (2.80d, 0.00p, 0.25std) +0.12%, 0.02ms invariant (0.06d, 0.91p, 0.32std) +0.00%, 0.0ms invariant (0.00d, 1.00p, 1.43std)
CairoMakie 26.63s (24.86, 27.90) 1.38+- 17.79s (16.48, 19.01) 0.94+- 2.57s (2.41, 2.77) 0.13+- 18.19ms (17.09, 18.99) 0.76+- 22.35ms (20.56, 23.47) 1.23+-
master 27.70s (25.41, 28.31) 1.02+- 18.29s (16.97, 19.00) 0.65+- 2.61s (2.39, 2.73) 0.11+- 18.79ms (17.06, 19.51) 0.83+- 22.62ms (20.73, 23.26) 0.85+-
evaluation -3.99%, -1.06s invariant (-0.87d, 0.13p, 1.20std) -2.84%, -0.5s invariant (-0.63d, 0.27p, 0.79std) -1.22%, -0.03s invariant (-0.27d, 0.63p, 0.12std) -3.27%, -0.6ms invariant (-0.75d, 0.19p, 0.79std) -1.19%, -0.27ms invariant (-0.25d, 0.65p, 1.04std)
WGLMakie 44.91s (44.26, 45.52) 0.51+- 39.25s (38.57, 39.95) 0.48+- 60.61s (59.93, 61.84) 0.66+- 43.76ms (41.72, 46.71) 2.27+- 154.23ms (148.06, 170.95) 8.08+-
master 44.58s (44.14, 45.25) 0.43+- 40.24s (39.75, 40.65) 0.33+- 59.48s (58.55, 61.61) 1.00+- 34.82ms (33.71, 36.35) 0.82+- 116.79ms (107.82, 127.61) 6.75+-
evaluation +0.73%, 0.33s invariant (0.70d, 0.22p, 0.47std) -2.54%, -1.0s faster ✓ (-2.42d, 0.00p, 0.41std) +1.87%, 1.13s slower X (1.34d, 0.03p, 0.83std) +20.43%, 8.94ms slower❌ (5.24d, 0.00p, 1.55std) +24.28%, 37.45ms slower❌ (5.03d, 0.00p, 7.41std)

@SimonDanisch
Copy link
Member

Cool, that might just work without too much effort :) Let me know if you have any questions, I could try to take a look as well...

@ffreyer
Copy link
Collaborator Author

ffreyer commented Oct 15, 2022

Texture, TextureBuffer, GLBuffer and VertexArray now track when they are updated and forward that to RenderObject. That's all the GPUArray subtypes other than GPUVector, which seems unused. I think with that the pr is basically done already.

I also went through every plot primitive to check if each uniform is triggering updates. I skipped the ones which I believe to be constant, irrelevant or inaccessible.

List of ignored attributes

General

  • doc_string (irrelevant)
  • objectid (constant?)
  • intensity (generally works if color is set to a float array first)
  • shading (irrelevant, requires shader recompilation)

scatter

  • image (only works if passed as "image")
  • gl_primitive

Lines, Linesegments

  • pattern (linestyle, doesn't work but I think that's indepedent of the pr)
  • pattern_length
  • gl_primitive

mesh

  • interpolate_in_fragment_shader (constant?)

surface

  • scale (constant?)
  • color (constant?)
  • vertices (constant)
  • faces (constant)

text

  • intensity (not accessable?)
  • shape (constant)
  • scale_primitive (irrelevant?)
  • color_norm (irrelevant w/o intesity)
  • image (irrelevant)
  • billboard
  • color_map (irrelevant w/o intensity)
  • gl_primitive (constant)

heatmap/image

  • stroke_color (doesn't work anyway?)

volume

  • absorption (constant?)
  • prerender (constant?)
  • postrender (constant?)

meshscatter

  • image (broken in general)
  • vertex_color (is this usable?)
  • interpolate_in_fragment_shader (constant?)

@ffreyer ffreyer marked this pull request as ready for review October 15, 2022 13:45
@ffreyer
Copy link
Collaborator Author

ffreyer commented Oct 15, 2022

I guess another thing to think about with this is whether all the on(robj.requires_update = true, uniform) significantly downgrade performance in other renderloops. Not sure how to test that though, maybe with a record?

@ffreyer
Copy link
Collaborator Author

ffreyer commented Oct 15, 2022

Should help with #678 (on any OS), maybe also #267 and #1683. Should also help with idle gpu utilization in #1427, but I don't think it'll help with #1412.

Whether this fixes any of those issues is kind of subjective I guess. This pr should make raw rendering a little bit slower through the update tracking, but cuts down idle gpu usage to 0. So in an interactive or animated situation things shouldn't be any better, but otherwise they should be a lot better.

For example arrows(rand(Point3f, 1_000_000), rand(Point3f, 1_000_000)) maxes out my GPU while interacting, but has 0% GPU utilization once I stop.

@ffreyer
Copy link
Collaborator Author

ffreyer commented Oct 16, 2022

I did a bit of benchmarking with

scene = Scene()
for _ in 1:100
    scatter!(scene, rand(10), marker = Rect)
end
display(scene, render_on_demand = false)

function update!(scene)
    for p in scene.plots
        p.ssao[] = !p.ssao[]
        p.overdraw[] = !p.overdraw[]
        p.color[] = :red
        p.markersize[] = 10f0
        p.marker[] = Rect
    end
end

@benchmark update!($scene)

That looks to be 3-8% slower with this pr. (Min , mean, median time going from (1.38, 1.96, 1.94) ms -> (1.43, 2.07, 2.03) ms)

@ffreyer
Copy link
Collaborator Author

ffreyer commented Oct 16, 2022

I added some code to disable update tracking when the other render loops are used. (Those can't be changed dynamically, right?)

I also noticed that I wasn't disabling render_on_demand correctly in my benchmark, but it doesn't change much. My timing were a bit worse across the board but the pr was still 5-10% slower. With the latest commit it's now on par with fps_renderloop.

# This is a lazy workaround for disabling updates of `requires_update` when
# not rendering on demand. A cleaner implementation should probably go
# through @gen_defaults! and adjust constructors instead.
track_updates = to_value(pop!(data, :track_updates, true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gen_defaults is pretty much deprecated (without a new implementation yet, but I think it's pretty clear that it's pretty redundant), so I guess that's fair ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it works atm is that GPUArrays always attach the update trackers when they get created. When the renderobject is created it either removes the tracking (i.e. observer functions) or connects the remaining uniforms based on what would usually be interpreted as a uniform. Seems quite hacky to me 🤷

A clean implementation wouldn't connect (and maybe also not define) the tracking observables in GPUArrays. I think with how things are organized atm @gen_defaults would need to be adjusted to forward that information. But I don't really want to mess with that macro. I'd probably just break it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, @gen_defaults needs to be refactored / removed going forward, so whenever we do that, we can clean up the implementation of track_updates ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words: please don't waste a second on @gen_defaults ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think the pr is ready then

@@ -193,8 +193,10 @@ max_dim(t) = error("max_dim not implemented for: $(typeof(t)). This happen

function (::Type{T})(x::Observable; kw...) where T <: GPUArray
gpu_mem = T(x[]; kw...)
on(x-> update!(gpu_mem, x), x)
gpu_mem
# TODO merge these and handle update tracking during contruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this todo, and not just:

  on(data) do new_data
      update!(gpu_mem, new_data)
      gpu_mem.requires_update[] = true
  end

(I already have this locally, will push some changes like this later)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated them so the update tracking could be removed again when using fps_renderloop

@@ -289,11 +297,14 @@ mutable struct RenderObject{Pre}
prerenderfunction::Pre
postrenderfunction
id::UInt32
requires_update::Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consistently use the same type for requires_update, and I don't think we need an observable, so maybe use Base.RefValue consistently instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, some do need to be an observable..So maybe use Observable consistently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GPUArraywe either need an observable to forward to the information to the renderobject or we need to search through a bunch of stuff every potential frame. Or we need to make their updates aware of the renderobject but that might be a lot of work?

Otherwise I went with Bool since the structs are already mutable. I think that's a bit faster than a Refvalue in a mutable struct too...

end
on(_ -> robj.requires_update = true, vertexarray.requires_update)
else
# remove tracking from GPUArrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why that's needed, could you elaborate the comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, classic case of not seing the whole picture in the github diff...sorry for the noise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it feels cleaner to me, to just let the GPUArrays track their updates, which should be very cheap, but then not connect them further if we don't track updates...

Copy link
Collaborator Author

@ffreyer ffreyer Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on(_ -> robj.requires_update = true, vertexarray.requires_update)

This? If all the buffers are also in uniforms it shouldn't be necessary. I added that early on when I was searching for things that need to be tracked

# remove tracking from GPUArrays

The way I added tracking it gets intialized for every renderloop. That can be a lot of extra useless callbacks and from my testing it can be noticeable (like 5-10% slower with fps_renderloop in what should be a bad case). I tried to restore performance again by removing tracking when it's not needed. That's what the code under the comment does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I actually removed all the tracking in one go so I'm not sure how much the GPUArrays contribute. The test case I used also didn't trigger updates there, but in simple observables since that should have a higher ratio of tracking cost / total update cost and I wanted to know how bad it can get.

I was planning to test having GPUArray updates wait for frames in the future, which would probably require something like requires_update too. So maybe this will end up as always on/tracking later anyway...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using BenchmarkTools, GLMakie

scene = Scene()
for _ in 1:100
    heatmap!(scene, rand(16, 16))
end
display(GLMakie.Screen(render_on_demand = false), scene)

function update!(scene)
    for p in scene.plots
        notify(p[3])
    end
end

@benchmark update!($scene)

With cleanup

BenchmarkTools.Trial: 9326 samples with 1 evaluation.
 Range (min … max):  381.502 μs …  32.987 ms  ┊ GC (min … max): 0.00% … 98.11%
 Time  (median):     502.491 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   532.654 μs ± 812.194 μs  ┊ GC (mean ± σ):  3.84% ±  2.49%

      ▁           ▃▄▇█▄▃▁▁▁▂▁▁                                   
  ▁▁▅▄█▅▄▄▄▃▄▄▄▄▄▆█████████████▇▇▇▆▇▆▆▅▅▅▅▅▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▂▁▁▁ ▄
  382 μs           Histogram: frequency by time          694 μs <

 Memory estimate: 121.88 KiB, allocs estimate: 900.

Without cleanup (the else branch commented out)

BenchmarkTools.Trial: 4238 samples with 1 evaluation.
 Range (min … max):  803.314 μs …  34.882 ms  ┊ GC (min … max): 0.00% … 96.12%
 Time  (median):       1.153 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.174 ms ± 745.947 μs  ┊ GC (mean ± σ):  1.33% ±  2.08%

          ▂▃▅▅▅█▃▃▃▄▃▃▃▃▄▄▄▃▃▄▃▃▇▃▅▇▄▃▄▄▃▃▁▂                     
  ▁▄▄▆▄▅▅▃███████████████████████████████████▆▇▆▅▅▄▄▃▄▄▃▂▂▃▃▂▂▂ ▅
  803 μs           Histogram: frequency by time          1.6 ms <

 Memory estimate: 128.12 KiB, allocs estimate: 1300.

Without cleanup & single observer function

BenchmarkTools.Trial: 4747 samples with 1 evaluation.
 Range (min … max):  757.975 μs …  31.849 ms  ┊ GC (min … max): 0.00% … 96.00%
 Time  (median):       1.006 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.048 ms ± 783.416 μs  ┊ GC (mean ± σ):  1.84% ±  2.42%

          ▃▄▇▅▅█▅▅▅▂▃ ▃▁▁▁▁ ▁▁▂▁▂▁▁  ▁                           
  ▅▃▆▅▄▆▅▆████████████████████████████████▆▆▆▄▄▅▆▄▄▄▃▃▃▃▂▂▂▂▁▂▂ ▅
  758 μs           Histogram: frequency by time         1.43 ms <

 Memory estimate: 128.12 KiB, allocs estimate: 1300.

That's much more than I thought it would be...


function requires_update(screen::Screen)
for (_, _, robj) in screen.renderlist
visible = Bool(to_value(get(robj.uniforms, :visible, true)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to use this PR to change the visible field into a well typed, mandatory field for RenderObject.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I did what you had in mind but I moved it to a field.

I also had the logic for re-rendering with visible wrong. I tried to set it up in a way that invisible renderobjects can't trigger updates, but the way I did it caused p.visible[] = false to not trigger a re-render. I fixed that by moving the check to the renderobject updater functions.

@ffreyer
Copy link
Collaborator Author

ffreyer commented Oct 25, 2022

I reran the benchmarks I posted before and now things are more equal. Maybe the things I have running in the background affect this, not sure. To summarize:

Min times on master:               754, 774, 759µs
Min times on pr w/ fps_renderloop: 743, 740, 745µs
Min times on pr w/ on_demand:      848, 844, 852µs

So in a workflow with lots of GPUArray updates, the pr is now does slightly better than master with fps_renderloop and ~15% worse with the on-demand renderloop. I restarted Julia once after switching branches just to be save.

@SimonDanisch SimonDanisch mentioned this pull request Nov 3, 2022
@SimonDanisch
Copy link
Member

Merged in #2397

@MariusDrulea
Copy link
Contributor

Should help with #678 (on any OS), maybe also #267 and #1683. Should also help with idle gpu utilization in #1427, but I don't think it'll help with #1412.

I did some tests and plotting a 3D volume plot(rand(1000, 1000, 1000)) works nice now, 0% GPU utilization when the volume is not manipulated. Plotting a 2D matrix plot(rand(2000, 2000)) however still keeps the GPU busy even there is no interaction on the image, just like mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants