Render on demand #2336

ffreyer · 2022-10-14T17:52:06Z

Description

This is an attempt at making a "render on demand" mode for GLMakie. The idea is have all renderobject input observables update a flag requires_update which is checked in the renderloop. Rendering only takes place if this flag is true for at least one (visible) renderobject.

The basic idea already works, but it's currently not catching all updates. ~~For example changes to a mesh aren't tracked.~~ I guess the main task will be to find all of those.

Type of change

New feature (non-breaking change which adds functionality)

Checklist

Added an entry in NEWS.md (for new features and breaking changes)
Added or changed relevant sections in the documentation

MakieBot · 2022-10-14T18:19:18Z

Compile Times benchmark

Note, that these numbers may fluctuate on the CI servers, so take them with a grain of salt. All benchmark results are based on the mean time and negative percent mean faster than the base branch. Note, that GLMakie + WGLMakie run on an emulated GPU, so the runtime benchmark is much slower. Results are from running:

using_time = @ctime using Backend
# Compile time
create_time = @ctime fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @ctime Makie.colorbuffer(display(fig))
# Runtime
create_time = @benchmark fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @benchmark Makie.colorbuffer(display(fig))

	using	create	display	create	display
GLMakie	29.51s (29.24, 30.06) 0.36+-	18.63s (18.37, 19.32) 0.40+-	17.41s (17.01, 17.94) 0.42+-	16.57ms (16.11, 17.13) 0.33+-	49.14ms (48.12, 51.36) 1.25+-
master	29.74s (29.54, 30.37) 0.28+-	18.62s (18.52, 18.77) 0.09+-	16.57s (16.46, 16.70) 0.08+-	16.55ms (16.25, 17.17) 0.31+-	49.14ms (47.57, 52.08) 1.62+-
evaluation	-0.77%, -0.23s invariant (-0.70d, 0.21p, 0.32std)	+0.07%, 0.01s invariant (0.04d, 0.94p, 0.25std)	+4.83%, 0.84s slower X (2.80d, 0.00p, 0.25std)	+0.12%, 0.02ms invariant (0.06d, 0.91p, 0.32std)	+0.00%, 0.0ms invariant (0.00d, 1.00p, 1.43std)
CairoMakie	26.63s (24.86, 27.90) 1.38+-	17.79s (16.48, 19.01) 0.94+-	2.57s (2.41, 2.77) 0.13+-	18.19ms (17.09, 18.99) 0.76+-	22.35ms (20.56, 23.47) 1.23+-
master	27.70s (25.41, 28.31) 1.02+-	18.29s (16.97, 19.00) 0.65+-	2.61s (2.39, 2.73) 0.11+-	18.79ms (17.06, 19.51) 0.83+-	22.62ms (20.73, 23.26) 0.85+-
evaluation	-3.99%, -1.06s invariant (-0.87d, 0.13p, 1.20std)	-2.84%, -0.5s invariant (-0.63d, 0.27p, 0.79std)	-1.22%, -0.03s invariant (-0.27d, 0.63p, 0.12std)	-3.27%, -0.6ms invariant (-0.75d, 0.19p, 0.79std)	-1.19%, -0.27ms invariant (-0.25d, 0.65p, 1.04std)
WGLMakie	44.91s (44.26, 45.52) 0.51+-	39.25s (38.57, 39.95) 0.48+-	60.61s (59.93, 61.84) 0.66+-	43.76ms (41.72, 46.71) 2.27+-	154.23ms (148.06, 170.95) 8.08+-
master	44.58s (44.14, 45.25) 0.43+-	40.24s (39.75, 40.65) 0.33+-	59.48s (58.55, 61.61) 1.00+-	34.82ms (33.71, 36.35) 0.82+-	116.79ms (107.82, 127.61) 6.75+-
evaluation	+0.73%, 0.33s invariant (0.70d, 0.22p, 0.47std)	-2.54%, -1.0s faster ✓ (-2.42d, 0.00p, 0.41std)	+1.87%, 1.13s slower X (1.34d, 0.03p, 0.83std)	+20.43%, 8.94ms slower❌ (5.24d, 0.00p, 1.55std)	+24.28%, 37.45ms slower❌ (5.03d, 0.00p, 7.41std)

SimonDanisch · 2022-10-15T13:22:42Z

Cool, that might just work without too much effort :) Let me know if you have any questions, I could try to take a look as well...

ffreyer · 2022-10-15T13:44:42Z

Texture, TextureBuffer, GLBuffer and VertexArray now track when they are updated and forward that to RenderObject. That's all the GPUArray subtypes other than GPUVector, which seems unused. I think with that the pr is basically done already.

I also went through every plot primitive to check if each uniform is triggering updates. I skipped the ones which I believe to be constant, irrelevant or inaccessible.

List of ignored attributes

General

doc_string (irrelevant)
objectid (constant?)
intensity (generally works if color is set to a float array first)
shading (irrelevant, requires shader recompilation)

scatter

image (only works if passed as "image")
gl_primitive

Lines, Linesegments

pattern (linestyle, doesn't work but I think that's indepedent of the pr)
pattern_length
gl_primitive

mesh

interpolate_in_fragment_shader (constant?)

surface

scale (constant?)
color (constant?)
vertices (constant)
faces (constant)

text

intensity (not accessable?)
shape (constant)
scale_primitive (irrelevant?)
color_norm (irrelevant w/o intesity)
image (irrelevant)
billboard
color_map (irrelevant w/o intensity)
gl_primitive (constant)

heatmap/image

stroke_color (doesn't work anyway?)

volume

absorption (constant?)
prerender (constant?)
postrender (constant?)

meshscatter

image (broken in general)
vertex_color (is this usable?)
interpolate_in_fragment_shader (constant?)

ffreyer · 2022-10-15T14:00:06Z

I guess another thing to think about with this is whether all the on(robj.requires_update = true, uniform) significantly downgrade performance in other renderloops. Not sure how to test that though, maybe with a record?

ffreyer · 2022-10-15T14:23:39Z

Should help with #678 (on any OS), maybe also #267 and #1683. Should also help with idle gpu utilization in #1427, but I don't think it'll help with #1412.

Whether this fixes any of those issues is kind of subjective I guess. This pr should make raw rendering a little bit slower through the update tracking, but cuts down idle gpu usage to 0. So in an interactive or animated situation things shouldn't be any better, but otherwise they should be a lot better.

For example arrows(rand(Point3f, 1_000_000), rand(Point3f, 1_000_000)) maxes out my GPU while interacting, but has 0% GPU utilization once I stop.

ffreyer · 2022-10-16T07:43:25Z

I did a bit of benchmarking with

scene = Scene()
for _ in 1:100
    scatter!(scene, rand(10), marker = Rect)
end
display(scene, render_on_demand = false)

function update!(scene)
    for p in scene.plots
        p.ssao[] = !p.ssao[]
        p.overdraw[] = !p.overdraw[]
        p.color[] = :red
        p.markersize[] = 10f0
        p.marker[] = Rect
    end
end

@benchmark update!($scene)

That looks to be 3-8% slower with this pr. (Min , mean, median time going from (1.38, 1.96, 1.94) ms -> (1.43, 2.07, 2.03) ms)

ffreyer · 2022-10-16T18:00:28Z

I added some code to disable update tracking when the other render loops are used. (Those can't be changed dynamically, right?)

I also noticed that I wasn't disabling render_on_demand correctly in my benchmark, but it doesn't change much. My timing were a bit worse across the board but the pr was still 5-10% slower. With the latest commit it's now on par with fps_renderloop.

SimonDanisch · 2022-10-17T07:07:30Z

GLMakie/src/GLAbstraction/GLTypes.jl

+    # This is a lazy workaround for disabling updates of `requires_update` when
+    # not rendering on demand. A cleaner implementation should probably go 
+    # through @gen_defaults! and adjust constructors instead.
+    track_updates = to_value(pop!(data, :track_updates, true))


@gen_defaults is pretty much deprecated (without a new implementation yet, but I think it's pretty clear that it's pretty redundant), so I guess that's fair ;)

The way it works atm is that GPUArrays always attach the update trackers when they get created. When the renderobject is created it either removes the tracking (i.e. observer functions) or connects the remaining uniforms based on what would usually be interpreted as a uniform. Seems quite hacky to me 🤷

A clean implementation wouldn't connect (and maybe also not define) the tracking observables in GPUArrays. I think with how things are organized atm @gen_defaults would need to be adjusted to forward that information. But I don't really want to mess with that macro. I'd probably just break it

I meant, @gen_defaults needs to be refactored / removed going forward, so whenever we do that, we can clean up the implementation of track_updates ;)

In other words: please don't waste a second on @gen_defaults ;)

Ok I think the pr is ready then

SimonDanisch · 2022-10-21T14:15:47Z

GLMakie/src/GLAbstraction/AbstractGPUArray.jl

@@ -193,8 +193,10 @@ max_dim(t)       = error("max_dim not implemented for: $(typeof(t)). This happen

 function (::Type{T})(x::Observable; kw...) where T <: GPUArray
    gpu_mem = T(x[]; kw...)
-    on(x-> update!(gpu_mem, x), x)
-    gpu_mem
+    # TODO merge these and handle update tracking during contruction


Why do we have this todo, and not just:

on(data) do new_data update!(gpu_mem, new_data) gpu_mem.requires_update[] = true end

(I already have this locally, will push some changes like this later)

I separated them so the update tracking could be removed again when using fps_renderloop

SimonDanisch · 2022-10-21T14:17:33Z

GLMakie/src/GLAbstraction/GLTypes.jl

@@ -289,11 +297,14 @@ mutable struct RenderObject{Pre}
    prerenderfunction::Pre
    postrenderfunction
    id::UInt32
+    requires_update::Bool


I think we should consistently use the same type for requires_update, and I don't think we need an observable, so maybe use Base.RefValue consistently instead?

Ah, I see, some do need to be an observable..So maybe use Observable consistently?

For GPUArraywe either need an observable to forward to the information to the renderobject or we need to search through a bunch of stuff every potential frame. Or we need to make their updates aware of the renderobject but that might be a lot of work?

Otherwise I went with Bool since the structs are already mutable. I think that's a bit faster than a Refvalue in a mutable struct too...

SimonDanisch · 2022-10-21T14:20:08Z

GLMakie/src/GLAbstraction/GLTypes.jl

+            end
+            on(_ -> robj.requires_update = true, vertexarray.requires_update)
+        else
+            # remove tracking from GPUArrays


not sure why that's needed, could you elaborate the comment?

ugh, classic case of not seing the whole picture in the github diff...sorry for the noise

Hm, it feels cleaner to me, to just let the GPUArrays track their updates, which should be very cheap, but then not connect them further if we don't track updates...

on(_ -> robj.requires_update = true, vertexarray.requires_update)

This? If all the buffers are also in uniforms it shouldn't be necessary. I added that early on when I was searching for things that need to be tracked

# remove tracking from GPUArrays

The way I added tracking it gets intialized for every renderloop. That can be a lot of extra useless callbacks and from my testing it can be noticeable (like 5-10% slower with fps_renderloop in what should be a bad case). I tried to restore performance again by removing tracking when it's not needed. That's what the code under the comment does.

I think I actually removed all the tracking in one go so I'm not sure how much the GPUArrays contribute. The test case I used also didn't trigger updates there, but in simple observables since that should have a higher ratio of tracking cost / total update cost and I wanted to know how bad it can get.

I was planning to test having GPUArray updates wait for frames in the future, which would probably require something like requires_update too. So maybe this will end up as always on/tracking later anyway...

using BenchmarkTools, GLMakie scene = Scene() for _ in 1:100 heatmap!(scene, rand(16, 16)) end display(GLMakie.Screen(render_on_demand = false), scene) function update!(scene) for p in scene.plots notify(p[3]) end end @benchmark update!($scene)

With cleanup

BenchmarkTools.Trial: 9326 samples with 1 evaluation. Range (min … max): 381.502 μs … 32.987 ms ┊ GC (min … max): 0.00% … 98.11% Time (median): 502.491 μs ┊ GC (median): 0.00% Time (mean ± σ): 532.654 μs ± 812.194 μs ┊ GC (mean ± σ): 3.84% ± 2.49% ▁ ▃▄▇█▄▃▁▁▁▂▁▁ ▁▁▅▄█▅▄▄▄▃▄▄▄▄▄▆█████████████▇▇▇▆▇▆▆▅▅▅▅▅▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▂▁▁▁ ▄ 382 μs Histogram: frequency by time 694 μs < Memory estimate: 121.88 KiB, allocs estimate: 900.

Without cleanup (the else branch commented out)

BenchmarkTools.Trial: 4238 samples with 1 evaluation. Range (min … max): 803.314 μs … 34.882 ms ┊ GC (min … max): 0.00% … 96.12% Time (median): 1.153 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.174 ms ± 745.947 μs ┊ GC (mean ± σ): 1.33% ± 2.08% ▂▃▅▅▅█▃▃▃▄▃▃▃▃▄▄▄▃▃▄▃▃▇▃▅▇▄▃▄▄▃▃▁▂ ▁▄▄▆▄▅▅▃███████████████████████████████████▆▇▆▅▅▄▄▃▄▄▃▂▂▃▃▂▂▂ ▅ 803 μs Histogram: frequency by time 1.6 ms < Memory estimate: 128.12 KiB, allocs estimate: 1300.

Without cleanup & single observer function

BenchmarkTools.Trial: 4747 samples with 1 evaluation. Range (min … max): 757.975 μs … 31.849 ms ┊ GC (min … max): 0.00% … 96.00% Time (median): 1.006 ms ┊ GC (median): 0.00% Time (mean ± σ): 1.048 ms ± 783.416 μs ┊ GC (mean ± σ): 1.84% ± 2.42% ▃▄▇▅▅█▅▅▅▂▃ ▃▁▁▁▁ ▁▁▂▁▂▁▁ ▁ ▅▃▆▅▄▆▅▆████████████████████████████████▆▆▆▄▄▅▆▄▄▄▃▃▃▃▂▂▂▂▁▂▂ ▅ 758 μs Histogram: frequency by time 1.43 ms < Memory estimate: 128.12 KiB, allocs estimate: 1300.

That's much more than I thought it would be...

SimonDanisch · 2022-10-21T14:21:31Z

GLMakie/src/screen.jl

+
+function requires_update(screen::Screen)
+    for (_, _, robj) in screen.renderlist
+        visible = Bool(to_value(get(robj.uniforms, :visible, true)))


I think we may want to use this PR to change the visible field into a well typed, mandatory field for RenderObject.

Sounds good.

Not sure if I did what you had in mind but I moved it to a field.

I also had the logic for re-rendering with visible wrong. I tried to set it up in a way that invisible renderobjects can't trigger updates, but the way I did it caused p.visible[] = false to not trigger a re-render. I fixed that by moving the check to the renderobject updater functions.

ffreyer · 2022-10-25T10:46:46Z

I reran the benchmarks I posted before and now things are more equal. Maybe the things I have running in the background affect this, not sure. To summarize:

Min times on master:               754, 774, 759µs
Min times on pr w/ fps_renderloop: 743, 740, 745µs
Min times on pr w/ on_demand:      848, 844, 852µs

So in a workflow with lots of GPUArray updates, the pr is now does slightly better than master with fps_renderloop and ~15% worse with the on-demand renderloop. I restarted Julia once after switching branches just to be save.

SimonDanisch · 2022-11-10T19:15:49Z

Merged in #2397

MariusDrulea · 2022-11-22T23:08:43Z

Should help with #678 (on any OS), maybe also #267 and #1683. Should also help with idle gpu utilization in #1427, but I don't think it'll help with #1412.

I did some tests and plotting a 3D volume plot(rand(1000, 1000, 1000)) works nice now, 0% GPU utilization when the volume is not manipulated. Plotting a 2D matrix plot(rand(2000, 2000)) however still keeps the GPU busy even there is no interaction on the image, just like mentioned above.

ffreyer added 2 commits October 14, 2022 19:36

prototyping

d2a6e63

track vertexarray buffers

8b51319

ffreyer added 4 commits October 15, 2022 13:00

track Texture updates

bd0f73d

remove prints

9017d3a

add new renderloop

fa2d179

cleanup

f1b70d8

ffreyer marked this pull request as ready for review October 15, 2022 13:45

ffreyer added 3 commits October 15, 2022 15:46

remove :inspectable from uniforms

ce00493

update NEWS

781e718

Merge branch 'master' into ff/render_on_demand

845a700

only track updates when render_on_demand = true

61a3163

SimonDanisch reviewed Oct 17, 2022

View reviewed changes

SimonDanisch reviewed Oct 21, 2022

View reviewed changes

ffreyer and others added 4 commits October 23, 2022 19:48

move visible to struct & fix rerender on visible

8d0bfd6

Merge branch 'master' into ff/render_on_demand

9ccc17b

fix visible initialization

63dc419

Merge branch 'master' into ff/render_on_demand

39c88a6

Merge branch 'master' into ff/render_on_demand

c5686c6

SimonDanisch mentioned this pull request Nov 3, 2022

GLMakie cleanups #2397

Merged

SimonDanisch closed this Nov 10, 2022

This was referenced Nov 23, 2022

Fix repeated rendering with invisible objects #2437

Merged

fix missing rerender on plot deletion #2447

Merged

SimonDanisch mentioned this pull request Jan 12, 2023

REPL can be extremely slow after plotting with GLMakie #1683

Closed

ffreyer mentioned this pull request Mar 17, 2023

Mouse responsivness drops with large plotting loads #2779

Closed

ffreyer deleted the ff/render_on_demand branch March 23, 2023 10:09

SimonDanisch mentioned this pull request Aug 1, 2023

Plotting 2000x2000 image is slow. A large amount of allocations is created. #1412

Closed

Render on demand #2336

Render on demand #2336

Conversation

ffreyer commented Oct 14, 2022 • edited

Description

Type of change

Checklist

MakieBot commented Oct 14, 2022 • edited

Compile Times benchmark

SimonDanisch commented Oct 15, 2022

ffreyer commented Oct 15, 2022

General

scatter

Lines, Linesegments

mesh

surface

text

heatmap/image

volume

meshscatter

ffreyer commented Oct 15, 2022

ffreyer commented Oct 15, 2022

ffreyer commented Oct 16, 2022

ffreyer commented Oct 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonDanisch Oct 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffreyer Oct 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

With cleanup

Without cleanup (the else branch commented out)

Without cleanup & single observer function

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffreyer commented Oct 25, 2022

SimonDanisch commented Nov 10, 2022

MariusDrulea commented Nov 22, 2022

ffreyer commented Oct 14, 2022 •

edited

MakieBot commented Oct 14, 2022 •

edited

SimonDanisch Oct 21, 2022 •

edited

ffreyer Oct 21, 2022 •

edited