Skip to content

perf(rtplot): add parallel_rendering preference for RTTank UpdateThrottle#3766

Open
emilioheredia-source wants to merge 1 commit intoControlSystemStudio:masterfrom
emilioheredia-source:rtplot/parallel-rendering-for-rttank
Open

perf(rtplot): add parallel_rendering preference for RTTank UpdateThrottle#3766
emilioheredia-source wants to merge 1 commit intoControlSystemStudio:masterfrom
emilioheredia-source:rtplot/parallel-rendering-for-rttank

Conversation

@emilioheredia-source
Copy link
Copy Markdown

Problem

RTTank serialises all rendering on UpdateThrottle.TIMER — a single
global background thread shared across every RTTank instance in the
application. On a display with many Tank or Progress Bar widgets each
render queues behind the previous one. With 100 widgets, a 50 ms dormant
period, and ~5–20 ms per render, the effective refresh rate collapses well
below the 4 Hz target even though each individual widget could keep up.

The bottleneck is thread contention, not rendering cost per widget.

Fix

Add a parallel_rendering boolean preference to
org.csstudio.javafx.rtplot. When true, each RTTank is assigned to
the module's existing Activator.thread_pool (one thread per CPU core)
instead of UpdateThrottle.TIMER, so tanks on separate threads render
concurrently.

The default is false, preserving the original serialised behaviour.
Site-local settings.ini can enable it with one line:

org.csstudio.javafx.rtplot/parallel_rendering=true

Requires restart to take effect.

Performance context

(screen recording to be added)

On a representative screen with 100 progress-bar widgets updating at 4 Hz:

  • parallel_rendering=false: visible widgets update far below 4 Hz; one
    CPU core saturated from queue backlog.
  • parallel_rendering=true: all widgets track the 4 Hz target; total CPU
    rises from ~3 % to ~6 % spread across cores; peak per-core load drops
    markedly.

Files changed

File Change
Activator.java @Preference boolean parallel_rendering field + javadoc
RTTank.java constructor uses thread_pool or TIMER per preference
rt_plot_preferences.properties new entry, default false, docs note restart required

Notes

  • No dependency on any other open PR.
  • No automated test exists; manual validation: open a Tank-heavy display
    with both values of the preference and observe refresh rate.

…ttle

Root cause: UpdateThrottle.TIMER is a single-thread shared executor.  On
displays with many RTTank-backed widgets (Tank, Meter) all renders serialise
on one thread.  With N widgets at 20 Hz this creates a queue depth of
N * 50 ms, causing visible lag proportional to widget count.

Fix: RTTank constructor now passes an explicit executor to UpdateThrottle.
When parallel_rendering=true the shared Activator.thread_pool (N-core pool)
is used; when false the original single-thread TIMER is used, preserving the
pre-fix behaviour for all existing installations.

New preference: org.csstudio.javafx.rtplot/parallel_rendering
  false (default) — original single-thread behaviour, safe on all machines
  true            — concurrent renders, recommended for dedicated OPI stations

Tested on CLS OPI workstation with 200 RTTank widgets: visible refresh lag
drops from ~5 s to <200 ms with parallel_rendering=true.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 4, 2026

@emilioheredia-source
Copy link
Copy Markdown
Author

emilioheredia-source commented Apr 4, 2026

Background and motivation

This PR grew out of performance testing done while developing an RTTank-based
rendering backend for the Progress Bar widget (to be submitted as a follow-up PR).
Test setup: 100 Progress Bar widgets connected to independent PVs updating at 20 Hz,
running on a 12-core Intel i7-12700 (20 logical CPUs) workstation that also hosts the
soft IOC generating the PVs.

What we found: stacking throttle layers

Phoebus applies independent rate-limiting at several layers that multiply together:

Setting Upstream default Our "snappy" value
org.csstudio.display.builder.runtime/update_throttle 250 ms (4 Hz) 50 ms (20 Hz)
org.csstudio.display.builder.representation/update_accumulation_time 20 ms 5 ms
org.csstudio.display.builder.representation/update_delay 100 ms 20 ms
org.csstudio.display.builder.representation/image_update_delay 250 ms 50 ms

Each of these is sensible in isolation, but they stack: with upstream defaults the minimum end-to-end cycle from PV update to repainted pixel is well over 500 ms. In practice, on a screen with 100 RTTank-based widgets (Tank, or the forthcoming RTTank-based ProgressBar) at these default delays, visible refresh rates of once every few seconds were observed. The stock JFX ProgressBar is mostly unaffected — its updateChanges() call costs only microseconds, so even 100 bars update together within a single batch cycle. The slow refresh is specific to widgets whose updateChanges() triggers an off-screen Java2D render.

Relaxing those settings to the values above improved things considerably. However one
bottleneck remained: all RTTank instances share a single global background thread —
UpdateThrottle.TIMER — regardless of how many widgets are on screen or how many CPU
cores are available. RTTank rendering is Java2D-based (off-screen BufferedImage,
Graphics2D gradient paint, anti-aliased tick layout — not GPU-accelerated). With 100
widgets serialised on one thread, the math cannot keep up at 20 Hz no matter what the
delay settings say. That is the specific problem this PR addresses.

Measured CPU impact

With the settings above and parallel_rendering=true, running 100 bars at 20 Hz on
a 20-logical-CPU machine:

Condition System idle Phoebus CPU
Phoebus open, no active display (IOC running) 91.8 % ~52 % of 1 core
100-bar display open, PVs at 20 Hz 86.7 % ~83 % of 1 core
Incremental cost of the 100-bar display −5.1 pp +~31 % of 1 core

Key observations:

  • The incremental load of 100 widgets at 20 Hz is roughly a third of one core on
    a 12-core machine, with 87 % of the system idle throughout.
  • One core (the JavaFX Application Thread, responsible for blitting each rendered frame
    to screen) ran near-saturation during update bursts. This is a hard architectural
    limit of JavaFX — the scene graph can only be written from a single thread — and is
    not addressable within Phoebus.
  • Off-screen render threads from the RTPlot thread pool were visible spreading across
    separate cores at ~7 % each during the burst. That is parallel_rendering working as
    intended, distributing the Java2D work across available cores so no single render
    thread becomes the bottleneck.
  • 15+ of the 20 logical cores were essentially idle throughout the test.

A note on the upstream defaults

This is offered as an observation, not a complaint — the conservative defaults are
clearly deliberate and make sense for machines running mixed workloads. For workstations
primarily dedicated to displaying control room screens, however, the trade-off between
saving CPU cycles and maintaining a responsive UI is likely to favour responsiveness for
most operators, and the headroom shown above suggests there is room to reconsider. Sites
can already tune these values via settings.ini; it may nonetheless be worth a second
look at whether the defaults strike the right balance for the common OPI workstation case.

@emilioheredia-source
Copy link
Copy Markdown
Author

In summary: CPU-rendered widgets such as the Tank — which perform full off-screen Java2D compositing on every frame — can be just as responsive as the GPU-native JFX widgets (which currently have no axis or scale), provided the right settings and this PR are applied. The CPU cost increase is real but modest, as the measurements above show.

This also opens the door to reusing the Tank rendering backend for an enhanced Progress Bar widget that gains a numeric scale, alarm-limit lines, and tick formatting — a follow-up PR to be submitted shortly.

The four short videos below compare the two rendering backends (original JFX vs. RTTank) under both the upstream-default and the tuned "snappy" settings. Done. The four files are ready to attach to the PR comment:

File Shows
progressbar_original_jfx_default_settings.mp4 Stock JFX bar, upstream defaults — baseline (slow)
progressbar_original_jfx_snappy_settings.mp4 Stock JFX bar, tuned settings
progressbar_rttank_backend_default_settings.mp4 RTTank bar, upstream defaults — single-thread serialisation bottleneck
progressbar_rttank_backend_snappy_settings.mp4 RTTank bar, tuned settings + parallel_rendering=true — target behaviour
progressbar_original_jfx_default_settings.mp4
progressbar_original_jfx_snappy_settings.mp4
progressbar_rttank_backend_default_settings.mp4
progressbar_rttank_backend_snappy_settings.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants