latency-test, latency-histogram: warn when rtapi_app lacks RT privileges by grandixximo · Pull Request #4107 · LinuxCNC/linuxcnc

grandixximo · 2026-06-02T13:14:13Z

What

Both latency tools now warn, for a non-root user, when rtapi_app is neither setuid root nor carrying the cap_sys_nice file capability. In that state the realtime threads cannot get SCHED_FIFO scheduling or locked memory, so the reported latency is wildly inflated and unrepresentative.

Why

In #4044 a run-in-place build was used without sudo make setuid. rtapi_app ran unprivileged, the latency numbers blew up, and this was initially mistaken for a code regression. It was not; the setuid/setcap step had simply been skipped. A plain warning would have saved the back-and-forth. @rodw-au suggested a geteuid()/setuid check in the issue thread.

How

A small check runs before the test starts:

root: silent (already has every privilege).
locate rtapi_app; if not found, stay silent.
setuid-root bit present (sudo make setuid): silent.
cap_sys_nice file capability present (sudo make setcap): silent.
otherwise: print a warning that the numbers may be skewed and point at the missing setuid/setcap step.

A pure euid check was deliberately avoided: a normal deb install runs the scripts as an unprivileged user yet works fine via setuid-root rtapi_app, so euid alone would false-warn on every correct install. The real signal is the privilege state of rtapi_app itself.

latency-test (bash) prints to stderr; latency-histogram (tcl) prints to stderr and, in GUI mode, also shows a tk_messageBox.

Closes #4044

Without 'sudo make setuid' (or 'sudo make setcap') rtapi_app runs unprivileged: no SCHED_FIFO, no locked memory, so latency readings are wildly inflated and easy to mistake for a code regression. Warn, for a non-root user, when rtapi_app is neither setuid root nor carries the cap_sys_nice capability. Closes LinuxCNC#4044

BsAtHome · 2026-06-02T14:02:23Z

These tests are moot when you run on a non-RT kernel (like I do in dev). I'm not sure the noise is really necessary in that case.

Only warn under PREEMPT_RT or RTAI; on a non-RT kernel the privileges do not matter, so the check would be noise.

grandixximo · 2026-06-02T14:32:17Z

silenced on non-rt

rodw-au · 2026-06-02T21:06:05Z

Great. Every little bit helps. Reduces user frustration and more importantly less developer time wasted on spurious issues.

hdiethelm · 2026-06-04T19:43:20Z

Hmm, in #4044 in the images in the background you clearly see:

So the new warning will not help that much. If desired, you can add it in the place where Note: Using POSIX non-realtime is printed, so at least i is shown always, not only in these test tools.

hdiethelm · 2026-06-04T19:47:05Z

Connected to this:
#4118

A general way for GUI's to show "You don't have realtime" warnings that you can not overlook would help the most. When milling, I start linuxcnc with the link. So no console. If I accidentally start the wrong kernel, bad luck.

BsAtHome · 2026-06-04T20:05:55Z

On a production system, you may want to warn all the time when this is amiss. Maybe something with a background color turning from gray into gray-red tinted and do something similar consistently in all GUIs.

For dev builds you don't really want this because you want to see what the operator sees while you are working on stuff. I'd go for an opt-in choice by setting a value in the INI file. Maybe something like a boolean [DISPLAY]VISUAL_WARN_NONRT, defaulting to false. Then add the entry commented out in our configs and add a choice in pnconf and friends.

hdiethelm · 2026-06-04T20:18:51Z

I was just checking the code. @grandixximo You already added a better warning in the C code in your nonroot patch, so this was probably not even applied in the screenshot. Now there is a double warning in the console. The note from C++ and the Warning from this PR:

With latency-histogram there is also a pop-up. With latency-test, this doesn't work. And the most important app, linuxcnc, also just shows noting if you don't start it in a console.

Anyone has a good idea to check easily and globally for real time capability?

There is already a function rtapi_is_realtime(). This is not 100% reliable, if harden_rt() fails, it will return true, even if it runs in SCHED_OTHER. But this can be fixed.

rtapi_is_realtime() is also linked to userspace apps but there it will not work, it checks if the userspace app has realtime... ;-)

I could add a halcmd that checks for realtime. Or a pin that is true when all is ok, false otherwise.

This could then be used in all gui's for an opt-in or opt-out warning. But I am not that deep into all these various gui's and how they communicate with the RT part.

hdiethelm · 2026-06-04T20:29:33Z

On a production system, you may want to warn all the time when this is amiss. Maybe something with a background color turning from gray into gray-red tinted and do something similar consistently in all GUIs.

For dev builds you don't really want this because you want to see what the operator sees while you are working on stuff. I'd go for an opt-in choice by setting a value in the INI file. Maybe something like a boolean [DISPLAY]VISUAL_WARN_NONRT, defaulting to false. Then add the entry commented out in our configs and add a choice in pnconf and friends.

Might be an option that is default on when you deploy an default off in rip-mode? But this is annoying to test.

Otherwise, I would tend for default on, dev's will manage it better to switch it off than users will fight not knowing that they don't have real time enabled. Instead of in ini, might be an environment variable LINUXCNC_NO_RT_WARN. Dev's can set it on their dev machine in .profile if they are annoyed and it works for all test configs.

BTW, just brainstorming options.

hdiethelm · 2026-06-04T21:16:00Z

Just a POC, if you think a halcmd getrt (or better name) would help I can create a PR. Was easy.
You can use that everywhere and it will return 1 if failed / 0 if good.

../bin/halcmd getrt ; echo Return value $?
<commandline>:0: exit value: 1
<commandline>:0: No realtime available
Return value 1

make setuid

../bin/halcmd getrt ; echo Return value $?
Realtime available
Return value 0

You find it on my fork:
https://github.com/hdiethelm/linuxcnc-fork/tree/halcmd_getrt
hdiethelm/linuxcnc-fork@master...hdiethelm:linuxcnc-fork:halcmd_getrt

hdiethelm · 2026-06-04T22:19:05Z

Meanwhile, I found also something that looks like it is exposed to the python code:

linuxcnc/src/hal/halmodule.cc

Line 2382 in 888cb94

PyModule_AddIntConstant(m, "is_rt", rtapi_is_realtime());

But this is broken: #4129

grandixximo · 2026-06-05T00:48:07Z

Thanks both, this is more useful than my original per-tool heuristic.

I have pivoted the PR to use @hdiethelm's halcmd getrt as the single source of truth instead of hand-rolling a setuid-bit / getcap probe in bash and tcl. Both scripts now just run halcmd getrt and warn only when it reports No realtime available. An rtai/non-uspace build, an older halcmd without getrt, or a working realtime setup all stay silent, so the check rides on the authoritative rtapi_is_realtime() path rather than guessing from file permissions. This also drops the weaker logic @hdiethelm rightly flagged.

A few things I would like your input on, since they touch the broader direction in #4118:

Console double-warning. rtapi already prints Note: Using POSIX non-realtime at the source (uspace_posix.cc). For a console tool like latency-test that note is arguably enough, and a second line from the script is the duplication you saw. I am inclined to keep the script warning only for its actionable hint (the make setuid / make setcap pointer) and let the GUI popup be the real value-add in latency-histogram. Happy to drop the latency-test console line entirely if you would rather the C note be the only console source.
getrt invocation/cleanup. Since getrt goes through hal_systemv and a HAL init, calling it standalone before the test seems to bring up an rtapi instance. @hdiethelm, where do you intend callers to invoke it, and does it need a halrun -U afterward so it does not collide with the session the tool then starts? I did not want to bake in a cleanup that could disturb a running setup.
Dev opt-out policy. I wired a LINUXCNC_NO_RT_WARN env opt-out per @hdiethelm's suggestion, which keeps @BsAtHome's dev boxes quiet without a per-kernel heuristic. If the consensus in Feature: Properly warn if no realtime kernel is active #4118 lands on an INI key like [DISPLAY]VISUAL_WARN_NONRT instead, I will switch to that. The env var is easy to set once for all test configs, which is why I started there.

This now depends on the getrt command landing. @hdiethelm, if you open that as its own PR I will rebase on top and reference it.

BsAtHome · 2026-06-05T06:40:55Z

Instantiating a HAL memory segment on invocation may be problematic. It will surely confuse because you have to remember to call halrun -U afterwards. That is not a good design.

Opt-out policies are generally designed to force you to do a thing, even if you do not want to. That is why they should be avoided.

Query realtime status with 'halcmd getrt' (hdiethelm's PR) rather than probing the setuid bit. latency-histogram asks inside its own running session, so no stray HAL segment is created. latency-test drops its check and relies on the existing "POSIX non-realtime" note.

hdiethelm · 2026-06-05T07:17:32Z

Instantiating a HAL memory segment on invocation may be problematic. It will surely confuse because you have to remember to call halrun -U afterwards. That is not a good design.

The way halcmd getrt runs this in the brackground for uspace is by executing rtapi_app getrt. Now there are two possibility's:

rtapi_app is already running (for example you start latency_test in an other terminal): The command is executed and the result returned.
rtapi_app is not yet running: master starts, runs the command and exits again due to instance_count==0. So nothing stays behind. No halrun -U needed. However, this is kind of a low-likelyness race condition: If you manage to break realtime somehow between halcmd getrt and halrun lat.hal, then no error is reported.

You see that in the following test where I added a message when rtapi_app exits:

halcmd loadrt and2
#Note: Using POSIX realtime
halcmd getrt
#Realtime available
halcmd getrt
#Realtime available
pgrep rtapi_app
#3799
halrun -U
#exit master

vs

halcmd getrt
#Note: Using POSIX realtime
#exit master
#Realtime available
halcmd getrt
#Note: Using POSIX realtime
#exit master
#Realtime available
pgrep rtapi_app
#no process running

I don't see any big downside in doing it like that. But i also do not 100% like starting up rtapi_app just to exit right away. Better would be running it with or after halrun in the script. Or might be an approach using a signal / parameter.

Of course, also RTAI / doc and so on needs to be checked / updated before I will call that ready.

Better ideas are welcome. But I prefer using a check executing the same code path for realtime checks always instead the variant before where you then have most likely diverging real time checks spread in all possible apps.

grandixximo · 2026-06-05T07:19:22Z

@hdiethelm I have restructured to avoid the standalone HAL instantiation @BsAtHome flagged:

latency-histogram now calls halcmd getrt from inside its own running session (right after hal start), so it attaches to the realtime already up rather than spinning up a segment that needs a separate halrun -U.
latency-test drops its script-side check entirely and relies on the existing Note: Using POSIX non-realtime from rtapi, which already lands on the same console. That also removes the double-warning you saw.
Dropped the LINUXCNC_NO_RT_WARN opt-out per @BsAtHome; the dev-suppression policy can be decided in Feature: Properly warn if no realtime kernel is active #4118 rather than baked in here.

That keeps the scripts honest, but @BsAtHome's deeper point lands on getrt itself: do_getrt_cmd goes through hal_systemv + a HAL init, so any standalone caller instantiates a segment. Is it worth making getrt probe rtapi_is_realtime() without a full hal_init, so a GUI can ask "is realtime available" cheaply before starting anything? That would let every GUI (including linuxcnc started from a launcher, the #4118 case) query it without session side effects. If you think that is the right shape, this PR can depend on that and I will rebase on top once your getrt lands as its own PR.

hdiethelm · 2026-06-05T07:29:25Z

@grandixximo Nice. I have to test it.

I created a PR, feel free to rebase:
https://github.com/LinuxCNC/linuxcnc/pull/4132/changes

@BsAtHome
Yes, rtapi_app has the annoying behavior that it always creates this memory segment and initializes RT. Even if you exit right away after. However, it looks like this doesn't hurt anything, you can start any app after rtapi_app getrt or other commands that do not increase the instance counter or do anything else than initializing this segment and exiting afterwards.

Any hint's what should be done in this case? Before my pr rtapi_app rework pr, even rtapi_app exit initialized a memory segment when it was not running. ;-)

grandixximo · 2026-06-05T07:34:50Z

Crossed posts, @hdiethelm. Good, your "run it with/after halrun in the script" is exactly what I did for latency-histogram (getrt after hal start, inside the session), so no stray rtapi_app and it also avoids the break-in-between race you noted. For latency-test I leaned on the existing rtapi note rather than a second getrt call; happy to switch it to getrt inside its HAL flow if you would rather every tool go through the one path. I will rebase on your getrt PR once it is up with the RTAI/doc bits.

hdiethelm · 2026-06-05T07:59:21Z

Hmm, just an idea:
Something like this for the test scripts? I find this huge popup's a bit annoying.

@BsAtHome
What do you think about this for all GUI apps? TBD how to inhibit but there will be a way.
4803bb1
Somehow gmocappy doesn't show this error. I guess there is a bug that startup errors are not shown?

BsAtHome · 2026-06-05T08:05:15Z

A (forced) popup is the equivalent of slapping someone in the face.

The error message added to the GUI is actually not an error. Not running RT on a production system may be considered an error.

If you look closely in AXIS' status bar you see "Kein Werkzeug". That is also the place where you want to warn the user. Add a status bar field that is obvious (light red background) yet not invasive.

hdiethelm · 2026-06-05T08:09:20Z

You have a point. I also get annoyed of all this popup's when using the good old microslop... :-D

Now on the status bar: Good idea. How to do that? I am already somewhat deep in the C code, so I can add any needed support there but for GUI's, someone else has to take over.

@grandixximo Can you do this based on whatever from the hal? halcmd / parameter / pin is easy to add for me.

grandixximo · 2026-06-05T08:38:55Z

@hdiethelm thanks, I will rebase this onto #4132 once it settles. @BsAtHome agreed the popup is too much; I will drop the tk_messageBox and warn non-invasively in the test tools instead.

On the cross-GUI status bar, that is the right shape and I am happy to do the AXIS side (a status-bar field with a light-red background, like the existing tool slot) once @hdiethelm's "realtime ok" signal from #4132 exists. The question is where it lives.

My preference is to keep #4107 narrow: rebased on getrt, popup gone, scoped to the latency tools. It can ship as soon as #4132 lands. The GUI-wide warning cannot be written until the HAL pin exists and touches AXIS, gmoccapy and qtvcp, so I would do it as a separate PR tracking #4118 rather than make this small change wait on the slowest part.

That said, if you would rather have one PR own the whole intent, I am fine rescoping #4107 to the GUI-wide warning and retitling it; it just becomes larger and slower. Either way works for me. Which do you prefer?

grandixximo · 2026-06-05T08:41:19Z

@hdiethelm to answer your "halcmd / parameter / pin" question directly: a bool pin is best for the GUIs. AXIS, gmoccapy and qtvcp already monitor HAL pins, so they can reflect realtime state live in the status bar without polling a command. I would steer away from a param given those are heading for deprecation, and a halcmd is the least convenient since a GUI would have to shell out to poll it.

BsAtHome · 2026-06-05T08:41:57Z

I think we first have to agree on the proper conceptual design of how to detect in the different scenarios and what to do with it.

grandixximo · 2026-06-05T08:46:25Z

@BsAtHome agreed, let me put a concrete proposal on the table.

Detection: one source of truth. @hdiethelm's rtapi_is_realtime() path, exposed as a single bool HAL pin ("realtime ok"). Every app reads the same signal, so there is no divergent per-app logic, which was the original concern.

What to do with it: per UI, not one mechanism. The right surface differs by app, so each owns its own rendering rather than forcing a single widget everywhere:

Console tools (latency-test): the existing Note: Using POSIX non-realtime already covers them.
GUIs (AXIS, gmoccapy, qtvcp): an in-window, non-invasive indicator. Obvious but not a slap, e.g. a light-red status-bar field, no forced popups.

One thing to rule out: coloring the window title bar / decoration is not reliable. Plenty of setups have no title bar at all (fullscreen/kiosk panels, some Wayland/WM configs), so the signal has to live inside the app window, not in the chrome.

Still open (defer): production-vs-dev suppression policy. That can ride with #4118 once the pin and the per-UI rendering exist.

Does that match how you see the scenarios?

hdiethelm · 2026-06-05T08:58:35Z

Sounds like a plan. I will create new PR with a signal. Then we can test how this feels and continue from there.

I can do that tomorrow, right now I have other things to do.

About the title bar: The idea was to only use this for the two test apps. If this is cumbersome, might be just modify the text that is already displayed in them.

@grandixximo Can you mark this PR as a draft until we are done?

Sorry about the for- and back. If i dont have a good solution yet, this is often my way of brainstoming. Try things and discard until it is good. Hope this is ok for you.

BsAtHome · 2026-06-05T09:05:41Z

Detection: one source of truth. @hdiethelm's rtapi_is_realtime() path, exposed as a single bool HAL pin ("realtime ok"). Every app reads the same signal, so there is no divergent per-app logic, which was the original concern.

That is only partly satisfactory because for this realtime needs to be running.

You want to know in advance whether your system will be capable of running RT without starting any of it. Then, when you are running, you want to know from various applications what the actual status is by using generic API call or/and HAL pin.

grandixximo · 2026-06-05T09:10:32Z

Hope this is ok for you.

No problem, we brain storm it and come up with something that sticks.

hdiethelm · 2026-06-05T21:35:35Z

Hmm, like so often, more difficult than estimated. So basically two options are needed.

halcmd getrt is gone. halcmd always initializes the hal before anything else, so not suited for this. With uspace, it fully cleans up afterwards, so no real issue. But with rtai, it just fails if rtai is not yet started.

New concept:

Check with script in advance / during runtime:

realtime verify -> Returns 0 if realtime, 1 if not realtime
- If rtapi_app is not running, this goes trough a fast path that doesn't start the hal
- If rtapi_app is running, this goes trough the normal command path
- If rtai build, always realtime

A pin that shows the status:

New persistent component hal_status that is created when the hal is started.
- hal_status.is-realtime 1 when realtime / 0 otherwise

Some quirks where needed:

Inhinbit unload of hal_status -> Otherwhise unload all will try to unload a .so / kernel module which doesn't exist.
The fast path in rtapi_app

Ideas:

hal_status.realtime-type: 0 for no, 1 for RTAI, 2 for posix PREEMPT_DYNAMIC, 3 for PREEMPT_RT and so on

Tested on uspace / rtai, it works.

Alternatives:

Separate realtime check that checks rtapi_app from the outside. Big downside: I am sure this is getting inconsistent soon or is already at time of implementation and we miss an edge case.
Separate hal_status component: I started something. But it has also a downside: It must be loaded in each and every config to work. This is why I decided against it.
Add a pin to a normally loaded component like motion. This is a bit out of place.

Feedback?
Anyone has an alternative idea how to create a pin without having a component directly in hal_lib.c?

hdiethelm · 2026-06-05T21:46:16Z

Hmm, the tests really don't like my persistent hal_status component. Options:

Just create a separate component that has to be loaded in each and every sample config
Drop the idea with the pin and only use realtime verify
Fix the tests. A persistent hal_status component could be useful for many things
Pass the value trough shm directly and add it to the hal api's

BsAtHome · 2026-06-05T22:26:54Z

Hmm, like so often, more difficult than estimated. So basically two options are needed.

Yes, that is why we need a plan first :-)

There are three levels where you want to be able to query the RT state:

when nothing is started to see if the system is actually capable of running RT,
when starting the system,
when running.

These levels of interaction may use different methods to give you the result.

Using HAL while the system is up may not be an entirely bad idea. However, I do not think that most programs would or should use it. Most programs already interact with libraries and the information must be available through the C HAL library and python hal module.

If you want a "persistent" component, then it needs to be HAL itself who does the work and it must be created when the HAL library initializes the shared memory segment. For the namespace, I'd suggest either hal or linuxcnc. Not sure which is the better one. The status is-realtime and realtime-type should probably be R/O parameters.

All that said, if the data is available through hal_lib and halmodule, then why bother with a component?

grandixximo force-pushed the fix/latency-setuid-warning-4044 branch from 77c1e3e to 8274d2d Compare June 2, 2026 13:19

latency: skip RT-privilege warning on non-RT kernels

ecc6b29

Only warn under PREEMPT_RT or RTAI; on a non-RT kernel the privileges do not matter, so the check would be noise.

grandixximo force-pushed the fix/latency-setuid-warning-4044 branch from 228e6ef to a3df81c Compare June 5, 2026 07:19

hdiethelm mentioned this pull request Jun 5, 2026

WIP: New halcmd getrt #4132

Draft

grandixximo marked this pull request as draft June 5, 2026 09:09

Conversation

grandixximo commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Uh oh!

BsAtHome commented Jun 2, 2026

Uh oh!

grandixximo commented Jun 2, 2026

Uh oh!

rodw-au commented Jun 2, 2026

Uh oh!

hdiethelm commented Jun 4, 2026

Uh oh!

hdiethelm commented Jun 4, 2026

Uh oh!

BsAtHome commented Jun 4, 2026

Uh oh!

hdiethelm commented Jun 4, 2026

Uh oh!

hdiethelm commented Jun 4, 2026

Uh oh!

hdiethelm commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hdiethelm commented Jun 4, 2026

Uh oh!

grandixximo commented Jun 5, 2026

Uh oh!

BsAtHome commented Jun 5, 2026

Uh oh!

hdiethelm commented Jun 5, 2026

Uh oh!

grandixximo commented Jun 5, 2026

Uh oh!

hdiethelm commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grandixximo commented Jun 5, 2026

Uh oh!

hdiethelm commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BsAtHome commented Jun 5, 2026

Uh oh!

hdiethelm commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grandixximo commented Jun 5, 2026

Uh oh!

grandixximo commented Jun 5, 2026

Uh oh!

BsAtHome commented Jun 5, 2026

Uh oh!

grandixximo commented Jun 5, 2026

Uh oh!

hdiethelm commented Jun 5, 2026

Uh oh!

BsAtHome commented Jun 5, 2026

Uh oh!

grandixximo commented Jun 5, 2026

Uh oh!

hdiethelm commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hdiethelm commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BsAtHome commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

grandixximo commented Jun 2, 2026 •

edited

Loading

hdiethelm commented Jun 4, 2026 •

edited

Loading

hdiethelm commented Jun 5, 2026 •

edited

Loading

hdiethelm commented Jun 5, 2026 •

edited

Loading

hdiethelm commented Jun 5, 2026 •

edited

Loading

hdiethelm commented Jun 5, 2026 •

edited

Loading

hdiethelm commented Jun 5, 2026 •

edited

Loading