linuxcncrsh: fixes & cleanup #2739

heeplr · 2023-11-07T18:55:09Z

This fixes the current linuxcncrsh segfault bug and a lot of other issues.
I also cleaned/modernized parts of the code, added "get update" command and re-enabled the test (disabled in eca9685), which should pass now.
The individual commits provide verbose details.

andypugh · 2023-12-11T10:21:06Z

Thanks for giving linuxcncrsh some long overdue attention.
I have not gone through the code in detail, but I am a little concerned that you seem to have changed the name of one of the commands (from set_wait to wait_mode). My worry is that this will possibly break existing scripts in user configs.

heeplr · 2023-12-11T10:45:47Z

Thanks for giving linuxcncrsh some long overdue attention.

I think it's a great way to automate linuxcnc and do testing. I also created a quick, uncleaned proof of concept for a rewrite using libcli that should provide a modern UX and cleaner code.

you seem to have changed the name of one of the commands (from set_wait to wait_mode)

Yes. I couldn't find out what "set" refers to other than "set wait mode". I suspected it's an artefact from times when the set command didn't exist and someone forgot to remove the "set_" from the name. That resulted in the weird set set_mode ... construction.

My worry is that this will possibly break existing scripts in user configs.

Indeed there is a possibility. But since linuxcncrsh was almost completly broken beyond sending mdi commands and set_wait, the chance should be pretty low. I'm quite sure that it should fail gracefully, tho unless users don't check for NAKs/errors. So mentioning it in the ChangeLog/Release notes should probably be fine.

I'd prefer to keep it if my assumption is right and the name is "just wrong". But I have no strong feelings about changing it back since it wouldn't exactly hurt to rename it later (e.g. on a new major release).

andypugh · 2023-12-11T11:12:58Z

As an interim measure, could the code respond to both the new and old commands identically? (possibly with a deprecation warning?)
We have a script to auto-update configs, but it only works with HAL and INI files. (Because you can find the HAL files by parsing the INI, and the INI is passed as a parameter to the linuxcnc startup script.

heeplr · 2023-12-11T11:21:33Z

As an interim measure, could the code respond to both the new and old commands identically? (possibly with a deprecation warning?)

That's a good solution. I'll add that.

We have a script to auto-update configs

Autoupdate would be nice, but I'd argue that it's not a config value but an API change and updating user scripts/code/3rd party applications probably is out of scope of linuxcnc.

…ion parameters) * rename pch -> s uniformly * fix double-tokenize e.g. in setDebug() * lots of cleanup for get/setSpindle(), get/setSpindleOverride() and get/setBrake() * fixed segfault of "get spindle" without argument * fixed snprintf() compiler warnings * support multiple spindles * make spindlenum the last input argument * output NML errors after NML failures (untested) * fix off-by-one in max-spindle test * replace EMCMOT_MAX_SPINDLES with actual number of spindles * add missing "zero input items parsed" check for sscanf() * fix cppcheck warnings * handle NML errors in setEstop() * add "get update" command * replace write() and sockWrite() with dprintf() where applicable * eliminate sockWrite() * fix all \n\r -> \r\n (remember "ReturN") * pass string literals if there's no format string * prefer using string literals * use spindle 0 by default, not all spindles * fix helptext * handle errors in all send*() functions * rename *_s -> *Str according to previous style * fix missing argument documentation * make getTeleopEnable() output ON/OFF instead of YES/NO according to setTeleopEnable() using checkOnOff(); fix help text * rename set_wait -> wait_mode * eliminate initMain() * rename initSockets -> initSocket * remove unneeded includes * replace strncpy() with rtapi_strlcpy(); commenting * use sizeof() to loop over array instead of last array element; prefer for over while * introduce OUT() helper macro * replace snprintf(context->outBuf,...) with OUT() macro; don't use string substitution excessively * output TOOL_OFFSET as double * make axisnumber() parse all possible notations * treat set/getFeedOverride percent as double * fix commandSet() returns * rewrite getAbs/Rel*Pos, getJointPos, getPosOffset * fix commandGet return values * cleanup parseCommand * formatting * more error handling * more error user feedback * more input validation * minor cleanups

heeplr · 2023-12-11T21:31:28Z

I've added the alias and encountered two race conditions when homing and probing in the tests.
The issue is likely unrelated to linuxcncrsh and I disabled the test for now.

andypugh · 2024-01-02T00:06:42Z

@heeplr Can you look at why linuxcnc builds on the buildbot are failing? It's the linuxcncrsh test. But it doesn't always fail, and I have not been able to reproduce it.
Example: https://github.com/LinuxCNC/linuxcnc/actions/runs/7380505624/job/20077951641?pr=2823
(It might be due to a bad merge on my part, but I don't think so)

heeplr · 2024-01-02T10:03:43Z

It's failing because of a race condition. The test does set teleop_enable on but a subsequent get teleop_enable still returns off.
The initial set wait_mode done should have prevented that if I understand correctly. But it seems that it's not working. I couldn't find a mechanism that would check when a command is actually done executing so the call could block until then.

I encountered multiple of those race conditions with probing and homing aswell. I ended up disabling test commands to prevent the test from failing.
This roots way deeper than linuxcncrsh but I can't tell exactly where the culprit is. It probably exists for a long time but hasn't been tested.

Maybe just comment out failing commands for now?

PS: Is there another integration test that does something like "run a complete real-world job on simulated hardware"? I couldn't find any (but maybe missed it).

petterreinholdtsen · 2024-01-02T10:58:38Z

[heeplr]

It's failing because of a race condition. The test does set teleop_enable on but a subsequent get teleop_enable still returns off.

What about adding code after the "set teleop_enable on" call to loop for a few seconds to wait for 'get teleop_enable' to return on, to avoid the race condition? It could warn about the issue in the process, but make the test more robust. -- Happy hacking Petter Reinholdtsen

heeplr · 2024-01-02T11:09:47Z

to loop for a few seconds

Adding a few "sleep" commands came to my mind but with all the other racing commands, that would be a lot of "few seconds" and would probably increase total runtime of the test significantly. From a testing POV, this would certainly be better than disabling the tests completely.

Fixing the race conditions or removing the set_wait (now wait_mode) command - i.e. not giving guarantees to the user - would be best.

Note that the test isn't complete. I'm not familiar enough to create a representative "real world job" test and couldn't find a template to copy. But in theory, it should test a lot more different modes/operations/combinations.

petterreinholdtsen · 2024-01-02T11:32:31Z

[heeplr]

Adding a few "sleep" commands came to my mind but with all the other racing commands, that would be a lot of "few seconds" and would probably increase total runtime of the test significantly. From a testing POV, this would certainly be better than disabling the tests completely.

Just adding sleep is quite unreliable, as the various machines running the tests have widely different performance. This is why I suggest a loop with a upper bound on the number of seconds sleeping.

…

-- Happy hacking Petter Reinholdtsen

heeplr · 2024-01-02T11:43:44Z

@petterreinholdtsen yes, such a loop would help. Feel free to add it. I'd rather spend my time hunting down the actual issue tho (edit: after #2760 is resolved) , since there must be some mechanism for "block-until-cmd-done" already, which is broken.

As can be seen in several test failures on github and discussed in LinuxCNC#2739, the linuxcncrsh test is unstable. It is believed to be caused by a race condition. Until the race condition is fixed, I believe it is best to skip the test.

petterreinholdtsen · 2024-01-02T12:14:34Z

[heeplr]

@petterreinholdtsen yes, such a loop would help. Feel free to add it. I'd rather spend my time hunting down the actual issue tho, since there must be some mechanism for "block-until-cmd-done" already, which is broken.

While you hunt, I propose to simply skip the test until it become stable in <URL: #2824 >, to avoid blocking other patches from getting a successful test.

…

-- Happy hacking Petter Reinholdtsen

heeplr · 2024-01-02T12:58:04Z

to avoid blocking other patches from getting a successful test.

@petterreinholdtsen are we 100% sure that this isn't caused by some recent patch? Since the test did run fine for three weeks until now. Might be worth confirming before merging #2824 to silence the issue.

heeplr force-pushed the linuxcncrsh branch from 3484cc7 to b4f5d32 Compare November 7, 2023 20:31

heeplr changed the title ~~linuxcncrsh: fixes & partial cleanup~~ linuxcncrsh: fixes & cleanup Nov 7, 2023

heeplr force-pushed the linuxcncrsh branch from b4f5d32 to 5f4acce Compare November 16, 2023 15:09

heeplr force-pushed the linuxcncrsh branch 2 times, most recently from 89da543 to eafefff Compare December 11, 2023 14:45

daniel added 2 commits December 11, 2023 16:12

more extensive testing

8a905c0

add "set_wait" alias for "wait_mode" to deprecate more gracefully

8c7aa69

heeplr force-pushed the linuxcncrsh branch from eafefff to 8c7aa69 Compare December 11, 2023 15:13

andypugh merged commit c61d428 into LinuxCNC:master Dec 14, 2023
11 checks passed

heeplr mentioned this pull request Dec 15, 2023

/src/emc/usr_intf/emcrsh.cc function getToolOffset. Wrong param in snprintf. #2784

Open

petterreinholdtsen mentioned this pull request Jan 2, 2024

Skip test linuxcncrsh until it become more stable. #2824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linuxcncrsh: fixes & cleanup #2739

linuxcncrsh: fixes & cleanup #2739

heeplr commented Nov 7, 2023

andypugh commented Dec 11, 2023 •

edited

Loading

heeplr commented Dec 11, 2023

andypugh commented Dec 11, 2023

heeplr commented Dec 11, 2023

heeplr commented Dec 11, 2023

andypugh commented Jan 2, 2024 •

edited

Loading

heeplr commented Jan 2, 2024

petterreinholdtsen commented Jan 2, 2024 via email

heeplr commented Jan 2, 2024

petterreinholdtsen commented Jan 2, 2024 via email

heeplr commented Jan 2, 2024 •

edited

Loading

petterreinholdtsen commented Jan 2, 2024 via email

heeplr commented Jan 2, 2024

linuxcncrsh: fixes & cleanup #2739

linuxcncrsh: fixes & cleanup #2739

Conversation

heeplr commented Nov 7, 2023

andypugh commented Dec 11, 2023 • edited Loading

heeplr commented Dec 11, 2023

andypugh commented Dec 11, 2023

heeplr commented Dec 11, 2023

heeplr commented Dec 11, 2023

andypugh commented Jan 2, 2024 • edited Loading

heeplr commented Jan 2, 2024

petterreinholdtsen commented Jan 2, 2024 via email

heeplr commented Jan 2, 2024

petterreinholdtsen commented Jan 2, 2024 via email

heeplr commented Jan 2, 2024 • edited Loading

petterreinholdtsen commented Jan 2, 2024 via email

heeplr commented Jan 2, 2024

andypugh commented Dec 11, 2023 •

edited

Loading

andypugh commented Jan 2, 2024 •

edited

Loading

heeplr commented Jan 2, 2024 •

edited

Loading