-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linuxcncrsh: fixes & cleanup #2739
Conversation
Thanks for giving linuxcncrsh some long overdue attention. |
I think it's a great way to automate linuxcnc and do testing. I also created a quick, uncleaned proof of concept for a rewrite using libcli that should provide a modern UX and cleaner code.
Yes. I couldn't find out what "set" refers to other than "set wait mode". I suspected it's an artefact from times when the
Indeed there is a possibility. But since linuxcncrsh was almost completly broken beyond sending mdi commands and set_wait, the chance should be pretty low. I'm quite sure that it should fail gracefully, tho unless users don't check for NAKs/errors. So mentioning it in the ChangeLog/Release notes should probably be fine. I'd prefer to keep it if my assumption is right and the name is "just wrong". But I have no strong feelings about changing it back since it wouldn't exactly hurt to rename it later (e.g. on a new major release). |
As an interim measure, could the code respond to both the new and old commands identically? (possibly with a deprecation warning?) |
That's a good solution. I'll add that.
Autoupdate would be nice, but I'd argue that it's not a config value but an API change and updating user scripts/code/3rd party applications probably is out of scope of linuxcnc. |
…ion parameters) * rename pch -> s uniformly * fix double-tokenize e.g. in setDebug() * lots of cleanup for get/setSpindle(), get/setSpindleOverride() and get/setBrake() * fixed segfault of "get spindle" without argument * fixed snprintf() compiler warnings * support multiple spindles * make spindlenum the last input argument * output NML errors after NML failures (untested) * fix off-by-one in max-spindle test * replace EMCMOT_MAX_SPINDLES with actual number of spindles * add missing "zero input items parsed" check for sscanf() * fix cppcheck warnings * handle NML errors in setEstop() * add "get update" command * replace write() and sockWrite() with dprintf() where applicable * eliminate sockWrite() * fix all \n\r -> \r\n (remember "ReturN") * pass string literals if there's no format string * prefer using string literals * use spindle 0 by default, not all spindles * fix helptext * handle errors in all send*() functions * rename *_s -> *Str according to previous style * fix missing argument documentation * make getTeleopEnable() output ON/OFF instead of YES/NO according to setTeleopEnable() using checkOnOff(); fix help text * rename set_wait -> wait_mode * eliminate initMain() * rename initSockets -> initSocket * remove unneeded includes * replace strncpy() with rtapi_strlcpy(); commenting * use sizeof() to loop over array instead of last array element; prefer for over while * introduce OUT() helper macro * replace snprintf(context->outBuf,...) with OUT() macro; don't use string substitution excessively * output TOOL_OFFSET as double * make axisnumber() parse all possible notations * treat set/getFeedOverride percent as double * fix commandSet() returns * rewrite getAbs/Rel*Pos, getJointPos, getPosOffset * fix commandGet return values * cleanup parseCommand * formatting * more error handling * more error user feedback * more input validation * minor cleanups
89da543
to
eafefff
Compare
I've added the alias and encountered two race conditions when homing and probing in the tests. |
@heeplr Can you look at why linuxcnc builds on the buildbot are failing? It's the linuxcncrsh test. But it doesn't always fail, and I have not been able to reproduce it. |
It's failing because of a race condition. The test does set teleop_enable on but a subsequent get teleop_enable still returns off. I encountered multiple of those race conditions with probing and homing aswell. I ended up disabling test commands to prevent the test from failing. Maybe just comment out failing commands for now? PS: Is there another integration test that does something like "run a complete real-world job on simulated hardware"? I couldn't find any (but maybe missed it). |
[heeplr]
It's failing because of a race condition. The test does set
teleop_enable on but a subsequent get teleop_enable still returns off.
What about adding code after the "set teleop_enable on" call to loop for
a few seconds to wait for 'get teleop_enable' to return on, to avoid the
race condition? It could warn about the issue in the process, but make
the test more robust.
--
Happy hacking
Petter Reinholdtsen
|
Adding a few "sleep" commands came to my mind but with all the other racing commands, that would be a lot of "few seconds" and would probably increase total runtime of the test significantly. From a testing POV, this would certainly be better than disabling the tests completely. Fixing the race conditions or removing the set_wait (now wait_mode) command - i.e. not giving guarantees to the user - would be best. Note that the test isn't complete. I'm not familiar enough to create a representative "real world job" test and couldn't find a template to copy. But in theory, it should test a lot more different modes/operations/combinations. |
[heeplr]
Adding a few "sleep" commands came to my mind but with all the other
racing commands, that would be a lot of "few seconds" and would
probably increase total runtime of the test significantly. From a
testing POV, this would certainly be better than disabling the tests
completely.
Just adding sleep is quite unreliable, as the various machines running
the tests have widely different performance. This is why I suggest a
loop with a upper bound on the number of seconds sleeping.
…--
Happy hacking
Petter Reinholdtsen
|
@petterreinholdtsen yes, such a loop would help. Feel free to add it. I'd rather spend my time hunting down the actual issue tho (edit: after #2760 is resolved) , since there must be some mechanism for "block-until-cmd-done" already, which is broken. |
As can be seen in several test failures on github and discussed in LinuxCNC#2739, the linuxcncrsh test is unstable. It is believed to be caused by a race condition. Until the race condition is fixed, I believe it is best to skip the test.
[heeplr]
@petterreinholdtsen yes, such a loop would help. Feel free to add
it. I'd rather spend my time hunting down the actual issue tho, since
there must be some mechanism for "block-until-cmd-done" already, which
is broken.
While you hunt, I propose to simply skip the test until it become stable
in <URL: #2824 >, to avoid
blocking other patches from getting a successful test.
…--
Happy hacking
Petter Reinholdtsen
|
@petterreinholdtsen are we 100% sure that this isn't caused by some recent patch? Since the test did run fine for three weeks until now. Might be worth confirming before merging #2824 to silence the issue. |
This fixes the current linuxcncrsh segfault bug and a lot of other issues.
I also cleaned/modernized parts of the code, added "get update" command and re-enabled the test (disabled in eca9685), which should pass now.
The individual commits provide verbose details.