Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gmoccapy MSG: Must be in MDI mode..... #2453

Closed
zz912 opened this issue Apr 27, 2023 · 80 comments
Closed

Gmoccapy MSG: Must be in MDI mode..... #2453

zz912 opened this issue Apr 27, 2023 · 80 comments

Comments

@zz912
Copy link
Contributor

zz912 commented Apr 27, 2023

I use RIP LCNC Branche 2.9

Simulate the problem:

  1. edit /home/user/linuxcnc/linuxcnc-2.9/configs/sim/gmoccapy/gmoccapy.ini
    add:
[HALUI]
MDI_COMMAND = M61 Q5
MDI_COMMAND = M61 Q2
  1. Run LCNC
    Gmoccapy-MSG Must be in MDI mode

  2. This error is random, so it may not appear the first time.
    Constantly switch to MDI mode and enable/disable HAL pins halui.mdi-command-00 and halui.mdi-command-01

MDI commands run fine, but sometimes this message pops up.

I have more sophisticated commands on the real machine. I was in MDI-mode the whole time when I tuned them and never had this problem. However, for normal use of my commands, I want to be in JOG mode and that's where the message appears.

@gmoccapy
Copy link
Collaborator

As there are three modes, Manual and MDI and AUTO the behavior of gmoccapy is exactly as it should!

No MDI commands in MANUAL mode.
To execute commands from MANUAL mode, you could combine MDI commands, first command switch to MDI then execute the MDI command you want and switch back to manual mode. You may use the corresponding Halui commands to change Modes.

Doing this, may result in a short flicker of the screen, as gmoccapy will change the screen design due to the mode switch.

Norbert

@zz912
Copy link
Contributor Author

zz912 commented Apr 28, 2023

Hello Norbert,

  1. Thanks for explaining the problem. Now I know how to solve it.

  2. I think Gmoccapy behavior is not ideal. What bothers me the most is that it acts randomly. Paradoxically, more often it allows you to execute an MDI command without this message popping up, and every time the MDI commands are executed in MANUAL. It is not absolutely necessary, but it would be nice if the message appeared regularly and the execution of MDI commands was really prohibited.

  3. Would it be possible to add a "HAL pin" to Gmoccapy that would pause the redraw of the Gmoccapy screen? I could make sure that it wouldn't flash short flicker of the screen.
    Or would it be possible to add a "HAL pin" to Gmoccapy that would allow make MDI execution in every modes?
    Or is there any other solution to prevent short flicker of the screen?

Zdeněk

@zz912
Copy link
Contributor Author

zz912 commented Apr 28, 2023

I see fight between Gmoccapy and Halui.

Gmoccapy wants "No MDI commands in MANUAL mode."

But Halui sets MDI mode for MDI_COMMAND event.
Look here:


(I hope, that I understood halui code correct)

First we should define the correct LCNC behavior when changing the halui.mdi-command-XX pin

@hansu
Copy link
Member

hansu commented Apr 29, 2023

[Norbert]

To execute commands from MANUAL mode, you could combine MDI commands, first command switch to MDI then execute the MDI command you want and switch back to manual mode. You may use the corresponding Halui commands to change Modes.

It seems that halui is doing that already.

[Zdeněk]
I tested that several times and I don't get that message.
When I run the MDI command in manual mode, LinuxCNC switches to MDI mode, runs the command and then switch back to Manual mode. Indeed some flickering especially if you have enabled the on-screen keyboard.

@zz912
Copy link
Contributor Author

zz912 commented Apr 29, 2023

[HansU]
Can I ask for modify test?

[HALUI]
MDI_COMMAND = M61 Q5 G4 P0.5
MDI_COMMAND = M61 Q2 G4 P0.5

Now I tried the test without the G4 on a third PC and the problem did not appear like you. With the G4, it will show, but only for the twentieth time.

@zz912
Copy link
Contributor Author

zz912 commented Apr 30, 2023

I tested it with Axis and there were not any problems.
Axis-MDI_command

@hansu
Copy link
Member

hansu commented Apr 30, 2023

I still don't get that message. Does is appear when you set the execute command pin or on startup?

@zz912
Copy link
Contributor Author

zz912 commented Apr 30, 2023

I will make a video of the screen tonight, how to simulate the error.

@zz912
Copy link
Contributor Author

zz912 commented Apr 30, 2023

Here is video:
https://user-images.githubusercontent.com/96618597/235367645-18399092-23a7-4bb3-8516-3df2d3b17171.mp4

The bug behaves very randomly. On this video, the message occurs more often than in other cases. Maybe the video recorder supports this bug. Even so, you can see the randomness in the video. I usually have to click through Set and Clr far more often to get it to appear at least once. Furthermore, I found that increasing the probability of a bug occurring can be ensured by increasing the number of G and M codes in MDI_COMMAND.

@hansu
Copy link
Member

hansu commented Apr 30, 2023

The video file seems to be damaged. Please upload again.

@zz912
Copy link
Contributor Author

zz912 commented Apr 30, 2023

The video is OK. It works in some player and in some player not. Can I ask you for VLC player use?

I test it in 2 PC and it works. In my mobile it works not.

@hansu
Copy link
Member

hansu commented Apr 30, 2023

Ok it works if I download it. Just didn't work in the browser. Furthermore Github is now capable of embedding videos.

@zz912
Copy link
Contributor Author

zz912 commented May 3, 2023

Hi Hans,

did you manage to simulate this bug?

@hansu
Copy link
Member

hansu commented May 4, 2023

No I still couldn't reproduce it. But maybe that's because I am running it on a VM. I'll have another try on a real machine...

@zz912
Copy link
Contributor Author

zz912 commented May 4, 2023

That's weird. I am able to simulate it on 2 PCs and 1 VM. Try reducing the CPU on the VM. I have a feeling that the error is more frequent when the CPU is more heavily loaded. Would it help if I gave you access to my PC?

@rene-dev
Copy link
Collaborator

rene-dev commented May 4, 2023

the three modes, Manual, MDI and AUTO are not a feature of gmocappy, but of linuxcnc task. I believe you do not see the behavior in axis, as it does not switch the ui when task changes modes.

@gmoccapy
Copy link
Collaborator

gmoccapy commented May 4, 2023

Try reducing the cycle time (INI setting),
Default is 100 , try to set the value to 150 or 200

I could reproduce the error, but only on one PC (poor CPU Power) setting the cycle time to 150 solved the problem on my PC.

Norbert

@zz912
Copy link
Contributor Author

zz912 commented May 5, 2023

[rene-dev]

the three modes, Manual, MDI and AUTO are not a feature of gmocappy, but of linuxcnc task. I believe you do not see the behavior in axis, as it does not switch the ui when task changes modes.

You're right. #2453 (comment)

[Norbert]

Try reducing the cycle time (INI setting), Default is 100 , try to set the value to 150 or 200

I could reproduce the error, but only on one PC (poor CPU Power) setting the cycle time to 150 solved the problem on my PC.

Norbert

I tried 300ms and it did not help. :-(

I would like to show you one more thing about this bug. Sometimes Gmoccapy gets stuck in MDI mode. Watch this video. At time 00:00 Gmoccapy is in MANUAL mode and then after executing MDI_COMMAND it remains in MDI mode.

MDI_screen_stuck.mp4

In this video CYCLE_TIME = 300

@gmoccapy
Copy link
Collaborator

gmoccapy commented May 5, 2023

Gmoccapy do change the screen Design according to the MODE selection button or due to signals from LinuxCNC and so far it works as it should.

The problem seems to be related to the MDI commands. If you try with an MDI command with any movement i.e. G91 G0 Z0.001 you will not be able to reproduce the error, as LinuxCNC will emit the signal of the actual Mode but if you use a command without a Gcode Brake the signal will not be emmited.

I do not know a solution at this moment.
I will finish my house building in about 3 to 6 month so agter that I will have more time to look at that kind of behavior.

Norbert

@zz912
Copy link
Contributor Author

zz912 commented May 5, 2023

I tried:

[HALUI]
MDI_COMMAND = M61 Q5  G91 G0 Z-5.001
MDI_COMMAND = M61 Q2  G91 G0 Z-10.001

It did not help. I don't need this bug fixed immediately, but I could ask you to label this bug 2.9-must-fix, or add it to some list of things that must be fixed.

@zz912
Copy link
Contributor Author

zz912 commented May 7, 2023

I tried to do the same with TAG 2.8.4

  1. edit /home/zdenek/linuxcnc/linuxcnc-2.8.4/configs/sim/gmoccapy/gmoccapy.ini
    add:
[HALUI]
MDI_COMMAND = M61 Q5  G91 G0 Z-5.001
MDI_COMMAND = M61 Q2  G91 G0 Z-10.001
  1. edit /home/zdenek/linuxcnc/linuxcnc-2.8.4/configs/sim/gmoccapy/gmoccapy_postgui.hal
    add:
net pokus-00  halui.mist.is-on halui.mdi-command-00
net pokus-01  halui.flood.is-on halui.mdi-command-01

In the LCNC 2.8 version, the SET and CLR buttons are not in the halshow, so I used the MIST and FLOOD buttons.

  1. Run LCNC

The video shows that the error also appears in version 2.8.4.
https://user-images.githubusercontent.com/96618597/236683209-0ad5658a-c7f5-49a8-ae24-304b169c125c.mp4

@zz912
Copy link
Contributor Author

zz912 commented May 7, 2023

I did the test from the previous post on other versions:
linuxcnc-2.7.15 - works good without message "Must be in MDI mode to issue MDI command"
linuxcnc-2.8.0 - bug
linuxcnc-2.8.2 - bug
linuxcnc-2.8.3 - bug
linuxcnc-2.8.4 - bug
linuxcnc-2.9 - bug

Finding this error is challenging. Appears randomly.

Here is the terminal listing when I press the button and execute MDI_COMMAND.

LCNC 2.7.15:

Emit interp-run
3 2
Emit interp-run

LCNC 2.8.0:

MANUAL Mode
IDLE
hal status motion mode changed

LCNC 2.8.0 bug:

3 2
('MDI Mode', False)
RUN
hal status motion mode changed
Must be in MDI mode to issue MDI command
MANUAL Mode
IDLE
hal status motion mode changed

[HansU]
Have you tried lowering the CPU frequency in the VM to simulate this bug? Norbert wrote that he managed to simulate it on one PC (poor CPU Power), maybe it would help you to see this bug. If you can't simulate this bug, I could send you my VM, but I won't have access to it until Tuesday.

@zz912
Copy link
Contributor Author

zz912 commented May 9, 2023

I asked Fupe to try to look into this problem of mine.

At first I was disappointed because his feedback was that he was unable to simulate the problem on either a physical PC or a Virtual Machine.

Finally we found out that he is not using Oracle VM, but using another VM. So based on his help, we know that we need to use Oracle VM to simulate this bug.

Next, I verified that it is really necessary to reduce the CPU performance. At 100% CPU the error did not appear, at 30% the error appears stably.

vm

@hansu
Copy link
Member

hansu commented May 9, 2023

No, I still don't get that message even when limiting the CPU to 30 % :/

@zz912
Copy link
Contributor Author

zz912 commented May 9, 2023

Thank you for info.

@zz912
Copy link
Contributor Author

zz912 commented May 10, 2023

Hello,

I am very unhappy that you are not able to simulate my bug. I made another attempt today.

I made a new Virtual Box in Oracle VM. I installed it with linuxcnc-2.8.2-buster.iso file.

In order to install this iso file, it is necessary to set at least 2 processor cores, otherwise the installation will fail.

After installing I updated:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-update

I started linuxcnc and opened the Gmoccapy sim.

I closed linuxcnc

edit /home/zdenek/linuxcnc/configs/sim.gmoccapy/gmoccapy.ini
add:

[HALUI]
MDI_COMMAND = M61 Q5 G91 G0 Z-5.001
MDI_COMMAND = M61 Q2 G91 G0 Z-10.001

edit /home/zdenek/linuxcnc/configs/sim.gmoccapy/gmoccapy_postgui.hal
add:

net attempt-00 halui.mist.is-on halui.mdi-command-00
net attempt-01 halui.flood.is-on halui.mdi-command-01

Now when I tried to simulate the bug, the bug did not appear.

Then I reduced the CPU performance and the bug appeared.

[hansu]
Can I ask for another test with lower CPU performance?

Here you can download the VM file for Oracle VM VirtualBox:
https://ulozto.cz/tamhle/cVRcwtDslND0#!ZGSvLmR2Awx5LmNjMJWwBGWyAQx3LGORGzkHFaqPEUEPEGZkLD==
login: zdenek
password: dedadeda

I did the installation twice. Once in Czech and once in English. There was no difference.

@hansu
Copy link
Member

hansu commented May 10, 2023

Yeah I can try a bit more around. But this also might depend on the power of teh host machine. What does your VM host have for a CPU?

@zz912
Copy link
Contributor Author

zz912 commented May 10, 2023

Is there an answer to your question?
image

If you think of anything else I could try, let me know.

If you still can't simulate it, I plan to try it on an old physical PC with bad latency and mail it to you.

@hansu
Copy link
Member

hansu commented May 10, 2023

No nothing :(
I tried down to 2 cores, 20% cpu limit where the OS on the VM is almost not responding, but no error message.
I have an old PC I can install LinuxCNC on and I can also try ob my laptop which only have a Core i5.
But not sure when I'll have time for this.

@zz912
Copy link
Contributor Author

zz912 commented May 11, 2023

Can I ask you if you could try changing the INI like this:

[HALUI]
MDI_COMMAND = M61 Q5  G91 G0 Z-5.001  G4 P5
MDI_COMMAND = M61 Q2  G91 G0 Z-10.001  G4 P5

?

Stay in VM with low CPU.

I found that the more commands in MDI_COMMANDS, the more occurrences of that message.

Here it is even interesting that sometimes it ignores the G4 command and does not give any message.

@zz912
Copy link
Contributor Author

zz912 commented Jun 2, 2023

Thank you Norbert for your response.

Why do you think the interprate state is the problem?

I know the interpreter status is related to this error.
I don't know if the state of the interpreter is the result or the cause of this error.
I currently assume that the cause of this error is the state of the interpreter. I am currently trying to confirm or disprove this assumption.

I made another attempt. I added these lines to gmooccapy.py.

    def _periodic(self):
        # we put the poll command in a try, so if the linuxcnc pid is killed
        # from an external command, we also quit the GUI
        try:
            self.stat.poll()
        except:
            raise SystemExit("gmoccapy can not poll linuxcnc status any more")

        if hal.get_value("halui.program.is-idle") == False or self.stat.interp_state !=1:
            print("halui.program.is-idle: %s" % hal.get_value("halui.program.is-idle"))
            print("self.stat.interp_state: %i" % self.stat.interp_state)

Result 1: (Look at the last line)

halui.program.is-idle: False
self.stat.interp_state: 2
halui.program.is-idle: False
self.stat.interp_state: 2
............
............
halui.program.is-idle: False
self.stat.interp_state: 2
halui.program.is-idle: False
self.stat.interp_state: 2
halui.program.is-idle: False
self.stat.interp_state: 1

Result 2:

halui.program.is-idle: False
self.stat.interp_state: 2
halui.program.is-idle: False
self.stat.interp_state: 2
............
............
halui.program.is-idle: False
self.stat.interp_state: 2
halui.program.is-idle: False
self.stat.interp_state: 2
halui.program.is-idle: False
self.stat.interp_state: 2

These results were generated during the MDI_COMMAND run.

The results of this test can be explained by two theories.
Theory 1:
self.stat.interp_state works fine and halui.program.is-idle is just slower and I'm on the wrong track.
Theory 2:
halui.program.is-idle works correctly and self.stat.interp_state switches to IDLE earlier than it should.

My guess is that theory 2 is the correct one. I don't have proof for it, but a lot of other experiments suggest that it might be.

So if you want to try to go deaper, I would suggest to look in the halui Hal pin stuff.

I have already studied this:

Source code for MDI_COMMAND finish is here:

// determine when a MDI command actually finishes normally.
if (interp_list.len() == 0 &&
emcTaskCommand == 0 &&
emcStatus->task.execState == EMC_TASK_EXEC_DONE &&
emcStatus->task.interpState != EMC_TASK_INTERP_IDLE &&
emcStatus->motion.traj.queue == 0 &&
emcStatus->io.status == RCS_DONE &&
!mdi_execute_wait &&
!mdi_execute_next) {
// finished. Check for dequeuing of queued MDI command is done in emcTaskPlan().
if (emc_debug & EMC_DEBUG_TASK_ISSUE)
rcs_print("mdi_execute_hook: MDI command '%s' done (remaining: %d)\n",
emcStatus->task.command, mdi_input_queue.len());
emcStatus->task.command[0] = 0;
emcStatus->task.interpState = EMC_TASK_INTERP_IDLE;
}

Source code for halui IDLE is here:

*(halui_data->program_is_idle) = emcStatus->task.interpState == EMC_TASK_INTERP_IDLE;

But now I would like to find the Source code for Python IDLE. Unfortunately I don't know where to look.

@zz912
Copy link
Contributor Author

zz912 commented Jun 11, 2023

I bought a new, more powerful computer
DELL 9010 SFF: INTEL i5/ 16GB/ SSD 240GB/
to eliminate this problem.
I installed bookworm on it.
Unfortunately, the problem still persists.

It is strange that you are not able to simulate the problem. On the other hand, I believe you, because this bug was hiding from me for two Sundays. He is insidious.

I have been working on this issue since Apr 27. Is there anything that could motivate you developers to fix the bug?

@Sigma1912
Copy link
Contributor

Sigma1912 commented Jun 14, 2023

I can reproduce this on a 2.10pre build from march. This is a simulation machine without RT-kernel.
What I have noticed is that sometimes while in the jogging screen and changing the mdi-command-xx pins in halshow it will switch to MDI screen but not go back to the joggin screen as it usually does.

@zz912
Copy link
Contributor Author

zz912 commented Jun 14, 2023

Hello everybody,

I spent this evening again looking for the source of this bug.

I am convinced that the source of this bug is the bad functionality of EMC_TASK_INTERP. I spent tonight to prove that EMC_TASK_INTERP works badly.

I would like to ask you to confirm or refute my theory.

Since I don't want to waste your precious time, I have prepared an attempt that has no dependence on previous posts. If you are willing to help me, just read only this post.

In this part of the Gmoocapy code

def on_hal_status_interp_idle(self, widget):
LOG.debug("IDLE")
if self.load_tool:
return
widgetlist = ["ntb_jog", "btn_from_line",
"tbtn_flood", "tbtn_mist", "rbt_forward", "rbt_reverse", "rbt_stop",
"btn_load", "btn_edit", "tbtn_optional_blocks", "btn_reload"
]
if not self.widgets.rbt_hal_unlock.get_active() and not self.user_mode:
widgetlist.append("tbtn_setup")
if not self.widgets.tbtn_setup.get_active():
widgetlist.append("rbt_manual")
if self.all_homed or self.no_force_homing:
if not self.widgets.tbtn_setup.get_active():
widgetlist.append("rbt_mdi")
widgetlist.append("rbt_auto")
widgetlist.append("btn_index_tool")
widgetlist.append("btn_change_tool")
widgetlist.append("btn_select_tool_by_no")
widgetlist.append("btn_tool_touchoff_x")
widgetlist.append("btn_tool_touchoff_z")
widgetlist.append("btn_touch")
# This happen because hal_glib does emit the signals in the order that idle is emitted later that estop
if self.stat.task_state == linuxcnc.STATE_ESTOP or self.stat.task_state == linuxcnc.STATE_OFF:
self._sensitize_widgets(widgetlist, False)
else:
self._sensitize_widgets(widgetlist, True)
for btn in self.macrobuttons:
btn.set_sensitive(True)
if self.onboard:
self._change_kbd_image("img_macro_menu_keyboard")
else:
self._change_kbd_image("img_macro_menu_stop")
self.macro_dic["keyboard"].set_sensitive(False)
self.widgets.btn_run.set_sensitive(True)
self.widgets.btn_stop.set_sensitive(False)
if self.tool_change:
self.command.mode(linuxcnc.MODE_MANUAL)
self.command.wait_complete()
self.tool_change = False
self.halcomp["program.current-line"] = 0
self.halcomp["program.progress"] = 0.0

I would assume that the commands will be executed when the LCNC is only in IDLE.
QUESTION 1:
Is my assumption correct?

To verify that LCNC is in IDLE, I added the following lines to Gmoccapy's source code:

   def on_hal_status_interp_idle(self, widget):
        print("HAF HAF - linuxcnc.INTERP_IDLE = " + str(linuxcnc.INTERP_IDLE))
        print("HAF HAF - linuxcnc.INTERP_READING = " + str(linuxcnc.INTERP_READING))
        print("HAF HAF - linuxcnc.INTERP_PAUSED = " + str(linuxcnc.INTERP_PAUSED))
        print("HAF HAF - linuxcnc.INTERP_WAITING = " + str(linuxcnc.INTERP_WAITING))
        print("HAF HAF - now is " + str(self.stat.interp_state))

I would expect self.stat.interp_state to be equal to linuxcnc.INTERP_IDLE.
QUESTION 2:
Is my expectation correct?

Now let's watch the video:
Peek 2023-06-14 21-37

When we look at the video, we find out that mostly LCNC is READING (2) and at the end of the video, LCNC is IDLE (1).
QUESTION 3:
Can LCNC READING (2) be considered a bug?
QUESTION 4:
Can the random status of LCNC IDLE (1) and LCNC READING (2) be considered a bug?

This video shows mostly state 2 and once 1. During longer testing, states 1 and 2 appear more randomly.

Please do not take this post of mine as offensive or sarcastic. I really just need help. I am very unhappy with this bug.

I asked the questions in such a way that they could be answered simply yes/no and did not delay you.
QUESTION 1: yes/no
QUESTION 2: yes/no
QUESTION 3: yes/no
QUESTION 4: yes/no

@phillc54
Copy link
Collaborator

Do you need to poll the status channel to ensure that it is up to date before reading it?
self.stat.poll()

@zz912
Copy link
Contributor Author

zz912 commented Jun 15, 2023

Do you need to poll the status channel to ensure that it is up to date before reading it?
self.stat.poll()

Oh yeah, you're right. I'm an idiot :-(.

3 2
HAF HAF - self.stat.poll() was executed
HAF HAF - linuxcnc.INTERP_IDLE = 1
HAF HAF - linuxcnc.INTERP_READING = 2
HAF HAF - linuxcnc.INTERP_PAUSED = 3
HAF HAF - linuxcnc.INTERP_WAITING = 4
HAF HAF - now is 1
Must be in MDI mode to issue MDI command
3 2
Must be in MDI mode to issue MDI command
HAF HAF - self.stat.poll() was executed
HAF HAF - linuxcnc.INTERP_IDLE = 1
HAF HAF - linuxcnc.INTERP_READING = 2
HAF HAF - linuxcnc.INTERP_PAUSED = 3
HAF HAF - linuxcnc.INTERP_WAITING = 4
HAF HAF - now is 1
3 2
3 2
HAF HAF - self.stat.poll() was executed
HAF HAF - linuxcnc.INTERP_IDLE = 1
HAF HAF - linuxcnc.INTERP_READING = 2
HAF HAF - linuxcnc.INTERP_PAUSED = 3
HAF HAF - linuxcnc.INTERP_WAITING = 4
HAF HAF - now is 1
3 2
HAF HAF - self.stat.poll() was executed
HAF HAF - linuxcnc.INTERP_IDLE = 1
HAF HAF - linuxcnc.INTERP_READING = 2
HAF HAF - linuxcnc.INTERP_PAUSED = 3
HAF HAF - linuxcnc.INTERP_WAITING = 4
HAF HAF - now is 1


@zz912
Copy link
Contributor Author

zz912 commented Jun 15, 2023

Sigma1912

I can reproduce this on a 2.10pre build from march. This is a simulation machine without RT-kernel. What I have noticed is that sometimes while in the jogging screen and changing the mdi-command-xx pins in halshow it will switch to MDI screen but not go back to the joggin screen as it usually does.

This bug is possibly the cause of other problems. If it's not resolved, we can't move on. For example, when this error occurs, nonsensical tool corrections are displayed.
Wrong_correction

That's why I'm sorry that there is no interest in fixing this bug because it makes using LCNC very dangerous when using ATC.

@Sigma1912
Copy link
Contributor

Sigma1912 commented Jun 15, 2023

AXIS gui does not seem to have this bug, would that not indicate that the problem is with gmoccpy rather than with python stat or halui?

@zz912
Copy link
Contributor Author

zz912 commented Jun 15, 2023

AXIS gui does not seem to have this bug,

This bug is really insidious, the fact that it did not show up in AXIS does not mean that it is not there.
I'm paranoid. I've been looking for that bug for a long time.

Assuming this bug is not in AXIS, we can rule out halui.

I think AXIS doesn't use python stat. It is so?
That's why I didn't rule out python stat.

@phillc54
Copy link
Collaborator

AXIS does use linuxcnc.stat
L3540

@zz912
Copy link
Contributor Author

zz912 commented Jun 15, 2023

AXIS does use linuxcnc.stat L3540

Thank you. I did not know it.

@Sigma1912
Copy link
Contributor

I have not been able to reproduce this bug in AXIS despite changing mdi-command-xx for about a hundred times in jog mode, zero issue while in GMOCCAPY it pops up pretty much right away on my machine.
Maybe comparing the relevant code of axis and gmoccapy could give an indication of why one is working while the other is not?

@zz912
Copy link
Contributor Author

zz912 commented Jun 15, 2023

Sigma1912
Can I ask you for test?
Could you remove lines from point 7
#2453 (comment)
?

@Sigma1912
Copy link
Contributor

I'm running a RIP install and I presumed that I could just alter python files and restart the config to test, yet it seems that somehow it is not using the updated code? Surely I don't need to recompile for python code.

@zz912
Copy link
Contributor Author

zz912 commented Jun 15, 2023

When I make a change in the python file, I have to be in the terminal in the linuxcnc/src folder and I have to run make.

@Sigma1912
Copy link
Contributor

I see, I'll have to switch to another machine since this one has issues that give me errors while running make.

@Sigma1912
Copy link
Contributor

Just made a new rip install on a different computer but on that machine I cannot reproduce the issue at all unfortunately.

@zz912
Copy link
Contributor Author

zz912 commented Jun 16, 2023

Try again after some time, after restarting the PC. I've cheered many times that some modification of mine fixed the bug, but it always appeared.

@zz912
Copy link
Contributor Author

zz912 commented Jun 16, 2023

Hello everybody,

I feel like I've solved it again. However, I don't want to write here that I have a solution. Therefore, I will write that I have another theory of the cause of this bug.

It will be theory number 156:

Here si defined finish of MDI_command:

// determine when a MDI command actually finishes normally.
if (interp_list.len() == 0 &&
emcTaskCommand == 0 &&
emcStatus->task.execState == EMC_TASK_EXEC_DONE &&
emcStatus->task.interpState != EMC_TASK_INTERP_IDLE &&
emcStatus->motion.traj.queue == 0 &&
emcStatus->io.status == RCS_DONE &&
!mdi_execute_wait &&
!mdi_execute_next) {
// finished. Check for dequeuing of queued MDI command is done in emcTaskPlan().
if (emc_debug & EMC_DEBUG_TASK_ISSUE)
rcs_print("mdi_execute_hook: MDI command '%s' done (remaining: %d)\n",
emcStatus->task.command, mdi_input_queue.len());
emcStatus->task.command[0] = 0;
emcStatus->task.interpState = EMC_TASK_INTERP_IDLE;
}

I believe that one condition is missing to determine the end of MDI_command.
The condition halui_sent_mdi == 0 is missing there.

The halui_sent_mdi parameter is defined here:

static int halui_sent_mdi = 0;

A situation may arise:
1)emctaskmain.cc sets emcStatus->task.interpState = EMC_TASK_INTERP_IDLE
but halui_sent_mdi in halui.cc is 1
2) When emcStatus->task.interpState = EMC_TASK_INTERP_IDLE Gmoccapy wants to set G43 and therefore sets MDI mode
3) halui_sent_mdi in halui.cc has the value 1 and thus the old_mode setting is started

if (halui_sent_mdi) { // we have an ongoing MDI command
if (emcStatus->status == 1) { //which seems to have finished
halui_sent_mdi = 0;
switch (halui_old_mode) {
case EMC_TASK_MODE_MANUAL: sendManual();break;
case EMC_TASK_MODE_MDI: break;
case EMC_TASK_MODE_AUTO: sendAuto();break;
default: sendManual();break;
}
}
}

4) If old_mode was manual, we have the mode manual NOT MDI !!!
5) Gmoccapy execute G43 command in manual
6)
if (emcStatus->task.mode != EMC_TASK_MODE_MDI) {
emcOperatorError(0, _("Must be in MDI mode to issue MDI command"));
retval = -1;
break;

I would like to ask for help in verifying this theory.
I would like to add the condition halui_sent_mdi == 0 to finish MDI_command.

The problem is that the halui_sent_mdi parameter is in halui.cc and the finish MDI_command is in emctaskmain.cc.
I don't know how to transfer parameters between two files. I guess it has to be done through the header file somehow?

@zz912
Copy link
Contributor Author

zz912 commented Jun 27, 2023

@gmoccapy
Copy link
Collaborator

gmoccapy commented Oct 8, 2023

Sorry, I spend a whole day, trying to reproduce the error, but was not able on my Laptop (i7 with Linux Mint and actual kernel 6.2.0-43). I also use MDI commands for a 24 positions rack tool change on a real machine about 3 years and had never this problem.

It is very complicated to find a supposed misbehavior which only occur randomly and only on very few computers.
I will not spend more time on this at this moment. ReneDEV is actually rebuilding the complete tool handling in LinuxCNC and will throw out all the IOCONTROL stuff, so io will be gone in a short time.

By the way, are you using io or iov2? do not use iov2!

Closing this until getting a situation where reproducing is possible.

@gmoccapy gmoccapy closed this as completed Oct 8, 2023
@zz912
Copy link
Contributor Author

zz912 commented Oct 8, 2023

Hello Norbert,

It's a shame you can't simulate it.

ReneDEV is actually rebuilding the complete tool handling in LinuxCNC

Does this change apply to version 2.9.0 which is coming out soon?

Will the Rene-dev remake cover this issue as well?
#2489

Please Re-open this Issue.

I would like to ask you to believe me that this problem is in these lines:

if "G43" in self.active_gcodes and self.stat.task_mode != linuxcnc.MODE_AUTO:
self.command.mode(linuxcnc.MODE_MDI)
self.command.wait_complete()
self.command.mdi("G43")
self.command.wait_complete()

These lines cause race conditions. I realize that these lines have been in Gmoccapy since the beginning and no one has complained. Trust me, I'm a bug magnet.

Once I removed these lines from gmoccapy.py, my problems with the M61 and M6 disappeared, not only this problem.

I have several Issues open here on github regarding the M6 and M61. It makes a bit of a mess because I didn't know all these Issues had this cause.

In some other Issue, you wrote that it is LCNC's fault.
I don't want to disagree with you, but I think there are two ways to look at it.

First way:
Your code is OK and the problem is deeper in LCNC. I think, that this I can not solve. I was waiting for LinuxCNC Meetup in Stuttgart.

Second way:
We have to accept that the LCNC is imperfect. We introduce the rule:
"def _update_toolinfo(self, tool):" must not run any python interface command.
"def _update_toolinfo(self, tool):" is only for editing the GUI
I was hoping that fixing RETAIN_G43 in the INI file might allow these lines to be deleted.

I think that the conclusion of the Meetup in Stuttgart may not be a solution to this problem, but an opinion should be agreed whether it is a bug of LCNC or a bug of Gmoccapy (Python Interface)

Way1 or way2 can be correct. It is necessary to agree.

I'm sorry I couldn't make it to the Meetup in Stuttgart. My English is bad. I would like to know you, but I would not understand you.

@gmoccapy
Copy link
Collaborator

gmoccapy commented Oct 8, 2023

Rene's part will be in 2.10 not in 2.9!
The G43 problem can be reproduced, every time you enter a tool with Z-Offset of zero, the G43 will be canceled, that causes strange behavior. This for sure will be fixed, but mostly not in 2.9!

I know it is a shame not to be able to solve every problem, but that is unfortunately the fact. If disabling lines in gmoccapy code, that might be the way for you to go. I am sure that the lines do not cause the problem you discribed, as also with lower cycle time you reported the beehavior.

@zz912
Copy link
Contributor Author

zz912 commented Oct 8, 2023

Would it help if I somehow made my computer available to you, where the error can be simulated? I don't know how to do it right now, but I can keep my computer on all the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants