Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading causes loss of variable when 'Refresh All' #895

Open
brta-jc opened this issue May 14, 2024 · 9 comments
Open

Threading causes loss of variable when 'Refresh All' #895

brta-jc opened this issue May 14, 2024 · 9 comments

Comments

@brta-jc
Copy link

brta-jc commented May 14, 2024

We've had a bug appear a few times which occurs when pressing the 'Refresh All' button in the file menu. 'Refresh libraries' works fine.
This will happen reliably ever time we hit refresh, even after closing and reopening the GUI and SM. Vanilla new state machines are fine, so something seems to be corrupting the state machines we are authoring.

2024-05-14 11:51:48:  VERBOSE - rafcon.utils.timer:  Profiler: __init__ (args: (<rafcon.gui.models.state_machine.StateMachineModel object at 0x782e3c0388b0>, <rafcon.core.state_machine.StateMachine object at 0x782e3743e020>); kwargs: {}); duration: 0.00923s
2024-05-14 11:51:48:    DEBUG - rafcon.gui.controllers.state_machines_editor:  Create new graphical editor for state machine with id 8
2024-05-14 11:51:48:  VERBOSE - rafcon.gui.controllers.graphical_editor_gaphas:  Time spent in init 0.0002772808074951172 seconds for state machine 8
2024-05-14 11:51:48:  VERBOSE - rafcon.gui.models.state_machine_manager:  Number of created state models 25
2024-05-14 11:51:48:  VERBOSE - rafcon.gui.controllers.graphical_editor_gaphas:  start setup canvas
2024-05-14 11:51:48:  VERBOSE - rafcon.gui.controllers.graphical_editor_gaphas:  Time spent in setup canvas 0.004022359848022461 state machine 8
2024-05-14 11:51:49:    DEBUG - rafcon.core.state_machine_manager:  Remove state machine with id 1
2024-05-14 11:51:49:    DEBUG - rafcon.gui.models.state_machine_manager:  Delete state machine model for state machine with id 1
TypeError: MenuBarController.on_refresh_libraries_activate() takes 0 positional arguments but 1 was given
2024-05-14 11:51:55:    DEBUG - rafcon.gui.views.utils.editor:  The editor style 'rafcon-dark' is not supported. Using the default 'classic'
2024-05-14 11:51:55:    DEBUG - rafcon.gui.views.utils.editor:  The editor style 'rafcon-dark' is not supported. Using the default 'classic'
2024-05-14 11:51:57:    DEBUG - rafcon.core.library_manager:  Initializing LibraryManager: Loading libraries ... 
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'advanced_examples' does not exist: /home/jason/.config/rafcon/examples/functionality_examples
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'generic' does not exist: /home/jason/.config/rafcon/${RAFCON_LIB_PATH}/generic
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'ros' does not exist: /home/jason/.config/rafcon/examples/libraries/ros_libraries
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'turtle_libraries' does not exist: /home/jason/.config/rafcon/examples/libraries/turtle_libraries
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'tutorials' does not exist: /home/jason/.config/rafcon/examples/tutorials
2024-05-14 11:51:57:    DEBUG - rafcon.core.library_manager:  Adding library 'ros_libraries' from /home/jason/.local/lib/python3.10/site-packages/rafcon/share/rafcon/examples/libraries/ros_libraries
2024-05-14 11:51:57:    DEBUG - rafcon.core.library_manager:  Adding library 'bmt_cc_state' from /home/jason/ros2/src/bmt/cooperative-cell/ros/bmt_cc_state
2024-05-14 11:51:57:    DEBUG - rafcon.core.library_manager:  Adding library 'libraries' from /home/jason/.local/lib/python3.10/site-packages/rafcon/share/rafcon/libraries
2024-05-14 11:51:57:    DEBUG - rafcon.core.library_manager:  Adding library 'bmt_common_skills' from /home/jason/ros2/src/bmt/bmt_common_skills
2024-05-14 11:51:57:    DEBUG - rafcon.core.library_manager:  Initialization of LibraryManager done
2024-05-14 11:51:57:     INFO - rafcon.gui.controllers.library_tree:  Libraries have been updated
2024-05-14 11:51:57:     INFO - rafcon.gui.controllers.library_tree:  Libraries have been updated
2024-05-14 11:51:57:    DEBUG - rafcon.core.state_machine_manager:  Remove state machine with id 8
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loading state machine from path /home/jason/ros2/src/bmt/bmt_common_skills/states/rf_bmt_init_core...
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Load state recursively: /home/jason/ros2/src/bmt/bmt_common_skills/states/rf_bmt_init_core/rf_bmt_init_core_KBXCPG
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Load state recursively: /home/jason/ros2/src/bmt/bmt_common_skills/states/rf_bmt_init_core/rf_bmt_init_core_KBXCPG/rf_init_ros_JXYWDD
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loading state machine from path /home/jason/ros2/src/bmt/bmt_common_skills/states/common/rf_init_ros...
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Load state recursively: /home/jason/ros2/src/bmt/bmt_common_skills/states/common/rf_init_ros/rf_init_ros_ZWPLGJ
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loaded state machine (/home/jason/ros2/src/bmt/bmt_common_skills/states/common/rf_init_ros) has 1 states. (Max hierarchy level 1)
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loaded state machine (/home/jason/ros2/src/bmt/bmt_common_skills/states/common/rf_init_ros) has 0 transitions.
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loaded state machine (/home/jason/ros2/src/bmt/bmt_common_skills/states/common/rf_init_ros) has 0 data flows.
2024-05-14 11:51:57:  VERBOSE - rafcon.utils.timer:  Profiler: load_state_machine_from_path (args: ('/home/jason/ros2/src/bmt/bmt_common_skills/states/common/rf_init_ros',); kwargs: {}); duration: 0.00101s
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loaded state machine (/home/jason/ros2/src/bmt/bmt_common_skills/states/rf_bmt_init_core) has 2 states. (Max hierarchy level 2)
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loaded state machine (/home/jason/ros2/src/bmt/bmt_common_skills/states/rf_bmt_init_core) has 2 transitions.
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  Loaded state machine (/home/jason/ros2/src/bmt/bmt_common_skills/states/rf_bmt_init_core) has 1 data flows.
2024-05-14 11:51:57:  VERBOSE - rafcon.utils.timer:  Profiler: load_state_machine_from_path (args: ('/home/jason/ros2/src/bmt/bmt_common_skills/states/rf_bmt_init_core', 8); kwargs: {}); duration: 0.0031s
2024-05-14 11:51:57:    DEBUG - rafcon.core.state_machine_manager:  Add new state machine with id 8
2024-05-14 11:51:57:    DEBUG - rafcon.gui.models.state_machine_manager:  Delete state machine model for state machine with id 8
2024-05-14 11:51:57:    DEBUG - rafcon.gui.models.state_machine_manager:  Add new state machine model ... 
2024-05-14 11:51:57:    DEBUG - rafcon.gui.models.state_machine_manager:  Create new state machine model for state machine with id 8
2024-05-14 11:51:57:    DEBUG - rafcon.gui.models.auto_backup:  The auto-backup for state-machine 8 is ENABLED and set to 'dynamic interval mode'
2024-05-14 11:51:57:    DEBUG - rafcon.gui.models.auto_backup:  Performing auto backup of state machine 8 to temp folder
2024-05-14 11:51:57:    DEBUG - rafcon.core.storage.storage:  State machine with id 8 was saved at /tmp/rafcon-jason/15849/runtime_backup/home/jason/ros2/src/bmt/bmt_common_skills/bmt_common_skills/states/rf_bmt_init_core
2024-05-14 11:51:57:  VERBOSE - rafcon.utils.timer:  Profiler: __init__ (args: (<rafcon.gui.models.state_machine.StateMachineModel object at 0x782e3743e800>, <rafcon.core.state_machine.StateMachine object at 0x782e57bcc100>); kwargs: {}); duration: 0.00736s
2024-05-14 11:51:57:  VERBOSE - rafcon.gui.models.state_machine_manager:  Number of created state models 28
Traceback (most recent call last):
  File "/home/jason/.local/lib/python3.10/site-packages/rafcon/gui/controllers/menu_bar.py", line 439, in on_refresh_all_activate
    gui_helper_state_machine.refresh_all(force=force)
  File "/home/jason/.local/lib/python3.10/site-packages/rafcon/gui/helpers/state_machine.py", line 948, in refresh_all
    state_machines_editor_ctrl.refresh_all_state_machines()
  File "/home/jason/.local/lib/python3.10/site-packages/rafcon/gui/controllers/state_machines_editor.py", line 499, in refresh_all_state_machines
    self.refresh_state_machines(list(self.model.state_machine_manager.state_machines.keys()))
  File "/home/jason/.local/lib/python3.10/site-packages/rafcon/gui/controllers/state_machines_editor.py", line 481, in refresh_state_machines
    self.rearrange_state_machines(page_num_by_sm_id)
  File "/home/jason/.local/lib/python3.10/site-packages/rafcon/gui/controllers/state_machines_editor.py", line 173, in rearrange_state_machines
    set_tab_label_texts(tab_label, state_machine_m, state_machine_m.state_machine.marked_dirty)
AttributeError: 'NoneType' object has no attribute 'marked_dirty'
^CException ignored in: <module 'threading' from '/usr/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1567, in _shutdown
    lock.acquire()

Any help diagnosing and preventing further corruption of our SM's would be very helpful!
Many thanks

@JohannesErnst
Copy link
Collaborator

JohannesErnst commented May 14, 2024

Hi @brta-jc,

thanks for the bug report! Looking at the console output it seems like refreshing of the SMs worked fine up until the point where RAFCON tries to re-arrange the tabs you had open in the GUI to the same constellation you had before pressing Refresh.
Apparently, it cannot grab one (or multiple) of the SMs from the self.tabs variable.

I noticed that before it is showing some warnings about a library path misconfiguration. Not sure if it is connected but surely would not hurt to set up these paths properly.

2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'advanced_examples' does not exist: /home/jason/.config/rafcon/examples/functionality_examples
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'generic' does not exist: /home/jason/.config/rafcon/${RAFCON_LIB_PATH}/generic
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'ros' does not exist: /home/jason/.config/rafcon/examples/libraries/ros_libraries
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'turtle_libraries' does not exist: /home/jason/.config/rafcon/examples/libraries/turtle_libraries
2024-05-14 11:51:57:  WARNING - rafcon.core.library_manager:  Configured path for library root key 'tutorials' does not exist: /home/jason/.config/rafcon/examples/tutorials

There is a small possibility this is the reason it cannot grab the SMs properly when refreshing (although I couldn't reproduce this right away). So likely you have to adjust your library paths in the config.yaml.

If this doesn't help, maybe you can send a minimal example of one of your SMs that causes the problem so I can reproduce it locally and debug from there (also via DM if you prefer)?

@sillkjc
Copy link

sillkjc commented May 14, 2024

Hi Johannes,
I think this minimal SM can reproduce the bug. Simply loading it and hitting refresh should work..hopefully won't be any library mapping problems.
bug.zip

@JohannesErnst
Copy link
Collaborator

Hi @sillkjc,

I tried to open it but it's using a library called rf_init_ros which apparently is located in your bmt_common_skills/states/common and I don't have. If I replace it with some dummy state, refreshing works perfectly fine for me. So maybe the problem lies in this rf_init_ros or the path where it's located?

@brta-jc
Copy link
Author

brta-jc commented May 14, 2024

rf_init_ros.zip
Oops, the internal state went to the reference of this library (attached). I don't think it is exclusively this state, as others in the SM also trigger the same failure when isolated in a new empty SM.

@JohannesErnst
Copy link
Collaborator

JohannesErnst commented May 14, 2024

So for me it works all fine. Opening and refreshing does not cause any problems (fyi, I placed the rf_init_ros in the same directory as the bug SM). Did you check whether all library paths are specified correctly in your setup?

What version of RAFCON are you using?

@brta-hj
Copy link

brta-hj commented May 16, 2024

Hi @JohannesErnst,
I think our problem spans from using threading. I've tried to create another example that recreates the problem we face.
bug_test.zip

See the main function in the python script - rf we use the multiprocessing code it works, if we use the threading code it fails.

The failure in this example is a bit different to the one above, but I'm guessing could be the cause of that failure too?

@JohannesErnst
Copy link
Collaborator

Hi @brta-hj,

I quickly tested your setup and expectantly run into the same error. But, as you already mentioned, I think it's more of a threading problem than a RAFCON problem then. As mentioned here #895 (comment), the problem seems to be that somehow the state_machine_m variable gets lost and this line then returns None:

state_machine_m = self.tabs[sm_id]['state_machine_m']

I recommend to setup some debug environment and try to find where the variable gets lost and in how far it is connected to the threading. Unfortunately, I cannot be of more help here as I haven't tried working with multiprocessing or threading yet (see #893 (comment)).

Is the only reason why you use multiprocessing / threading to prevent your launch from blocking (as mentioned here #893 (comment))? In this case I would recommend to maybe re-think the startup procedure. Or is there another reason why you want to use multiprocessing?

@brta-hj
Copy link

brta-hj commented May 30, 2024

Hi @JohannesErnst,

Sorry for the delayed response. Yes, the main reason for now is to prevent the launch from blocking. However we are also looking into spawning parallel instance(s) of RAFCON from an already running state machine (not sure if this is possible via the API, but it looks like I can run more than one instance of the GUI and different state machines at the same time if launched from separate terminal windows).

With regard to the error from the previous post, I tried to debug it and looks like state_machine_m exists but it does not seem to contain a .state_machine variable, which is why it fails at the following line when trying to access state_machine_m.state_machine.marked_dirty:

set_tab_label_texts(tab_label, state_machine_m, state_machine_m.state_machine.marked_dirty)

For now we have just pushed ahead with threading as it allows us to use the state introspection functionality (wasn't working with multiprocessing), and we just try to avoid refreshing the state machine/libraries (relaunching RAFCON if we need to).

@JohannesErnst
Copy link
Collaborator

JohannesErnst commented Jun 3, 2024

Thanks for the follow up. I'm glad you found a workaround for now!

To summarize: Somehow, a variable (concretely state_machine_m.state_machine) gets lost or deleted which is notable when doing "Refresh all" but only when using threading python library to start up the core process.

I hope to come back to this when starting to work with multiprocessing / threading and RAFCON in the future as well and keep you posted on any updates on the issue here. If you find any more clues please post any information here as well.

I will rename the issue to better describe the error.

@JohannesErnst JohannesErnst changed the title State machine corruption preventing 'Refresh All' Multiprocessing causes loss of variable when 'Refresh All' Jun 3, 2024
@JohannesErnst JohannesErnst changed the title Multiprocessing causes loss of variable when 'Refresh All' Threading causes loss of variable when 'Refresh All' Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants