Multiple Unity instances stops #4194

ProxyCausal · 2020-07-03T17:30:52Z

Unity environment worker stops almost immediately after running when using num-envs > 1.

Note:
Slightly unrelated issue, but I can't make a separate post about it since it only happens on my custom environment. When I created multiple training areas and start training, my agent won't move after pressing play. I am using a separate controller script instead of heuristics mode, and I can manually control with that. In my original scene with only one copy, the opposite is true: I can train but not manually control. When I press play the position moves, but it is not moving in editor and I can't control. Not sure why this is, but I suspect somehow it is still being controlled by ml-agents.

To Reproduce
Steps to reproduce the behavior:

CD into ml-agents directory, run mlagents-learn config/ppo/WallJump.yaml --run-id=WallJumper --num-envs=2

Console logs / stack traces

WARNING:tensorflow:From c:\users\gdev\anaconda3\envs\rl\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From c:\users\gdev\anaconda3\envs\rl\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-07-03 13:14:48 INFO [subprocess_env_manager.py:191] UnityEnvironment worker 1: environment stopping.
2020-07-03 13:14:48 INFO [environment.py:199] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
2020-07-03 13:14:58 INFO [environment.py:108] Connected to Unity environment with package version 1.1.0-preview and communication version 1.0.0
2020-07-03 13:14:58 INFO [environment.py:265] Connected new brain:
SmallWallJump?team=0
2020-07-03 13:14:58 INFO [trainer_controller.py:234] Learning was interrupted. Please wait while the graph is generated.
2020-07-03 13:14:58 INFO [trainer_controller.py:108] Saved Model
Traceback (most recent call last):
  File "c:\users\gdev\anaconda3\envs\rl\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\gdev\anaconda3\envs\rl\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\gdev\anaconda3\envs\rl\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\learn.py", line 322, in main
    run_cli(parse_command_line())
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\learn.py", line 318, in run_cli
    run_training(run_seed, options)
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\learn.py", line 163, in run_training
    tc.start_learning(env_manager)
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\trainer_controller.py", line 243, in start_learning
    raise ex
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\trainer_controller.py", line 214, in start_learning
    self._reset_env(env_manager)
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\trainer_controller.py", line 150, in _reset_env
    env.reset(config=sampled_reset_param)
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\env_manager.py", line 67, in reset
    self.first_step_infos = self._reset_env(config)
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 295, in _reset_env
    ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
  File "c:\users\gdev\anaconda3\envs\rl\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 92, in recv
    raise env_exception
mlagents_envs.exception.UnityEnvironmentException: If the environment name is None, the worker-id must be 0 in order to connect with the Editor.

Environment:

Unity Version: 2019.3.5f1
OS + version: Windows 10
ML-Agents version: 0.17.9
TensorFlow version: 2.0.1
Environment: WallJump

The text was updated successfully, but these errors were encountered:

chriselion · 2020-07-03T18:31:39Z

mlagents-learn config/ppo/WallJump.yaml --run-id=WallJumper --num-envs=2

This isn't a valid combination of commandline parameters: you either need to omit --num-envs (if you want to connect to the editor) or add --env and point it to an executable.

We need to validate the arguments better and raise a clearer error message in this case.

Note that --num-envs refers to the number of instances of the executable to run, not the number of training areas in the scene.

ProxyCausal · 2020-07-03T18:44:01Z

We need to validate the arguments better and raise a clearer error message in this case.

Yes, it wasn't clear to me that I needed the executable from the training docs (I should've thought of it though).

Note that --num-envs refers to the number of instances of the executable to run, not the number of training areas in the scene.

I know, I only went the multiple instance way after the problems I was facing with multiple training areas as I noted in my post

chriselion · 2020-07-07T17:03:33Z

I logged the --num-envs bug as MLA-1145 in our internal tracker.

If you're still having problems with multiple areas, can you post it on the ML-Agents forum (https://forum.unity.com/forums/ml-agents.453/) ? That's a better venue for that type of questions.

github-actions · 2021-01-29T20:08:57Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ProxyCausal added the bug Issue describes a potential bug in ml-agents. label Jul 3, 2020

ProxyCausal changed the title ~~Multiple Unity Instances stops~~ Multiple Unity instances stops Jul 3, 2020

chriselion mentioned this issue Jul 8, 2020

don't allow --num-envs >1 with no --env #4203

Merged

10 tasks

vincentpierre assigned dongruoping Oct 16, 2020

dongruoping closed this as completed Oct 16, 2020

github-actions bot locked as resolved and limited conversation to collaborators Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Unity instances stops #4194

Multiple Unity instances stops #4194

ProxyCausal commented Jul 3, 2020

chriselion commented Jul 3, 2020

ProxyCausal commented Jul 3, 2020

chriselion commented Jul 7, 2020

github-actions bot commented Jan 29, 2021

Multiple Unity instances stops #4194

Multiple Unity instances stops #4194

Comments

ProxyCausal commented Jul 3, 2020

chriselion commented Jul 3, 2020

ProxyCausal commented Jul 3, 2020

chriselion commented Jul 7, 2020

github-actions bot commented Jan 29, 2021