Skip to content

Conversation

@CubeMD
Copy link

@CubeMD CubeMD commented Nov 12, 2020

Proposed change(s)

I would like to suggest some changes to the hyperparameters for match3 environment. With previous settings, mean rewards for greedy heuristic and vector observations were around 37-38 which is quite close to each other in performance.

  • Batch and buffer sizes were lowered to increase the frequency of policy updates, as match3 is a game where just a few steps are required to evaluate a policy correctly

  • Beta has been increased to incentivize exploration.

  • The learning rate schedule has been changed to linear, however, I am not sure if it helped.

  • The size of the model has been increased.

  • The time horizon has been lowered as the actions are not affecting rewards that far in the future.

Using proposed values I was able to achieve scores of 40-41 with the same max steps. I have added a trained onnx file in the project folder.

Tensorflow graphs

image

Types of change(s)

  • Bug fix

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

*Changed some hyperparameters of Match3VectorObs behavior
*Added accordingly named onnx file in match3 project folder
@CLAassistant
Copy link

CLAassistant commented Nov 12, 2020

CLA assistant check
All committers have signed the CLA.

@shihzy shihzy requested a review from chriselion November 12, 2020 17:28
@chriselion
Copy link
Contributor

This looks great! I hadn't gotten a chance to play around with different hyperparameters after my initial attempt.

Personally I prefer constant learning rate, because if you see the reward flattens out, you can reduce the training steps to that point.

I'm going to retrain both vector and visual with these parameters over the weekend and will update the models on this branch (assuming I have push permissions).

@CubeMD
Copy link
Author

CubeMD commented Nov 15, 2020

I tried to train an uncompressed visual with these parameters but got this error every time around 2 millions.

Traceback (most recent call last):
  File "C:\Users\CubeMD\.conda\envs\ml-agents-repo\Scripts\mlagents-learn-script.py", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\learn.py", line 280, in main
    run_cli(parse_command_line())
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\learn.py", line 276, in run_cli
    run_training(run_seed, options)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\learn.py", line 153, in run_training
    tc.start_learning(env_manager)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\trainer_controller.py", line 176, in start_learning
    n_steps = self.advance(env_manager)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\trainer_controller.py", line 234, in advance
    new_step_infos = env_manager.get_steps()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\env_manager.py", line 112, in get_steps
    new_step_infos = self._step()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 255, in _step
    self._queue_steps()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 248, in _queue_steps
    env_action_info = self._take_step(env_worker.previous_step)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 348, in _take_step
    all_action_info[brain_name] = self.policies[brain_name].get_action(
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\policy\torch_policy.py", line 234, in get_action
    run_out = self.evaluate(
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\policy\torch_policy.py", line 203, in evaluate
    action, log_probs, entropy, memories = self.sample_actions(
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\policy\torch_policy.py", line 146, in sample_actions
    action_list = self.actor_critic.sample_action(dists)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\torch\networks.py", line 306, in sample_action
    action = action_dist.sample()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\torch\distributions.py", line 100, in sample
    return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Could it be related to low buffer size with new sensor? I might be wrong but isn't policy clipping in #4649 related to it? Should I make a new issue for it? I'll train with original hyperparameters to see if I get this error again.

I allowed edits to this branch for maintainers, if that's what you mean by permissions. I'm a bit new to this.

@CubeMD
Copy link
Author

CubeMD commented Nov 16, 2020

I tried to train a few more and never had any problems with original hyperparameters. I couldn't find any significant difference between sensors in terms of how often this issue comes up. Also, I never reached the performance of my original post, so might have just been lucky.

@chriselion
Copy link
Contributor

Using the same parameters in the PR (but constant learning rate), I hit the same exception after around 1.2M steps. Before that, both visual and vector were averaging about 39, which is already an improvement.

I'll dig into the exception some more; it's probably a bug in the torch trainers that we should get fixed ASAP.

@chriselion
Copy link
Contributor

We're experimenting with a fix for the NaNs here: https://github.com/Unity-Technologies/ml-agents/pull/4664/files

@CubeMD
Copy link
Author

CubeMD commented Nov 20, 2020

Thank you for letting me know. I saw that the issue was fixed in release 10. I will do more runs this weekend.

Base automatically changed from master to main February 25, 2021 19:16
@miguelalonsojr miguelalonsojr requested review from maryamhonari and miguelalonsojr and removed request for chriselion and maryamhonari January 18, 2022 22:47
@miguelalonsojr
Copy link
Contributor

Closing PR as these hyperparameters have already been incorporated into the config.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants