Match3 hyperparameters adjustments #4641

CubeMD · 2020-11-12T05:39:45Z

Proposed change(s)

I would like to suggest some changes to the hyperparameters for match3 environment. With previous settings, mean rewards for greedy heuristic and vector observations were around 37-38 which is quite close to each other in performance.

Batch and buffer sizes were lowered to increase the frequency of policy updates, as match3 is a game where just a few steps are required to evaluate a policy correctly
Beta has been increased to incentivize exploration.
The learning rate schedule has been changed to linear, however, I am not sure if it helped.
The size of the model has been increased.
The time horizon has been lowered as the actions are not affecting rewards that far in the future.

Using proposed values I was able to achieve scores of 40-41 with the same max steps. I have added a trained onnx file in the project folder.

Tensorflow graphs

Types of change(s)

Bug fix

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

*Changed some hyperparameters of Match3VectorObs behavior *Added accordingly named onnx file in match3 project folder

CLAassistant · 2020-11-12T05:39:50Z

All committers have signed the CLA.

chriselion · 2020-11-13T19:14:01Z

This looks great! I hadn't gotten a chance to play around with different hyperparameters after my initial attempt.

Personally I prefer constant learning rate, because if you see the reward flattens out, you can reduce the training steps to that point.

I'm going to retrain both vector and visual with these parameters over the weekend and will update the models on this branch (assuming I have push permissions).

CubeMD · 2020-11-15T04:18:09Z

I tried to train an uncompressed visual with these parameters but got this error every time around 2 millions.

Traceback (most recent call last):
  File "C:\Users\CubeMD\.conda\envs\ml-agents-repo\Scripts\mlagents-learn-script.py", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\learn.py", line 280, in main
    run_cli(parse_command_line())
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\learn.py", line 276, in run_cli
    run_training(run_seed, options)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\learn.py", line 153, in run_training
    tc.start_learning(env_manager)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\trainer_controller.py", line 176, in start_learning
    n_steps = self.advance(env_manager)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\trainer_controller.py", line 234, in advance
    new_step_infos = env_manager.get_steps()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\env_manager.py", line 112, in get_steps
    new_step_infos = self._step()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 255, in _step
    self._queue_steps()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 248, in _queue_steps
    env_action_info = self._take_step(env_worker.previous_step)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 348, in _take_step
    all_action_info[brain_name] = self.policies[brain_name].get_action(
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\policy\torch_policy.py", line 234, in get_action
    run_out = self.evaluate(
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\policy\torch_policy.py", line 203, in evaluate
    action, log_probs, entropy, memories = self.sample_actions(
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\policy\torch_policy.py", line 146, in sample_actions
    action_list = self.actor_critic.sample_action(dists)
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\torch\networks.py", line 306, in sample_action
    action = action_dist.sample()
  File "c:\users\cubemd\desktop\ml-agents-repo\ml-agents\mlagents\trainers\torch\distributions.py", line 100, in sample
    return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Could it be related to low buffer size with new sensor? I might be wrong but isn't policy clipping in #4649 related to it? Should I make a new issue for it? I'll train with original hyperparameters to see if I get this error again.

I allowed edits to this branch for maintainers, if that's what you mean by permissions. I'm a bit new to this.

CubeMD · 2020-11-16T15:40:04Z

I tried to train a few more and never had any problems with original hyperparameters. I couldn't find any significant difference between sensors in terms of how often this issue comes up. Also, I never reached the performance of my original post, so might have just been lucky.

chriselion · 2020-11-16T22:20:40Z

Using the same parameters in the PR (but constant learning rate), I hit the same exception after around 1.2M steps. Before that, both visual and vector were averaging about 39, which is already an improvement.

I'll dig into the exception some more; it's probably a bug in the torch trainers that we should get fixed ASAP.

chriselion · 2020-11-18T03:45:17Z

We're experimenting with a fix for the NaNs here: https://github.com/Unity-Technologies/ml-agents/pull/4664/files

CubeMD · 2020-11-20T21:33:18Z

Thank you for letting me know. I saw that the issue was fixed in release 10. I will do more runs this weekend.

miguelalonsojr · 2022-01-18T22:50:24Z

Closing PR as these hyperparameters have already been incorporated into the config.

Match3 hyperparameters change

2c37b89

*Changed some hyperparameters of Match3VectorObs behavior *Added accordingly named onnx file in match3 project folder

Added meta file

299f844

shihzy requested a review from chriselion November 12, 2020 17:28

Merge remote-tracking branch 'upstream/master'

e83612e

Base automatically changed from master to main February 25, 2021 19:16

miguelalonsojr requested review from maryamhonari and miguelalonsojr and removed request for chriselion and maryamhonari January 18, 2022 22:47

miguelalonsojr closed this Jan 18, 2022

github-actions bot locked as resolved and limited conversation to collaborators Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Match3 hyperparameters adjustments #4641

Match3 hyperparameters adjustments #4641

Uh oh!

CubeMD commented Nov 12, 2020

Uh oh!

CLAassistant commented Nov 12, 2020 •

edited

Loading

Uh oh!

chriselion commented Nov 13, 2020

Uh oh!

CubeMD commented Nov 15, 2020 •

edited

Loading

Uh oh!

CubeMD commented Nov 16, 2020

Uh oh!

chriselion commented Nov 16, 2020

Uh oh!

chriselion commented Nov 18, 2020

Uh oh!

CubeMD commented Nov 20, 2020

Uh oh!

miguelalonsojr commented Jan 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Match3 hyperparameters adjustments #4641

Match3 hyperparameters adjustments #4641

Uh oh!

Conversation

CubeMD commented Nov 12, 2020

Proposed change(s)

Tensorflow graphs

Types of change(s)

Checklist

Other comments

Uh oh!

CLAassistant commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chriselion commented Nov 13, 2020

Uh oh!

CubeMD commented Nov 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CubeMD commented Nov 16, 2020

Uh oh!

chriselion commented Nov 16, 2020

Uh oh!

chriselion commented Nov 18, 2020

Uh oh!

CubeMD commented Nov 20, 2020

Uh oh!

miguelalonsojr commented Jan 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Nov 12, 2020 •

edited

Loading

CubeMD commented Nov 15, 2020 •

edited

Loading