-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Match3 hyperparameters adjustments #4641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
*Changed some hyperparameters of Match3VectorObs behavior *Added accordingly named onnx file in match3 project folder
|
This looks great! I hadn't gotten a chance to play around with different hyperparameters after my initial attempt. Personally I prefer constant learning rate, because if you see the reward flattens out, you can reduce the training steps to that point. I'm going to retrain both vector and visual with these parameters over the weekend and will update the models on this branch (assuming I have push permissions). |
|
I tried to train an uncompressed visual with these parameters but got this error every time around 2 millions. Could it be related to low buffer size with new sensor? I might be wrong but isn't policy clipping in #4649 related to it? Should I make a new issue for it? I'll train with original hyperparameters to see if I get this error again. I allowed edits to this branch for maintainers, if that's what you mean by permissions. I'm a bit new to this. |
|
I tried to train a few more and never had any problems with original hyperparameters. I couldn't find any significant difference between sensors in terms of how often this issue comes up. Also, I never reached the performance of my original post, so might have just been lucky. |
|
Using the same parameters in the PR (but constant learning rate), I hit the same exception after around 1.2M steps. Before that, both visual and vector were averaging about 39, which is already an improvement. I'll dig into the exception some more; it's probably a bug in the torch trainers that we should get fixed ASAP. |
|
We're experimenting with a fix for the NaNs here: https://github.com/Unity-Technologies/ml-agents/pull/4664/files |
|
Thank you for letting me know. I saw that the issue was fixed in release 10. I will do more runs this weekend. |
|
Closing PR as these hyperparameters have already been incorporated into the config. |
Proposed change(s)
I would like to suggest some changes to the hyperparameters for match3 environment. With previous settings, mean rewards for greedy heuristic and vector observations were around 37-38 which is quite close to each other in performance.
Batch and buffer sizes were lowered to increase the frequency of policy updates, as match3 is a game where just a few steps are required to evaluate a policy correctly
Beta has been increased to incentivize exploration.
The learning rate schedule has been changed to linear, however, I am not sure if it helped.
The size of the model has been increased.
The time horizon has been lowered as the actions are not affecting rewards that far in the future.
Using proposed values I was able to achieve scores of 40-41 with the same max steps. I have added a trained onnx file in the project folder.
Tensorflow graphs
Types of change(s)
Checklist
Other comments