-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Closed
Labels
bugIssue describes a potential bug in ml-agents.Issue describes a potential bug in ml-agents.
Description
Hi.
I try to run sample project "tennis learning" but agents seems to be trained slowly.
The below result was obtained with the original hyper parameters.
[terminal's output]
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 1000. Mean Reward: 0.009. Std of Reward: 0.038. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 10000. Mean Reward: 0.030. Std of Reward: 0.053. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 20000. Mean Reward: 0.048. Std of Reward: 0.057. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 25000. Mean Reward: 0.057. Std of Reward: 0.063. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 30000. Mean Reward: 0.085. Std of Reward: 0.112. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 40000. Mean Reward: 0.978. Std of Reward: 0.927. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 45000. Mean Reward: 1.332. Std of Reward: 1.000. Training.
I noticed that both agentRb and ballRb refer to Agent's Rigidbody in TennisAgent.cs.
Here original script link
Line 28 agentRb = GetComponent<Rigidbody>();
Line 29 ballRb = GetComponent<Rigidbody>();
Line 30 var canvas = GameObject.Find(CanvasName);
Line 29, " ballRb = ball.GetComponent(); " is correct ?
According to the script context, agent needs ball's velocity to decide next action.
Fixed script gave below results, maybe this is going well.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 1000. Mean Reward: 0.006. Std of Reward: 0.034. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 10000. Mean Reward: 0.052. Std of Reward: 0.076. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 20000. Mean Reward: 0.153. Std of Reward: 0.148. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 25000. Mean Reward: 0.309. Std of Reward: 0.331. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 30000. Mean Reward: 0.833. Std of Reward: 0.728. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 40000. Mean Reward: 1.315. Std of Reward: 0.988. Training.
INFO:mlagents.trainers: tennis-0: TennisLearning: Step: 45000. Mean Reward: 1.408. Std of Reward: 1.075. Training.
Thank you.
Metadata
Metadata
Assignees
Labels
bugIssue describes a potential bug in ml-agents.Issue describes a potential bug in ml-agents.