You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey @colinskow , I have implemented ars.py for the bipedal problem. The score on 1500 iteration is around 330.
In each step, in the training loop we explore once using below code
# Play an episode with the new weights and print the score
reward_evaluation = self.explore()
Now I have saved the theta at 1500 iterations and also all the other parameter.
Next I have initialized the theta with this pretrained theta while creating an instance of Policy() class and explored 10 times, but score is around 6.23 not even closer to 330.
Can you tell me why is this happening.
Each time in Explore() we do self.env.reset() so just restarts the env but why reward from the explore function, when called from inside of training loop and manually call explore function is so different.
Let me know if my query is not clear, thanks.
The text was updated successfully, but these errors were encountered:
When I try to use this in a "production env" it fail and I did the same as you.
I saved all the params(n, mean, mean_diff, var) and theta and loaded into another instance of police, but never get the reward that was trained.
Hey @colinskow , I have implemented
ars.py
for the bipedal problem. The score on1500
iteration is around330
.In each step, in the training loop we
explore
once using below codeNow I have saved the theta at 1500 iterations and also all the other parameter.
Next I have initialized the
theta
with this pretrained theta while creating an instance ofPolicy()
class and explored 10 times, but score is around6.23
not even closer to330
.Can you tell me why is this happening.
Each time in
Explore()
we doself.env.reset()
so just restarts the env but why reward from theexplore
function, when called from inside of training loop and manually callexplore
function is so different.Let me know if my query is not clear, thanks.
The text was updated successfully, but these errors were encountered: