ARS agents score not good on exploring #3

ar8372 · 2022-06-24T06:57:50Z

Hey @colinskow , I have implemented ars.py for the bipedal problem. The score on 1500 iteration is around 330.
In each step, in the training loop we explore once using below code

            # Play an episode with the new weights and print the score
            reward_evaluation = self.explore()

Now I have saved the theta at 1500 iterations and also all the other parameter.
Next I have initialized the theta with this pretrained theta while creating an instance of Policy() class and explored 10 times, but score is around 6.23 not even closer to 330.
Can you tell me why is this happening.

Each time in Explore() we do self.env.reset() so just restarts the env but why reward from the explore function, when called from inside of training loop and manually call explore function is so different.

Let me know if my query is not clear, thanks.

The text was updated successfully, but these errors were encountered:

vfsousas · 2022-06-27T13:17:16Z

Hello @ar8372 I have the same problem...

When I try to use this in a "production env" it fail and I did the same as you.
I saved all the params(n, mean, mean_diff, var) and theta and loaded into another instance of police, but never get the reward that was trained.

ar8372 mentioned this issue Jun 25, 2022

Training on BipedalWalkerHardcore seems to result in a negative reward modestyachts/ARS#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARS agents score not good on exploring #3

ARS agents score not good on exploring #3

ar8372 commented Jun 24, 2022 •

edited

Loading

vfsousas commented Jun 27, 2022

ARS agents score not good on exploring #3

ARS agents score not good on exploring #3

Comments

ar8372 commented Jun 24, 2022 • edited Loading

vfsousas commented Jun 27, 2022

ar8372 commented Jun 24, 2022 •

edited

Loading