Skip to content

Commit

Permalink
Merge pull request #582 from marioyc/add-comment-env-difference
Browse files Browse the repository at this point in the history
Add comment on environment version difference
  • Loading branch information
muupan committed Nov 29, 2019
2 parents 71e04ee + 4a9ddd2 commit a9fff4f
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 12 deletions.
13 changes: 7 additions & 6 deletions examples/mujoco/reproduction/ppo/README.md
Expand Up @@ -28,6 +28,7 @@ To view the full list of options, either view the code or run the example with t
## Known differences

- While the original paper initialized weights by normal distribution (https://github.com/Breakend/baselines/blob/50ffe01d254221db75cdb5c2ba0ab51a6da06b0a/baselines/ppo1/mlp_policy.py#L28), we use orthogonal initialization as the latest openai/baselines does (https://github.com/openai/baselines/blob/9b68103b737ac46bc201dfb3121cfa5df2127e53/baselines/a2c/utils.py#L61).
- We used version v2 of the environments whereas the original results were reported for version v1, however this doesn't seem to introduce significant differences: https://github.com/openai/gym/pull/834

## Results

Expand All @@ -41,12 +42,12 @@ ChainerRL scores are based on 20 trials using different random seeds, using the
python train_ppo.py --gpu -1 --seed [0-19] --env [env]
```

| Environment | ChainerRL Score | Reported Score |
| -------------- |:---------------:|:--------------:|
| HalfCheetah-v2 | **2404**+/-185 | 2201+/-323 |
| Hopper-v2 | 2719+/-67 | **2790**+/-62 |
| Walker2d-v2 | 2994+/-113 | N/A |
| Swimmer-v2 | 111+/-4 | N/A |
| Environment | ChainerRL Score | Reported Score |
| ----------- |:---------------:|:--------------:|
| HalfCheetah | **2404**+/-185 | 2201+/-323 |
| Hopper | 2719+/-67 | **2790**+/-62 |
| Walker2d | 2994+/-113 | N/A |
| Swimmer | 111+/-4 | N/A |


### Training times
Expand Down
16 changes: 10 additions & 6 deletions examples/mujoco/reproduction/trpo/README.md
Expand Up @@ -25,19 +25,23 @@ python train_trpo.py [options]

To view the full list of options, either view the code or run the example with the `--help` option.

## Known differences

- We used version v2 of the environments whereas the original results were reported for version v1, however this doesn't seem to introduce significant differences: https://github.com/openai/gym/pull/834

## Results

These scores are evaluated by average return +/- standard error of 100 evaluation episodes after 2M training steps.

Reported scores are taken from the row Table 1 of [Deep Reinforcement Learning that Matters](https://arxiv.org/abs/1709.06560).
Here we try to reproduce TRPO (Schulman et al. 2017) of the (64, 64) column, which corresponds to the default settings.

| Environment | ChainerRL Score | Reported Score |
| -------------- |:---------------:|:--------------:|
| HalfCheetah-v2 | **1474**+/-112 | 205+/-256 |
| Hopper-v2 | **3056**+/-44 | 2828+/-70 |
| Walker2d-v2 | 3073+/-59 | N/A |
| Swimmer-v2 | 200+/-25 | N/A |
| Environment | ChainerRL Score | Reported Score |
| ----------- |:---------------:|:--------------:|
| HalfCheetah | **1474**+/-112 | 205+/-256 |
| Hopper | **3056**+/-44 | 2828+/-70 |
| Walker2d | 3073+/-59 | N/A |
| Swimmer | 200+/-25 | N/A |

### Learning Curves

Expand Down

0 comments on commit a9fff4f

Please sign in to comment.