Merge pull request #582 from marioyc/add-comment-env-difference

Add comment on environment version difference
chainer · Nov 29, 2019 · a9fff4f · a9fff4f
2 parents 71e04ee + 4a9ddd2
commit a9fff4f
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 12 deletions.
diff --git a/examples/mujoco/reproduction/ppo/README.md b/examples/mujoco/reproduction/ppo/README.md
@@ -28,6 +28,7 @@ To view the full list of options, either view the code or run the example with t
 ## Known differences
 
 - While the original paper initialized weights by normal distribution (https://github.com/Breakend/baselines/blob/50ffe01d254221db75cdb5c2ba0ab51a6da06b0a/baselines/ppo1/mlp_policy.py#L28), we use orthogonal initialization as the latest openai/baselines does (https://github.com/openai/baselines/blob/9b68103b737ac46bc201dfb3121cfa5df2127e53/baselines/a2c/utils.py#L61).
+- We used version v2 of the environments whereas the original results were reported for version v1, however this doesn't seem to introduce significant differences: https://github.com/openai/gym/pull/834
 
 ## Results
 
@@ -41,12 +42,12 @@ ChainerRL scores are based on 20 trials using different random seeds, using the
 python train_ppo.py --gpu -1 --seed [0-19] --env [env]
 ```
 
-| Environment    | ChainerRL Score | Reported Score |
-| -------------- |:---------------:|:--------------:|
-| HalfCheetah-v2 |  **2404**+/-185 |     2201+/-323 |
-| Hopper-v2      |       2719+/-67 |  **2790**+/-62 |
-| Walker2d-v2    |      2994+/-113 |            N/A |
-| Swimmer-v2     |         111+/-4 |            N/A |
+| Environment | ChainerRL Score | Reported Score |
+| ----------- |:---------------:|:--------------:|
+| HalfCheetah |  **2404**+/-185 |     2201+/-323 |
+| Hopper      |       2719+/-67 |  **2790**+/-62 |
+| Walker2d    |      2994+/-113 |            N/A |
+| Swimmer     |         111+/-4 |            N/A |
 
 
 ### Training times

diff --git a/examples/mujoco/reproduction/trpo/README.md b/examples/mujoco/reproduction/trpo/README.md
@@ -25,19 +25,23 @@ python train_trpo.py [options]
 
 To view the full list of options, either view the code or run the example with the `--help` option.
 
+## Known differences
+
+- We used version v2 of the environments whereas the original results were reported for version v1, however this doesn't seem to introduce significant differences: https://github.com/openai/gym/pull/834
+
 ## Results
 
 These scores are evaluated by average return +/- standard error of 100 evaluation episodes after 2M training steps.
 
 Reported scores are taken from the row Table 1 of [Deep Reinforcement Learning that Matters](https://arxiv.org/abs/1709.06560).
 Here we try to reproduce TRPO (Schulman et al. 2017) of the (64, 64) column, which corresponds to the default settings.
 
-| Environment    | ChainerRL Score | Reported Score |
-| -------------- |:---------------:|:--------------:|
-| HalfCheetah-v2 |  **1474**+/-112 |      205+/-256 |
-| Hopper-v2      |   **3056**+/-44 |      2828+/-70 |
-| Walker2d-v2    |       3073+/-59 |            N/A |
-| Swimmer-v2     |        200+/-25 |            N/A |
+| Environment | ChainerRL Score | Reported Score |
+| ----------- |:---------------:|:--------------:|
+| HalfCheetah |  **1474**+/-112 |      205+/-256 |
+| Hopper      |   **3056**+/-44 |      2828+/-70 |
+| Walker2d    |       3073+/-59 |            N/A |
+| Swimmer     |        200+/-25 |            N/A |
 
 ### Learning Curves