Skip to content

Commit 901a627

Browse files
authoredApr 23, 2022
Update Mujoco Bemchmark's webpage (thu-ml#606)
1 parent aed60c9 commit 901a627

File tree

3 files changed

+79
-3
lines changed

3 files changed

+79
-3
lines changed
 

Diff for: ‎docs/spelling_wordlist.txt

+4
Original file line numberDiff line numberDiff line change
@@ -150,3 +150,7 @@ ppo
150150
Jupyter
151151
Colab
152152
Colaboratory
153+
IPendulum
154+
Reacher
155+
Runtime
156+
Nvidia

Diff for: ‎docs/tutorials/benchmark.rst

+74-2
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ Benchmark
55
Mujoco Benchmark
66
----------------
77

8-
Tianshou's Mujoco benchmark contains state-of-the-art results (even better than `SpinningUp <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_!).
8+
Tianshou's Mujoco benchmark contains state-of-the-art results.
99

10-
Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
10+
Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.
1111

1212
.. raw:: html
1313

@@ -18,6 +18,78 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
1818
<br>
1919
</center>
2020

21+
The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf>`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf>`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf>`_, `ACKTR paper <https://arxiv.org/abs/1708.05144>`_, `OpenAI Baselines <https://github.com/openai/baselines>`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_.
22+
23+
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
24+
|Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum|
25+
+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
26+
|DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 |
27+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
28+
| |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**|
29+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
30+
| |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1**|/ |/ |-4.0 |**1000.0**|8370.0 |
31+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
32+
| |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ |
33+
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
34+
|TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
35+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
36+
| |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8**|/ |/ |-3.6 |**1000.0**|9337.5 |
37+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
38+
| |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ |
39+
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
40+
|SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
41+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
42+
| |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ |
43+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
44+
| |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 |
45+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
46+
| |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ |
47+
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
48+
|A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
49+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
50+
| |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 |
51+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
52+
| |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 |
53+
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
54+
|PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**|
55+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
56+
| |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 |
57+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
58+
| |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 |
59+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
60+
| |OpenAI Baselines|/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 |
61+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
62+
| |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ |
63+
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
64+
|TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**|
65+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
66+
| |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 |
67+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
68+
| |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 |
69+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
70+
| |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 |
71+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
72+
| |OpenAI Baselines|/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 |
73+
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
74+
| |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ |
75+
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
76+
77+
Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
78+
up to 48 CPU cores (at most one CPU core for each thread).
79+
80+
========= ========= ============ ============== ============ ============== ==========
81+
Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
82+
========= ========= ============ ============== ============ ============== ==========
83+
DDPG 1 2.9h 12.0 80.2 2.4 5.4
84+
TD3 1 3.3h 11.4 81.7 1.7 5.2
85+
SAC 1 5.2h 10.9 83.8 1.8 3.5
86+
REINFORCE 64 4min 84.9 1.8 12.5 0.8
87+
A2C 16 7min 62.5 28.0 6.6 2.9
88+
PPO 64 24min 11.4 85.3 3.2 0.2
89+
NPG 16 7min 65.1 24.9 9.5 0.6
90+
TRPO 16 7min 62.9 26.5 10.1 0.6
91+
========= ========= ============ ============== ============ ============== ==========
92+
2193

2294
Atari Benchmark
2395
---------------

Diff for: ‎examples/mujoco/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ For pretrained agents, detailed graphs (single agent, single game) and log detai
247247

248248
### TRPO
249249

250-
| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
250+
| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (Tensorflow)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
251251
| :--------------------: | :---------------: | :-------------------------------------------------: | :-----------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
252252
| Ant | **2866.7±707.9** | ~0 | N | N | ~150 |
253253
| HalfCheetah | **4471.2±804.9** | ~400 | ~0 | ~1350 | ~850 |

0 commit comments

Comments
 (0)
Please sign in to comment.