@@ -5,9 +5,9 @@ Benchmark
5
5
Mujoco Benchmark
6
6
----------------
7
7
8
- Tianshou's Mujoco benchmark contains state-of-the-art results (even better than ` SpinningUp < https://spinningup.openai.com/en/latest/spinningup/bench.html >`_!) .
8
+ Tianshou's Mujoco benchmark contains state-of-the-art results.
9
9
10
- Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
10
+ Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.
11
11
12
12
.. raw :: html
13
13
@@ -18,6 +18,78 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
18
18
<br >
19
19
</center >
20
20
21
+ The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf >`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf >`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf >`_, `ACKTR paper <https://arxiv.org/abs/1708.05144 >`_, `OpenAI Baselines <https://github.com/openai/baselines >`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html >`_.
22
+
23
+ +---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
24
+ | Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum|
25
+ +=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
26
+ | DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 |
27
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
28
+ | |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**|
29
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
30
+ | |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1** |/ |/ |-4.0 |**1000.0**|8370.0 |
31
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
32
+ | |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ |
33
+ +---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
34
+ | TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
35
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
36
+ | |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8** |/ |/ |-3.6 |**1000.0**|9337.5 |
37
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
38
+ | |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ |
39
+ +---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
40
+ | SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
41
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
42
+ | |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ |
43
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
44
+ | |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 |
45
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
46
+ | |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ |
47
+ +---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
48
+ | A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
49
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
50
+ | |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 |
51
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
52
+ | |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 |
53
+ +---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
54
+ | PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**|
55
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
56
+ | |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 |
57
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
58
+ | |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 |
59
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
60
+ | |OpenAI Baselines |/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 |
61
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
62
+ | |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ |
63
+ +---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
64
+ | TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**|
65
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
66
+ | |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 |
67
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
68
+ | |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 |
69
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
70
+ | |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 |
71
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
72
+ | |OpenAI Baselines |/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 |
73
+ + +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
74
+ | |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ |
75
+ +---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
76
+
77
+ Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
78
+ up to 48 CPU cores (at most one CPU core for each thread).
79
+
80
+ ========= ========= ============ ============== ============ ============== ==========
81
+ Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
82
+ ========= ========= ============ ============== ============ ============== ==========
83
+ DDPG 1 2.9h 12.0 80.2 2.4 5.4
84
+ TD3 1 3.3h 11.4 81.7 1.7 5.2
85
+ SAC 1 5.2h 10.9 83.8 1.8 3.5
86
+ REINFORCE 64 4min 84.9 1.8 12.5 0.8
87
+ A2C 16 7min 62.5 28.0 6.6 2.9
88
+ PPO 64 24min 11.4 85.3 3.2 0.2
89
+ NPG 16 7min 65.1 24.9 9.5 0.6
90
+ TRPO 16 7min 62.9 26.5 10.1 0.6
91
+ ========= ========= ============ ============== ============ ============== ==========
92
+
21
93
22
94
Atari Benchmark
23
95
---------------
0 commit comments