Skip to content

Commit

Permalink
Initial commit.
Browse files Browse the repository at this point in the history
  • Loading branch information
danijar committed Jan 27, 2020
0 parents commit 02f0210
Show file tree
Hide file tree
Showing 10 changed files with 1,973 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
__pycache__/
*.py[cod]
*.egg-info
./dist
./logdir
MUJOCO_LOG.TXT
19 changes: 19 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2020 Danijar Hafner

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
63 changes: 63 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Dream to Control

Fast and simple implementation of the Dreamer agent in TensorFlow 2.

<img width="100%" src="https://imgur.com/x4NUHXl.gif">

If you find this code useful, please reference in your paper:

```
@article{hafner2019dreamer,
title={Dream to Control: Learning Behaviors by Latent Imagination},
author={Hafner, Danijar and Lillicrap, Timothy and Ba, Jimmy and Norouzi, Mohammad},
journal={arXiv preprint arXiv:1912.01603},
year={2019}
}
```

## Method

![Dreamer](https://imgur.com/JrXC4rh.png)

Dreamer learns a world model that predicts ahead in a compact feature space.
From imagined feature sequences, it learns a policy and state-value function.
The value gradients are backpropagated through the multi-step predictions to
efficiently learn a long-horizon policy.

- [Project website][website]
- [Research paper][paper]
- [Official implementation][code] (TensorFlow 1)

[website]: https://danijar.com/dreamer
[paper]: https://arxiv.org/pdf/1912.01603.pdf
[code]: https://github.com/google-research/dreamer

## Instructions

Get dependencies:

```
pip3 install --user tensorflow-gpu==2.1.0
pip3 install --user tensorflow_probability
pip3 install --user git+git://github.com/deepmind/dm_control.git
pip3 install --user pandas
pip3 install --user matplotlib
```

Train the agent:

```
python3 dreamer.py --logdir ./logdir/dmc_walker_walk/dreamer/1 --task dmc_walker_walk
```

Generate plots:

```
python3 plotting.py --indir ./logdir --outdir ./plots --xaxis step --yaxis test/return --bins 3e4
```

Graphs and GIFs:

```
tensorboard --logdir ./logdir
```
1 change: 1 addition & 0 deletions baselines/atari.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"atari_alien": {"dqn_sticky_2e8": 2078.4279476, "c51_sticky_2e8": 2693.6, "rainbow_sticky_2e8": 3231.34615385, "iqn_sticky_2e8": 3491.18012422}, "atari_crazy_climber": {"dqn_sticky_2e8": 97140.3225806, "c51_sticky_2e8": 138612.5, "rainbow_sticky_2e8": 142240.0, "iqn_sticky_2e8": 136757.142857}, "atari_air_raid": {"dqn_sticky_2e8": 7653.48101266, "c51_sticky_2e8": 8211.84210526, "rainbow_sticky_2e8": 9241.23376623, "iqn_sticky_2e8": 10841.3934426}, "atari_phoenix": {"dqn_sticky_2e8": 4974.69387755, "c51_sticky_2e8": 6417.7027027, "rainbow_sticky_2e8": 9991.78217822, "iqn_sticky_2e8": 4888.58064516}, "atari_tennis": {"dqn_sticky_2e8": -1.1, "c51_sticky_2e8": 21.3644859813, "rainbow_sticky_2e8": -0.9, "iqn_sticky_2e8": 22.3495934959}, "atari_assault": {"dqn_sticky_2e8": 1445.67916667, "c51_sticky_2e8": 1700.70149254, "rainbow_sticky_2e8": 3726.60747664, "iqn_sticky_2e8": 4817.38823529}, "atari_hero": {"dqn_sticky_2e8": 18404.7715736, "c51_sticky_2e8": 36199.2134831, "rainbow_sticky_2e8": 43307.9746835, "iqn_sticky_2e8": 34965.2232143}, "atari_name_this_game": {"dqn_sticky_2e8": 7656.2745098, "c51_sticky_2e8": 12820.2564103, "rainbow_sticky_2e8": 8926.66666667, "iqn_sticky_2e8": 6456.14035088}, "atari_pong": {"dqn_sticky_2e8": 18.976, "c51_sticky_2e8": 19.352, "rainbow_sticky_2e8": 19.8, "iqn_sticky_2e8": 19.8}, "atari_journey_escape": {"dqn_sticky_2e8": -3173.15634218, "c51_sticky_2e8": -2921.23893805, "rainbow_sticky_2e8": -995.575221239, "iqn_sticky_2e8": -1580.2359882}, "atari_skiing": {"dqn_sticky_2e8": -15546.2422907, "c51_sticky_2e8": -23051.4230769, "rainbow_sticky_2e8": -29766.5245902, "iqn_sticky_2e8": -11528.7634409}, "atari_ice_hockey": {"dqn_sticky_2e8": -5.44594594595, "c51_sticky_2e8": -5.14705882353, "rainbow_sticky_2e8": -0.763157894737, "iqn_sticky_2e8": -5.74285714286}, "atari_solaris": {"dqn_sticky_2e8": 1434.0, "c51_sticky_2e8": 2032.66666667, "rainbow_sticky_2e8": 2082.85714286, "iqn_sticky_2e8": 855.714285714}, "atari_double_dunk": {"dqn_sticky_2e8": -3.75, "c51_sticky_2e8": -12.0, "rainbow_sticky_2e8": 22.04, "iqn_sticky_2e8": 20.1782178218}, "atari_zaxxon": {"dqn_sticky_2e8": 3837.93103448, "c51_sticky_2e8": 6385.43046358, "rainbow_sticky_2e8": 15561.9047619, "iqn_sticky_2e8": 11234.9514563}, "atari_boxing": {"dqn_sticky_2e8": 80.0922509225, "c51_sticky_2e8": 84.0551948052, "rainbow_sticky_2e8": 97.9354207436, "iqn_sticky_2e8": 97.5274725275}, "atari_enduro": {"dqn_sticky_2e8": 577.696969697, "c51_sticky_2e8": 1227.88888889, "rainbow_sticky_2e8": 2252.9, "iqn_sticky_2e8": 2213.0}, "atari_freeway": {"dqn_sticky_2e8": 32.8861788618, "c51_sticky_2e8": 33.6097560976, "rainbow_sticky_2e8": 33.6666666667, "iqn_sticky_2e8": 33.6016260163}, "atari_pooyan": {"dqn_sticky_2e8": 3306.92307692, "c51_sticky_2e8": 3035.44444444, "rainbow_sticky_2e8": 4574.27835052, "iqn_sticky_2e8": 4300.0}, "atari_gravitar": {"dqn_sticky_2e8": 453.461538462, "c51_sticky_2e8": 763.55721393, "rainbow_sticky_2e8": 1266.75603217, "iqn_sticky_2e8": 1078.70619946}, "atari_space_invaders": {"dqn_sticky_2e8": 1795.94488189, "c51_sticky_2e8": 3832.03488372, "rainbow_sticky_2e8": 2490.0, "iqn_sticky_2e8": 4176.11764706}, "atari_seaquest": {"dqn_sticky_2e8": 3136.69064748, "c51_sticky_2e8": 10211.0526316, "rainbow_sticky_2e8": 5822.125, "iqn_sticky_2e8": 15299.5384615}, "atari_centipede": {"dqn_sticky_2e8": 2173.3, "c51_sticky_2e8": 7830.97577855, "rainbow_sticky_2e8": 6360.17207792, "iqn_sticky_2e8": 3613.81109185}, "atari_qbert": {"dqn_sticky_2e8": 9876.01010101, "c51_sticky_2e8": 10121.9512195, "rainbow_sticky_2e8": 15541.6666667, "iqn_sticky_2e8": 13305.3191489}, "atari_fishing_derby": {"dqn_sticky_2e8": 2.84285714286, "c51_sticky_2e8": 11.1578947368, "rainbow_sticky_2e8": 42.4462151394, "iqn_sticky_2e8": 42.9324894515}, "atari_yars_revenge": {"dqn_sticky_2e8": 24010.4618644, "c51_sticky_2e8": 10941.7933884, "rainbow_sticky_2e8": 48431.2134146, "iqn_sticky_2e8": 83314.4336283}, "atari_kung_fu_master": {"dqn_sticky_2e8": 23842.0454545, "c51_sticky_2e8": 21218.6813187, "rainbow_sticky_2e8": 27548.1012658, "iqn_sticky_2e8": 32021.3333333}, "atari_breakout": {"dqn_sticky_2e8": 93.0327868852, "c51_sticky_2e8": 213.309210526, "rainbow_sticky_2e8": 107.497237569, "iqn_sticky_2e8": 76.9896907216}, "atari_demon_attack": {"dqn_sticky_2e8": 6125.05747126, "c51_sticky_2e8": 7160.43209877, "rainbow_sticky_2e8": 18187.9069767, "iqn_sticky_2e8": 13623.8461538}, "atari_video_pinball": {"dqn_sticky_2e8": 59460.1084337, "c51_sticky_2e8": 346053.818182, "rainbow_sticky_2e8": 462307.923077, "iqn_sticky_2e8": 639575.272727}, "atari_chopper_command": {"dqn_sticky_2e8": 2460.47904192, "c51_sticky_2e8": 6999.27007299, "rainbow_sticky_2e8": 12470.6730769, "iqn_sticky_2e8": 9482.66129032}, "atari_krull": {"dqn_sticky_2e8": 6184.05154639, "c51_sticky_2e8": 6670.77083333, "rainbow_sticky_2e8": 3868.23443223, "iqn_sticky_2e8": 8922.56756757}, "atari_ms_pacman": {"dqn_sticky_2e8": 3389.83516484, "c51_sticky_2e8": 3760.82417582, "rainbow_sticky_2e8": 4187.11656442, "iqn_sticky_2e8": 5422.88732394}, "atari_battle_zone": {"dqn_sticky_2e8": 18795.1807229, "c51_sticky_2e8": 25209.5238095, "rainbow_sticky_2e8": 37523.8095238, "iqn_sticky_2e8": 41405.7971014}, "atari_robotank": {"dqn_sticky_2e8": 60.1315789474, "c51_sticky_2e8": 62.1578947368, "rainbow_sticky_2e8": 64.2894736842, "iqn_sticky_2e8": 65.7179487179}, "atari_gopher": {"dqn_sticky_2e8": 5603.23809524, "c51_sticky_2e8": 7034.89795918, "rainbow_sticky_2e8": 11637.4647887, "iqn_sticky_2e8": 11493.3333333}, "atari_montezuma_revenge": {"dqn_sticky_2e8": 0.0, "c51_sticky_2e8": 0.0, "rainbow_sticky_2e8": 0.0, "iqn_sticky_2e8": 0.0}, "atari_elevator_action": {"dqn_sticky_2e8": 0.0, "c51_sticky_2e8": 73120.0, "rainbow_sticky_2e8": 72720.0, "iqn_sticky_2e8": 0.0}, "atari_bank_heist": {"dqn_sticky_2e8": 633.317073171, "c51_sticky_2e8": 734.395604396, "rainbow_sticky_2e8": 1104.63414634, "iqn_sticky_2e8": 1075.35714286}, "atari_bowling": {"dqn_sticky_2e8": 27.2307692308, "c51_sticky_2e8": 30.6386554622, "rainbow_sticky_2e8": 41.5462184874, "iqn_sticky_2e8": 62.1355932203}, "atari_star_gunner": {"dqn_sticky_2e8": 42890.3846154, "c51_sticky_2e8": 27959.4405594, "rainbow_sticky_2e8": 60662.5, "iqn_sticky_2e8": 77425.3968254}, "atari_private_eye": {"dqn_sticky_2e8": 379.430107527, "c51_sticky_2e8": 3374.08602151, "rainbow_sticky_2e8": 5629.43010753, "iqn_sticky_2e8": 5501.02150538}, "atari_carnival": {"dqn_sticky_2e8": 4734.44021325, "c51_sticky_2e8": 4922.64680105, "rainbow_sticky_2e8": 4176.86774942, "iqn_sticky_2e8": 5957.27782975}, "atari_beam_rider": {"dqn_sticky_2e8": 6159.37142857, "c51_sticky_2e8": 5522.13157895, "rainbow_sticky_2e8": 6525.43661972, "iqn_sticky_2e8": 7359.33333333}, "atari_time_pilot": {"dqn_sticky_2e8": 3670.25862069, "c51_sticky_2e8": 8551.66666667, "rainbow_sticky_2e8": 11911.8881119, "iqn_sticky_2e8": 11303.9215686}, "atari_riverraid": {"dqn_sticky_2e8": 11659.4444444, "c51_sticky_2e8": 13930.4255319, "rainbow_sticky_2e8": 20872.2857143, "iqn_sticky_2e8": 16589.2647059}, "atari_tutankham": {"dqn_sticky_2e8": 97.9571428571, "c51_sticky_2e8": 242.971428571, "rainbow_sticky_2e8": 247.945945946, "iqn_sticky_2e8": 244.25}, "atari_venture": {"dqn_sticky_2e8": 54.4444444444, "c51_sticky_2e8": 1317.73049645, "rainbow_sticky_2e8": 1438.21656051, "iqn_sticky_2e8": 1311.51079137}, "atari_asteroids": {"dqn_sticky_2e8": 661.724137931, "c51_sticky_2e8": 949.259259259, "rainbow_sticky_2e8": 1424.78632479, "iqn_sticky_2e8": 1470.40441176}, "atari_road_runner": {"dqn_sticky_2e8": 38379.3859649, "c51_sticky_2e8": 46631.3084112, "rainbow_sticky_2e8": 53786.8421053, "iqn_sticky_2e8": 57207.7419355}, "atari_atlantis": {"dqn_sticky_2e8": 801080.0, "c51_sticky_2e8": 780540.0, "rainbow_sticky_2e8": 749030.0, "iqn_sticky_2e8": 991020.0}, "atari_asterix": {"dqn_sticky_2e8": 145.966709347, "c51_sticky_2e8": 14594.2567568, "rainbow_sticky_2e8": 19609.8360656, "iqn_sticky_2e8": 7256.79347826}, "atari_wizard_of_wor": {"dqn_sticky_2e8": 2248.01980198, "c51_sticky_2e8": 2858.82352941, "rainbow_sticky_2e8": 7239.85507246, "iqn_sticky_2e8": 4899.41520468}, "atari_berzerk": {"dqn_sticky_2e8": 465.540334855, "c51_sticky_2e8": 535.74433657, "rainbow_sticky_2e8": 860.931372549, "iqn_sticky_2e8": 672.121212121}, "atari_pitfall": {"dqn_sticky_2e8": -34.6203208556, "c51_sticky_2e8": -7.9, "rainbow_sticky_2e8": -6.2625, "iqn_sticky_2e8": -10.4166666667}, "atari_amidar": {"dqn_sticky_2e8": 1370.28712871, "c51_sticky_2e8": 1014.84251969, "rainbow_sticky_2e8": 2276.15942029, "iqn_sticky_2e8": 2001.025}, "atari_jamesbond": {"dqn_sticky_2e8": 572.027972028, "c51_sticky_2e8": 637.939698492, "rainbow_sticky_2e8": 759.473684211, "iqn_sticky_2e8": 2806.69291339}, "atari_frostbite": {"dqn_sticky_2e8": 267.225609756, "c51_sticky_2e8": 4830.87179487, "rainbow_sticky_2e8": 7702.56578947, "iqn_sticky_2e8": 8450.71005917}, "atari_kangaroo": {"dqn_sticky_2e8": 10314.9425287, "c51_sticky_2e8": 7468.85245902, "rainbow_sticky_2e8": 13013.6986301, "iqn_sticky_2e8": 13297.2972973}, "atari_up_n_down": {"dqn_sticky_2e8": 7505.23809524, "c51_sticky_2e8": 8749.02173913, "rainbow_sticky_2e8": 20907.75, "iqn_sticky_2e8": 55617.826087}}
1 change: 1 addition & 0 deletions baselines/dmc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"dmc_acrobot_swingup": {"d4pg_100m": 91.7, "a3c_100m_proprio": 41.9}, "dmc_cartpole_balance": {"d4pg_100m": 992.8, "a3c_100m_proprio": 951.6}, "dmc_cartpole_swingup": {"d4pg_100m": 862.0, "planet_1e6": 821, "a3c_100m_proprio": 558.4}, "dmc_cartpole_balance_sparse": {"d4pg_100m": 1000.0, "a3c_100m_proprio": 857.4}, "dmc_cartpole_swingup_sparse": {"d4pg_100m": 482.0, "a3c_100m_proprio": 179.8}, "dmc_cheetah_run": {"slac_3e6": 880, "d4pg_100m": 523.8, "planet_1e6": 662, "a3c_100m_proprio": 213.9}, "dmc_cup_catch": {"slac_3e6": 970, "d4pg_100m": 980.5, "planet_1e6": 930, "a3c_100m_proprio": 104.7}, "dmc_finger_spin": {"slac_3e6": 950, "d4pg_100m": 985.7, "planet_1e6": 700, "a3c_100m_proprio": 129.4}, "dmc_finger_turn_easy": {"d4pg_100m": 971.4, "a3c_100m_proprio": 167.3}, "dmc_finger_turn_hard": {"d4pg_100m": 966.0, "a3c_100m_proprio": 88.7}, "dmc_hopper_hop": {"d4pg_100m": 242.0, "a3c_100m_proprio": 0.5}, "dmc_hopper_stand": {"d4pg_100m": 929.9, "a3c_100m_proprio": 27.9}, "dmc_reacher_easy": {"d4pg_100m": 967.4, "planet_1e6": 832, "a3c_100m_proprio": 95.6}, "dmc_reacher_hard": {"d4pg_100m": 957.1, "a3c_100m_proprio": 39.7}, "dmc_walker_stand": {"d4pg_100m": 985.2, "a3c_100m_proprio": 378.4}, "dmc_walker_walk": {"slac_3e6": 840, "d4pg_100m": 968.3, "planet_1e6": 951, "a3c_100m_proprio": 311.0}, "dmc_walker_run": {"d4pg_100m": 567.2, "a3c_100m_proprio": 191.8}, "dmc_pendulum_swingup": {"d4pg_100m": 680.9, "a3c_100m_proprio": 48.6}}
Loading

0 comments on commit 02f0210

Please sign in to comment.