Improve documentation and README (#40)

* Move acknowledgments out of README * Replace reacher gif * Improve organization
Farama-Foundation · Feb 13, 2023 · f2aa35f · f2aa35f
1 parent 1ca62e8
commit f2aa35f
Show file tree

Hide file tree

Showing 6 changed files with 55 additions and 46 deletions.
diff --git a/README.md b/README.md
@@ -15,30 +15,8 @@ The documentation website is at [mo-gymnasium.farama.org](https://mo-gymnasium.f
 
 ## Environments
 
-<!-- start environments -->
-MO-Gymnasium includes environments taken from the MORL literature, as well as multi-objective version of classical environments, such as Mujoco.
-
-| Env                                                                                                                                                                                                                                        | Obs/Action spaces                   | Objectives                                                    | Description                                                                                                                                                                                                                                                                                                                                         |
-|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [`deep-sea-treasure-v0`](https://mo-gymnasium.farama.org/environments/deep-sea-treasure/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/dst.png" width="200px">                          | Discrete / Discrete                 | `[treasure, time_penalty]`                                    | Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf).                                                                                                                                                                   |
-| [`resource-gathering-v0`](https://mo-gymnasium.farama.org/environments/resource-gathering/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/resource-gathering.png" width="200px">         | Discrete / Discrete                 | `[enemy, gold, gem]`                                          | Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From [Barret & Narayanan 2008](https://dl.acm.org/doi/10.1145/1390156.1390162).                                                                                                                                                                                     |
-| [`fishwood-v0`](https://mo-gymnasium.farama.org/environments/fishwood/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/fishwood.png" width="200px">                                       | Discrete / Discrete                 | `[fish_amount, wood_amount]`                                  | ESR environment, the agent must collect fish and wood to light a fire and eat. From [Roijers et al. 2018](https://www.researchgate.net/publication/328718263_Multi-objective_Reinforcement_Learning_for_the_Expected_Utility_of_the_Return).                                                                                                        |
-| [`breakable-bottles-v0`](https://mo-gymnasium.farama.org/environments/breakable-bottles/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/breakable-bottles.jpg" width="200px">            | Discrete (Dictionary) / Discrete    | `[time_penalty, bottles_delivered, potential]`                | Gridworld with 5 cells. The agents must collect bottles from the source location and deliver to the destination. From [Vamplew et al. 2021](https://www.sciencedirect.com/science/article/pii/S0952197621000336).                                                                                                                                   |
-| [`four-room-v0`](https://mo-gymnasium.farama.org/environments/four-room/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/four-room.png" width="200px">                                    | Discrete / Discrete                 | `[item1, item2, item3]`                                       | Agent must collect three different types of items in the map and reach the goal. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html).                                                                                                                                                                                      |
-| [`mo-mountaincar-v0`](https://mo-gymnasium.farama.org/environments/mo-mountaincar/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/mo-mountaincar.png" width="200px">                     | Continuous / Discrete               | `[time_penalty, reverse_penalty, forward_penalty]`            | Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From [Vamplew et al. 2011](https://www.researchgate.net/publication/220343783_Empirical_evaluation_methods_for_multiobjective_reinforcement_learning_algorithms).                                                                                           |
-| [`mo-mountaincarcontinuous-v0`](https://mo-gymnasium.farama.org/environments/mo-mountaincarcontinuous/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/mo-mountaincar.png" width="200px"> | Continuous / Continuous             | `[time_penalty, fuel_consumption_penalty]`                    | Continuous Mountain Car env, but with penalties for fuel consumption.                                                                                                                                                                                                                                                                               |
-| [`mo-lunar-lander-v2`](https://mo-gymnasium.farama.org/environments/mo-lunar-lander/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/lunarlander.png" width="200px">                      | Continuous / Discrete or Continuous | `[landed, shaped_reward, main_engine_fuel, side_engine_fuel]` | MO version of the `LunarLander-v2` [environment](https://gymnasium.farama.org/environments/box2d/lunar_lander/). Objectives defined similarly as in [Hung et al. 2022](https://openreview.net/forum?id=AwWaBXLIJE).                                                                                                                                 |
-| [`minecart-v0`](https://mo-gymnasium.farama.org/environments/minecart/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/minecart.png" width="200px">                                       | Continuous or Image / Discrete      | `[ore1, ore2, fuel]`                                          | Agent must collect two types of ores and minimize fuel consumption. From [Abels et al. 2019](https://arxiv.org/abs/1809.07803v2).                                                                                                                                                                                                                   |
-| [`mo-highway-v0`](https://mo-gymnasium.farama.org/environments/mo-highway/) and `mo-highway-fast-v0` <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/highway.png" width="200px">           | Continuous / Discrete               | `[speed, right_lane, collision]`                              | The agent's objective is to reach a high speed while avoiding collisions with neighbouring vehicles and staying on the rightest lane. From [highway-env](https://github.com/eleurent/highway-env).                                                                                                                                                  |
-| [`mo-reacher-v4`](https://mo-gymnasium.farama.org/environments/mo-reacher/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/reacher-mujoco.png" width="200px">                             | Continuous / Discrete               | `[target_1, target_2, target_3, target_4]`                    | Mujoco version of `mo-reacher-v0`, based on `Reacher-v4` [environment](https://gymnasium.farama.org/environments/mujoco/reacher/).                                                                                                                                                                                                                  |
-| [`mo-halfcheetah-v4`](https://mo-gymnasium.farama.org/environments/mo-halfcheetah/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/cheetah.png" width="200px">                            | Continuous / Continuous             | `[velocity, energy]`                                          | Multi-objective version of [HalfCheetah-v4](https://gymnasium.farama.org/environments/mujoco/half_cheetah/) env. Similar to [Xu et al. 2020](https://github.com/mit-gfx/PGMORL).                                                                                                                                                                    |
-| [`mo-hopper-v4`](https://mo-gymnasium.farama.org/environments/mo-hopper/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/hopper.png" width="200px">                                       | Continuous / Continuous             | `[velocity, height, energy]`                                  | Multi-objective version of [Hopper-v4](https://gymnasium.farama.org/environments/mujoco/hopper/) env.                                                                                                                                                                                                                                               |
-| [`water-reservoir-v0`](https://mo-gymnasium.farama.org/environments/water-reservoir/)                                                                                                                                                      | Continuous / Continuous             | `[cost_flooding, deficit_water]`                              | A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam. From [Pianosi et al. 2013](https://iwaponline.com/jh/article/15/2/258/3425/Tree-based-fitted-Q-iteration-for-multi-objective).                                                                                     |
-| [`fruit-tree-v0`](https://mo-gymnasium.farama.org/environments/fruit-tree/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/fruit-tree.png" width="200px">                                 | Discrete / Discrete                 | `[nutri1, ..., nutri6]`                                       | Full binary tree of depth d=5,6 or 7. Every leaf contains a fruit with a value for the nutrients Protein, Carbs, Fats, Vitamins, Minerals and Water. From [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf).                                                                                                                                 |
-| [`mo-reacher-v0`]() <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/reacher.png" width="200px">                                                                                            | Continuous / Discrete               | `[target_1, target_2, target_3, target_4]`                    | [:warning: PyBullet support is limited.] Reacher robot from [PyBullet](https://github.com/benelot/pybullet-gym/blob/ec9e87459dd76d92fe3e59ee4417e5a665504f62/pybulletgym/envs/roboschool/robots/manipulators/reacher.py), but there are 4 different target positions. From [Alegre et al. 2022](https://proceedings.mlr.press/v162/alegre22a.html). |
-| [`mo-supermario-v0`](https://mo-gymnasium.farama.org/environments/mo-supermario/) <br><img src="https://raw.githubusercontent.com/Farama-Foundation/MO-Gymnasium/main/screenshots/mario.png" width="200px">                                | Image / Discrete                    | `[x_pos, time, death, coin, enemy]`                           | [:warning: SuperMarioBrosEnv support is limited.] Multi-objective version of [SuperMarioBrosEnv](https://github.com/Kautenja/gym-super-mario-bros). Objectives are defined similarly as in [Yang et al. 2019](https://arxiv.org/pdf/1908.08342.pdf).                                                                                                |
-
-<!-- end environments -->
+MO-Gymnasium includes environments taken from the MORL literature, as well as multi-objective version of classical environments, such as MuJoco.
+The full list of environments is available [here](https://mo-gymnasium.farama.org/environments/all-environments/).
 
 ## Installation
 <!-- start install -->
@@ -103,15 +81,3 @@ If you use this repository in your work, please cite:
 ```
 
 <!-- end citation -->
-
-## Acknowledgments
-
-<!-- start acknowledgments -->
-
-* The `minecart-v0` env is a refactor of https://github.com/axelabels/DynMORL.
-* The `deep-sea-treasure-v0`, `fruit-tree-v0` and `mo-supermario-v0` envs are based on https://github.com/RunzheYang/MORL.
-* The `four-room-v0` env is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.
-* The `fishwood-v0` code was provided by Denis Steckelmacher and Conor F. Hayes.
-* The `water-reservoir-v0` code was provided by Mathieu Reymond.
-
-<!-- end acknowledgments -->
diff --git a/docs/_static/videos/mo-reacher.gif b/docs/_static/videos/mo-reacher.gif
diff --git a/docs/citing/citing.md b/docs/citing/citing.md
@@ -0,0 +1,17 @@
+---
+title: "Citing"
+---
+
+```{include} ../README.md
+:start-after: <!-- start citation -->
+:end-before: <!-- end citation -->
+```
+
+```{toctree}
+:hidden:
+:glob:
+:caption: List of Publications
+
+../examples/publications
+
+```
diff --git a/docs/community/community.md b/docs/community/community.md
@@ -6,7 +6,8 @@ If you want to help us out, reach us, or simply ask questions, you can join the
 
 Aside from the main contributors, some people have also contributed to the project in various ways. We would like to thank them all for their contributions.
 
-```{include} ../../README.md
-:start-after: <!-- start acknowledgments -->
-:end-before: <!-- end acknowledgments -->
-```
+* The `minecart-v0` env is a refactor of https://github.com/axelabels/DynMORL.
+* The `deep-sea-treasure-v0`, `fruit-tree-v0` and `mo-supermario-v0` envs are based on https://github.com/RunzheYang/MORL.
+* The `four-room-v0` env is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.
+* The `fishwood-v0` code was provided by Denis Steckelmacher and Conor F. Hayes.
+* The `water-reservoir-v0` code was provided by Mathieu Reymond.