Skip to content

Commit

Permalink
Doc update (#15)
Browse files Browse the repository at this point in the history
  • Loading branch information
araffin committed May 11, 2020
1 parent 257a40e commit b02afd6
Show file tree
Hide file tree
Showing 5 changed files with 38 additions and 12 deletions.
22 changes: 22 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# Build documentation with MkDocs
#mkdocs:
# configuration: mkdocs.yml

# Optionally build your docs in additional formats such as PDF and ePub
formats: all

# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- requirements: docs/requirements.txt
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ def __getattr__(cls, name):
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode',
# 'sphinx.ext.intersphinx',
# 'sphinx.ext.doctest'
]

if enable_spell_check:
Expand Down Expand Up @@ -218,5 +219,4 @@ def setup(app):

# kornia's hack to get rtd builder to install latest pytorch
if on_rtd:
os.system('pip install gym')
os.system('pip install torch==1.5.0+cpu torchvision==0.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html')
22 changes: 11 additions & 11 deletions docs/guide/developer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Developer Guide
================

This guide is meant for those who wants to understand the internals and the design choices of Stable-Baselines3.
This guide is meant for those who want to understand the internals and the design choices of Stable-Baselines3.


At first, you should read the two issues where the design choices were discussed:
Expand All @@ -19,31 +19,31 @@ The library is not meant to be modular, although inheritance is used to reduce c
Algorithms Structure
====================

Each algorithm (on-policy and off-policy ones) follow a common structure.
Each algorithm (on-policy and off-policy ones) follows a common structure.
There is one folder per algorithm, and in that folder there is the algorithm and the policy definition (``policies.py``).

Each algorithm had two main methods:
Each algorithm has two main methods:

- ``.collect_rollouts()`` which defines how new samples are collected, usually inherited from the base class. Those samples are then stored in a ``RolloutBuffer`` (discarded after the gradient update) or ``ReplayBuffer``

- ``.train()`` which update the parameters using samples from the buffer
- ``.train()`` which updates the parameters using samples from the buffer


Where to start?
===============

The first thing you need to read and understand are the base classes in the ``common/`` folder:

- ``BaseRLModel`` in ``base_class.py`` which defines how an RL class should look like
it contains also all the "glue code" for saving/loading and the common operations (wrapping environments)
- ``BaseRLModel`` in ``base_class.py`` which defines how an RL class should look like.
It contains also all the "glue code" for saving/loading and the common operations (wrapping environments)

- ``BasePolicy`` in ``policies.py`` which defines how a policy class should look like
it contains also all the magic for the ``.predict()`` method, to handle as many cases as possible
- ``BasePolicy`` in ``policies.py`` which defines how a policy class should look like.
It contains also all the magic for the ``.predict()`` method, to handle as many cases as possible

- ``OffPolicyRLModel`` in ``base_class.py`` that contains the implementation of ``collect_rollouts()`` for the off-policy algorithms


All the environments handled internally are assume to be ``VecEnv`` (``gym.Env`` are automatically wrapped).
All the environments handled internally are assumed to be ``VecEnv`` (``gym.Env`` are automatically wrapped).


Pre-Processing
Expand All @@ -59,7 +59,7 @@ Policy Structure
================

When we refer to "policy" in Stable-Baselines3, this is usually an abuse of language compared to RL terminology.
In SB3, "Policy" refers to the class that handle all the networks useful for training,
In SB3, "policy" refers to the class that handles all the networks useful for training,
so not only the network used to predict actions (the "learned controller").

For instance, the ``TD3`` policy contains the actor, the critic and the target networks.
Expand All @@ -69,7 +69,7 @@ Probability distributions

When needed, the policies handle the different probability distributions.
All distributions are located in ``common/distributions.py`` and follow the same interface.
Each distribution correspond to a type of action space (e.g. ``Categorical`` is the one used for discrete actions.
Each distribution corresponds to a type of action space (e.g. ``Categorical`` is the one used for discrete actions.
For continuous actions, we can use multiple distributions ("DiagGaussian", "SquashedGaussian" or "StateDependentDistribution")

State-Dependent Exploration
Expand Down
1 change: 1 addition & 0 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Others:
- Added Dockerfile
- Sync ``VecEnvs`` with Stable-Baselines
- Update requirement: ``gym>=0.17``
- Added ``.readthedoc.yml`` file

Documentation:
^^^^^^^^^^^^^^
Expand Down
3 changes: 3 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
gym>=0.17
pandas
matplotlib

0 comments on commit b02afd6

Please sign in to comment.