Skip to content

Commit

Permalink
minor fixes (#372)
Browse files Browse the repository at this point in the history
  • Loading branch information
veds12 committed Oct 5, 2020
1 parent 608fc03 commit 52b0b4c
Show file tree
Hide file tree
Showing 8 changed files with 8 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/usage/tutorials/Deep/DDPG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ DDPG makes use of target networks for the actor(policy) and the critic(value) ne
.. math::
y_t = r(s_t, a_t) + \gamma Q_targ(s_{t+1}, \mu_targ(s_{t+1}) \vert \theta^{Q})
y_t = r(s_t, a_t) + \gamma Q_{targ}(s_{t+1}, \mu_{targ}(s_{t+1}) \vert \theta^{Q})
Buliding up on Deterministic Policy Gradients, the gradient of the policy can be determined using the action-value function as

Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage/tutorials/Deep/NoisyNet_DQN.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ A noisy parameter :math:`\theta` is defined as:

.. math::
\theta \coloneqq \mu + \Sigma \odot \epsilon
\theta := \mu + \Sigma \odot \epsilon
where :math:`\Sigma` and :math:`\mu` are vectors of trainable parameters and :math:`\epsilon` is a vector of zero mean noise. Hence, the loss function is now defined with respect to :math:`\Sigma` and :math:`\mu`
and the optimization now takes place with respect to :math:`\Sigma` and :math:`\mu`. :math:`\epsilon` is sampled from factorised gaussian noise.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ critic network act on this feature vector to select an action and estimate the v
action value
GenRL provides support to incorporte this decoder network in all of the actor critic agents through a ``shared_layers``
parameter. ``shared_layers`` takes the sizes of the mlp layers o be used, and ``None`` if no decoder network is to be
parameter. ``shared_layers`` takes the sizes of the mlp layers to be used, and ``None`` if no decoder network is to be
used

As an example - in A2C -
Expand Down
1 change: 1 addition & 0 deletions genrl/agents/deep/a2c/a2c.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ class A2C(OnPolicyAgent):
gamma (float): The discount factor for rewards
layers (:obj:`tuple` of :obj:`int`): Layers in the Neural Network
of the Q-value function
shared_layers(:obj:`tuple` of :obj:`int`): Sizes of shared layers in Actor Critic if using
lr_policy (float): Learning rate for the policy/actor
lr_value (float): Learning rate for the critic
rollout_size (int): Capacity of the Replay Buffer
Expand Down
1 change: 1 addition & 0 deletions genrl/agents/deep/ddpg/ddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class DDPG(OffPolicyAgentAC):
gamma (float): The discount factor for rewards
layers (:obj:`tuple` of :obj:`int`): Layers in the Neural Network
of the Q-value function
shared_layers(:obj:`tuple` of :obj:`int`): Sizes of shared layers in Actor Critic if using
lr_policy (float): Learning rate for the policy/actor
lr_value (float): Learning rate for the critic
replay_size (int): Capacity of the Replay Buffer
Expand Down
1 change: 1 addition & 0 deletions genrl/agents/deep/ppo1/ppo1.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ class PPO1(OnPolicyAgent):
gamma (float): The discount factor for rewards
layers (:obj:`tuple` of :obj:`int`): Layers in the Neural Network
of the Q-value function
shared_layers(:obj:`tuple` of :obj:`int`): Sizes of shared layers in Actor Critic if using
lr_policy (float): Learning rate for the policy/actor
lr_value (float): Learning rate for the Q-value function
rollout_size (int): Capacity of the Rollout Buffer
Expand Down
1 change: 1 addition & 0 deletions genrl/agents/deep/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class SAC(OffPolicyAgentAC):
gamma (float): The discount factor for rewards
policy_layers (:obj:`tuple` of :obj:`int`): Neural network layer dimensions for the policy
value_layers (:obj:`tuple` of :obj:`int`): Neural network layer dimensions for the critics
shared_layers(:obj:`tuple` of :obj:`int`): Sizes of shared layers in Actor Critic if using
lr_policy (float): Learning rate for the policy/actor
lr_value (float): Learning rate for the critic
replay_size (int): Capacity of the Replay Buffer
Expand Down
1 change: 1 addition & 0 deletions genrl/agents/deep/td3/td3.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class TD3(OffPolicyAgentAC):
gamma (float): The discount factor for rewards
policy_layers (:obj:`tuple` of :obj:`int`): Neural network layer dimensions for the policy
value_layers (:obj:`tuple` of :obj:`int`): Neural network layer dimensions for the critics
shared_layers(:obj:`tuple` of :obj:`int`): Sizes of shared layers in Actor Critic if using
lr_policy (float): Learning rate for the policy/actor
lr_value (float): Learning rate for the critic
replay_size (int): Capacity of the Replay Buffer
Expand Down

0 comments on commit 52b0b4c

Please sign in to comment.