Skip to content

Latest commit

 

History

History
114 lines (91 loc) · 3.79 KB

skrl.agents.dqn.rst

File metadata and controls

114 lines (91 loc) · 3.79 KB

Deep Q-Network (DQN)

DQN is a model-free, off-policy algorithm that trains a control policies directly from high-dimensional sensory using a deep function approximator to represent the Q-value function

Paper: Playing Atari with Deep Reinforcement Learning

Algorithm implementation

Decision making (act(...))

$\epsilon \leftarrow \epsilon_{_{final}} + (\epsilon_{_{initial}} - \epsilon_{_{final}}) \; e^{-1 \; \frac{\text{timestep}}{\epsilon_{_{timesteps}}}}$
$a \leftarrow \begin{cases} a \in_R A & x < \epsilon \\ \underset{a}{\arg\max} \; Q_\phi(s) & x \geq \epsilon \end{cases} \qquad$ for x ← U(0, 1)

Learning algorithm (_update(...))

# sample a batch from memory
[s, a, r, s′, d] states, actions, rewards, next_states, dones of size batch_size
# gradient steps
FOR each gradient step up to gradient_steps DO
    # compute target values
    Q′ ← Qϕtarget(s′)
    $Q_{_{target}} \leftarrow \underset{a}{\max} \; Q' \qquad$ # the only difference with DDQN
    y ← r + discount_factor ¬dQtarget
    # compute Q-network loss
    Q ← Qϕ(s)[a]
    ${Loss}_{Q_\phi} \leftarrow \frac{1}{N} \sum_{i=1}^N (Q - y)^2$
    # optimize Q-network
    ϕLossQϕ
    # update target network
    *IF* it's time to update target network THEN
        ϕtarget polyak ϕ + (1 − polyak )ϕtarget
    # update learning rate
    *IF* there is a learning_rate_scheduler THEN
        step schedulerϕ(optimizerϕ)

Configuration and hyperparameters

../../../skrl/agents/torch/dqn/dqn.py

Spaces and models

The implementation supports the following Gym spaces / Gymnasium spaces

Gym/Gymnasium spaces

Observation

Action

Discrete

Box

Dict

The implementation uses 2 deterministic function approximators. These function approximators (models) must be collected in a dictionary and passed to the constructor of the class under the argument models

Notation Concept Key Input shape Output shape Type
Qϕ(s, a) Q-network "q_network" observation action Deterministic <models_deterministic>
Qϕtarget(s, a) Target Q-network "target_q_network" observation action Deterministic <models_deterministic>

Support for advanced features is described in the next table

Feature Support and remarks
Shared model -
RNN support -

API

skrl.agents.torch.dqn.dqn.DQN

__init__