DQN is a model-free, off-policy algorithm that trains a control policies directly from high-dimensional sensory using a deep function approximator to represent the Q-value function
Paper: Playing Atari with Deep Reinforcement Learning
Decision making (act(...)
)
$\epsilon \leftarrow \epsilon_{_{final}} + (\epsilon_{_{initial}} - \epsilon_{_{final}}) \; e^{-1 \; \frac{\text{timestep}}{\epsilon_{_{timesteps}}}}$
$a \leftarrow \begin{cases} a \in_R A & x < \epsilon \\ \underset{a}{\arg\max} \; Q_\phi(s) & x \geq \epsilon \end{cases} \qquad$ for x ← U(0, 1)
$a \leftarrow \begin{cases} a \in_R A & x < \epsilon \\ \underset{a}{\arg\max} \; Q_\phi(s) & x \geq \epsilon \end{cases} \qquad$ for x ← U(0, 1)
Learning algorithm (_update(...)
)
# sample a batch from memory
[s, a, r, s′, d] ← states, actions, rewards, next_states, dones of size
batch_size
# gradient steps
FOR each gradient step up to
gradient_steps
DO# compute target values
Q′ ← Qϕtarget(s′)
$Q_{_{target}} \leftarrow \underset{a}{\max} \; Q' \qquad$
# the only difference with DDQN
y ← r +
discount_factor
¬d Qtarget# compute Q-network loss
Q ← Qϕ(s)[a]
${Loss}_{Q_\phi} \leftarrow \frac{1}{N} \sum_{i=1}^N (Q - y)^2$
# optimize Q-network
∇ϕLossQϕ
# update target network
*IF* it's time to update target network THEN
ϕtarget←
polyak
ϕ + (1 − polyak
)ϕtarget# update learning rate
*IF* there is a
learning_rate_scheduler
THENstep schedulerϕ(optimizerϕ)
../../../skrl/agents/torch/dqn/dqn.py
The implementation supports the following Gym spaces / Gymnasium spaces
Gym/Gymnasium spaces | Observation |
Action |
---|---|---|
Discrete | ▫ |
◼ |
Box | ◼ |
▫ |
Dict | ◼ |
▫ |
The implementation uses 2 deterministic function approximators. These function approximators (models) must be collected in a dictionary and passed to the constructor of the class under the argument models
Notation | Concept | Key | Input shape | Output shape | Type |
---|---|---|---|---|---|
Qϕ(s, a) | Q-network | "q_network" |
observation | action | Deterministic <models_deterministic> |
Qϕtarget(s, a) | Target Q-network | "target_q_network" |
observation | action | Deterministic <models_deterministic> |
Support for advanced features is described in the next table
Feature | Support and remarks |
---|---|
Shared model | - |
RNN support | - |
skrl.agents.torch.dqn.dqn.DQN
__init__