Implemented preprocessors
Running standard scaler <running-standard-scaler-preprocessor>
The preprocessors usage is defined in each agent's configuration dictionary.
The preprocessor class is set under the "<variable>_preprocessor"
key and its arguments are set under the "<variable>_preprocessor_kwargs"
key as a keyword argument dictionary. The following examples show how to set the preprocessors for an agent:
Running standard scaler
# import the preprocessor class
from skrl.resources.preprocessors.torch import RunningStandardScaler
cfg = DEFAULT_CONFIG.copy()
cfg["state_preprocessor"] = RunningStandardScaler
cfg["state_preprocessor_kwargs"] = {"size": env.observation_space, "device": device}
cfg["value_preprocessor"] = RunningStandardScaler
cfg["value_preprocessor_kwargs"] = {"size": 1, "device": device}
Main notation/symbols:
- mean (x̄), standard deviation (σ), variance (σ2)
- running mean (x̄t), running variance (σt2)
- mean (x̄), standard deviation (σ), variance (σ2)
- running mean (x̄t), running variance (σt2)
Standardization by centering and scaling
$\text{clip}((x - \bar{x}_t) / (\sqrt{\sigma^2} \;+$
epsilon
), − c, c) with c as clip_threshold
Scale back the data to the original representation (inverse transform)
$\sqrt{\sigma^2_t} \; \text{clip}(x, -c, c) + \bar{x}_t \qquad$ with c as
clip_threshold
Update the running mean and variance (See parallel algorithm)
δ ← x − x̄t
nT ← nt + n
$M2 \leftarrow (\sigma^2_t n_t) + (\sigma^2 n) + \delta^2 \dfrac{n_t n}{n_T}$
$\bar{x}_t \leftarrow \bar{x}_t + \delta \dfrac{n}{n_T}$
$\sigma^2_t \leftarrow \dfrac{M2}{n_T}$
nt ← nT
nT ← nt + n
$M2 \leftarrow (\sigma^2_t n_t) + (\sigma^2 n) + \delta^2 \dfrac{n_t n}{n_T}$
# update internal variables
$\bar{x}_t \leftarrow \bar{x}_t + \delta \dfrac{n}{n_T}$
$\sigma^2_t \leftarrow \dfrac{M2}{n_T}$
nt ← nT
skrl.resources.preprocessors.torch.running_standard_scaler.RunningStandardScaler
__init__