Values, including both state and action-values;
Values for Non-linear generalizations of the Bellman equations.
Return Distributions, aka distributional value functions;
General Value Functions, for cumulants other than the main reward;
Policies, via policy-gradients in both continuous and discrete action spaces.

Value Learning

rlax

categorical_double_q_learning categorical_l2_project categorical_q_learning categorical_td_learning discounted_returns double_q_learning expected_sarsa general_off_policy_returns_from_action_values general_off_policy_returns_from_q_and_v lambda_returns leaky_vtrace leaky_vtrace_td_error_and_advantage n_step_bootstrapped_returns persistent_q_learning q_lambda q_learning quantile_expected_sarsa quantile_q_learning quantile_regression_loss qv_learning qv_max retrace retrace_continuous sarsa sarsa_lambda td_lambda td_learning transformed_general_off_policy_returns_from_action_values transformed_lambda_returns transformed_n_step_q_learning transformed_n_step_returns transformed_q_lambda transformed_retrace vtrace vtrace_td_error_and_advantage

Categorical Double Q Learning

categorical_double_q_learning

Categorical L2 Project

categorical_l2_project

Categorical Q Learning

categorical_q_learning

Categorical TD Learning

categorical_td_learning

Discounted Returns

discounted_returns

Double Q Learning

double_q_learning

Expected SARSA

expected_sarsa

General Off Policy Returns From Action Values

general_off_policy_returns_from_action_values

General Off Policy Returns From Q and V

general_off_policy_returns_from_q_and_v

Lambda Returns

lambda_returns

Leaky VTrace

leaky_vtrace

N Step Bootstrapped Returns

n_step_bootstrapped_returns

Leaky VTrace TD Error and Advantage

leaky_vtrace_td_error_and_advantage

Persistent Q Learning

persistent_q_learning

Q-Lambda

q_lambda

Q Learning

q_learning

Quantile Expected Sarsa

quantile_expected_sarsa

Quantile Q Learning

quantile_q_learning

QV Learning

qv_learning

QV Max

qv_max

Retrace

retrace

Retrace Continuous

retrace_continuous

SARSA

sarsa

SARSA Lambda

sarsa_lambda

TD Lambda

td_lambda

TD Learning

td_learning

Transformed General Off Policy Returns from Action Values

transformed_general_off_policy_returns_from_action_values

Transformed Lambda Returns

transformed_lambda_returns

Transformed N Step Q Learning

transformed_n_step_q_learning

Transformed N Step Returns

transformed_n_step_returns

Transformed Q Lambda

transformed_q_lambda

Transformed Retrace

transformed_retrace

Truncated Generalized Advantage Estimation

truncated_generalized_advantage_estimation

VTrace

vtrace

Policy Optimization

rlax

clipped_surrogate_pg_loss constant_policy_targets dpg_loss entropy_loss mpo_loss mpo_compute_weights_and_temperature_loss policy_gradient_loss qpg_loss rm_loss rpg_loss sampled_policy_distillation_loss zero_policy_targets

Clipped Surrogate PG Loss

clipped_surrogate_pg_loss

Compute Parametric KL Penalty and Dual Loss

compute_parametric_kl_penalty_and_dual_loss

DPG Loss

dpg_loss

Entropy Loss

entropy_loss

MPO Compute Weights and Temperature Loss

mpo_compute_weights_and_temperature_loss

MPO Loss

mpo_loss

Policy Gradient Loss

policy_gradient_loss

QPG Loss

qpg_loss

RM Loss

rm_loss

RPG Loss

rpg_loss

Constant Policy Targets

constant_policy_targets

Zero Policy Targets

zero_policy_targets

Sampled Policy Distillation Loss

sampled_policy_distillation_loss

VMPO Compute Weights and Temperature Loss

vmpo_compute_weights_and_temperature_loss

VMPO Loss

vmpo_loss

Exploration

rlax

add_dirichlet_noise add_gaussian_noise add_ornstein_uhlenbeck_noise episodic_memory_intrinsic_rewards knn_query

Add Dirichlet Noise

add_dirichlet_noise

Add Gaussian Noise

add_gaussian_noise

Add Ornstein Uhlenbeck Noise

add_ornstein_uhlenbeck_noise

Episodic Memory Intrinsic Rewards

episodic_memory_intrinsic_rewards

KNN Query

knn_query

Utilities

rlax

AllSum batched_index clip_gradient create_ema fix_step_type_on_interruptions lhs_broadcast one_hot embed_oar replace_masked transpose_first_axis_to_last transpose_last_axis_to_first tree_fn tree_map_zipped tree_replace_masked tree_select tree_split_key tree_split_leaves conditional_update periodic_update

All Sum

AllSum

Batched Index

batched_index

Clip Gradient

clip_gradient

Create Ema

create_ema

LHS Broadcast

lhs_broadcast

One Hot

one_hot

Embed OAR

embed_oar

Fix Step Type

fix_step_type_on_interruptions

Transpose First Axis To Last

transpose_first_axis_to_last

Transpose Last Axis to First

transpose_last_axis_to_first

Replace Masked

replace_masked

Tree Map Zipped

tree_map_zipped

Tree Replace Masked

tree_replace_masked

Tree Select

tree_select

Tree Split Key

tree_split_key

Tree Split Leaves

tree_split_leaves

Conditional Update

conditional_update

Periodic Update

periodic_update

General Value Functions

rlax

pixel_control_rewards feature_control_rewards

Pixel Control Rewards

pixel_control_rewards

Feature Control Rewards

feature_control_rewards

Model Learning

rlax

extract_subsequences sample_start_indices

Extract model training data

extract_subsequences

sample_start_indices

Pop Art

rlax

art normalize pop popart unnormalize unnormalize_linear

Art

art

Normalize

normalize

Pop

pop

PopArt

popart

Unnormalize

unnormalize

Unnormalize Linear

unnormalize_linear

Transforms

rlax

compose_tx DISCOUNT_TRANSFORM_PAIR HYPERBOLIC_SIN_PAIR identity IDENTITY_PAIR logit muzero_pair power sigmoid signed_expm1 signed_hyperbolic SIGNED_HYPERBOLIC_PAIR signed_logp1 SIGNED_LOGP1_PAIR signed_parabolic transform_from_2hot transform_to_2hot twohot_pair TxPair unbiased_transform_pair

Identity

identity

Logit

logit

Power

power

Sigmoid

sigmoid

Signed Exponential

signed_expm1

Signed Hyperbolic

signed_hyperbolic

Signed Logarithm

signed_logp1

Signed Parabolic

signed_parabolic

Transform from 2 Hot

transform_from_2hot

Transform to 2 Hot

transform_to_2hot

Losses

rlax

l2_loss likelihood log_loss huber_loss pixel_control_loss

L2 Loss

l2_loss

Likelihood

likelihood

Log Loss

log_loss

Huber Loss

huber_loss

Pixel Control Loss

pixel_control_loss

Distributions

rlax

categorical_cross_entropy categorical_importance_sampling_ratios categorical_kl_divergence categorical_sample clipped_entropy_softmax epsilon_greedy gaussian_diagonal greedy multivariate_normal_kl_divergence softmax squashed_gaussian

Categorical Cross Entropy

categorical_cross_entropy

Categorical Importance Sampling Ratios

categorical_importance_sampling_ratios

Categorical KL Divergence

categorical_kl_divergence

Categorical Sample

categorical_sample

Clipped Entropy Softmax

clipped_entropy_softmax

Epsilon Greedy

epsilon_greedy

Gaussian Diagonal

gaussian_diagonal

Greedy

greedy

Multivariate Normal KL Divergence

multivariate_normal_kl_divergence

Softmax

softmax

Squashed Gaussian

squashed_gaussian

Files

api.rst

Latest commit

History

api.rst

File metadata and controls

Value Learning

Categorical Double Q Learning

Categorical L2 Project

Categorical Q Learning

Categorical TD Learning

Discounted Returns

Double Q Learning

Expected SARSA

General Off Policy Returns From Action Values

General Off Policy Returns From Q and V

Lambda Returns

Leaky VTrace

N Step Bootstrapped Returns

Leaky VTrace TD Error and Advantage

Persistent Q Learning

Q-Lambda

Q Learning

Quantile Expected Sarsa

Quantile Q Learning

QV Learning

QV Max

Retrace

Retrace Continuous

SARSA

SARSA Lambda

TD Lambda

TD Learning

Transformed General Off Policy Returns from Action Values

Transformed Lambda Returns

Transformed N Step Q Learning

Transformed N Step Returns

Transformed Q Lambda

Transformed Retrace

Truncated Generalized Advantage Estimation

VTrace

Policy Optimization

Clipped Surrogate PG Loss

Compute Parametric KL Penalty and Dual Loss

DPG Loss

Entropy Loss

MPO Compute Weights and Temperature Loss

MPO Loss

Policy Gradient Loss

QPG Loss

RM Loss

RPG Loss

Constant Policy Targets

Zero Policy Targets

Sampled Policy Distillation Loss

VMPO Compute Weights and Temperature Loss

VMPO Loss

Exploration

Add Dirichlet Noise

Add Gaussian Noise

Add Ornstein Uhlenbeck Noise

Episodic Memory Intrinsic Rewards

KNN Query

Utilities

All Sum

Batched Index

Clip Gradient

Create Ema

LHS Broadcast

One Hot

Embed OAR

Fix Step Type

Transpose First Axis To Last

Transpose Last Axis to First

Replace Masked

Tree Map Zipped

Tree Replace Masked

Tree Select

Tree Split Key

Tree Split Leaves