Skip to content

Latest commit

 

History

History
777 lines (485 loc) · 11.6 KB

api.rst

File metadata and controls

777 lines (485 loc) · 11.6 KB
  • Values, including both state and action-values;
  • Values for Non-linear generalizations of the Bellman equations.
  • Return Distributions, aka distributional value functions;
  • General Value Functions, for cumulants other than the main reward;
  • Policies, via policy-gradients in both continuous and discrete action spaces.

Value Learning

rlax

categorical_double_q_learning categorical_l2_project categorical_q_learning categorical_td_learning discounted_returns double_q_learning expected_sarsa general_off_policy_returns_from_action_values general_off_policy_returns_from_q_and_v lambda_returns leaky_vtrace leaky_vtrace_td_error_and_advantage n_step_bootstrapped_returns persistent_q_learning q_lambda q_learning quantile_expected_sarsa quantile_q_learning quantile_regression_loss qv_learning qv_max retrace retrace_continuous sarsa sarsa_lambda td_lambda td_learning transformed_general_off_policy_returns_from_action_values transformed_lambda_returns transformed_n_step_q_learning transformed_n_step_returns transformed_q_lambda transformed_retrace vtrace vtrace_td_error_and_advantage

Categorical Double Q Learning

categorical_double_q_learning

Categorical L2 Project

categorical_l2_project

Categorical Q Learning

categorical_q_learning

Categorical TD Learning

categorical_td_learning

Discounted Returns

discounted_returns

Double Q Learning

double_q_learning

Expected SARSA

expected_sarsa

General Off Policy Returns From Action Values

general_off_policy_returns_from_action_values

General Off Policy Returns From Q and V

general_off_policy_returns_from_q_and_v

Lambda Returns

lambda_returns

Leaky VTrace

leaky_vtrace

N Step Bootstrapped Returns

n_step_bootstrapped_returns

Leaky VTrace TD Error and Advantage

leaky_vtrace_td_error_and_advantage

Persistent Q Learning

persistent_q_learning

Q-Lambda

q_lambda

Q Learning

q_learning

Quantile Expected Sarsa

quantile_expected_sarsa

Quantile Q Learning

quantile_q_learning

QV Learning

qv_learning

QV Max

qv_max

Retrace

retrace

Retrace Continuous

retrace_continuous

SARSA

sarsa

SARSA Lambda

sarsa_lambda

TD Lambda

td_lambda

TD Learning

td_learning

Transformed General Off Policy Returns from Action Values

transformed_general_off_policy_returns_from_action_values

Transformed Lambda Returns

transformed_lambda_returns

Transformed N Step Q Learning

transformed_n_step_q_learning

Transformed N Step Returns

transformed_n_step_returns

Transformed Q Lambda

transformed_q_lambda

Transformed Retrace

transformed_retrace

Truncated Generalized Advantage Estimation

truncated_generalized_advantage_estimation

VTrace

vtrace

Policy Optimization

rlax

clipped_surrogate_pg_loss constant_policy_targets dpg_loss entropy_loss mpo_loss mpo_compute_weights_and_temperature_loss policy_gradient_loss qpg_loss rm_loss rpg_loss sampled_policy_distillation_loss zero_policy_targets

Clipped Surrogate PG Loss

clipped_surrogate_pg_loss

Compute Parametric KL Penalty and Dual Loss

compute_parametric_kl_penalty_and_dual_loss

DPG Loss

dpg_loss

Entropy Loss

entropy_loss

MPO Compute Weights and Temperature Loss

mpo_compute_weights_and_temperature_loss

MPO Loss

mpo_loss

Policy Gradient Loss

policy_gradient_loss

QPG Loss

qpg_loss

RM Loss

rm_loss

RPG Loss

rpg_loss

Constant Policy Targets

constant_policy_targets

Zero Policy Targets

zero_policy_targets

Sampled Policy Distillation Loss

sampled_policy_distillation_loss

VMPO Compute Weights and Temperature Loss

vmpo_compute_weights_and_temperature_loss

VMPO Loss

vmpo_loss

Exploration

rlax

add_dirichlet_noise add_gaussian_noise add_ornstein_uhlenbeck_noise episodic_memory_intrinsic_rewards knn_query

Add Dirichlet Noise

add_dirichlet_noise

Add Gaussian Noise

add_gaussian_noise

Add Ornstein Uhlenbeck Noise

add_ornstein_uhlenbeck_noise

Episodic Memory Intrinsic Rewards

episodic_memory_intrinsic_rewards

KNN Query

knn_query

Utilities

rlax

AllSum batched_index clip_gradient create_ema fix_step_type_on_interruptions lhs_broadcast one_hot embed_oar replace_masked transpose_first_axis_to_last transpose_last_axis_to_first tree_fn tree_map_zipped tree_replace_masked tree_select tree_split_key tree_split_leaves conditional_update periodic_update

All Sum

AllSum

Batched Index

batched_index

Clip Gradient

clip_gradient

Create Ema

create_ema

LHS Broadcast

lhs_broadcast

One Hot

one_hot

Embed OAR

embed_oar

Fix Step Type

fix_step_type_on_interruptions

Transpose First Axis To Last

transpose_first_axis_to_last

Transpose Last Axis to First

transpose_last_axis_to_first

Replace Masked

replace_masked

Tree Map Zipped

tree_map_zipped

Tree Replace Masked

tree_replace_masked

Tree Select

tree_select

Tree Split Key

tree_split_key

Tree Split Leaves

tree_split_leaves

Conditional Update

conditional_update

Periodic Update

periodic_update

General Value Functions

rlax

pixel_control_rewards feature_control_rewards

Pixel Control Rewards

pixel_control_rewards

Feature Control Rewards

feature_control_rewards

Model Learning

rlax

extract_subsequences sample_start_indices

Extract model training data

extract_subsequences

sample_start_indices

Pop Art

rlax

art normalize pop popart unnormalize unnormalize_linear

Art

art

Normalize

normalize

Pop

pop

PopArt

popart

Unnormalize

unnormalize

Unnormalize Linear

unnormalize_linear

Transforms

rlax

compose_tx DISCOUNT_TRANSFORM_PAIR HYPERBOLIC_SIN_PAIR identity IDENTITY_PAIR logit muzero_pair power sigmoid signed_expm1 signed_hyperbolic SIGNED_HYPERBOLIC_PAIR signed_logp1 SIGNED_LOGP1_PAIR signed_parabolic transform_from_2hot transform_to_2hot twohot_pair TxPair unbiased_transform_pair

Identity

identity

Logit

logit

Power

power

Sigmoid

sigmoid

Signed Exponential

signed_expm1

Signed Hyperbolic

signed_hyperbolic

Signed Logarithm

signed_logp1

Signed Parabolic

signed_parabolic

Transform from 2 Hot

transform_from_2hot

Transform to 2 Hot

transform_to_2hot

Losses

rlax

l2_loss likelihood log_loss huber_loss pixel_control_loss

L2 Loss

l2_loss

Likelihood

likelihood

Log Loss

log_loss

Huber Loss

huber_loss

Pixel Control Loss

pixel_control_loss

Distributions

rlax

categorical_cross_entropy categorical_importance_sampling_ratios categorical_kl_divergence categorical_sample clipped_entropy_softmax epsilon_greedy gaussian_diagonal greedy multivariate_normal_kl_divergence softmax squashed_gaussian

Categorical Cross Entropy

categorical_cross_entropy

Categorical Importance Sampling Ratios

categorical_importance_sampling_ratios

Categorical KL Divergence

categorical_kl_divergence

Categorical Sample

categorical_sample

Clipped Entropy Softmax

clipped_entropy_softmax

Epsilon Greedy

epsilon_greedy

Gaussian Diagonal

gaussian_diagonal

Greedy

greedy

Multivariate Normal KL Divergence

multivariate_normal_kl_divergence

Softmax

softmax

Squashed Gaussian

squashed_gaussian