Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 2.6 KB

19_2017_Dabney.md

File metadata and controls

36 lines (24 loc) · 2.6 KB

Title: Distributional RL with Quantile Regression

Author: Dabney et al (2017) (DeepMind)

General Content:

Previous Bellemare C51 paper derived results using max form Wasserstein metric but implementation used projection onto support + KL instead to obtain gradient. Here authors attempt to bridge the gap by not looking at fixed location atoms varying their probabilities but using variable locations with fixed uniform probabilities. In order to do so they make use of quantile regression to adjust the locations of the atoms. They formally prove that this results in a contraction mapping under the Wasserstein metric & derive SOTA on ATARI.

Key take-away for own work:

Quantile regression is an idea from econometrics since econ is very much interested in the actual value of variable and not so much into hitting the exact likelihoods. Cool idea to circumvent problem of subgradients by smoothing things out using the Huber loss.

Keypoints:

  • Setting of distributional rl - interested in value distribution and not only mean value function..

  • Benefits of wasserstein metric:

    • p-Wasserstein metric = Lp metric on inverse cumulative distibution functions. 1 Wasserstein distance is area between two CDFs
    • Does not suffer from disjoint support issue arising from Bellman updates. BUT: viewed as a loss - we cannot obtain unbiased gradients
    • Property of respecting the underlying metric distances between two outcomes. True probability metric = considers both the probability of and the distance between outcome events
    • Suited for case where underlying similarity in outcome is more important than matching exact likelihoods
  • Here "transpose" problem - no longer fixed location atoms with variable weights but other way around. Then use quantile regression to stochastically adapt locations as to minimize the Wasserstein distance to a target distribution. In practice: Use smoothed version using Huber loss.

  • Theoretical result: Prove contraction mapping for algo under Wasserstein metric

  • Benefits of quantile distribution:

    1. Not restricted to prespecified support
    2. No projection step required
    3. Can minimize the Wasserstein Loss
  • Look at handwritten notes from blog post for more details on reweighting to obtain quantile reg loss and huber smoothing - leads to smooth gradient propagation

  • In limit n=1 is essentially like DQN - collapses to mean

Questions/Critical Observations:

  • Read original paper by Tsitsiklis and van Roy 1997 on use of fct approx = Bellman update projected onto approx space may no longer be a contraction!
  • Widely varying number of N used in experiments