Dual Averaging (with default setting suggested by NUTS paper) doesn't work well in transformed space #182

xukai92 · 2017-04-19T15:56:59Z

The default setting for HMC with Dual Averaging method seems not work well with constraints. 49b90e9 adds a file to reproduce the situation in test/hmcda_nb.jl.

To see the situation, it's simple way is to changed the Line 110 of samplers/hmcda.jl to a normal print function println("leapfrog for $τ steps with step size $ϵ") and run the test with julia test/hmcda_nb.jl. It will then show the step size before each leap frog loop, i.e. the one before the error. It can be seen that the first step size adapted is always around 2.7, which is controlled by the initial Dual Averaging parameters and the target accept rate 0.95 (which is already high to reduce the step size).

The text was updated successfully, but these errors were encountered:

yebai · 2017-04-20T11:48:12Z

@xukai92 Is it possible to isolate the numerical issue for transformed variables first? This would allow us to first fix the transform (ie. simplex transform here). I imagine this can be done by finding a value r2 such that

r = link(x, Dirichlet(...))  # apply transform

# perform some changes to `r`, e.g. simulating Hamiltonian dynamics on `r`
r2 = ...

x2 = invlink(r, Dirichlet(...)) # some elements of `x2` are `NaN`

xukai92 · 2017-04-20T12:54:14Z

Yes. I guess if we solve the numerical problem here then we are fine at the moment.

I further looked into the issue and found that the numerical issue only happens to the dual part of dual numbers.

The below is the invlink for the simplex variables.

function invlink(d::SimplexDistribution, y::Vector)
  K = length(y)
  T = typeof(y[1])
  z = [invlogit(y[k] + log(1 / (K - k))) for k in 1:K-1]
  x = Vector{T}(K)
  for k in 1:K-1
    x[k] = (1 - sum(x[1:k-1])) * z[k]
  end
  x[K] = 1 - sum(x[1:K-1])
  x
end

My guess is that invlogit(y[k] + log(1 / (K - k))) is not very safe and lose precision, when calculating the dual parts, the chain rule of derivatives may make the situation even worse. I'm looking into StatsFuns's implementation of some functions to see if I can use some to improve the code here.

xukai92 · 2017-04-20T13:40:14Z

I further locate the problem should origin from gradient(). It seems that gradient() returns NaN when numbers go extreme, which later used to update variables.

xukai92 · 2017-04-20T15:08:00Z

LATEST

The original cause of this issue is due to the fact that between link and invlink the constrained variable lose some precision, which then make the probability vector to be something like [1, 0, 0, 0], which is actually invalid to Dirichlet because there shouldn't be 0s in the probability vector of Dirichlet. When we call logpdf using this vector, we got NaN then (both in the real part and the dual part). Our AD then uses this NaNs to do leapfrog steps, which then cause each dimension to be NaN.

yebai · 2017-04-20T17:59:57Z

using Turing, Distributions

# Our version of invlogit
invlogit(x) = 1.0 ./ (exp(-x) + 1.0)

# log invlogit
loginvlogit_unstable(x) = log(invlogit(x))

# Numerically stable version of log invlogit
#  See e.g. https://lingpipe-blog.com/2012/02/16/howprevent-overflow-underflow-logistic-regression/
loginvlogit(x) = -logsumexp([0, -x])

# julia> loginvlogit_unstable(-750)
# -Inf

# julia> loginvlogit(-750)
# -750.0

function loginvlink(d::Dirichlet, y::Vector)
  K = length(y)
  T = typeof(y[1])
  z = [loginvlogit(y[k] - log(K - k)) for k in 1:K-1]
  x = Vector{T}(K)
  for k in 1:K-1
    x[k] =  log(-expm1(logsumexp(x[1:k-1]))) + z[k]
  end
  x[K] = log(-expm1(logsumexp(x[1:K-1])))
 x
end

# loginvlink(Dirichlet([1/3, 1/3, 1/3]), [-1000.0, -1000, -1000])
# 3-element Array{Float64,1}:
# -1000.69
# -1000.0 
#    0.0 

# julia> log(Turing.invlink(Dirichlet([1/3, 1/3, 1/3]), [-1000.0, -1000, -1000]))
# 3-element Array{Float64,1}:
# -Inf  
# -Inf  
#   0.0

xukai92 · 2017-04-20T21:38:53Z

Looks great! But we still need to exp the result and throw the result (which sitll Inf) to logpdf of Direchlet, right?

xukai92 · 2017-04-20T21:51:37Z

NOTE

The logpdf of the Dirichlet distribution from the Distributions package

function _logpdf{T<:Real}(d::Dirichlet, x::AbstractVector{T})
    a = d.alpha
    s = 0.
    for i in 1:length(a)
        @inbounds s += (a[i] - 1.0) * log(x[i])
    end
    return s - d.lmnB
end

yebai · 2017-04-21T11:45:02Z

It's OK to take the exp after loginvlink, the returned value will be 0 since -Inf< x <=0 here. Similarly the gradient through loginvlink will be 0 too. This prevents the underflow issue when taking gradients.

We might also need a special version of the logpdf function, in order to prevent the underflow or overflow issue. We can discuss when we meet.

xukai92 · 2017-04-21T12:35:51Z

I see. I guess another reason here we have the NaN is because that we use Dirichlet([1, 1, 1, 1]) in the Naive Bayes model.

If there is a 0 in the probability vector, during the 5th line of the logpdf of Dirichlet, i.e.

s += (a[i] - 1.0) * log(x[i])

We will have (1 - 1.0) * log(exp(-1000)) => 0 * -Inf => NaN.

yebai · 2017-04-28T17:35:55Z

Closed by PR #199

xukai92 mentioned this issue Apr 20, 2017

NUTS implementation #188

Closed

3 tasks

yebai mentioned this issue Apr 21, 2017

Make isprobvec(p) loose #163

Closed

xukai92 mentioned this issue Apr 24, 2017

NUTS without Dual Averaging (Algorithm 3 of the NUTS paper) #194

Closed

yebai pushed a commit that referenced this issue Apr 24, 2017

Numerically stable SimplexDist transform (#182)

9759929

yebai pushed a commit that referenced this issue Apr 26, 2017

Fix reject for dual averaging (#202, #182)

5967e90

yebai closed this as completed Apr 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dual Averaging (with default setting suggested by NUTS paper) doesn't work well in transformed space #182

Dual Averaging (with default setting suggested by NUTS paper) doesn't work well in transformed space #182

xukai92 commented Apr 19, 2017

yebai commented Apr 20, 2017 •

edited

xukai92 commented Apr 20, 2017

xukai92 commented Apr 20, 2017 •

edited

xukai92 commented Apr 20, 2017 •

edited

yebai commented Apr 20, 2017 •

edited

xukai92 commented Apr 20, 2017

xukai92 commented Apr 20, 2017

yebai commented Apr 21, 2017

xukai92 commented Apr 21, 2017

yebai commented Apr 28, 2017

Dual Averaging (with default setting suggested by NUTS paper) doesn't work well in transformed space #182

Dual Averaging (with default setting suggested by NUTS paper) doesn't work well in transformed space #182

Comments

xukai92 commented Apr 19, 2017

yebai commented Apr 20, 2017 • edited

xukai92 commented Apr 20, 2017

xukai92 commented Apr 20, 2017 • edited

xukai92 commented Apr 20, 2017 • edited

yebai commented Apr 20, 2017 • edited

xukai92 commented Apr 20, 2017

xukai92 commented Apr 20, 2017

yebai commented Apr 21, 2017

xukai92 commented Apr 21, 2017

yebai commented Apr 28, 2017

yebai commented Apr 20, 2017 •

edited

xukai92 commented Apr 20, 2017 •

edited

xukai92 commented Apr 20, 2017 •

edited

yebai commented Apr 20, 2017 •

edited