Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamicHMC fails on LBA #45

Open
itsdfish opened this issue Aug 22, 2019 · 21 comments
Open

DynamicHMC fails on LBA #45

itsdfish opened this issue Aug 22, 2019 · 21 comments

Comments

@itsdfish
Copy link
Collaborator

After correcting the prior distributions for the LBA, we are now encountering the non-finite density problem again:

  Got exception outside of a @test
  DomainError with [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]:
  Starting point has non-finite density.

Changing the following:

    chain, NUTS_tuned = NUTS_init_tune_mcmc(∇P, nsamples;
          q = zeros(n), p = ones(n), report=ReportSilent())

back to:

chain, NUTS_tuned = NUTS_init_tune_mcmc(∇P, nsamples; report=ReportSilent());

fixes the problem, but causes DynamicHMC to slow to a crawl.

@goedman
Copy link
Member

goedman commented Sep 18, 2019

Hi Chris,

Looking at the results produced by LBA, I noticed that v[1..3] == [A, k, tau]:

Summary Statistics
. Omitted printing of 1 columns
│ Row │ parameters │ mean     │ std       │ naive_se    │ mcse       │ ess     │
│     │ Symbol     │ Float64  │ Float64   │ Float64     │ Float64    │ Any     │
├─────┼────────────┼──────────┼───────────┼─────────────┼────────────┼─────────┤
│ 1   │ A          │ 0.288653 │ 0.157473  │ 0.00352121  │ 0.00305586 │ 1432.53 │
│ 2   │ k          │ 0.172597 │ 0.0851352 │ 0.00190368  │ 0.00222767 │ 1243.31 │
│ 3   │ tau        │ 0.448405 │ 0.0446614 │ 0.000998659 │ 0.00125684 │ 1052.54 │
│ 4   │ v[1]       │ 0.288653 │ 0.157473  │ 0.00352121  │ 0.00305586 │ 1432.53 │
│ 5   │ v[2]       │ 0.172597 │ 0.0851352 │ 0.00190368  │ 0.00222767 │ 1243.31 │
│ 6   │ v[3]       │ 0.448405 │ 0.0446614 │ 0.000998659 │ 0.00125684 │ 1052.54 │

This is already evident from the posterior samples;


julia> posterior
2000-element Array{NamedTuple{(:v, :A, :k, :tau),Tuple{Array{Float64,1},Float64,Float64,Float64}},1}:
 (v = [0.27919455171214147, 0.09033887035475256, 0.4490141935947707], A = 0.27919455171214147, k = 0.09033887035475256, tau = 0.4490141935947707)
 (v = [0.26779694844160534, 0.0878122639566501, 0.5055666913698694], A = 0.26779694844160534, k = 0.0878122639566501, tau = 0.5055666913698694)
 (v = [0.2638198899521974, 0.08628458345924102, 0.483451393205955], A = 0.263819889952197
...

Like Tamas, I don't really understand this model, trying to figure it out using LBA. Is that a good description of the model?

@itsdfish
Copy link
Collaborator Author

Good catch. It looks like something in my conversion to a Chain object broke. I'll look into that shortly.

The most comprehensive source on the model can be found here

Equations 1,2 and 3 comprise the model. Basically, the the model boils down to the distribution of the minimum of several random variables. Each observation consists of a variable index and a minimum processing time. For example, equation 3, represents the density for the ith random variable having the minimum value t (first term) and all j != i being larger than t (second term).

Figure 3 has a fairy intinutive illustration. The substantive interpretation is that each random variable is the finishing time of evidence accumulators for decision options. So for example if a person is choosing which of two objects is brighter on a computer screen, the objectively brighter option will have a higher drift rate (evidence accumulation rate) and will tend to hit an evidence threshold first. If you tell people to favor accuracy over speed, the evidence threshold parameter usually increases. So that is the gist of the model.

I'm not sure if that's helpful. I can do my best to answer specific questions.

@goedman
Copy link
Member

goedman commented Sep 18, 2019

Thanks, I'll read that description of the model.

I don't think it is your (or mine) conversion to a Chain object as the posterior shows that in each iteration (sample) the v's are identical to the other 3 parameters.

@itsdfish
Copy link
Collaborator Author

Yup. You are right. Sorry. I misunderstood.

This seems to be working correctly: p((v=fill(.5,Nc),A=.8,k=.2,tau=.4))

If you replace the the problem function with this

function (problem::LBAProb)(θ)
     @unpack data=problem
     @unpack v,A,k,tau=θ
     if typeof(v) != Array{Float64,1}
         x = map(x->x.value,v)
         println(x,[A.value k.value tau.value])
     end
     d=LBA(ν=v,A=A,k=k,τ=tau)
     minRT = minimum(x->x[2],data)
     logpdf(d,data)+sum(logpdf.(TruncatedNormal(0,3,0,Inf),v)) +
     logpdf(TruncatedNormal(.8,.4,0,Inf),A)+logpdf(TruncatedNormal(.2,.3,0,Inf),k)+
     logpdf(TruncatedNormal(.4,.1,0,minRT),tau)
 end

This suggests that the values are wrong during sampling. I wonder whether there is a bug in the transformation step:

trans = as((v=as(Array,asℝ₊,Nc),A=asℝ₊,k=asℝ₊,tau=asℝ₊))
P = TransformedLogDensity(trans, p)

I think I have specified it correctly.

@goedman
Copy link
Member

goedman commented Oct 1, 2019

Hi Chris,

After the latest update to TransformVariables.jl I get below results for a simplified LBA model (2 choices, as in figure 3 of the reference):

DynamicHMC.Diagnostics.EBFMI(results.tree_statistics) = 0.828130026872887

DynamicHMC.Diagnostics.summarize_tree_statistics(results.tree_statistics) = Hamiltonian Monte Carlo sample of length 1000
  acceptance rate mean: 0.93, 5/25/50/75/95%: 0.7 0.92 0.97 0.99 1.0
  termination: divergence => 0%, max_depth => 0%, turning => 100%
  depth: 0 => 0%, 1 => 0%, 2 => 4%, 3 => 18%, 4 => 41%, 5 => 30%, 6 => 7%

2-element Array{ChainDataFrame,1}

Summary Statistics
. Omitted printing of 1 columns
│ Row │ parameters │ mean     │ std       │ naive_se   │ mcse       │ ess     │
│     │ Symbol     │ Float64  │ Float64   │ Float64    │ Float64    │ Any     │
├─────┼────────────┼──────────┼───────────┼────────────┼────────────┼─────────┤
│ 1   │ A          │ 0.972474 │ 0.220211  │ 0.00696369 │ 0.0193175  │ 253.429 │
│ 2   │ k          │ 0.227209 │ 0.117316  │ 0.00370985 │ 0.00781100 │ 231.505 │
│ 3   │ tau        │ 0.381555 │ 0.0346235 │ 0.00109489 │ 0.00235538 │ 225.933 │
│ 4   │ v[1]       │ 1.30945  │ 0.224873  │ 0.0071111  │ 0.0124083  │ 410.51  │
│ 5   │ v[2]       │ 1.67557  │ 0.241736  │ 0.00764437 │ 0.0172310  │ 369.044 │

v[1] and v[2] are now no longer identical A and k. The results look better as well.

As comparison, generating the same data, I use lba1.jl in the LBA example in StanSample.jl:

Stan:

Summary Statistics
. Omitted printing of 1 columns
│ Row │ parameters │ mean     │ std       │ naive_se    │ mcse       │ ess     │
│     │ Symbol     │ Float64  │ Float64   │ Float64     │ Float64    │ Any     │
├─────┼────────────┼──────────┼───────────┼─────────────┼────────────┼─────────┤
│ 1   │ A          │ 0.992518 │ 0.300774  │ 0.00475565  │ 0.0156576  │ 363.458 │
│ 2   │ k          │ 0.290134 │ 0.202774  │ 0.00320613  │ 0.0122507  │ 149.192 │
│ 3   │ tau        │ 0.365730 │ 0.0538114 │ 0.000850832 │ 0.00299002 │ 261.44  │
│ 4   │ v.1        │ 1.41443  │ 0.223349  │ 0.00353145  │ 0.00804524 │ 934.487 │
│ 5   │ v.2        │ 1.77846  │ 0.243054  │ 0.00384303  │ 0.0082201  │ 943.819 │

Next step is to figure out a comparison between the functions as in the Stan model and in LBA_functions.jl (as in the LBA benchmark in DynamicHMCModels.jl) and what exactly the error
tells us if we don't control the random seed:

ERROR: LoadError: ArgumentError: Value and slope at step length = 0 must be finite.
Stacktrace:
 [1] (::LineSearches.HagerZhang{Float64,Base.RefValue{Bool}})(::Function, ::LineSearches.var"#ϕdϕ#6"{Optim.ManifoldObjective{NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}},Array{Float64,1},Array{Float64,1},Array{Float64,1}}, ::Float64, ::Float64, ::Float64) at /Users/rob/.julia/packages/LineSearches/WrsMD/src/hagerzhang.jl:117

@itsdfish
Copy link
Collaborator Author

itsdfish commented Oct 1, 2019

Thanks, Rob. Do you think the problem is with LBA_functions.jl or with DynamicHMC? My guess is that the problem is with DynamicHMC because it works with Turing.

@goedman
Copy link
Member

goedman commented Oct 2, 2019

Hmm, I'm really not sure yet, but I was encouraged by the fact that the estimates are not identical, but reasonably close. Maybe above error is some aspect in the definition of the distribution that is not handles properly and is irrelevant (or not used) in Turing. My next step is to suppress the initial call to Optim to obtain the starting values to see if that gets us through above error.

@itsdfish
Copy link
Collaborator Author

itsdfish commented Oct 3, 2019

Good point. Another difference that might be worth considering is that DynamicHMC uses different parameter values for NUTS (and this may also affect the initial values). Turing is now identical to Stan, and is very numerically stable. It also looks like both packages use different variable transformations.

@goedman
Copy link
Member

goedman commented Oct 3, 2019

Next tiny step, but indeed if I suppress the Optim part as shown below

# Sample from the posterior.
results = mcmc_with_warmup(Random.GLOBAL_RNG, ∇P, 1000;
  warmup_stages = default_warmup_stages(local_optimization=nothing),
  #reporter = NoProgressReport()
  )

I get

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/benchmarks/LBA/lba_dhmc.jl")
ERROR: LoadError: DomainError with [1.05585, 0.566752, -0.242509, 1.19977, 0.109664]:
Starting point has non-finite density.
Stacktrace:
 [1] local_acceptance_ratio(::DynamicHMC.Hamiltonian{GaussianKineticEnergy{Diagonal{Flo...

Next step is to see what the density looks like for these values.

@itsdfish
Copy link
Collaborator Author

itsdfish commented Oct 3, 2019

What parameter has the value of -.24? That might be a problem because only the v parameters can be negative.

@goedman
Copy link
Member

goedman commented Oct 3, 2019

Not sure, I'm assuming A?

ulia> theta=(v=fill(.5,Nc),A=.8,k=.2,tau=.4)
(v = [0.5, 0.5], A = 0.8, k = 0.2, tau = 0.4)

julia> p(theta)
-155.2066231692378

julia> theta=(v=fill(.5,Nc),A=-0.24,k=.2,tau=.4)
(v = [0.5, 0.5], A = -0.24, k = 0.2, tau = 0.4)

julia> p(theta)
-Inf

I've certainly seen other parameters taking on a negative's value:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/benchmarks/LBA/lba_dhmc.jl")
ERROR: LoadError: DomainError with [0.652014, -0.41796, 1.54240, -0.556393, -0.752985]:
Starting point has non-finite density.
Stacktrace:
 [1] local_acceptance_ratio(::DynamicHMC.Hamiltonian{GaussianKineticEnergy{Diagonal{Float64,Array{

...

::DynamicHMC.PhasePoint{DynamicHMC.EvaluatedLogDensity{Array{Float64,1},Float64},Array{Float64,1}}) at /Users/rob/.julia/packages/DynamicHMC/pVyrS/src/stepsize.jl:180
 [2] warmup at /Users/rob/.julia/packages/DynamicHMC/pVyrS/src/mcmc.jl:160 [inlined]

@itsdfish
Copy link
Collaborator Author

itsdfish commented Oct 3, 2019

Yeah. That is problematic. I suspect that part of DynamicHMC that enforces the allowable ranges of parameters is the source of the problem. For reference, here is the part of the model that specifies the ranges:

sum(logpdf.(TruncatedNormal(0,3,0,Inf), v)) +
logpdf(TruncatedNormal(.8,.4,0,Inf), A)+
logpdf(TruncatedNormal(.2,.3,0,Inf), k)+
logpdf(TruncatedNormal(.4,.1,0,minRT), tau)

@itsdfish
Copy link
Collaborator Author

I'm not sure that the problem is due to numerical instability of the likelihood function. However, I want to make reference to a more numerically stable version of the likelihood that Tamas started. I will find some time to finish this and add it, as it should be more stable with larger sample sizes.

File

using Distributions, Parameters, DynamicHMC, LogDensityProblems, TransformVariables
using Random, StatsFuns
import Distributions: pdf,logpdf,rand

export LBA,pdf,logpdf,rand

Base.@kwdef struct LBA{T1,T2,T3,T4} <: ContinuousUnivariateDistribution
    ν::T1
    A::T2
    k::T3
    τ::T4
    σ::Float64 = 1.0
end

Base.broadcastable(x::LBA) = Ref(x)

###
### simulation
###

function selectWinner(dt)
    if any(x -> x > 0,dt)
        mi, mv = 0, Inf
        for (i, t) in enumerate(dt)
            if (t > 0) && (t < mv)
                mi = i
                mv = t
            end
        end
    else
        return 1,-1.0
    end
    return mi,mv
end

function sampleDriftRates(ν,σ)
    noPositive=true
    v = similar(ν)
    while noPositive
        v = [rand(Normal(d,σ)) for d in ν]
        any(x->x>0,v) ? noPositive=false : nothing
    end
    return v
end

function rand(d::LBA)
    @unpack τ,A,k,ν,σ = d
    b=A+k
    N = length(ν)
    v = sampleDriftRates(ν,σ)
    a = rand(Uniform(0,A),N)
    dt = @. (b-a)/v
    choice,mn = selectWinner(dt)
    rt = τ .+ mn
    return choice,rt
end

function rand(d::LBA,N::Int)
    choice = fill(0,N)
    rt = fill(0.0,N)
    for i in 1:N
        choice[i], rt[i] = rand(d)
    end
    return (choice = choice, rt = rt)
end

function simulateLBA(;Nd,v=[1.0,1.5,2.0],A=.8,k=.2,tau=.4,kwargs...)
    return (rand(LBA(ν=v,A=A,k=k,τ=tau),Nd)...,N=Nd,Nc=length(v))
end

###
### log densities
###

function logpdf(d::LBA,data::T) where {T<:NamedTuple}
    return sum(logpdf.(d,data...))
end

logpdf(dist::LBA,data::Array{<:Tuple,1}) = sum(d -> logpdf(dist, d), data)

function logpdf(d::LBA,c,rt)
    @unpack τ,A,k,ν,σ = d
    b = A + k
    logden = 0.0
    rt < τ && return -Inf
    for (i,v) in enumerate(ν)
        if c == i
            logden += logdens(d,v,rt)
        else
            logden += log_tail_cumulative(d,v,rt)
        end
    end
    logden - log1mexp(logpnegative(d))
end

logpdf(d::LBA,data::Tuple) = logpdf(d,data...)

function logdens(d::LBA, v, rt)
    @unpack τ,A,k,ν,σ = d
    dt = rt-τ; b=A+k
    n1 = (b-A-dt*v)/(dt*σ)
    n2 = (b-dt*v)/(dt*σ)
    # FIXME rewrite this part nicer
    Δcdfs = cdf(Normal(0,1),n2) - cdf(Normal(0,1),n1)
    Δpdfs = pdf(Normal(0,1),n1) - pdf(Normal(0,1),n2)
    -log(A) + logaddexp(log(σ) + log(Δpdfs), log(v) * log(Δcdfs))
end

function log_tail_cumulative(d::LBA,v,rt)
    @unpack τ,A,k,ν,σ = d
    dt = rt-τ; b=A+k
    n1 = (b-A-dt*v)/(dt*σ)
    n2 = (b-dt*v)/(dt*σ)
    log(-((b-A-dt*v)/A)*cdf(Normal(0,1),n1) +
          ((b-dt*v)/A)*cdf(Normal(0,1),n2) - ((dt*σ)/A)*pdf(Normal(0,1),n1) +
          ((dt*σ)/A)*pdf(Normal(0,1),n2))
end

function logpnegative(d::LBA)
    @unpack ν,σ=d
    sum(v -> logcdf(Normal(0,1),-v/σ), ν)
end

struct LBAProb{T}
    data::T
    N::Int
    Nc::Int
end

function (problem::LBAProb)(θ)
    @unpack data=problem
    @unpack v,A,k,tau=θ
    d = LBA(ν=v,A=A,k=k,τ=tau)
    minRT = minimum(last, data)
    logprior = (sum(logpdf.(TruncatedNormal(0,3,0,Inf), v)) +
                logpdf(TruncatedNormal(.8,.4,0,Inf),A) +
                logpdf(TruncatedNormal(.2,.3,0,Inf),k) +
                logpdf(TruncatedNormal(.4,.1,0,minRT), tau))
    loglikelihood = logpdf(d, data)
end

function sampleDHMC(choice,rt,N,Nc,nsamples)
    data = [(c,r) for (c,r) in zip(choice,rt)]
    return sampleDHMC(data,N,Nc,nsamples)
end

N = 10
data = simulateLBA(Nd = N)
p = LBAProb(collect(zip(data.choice, data.rt)), N, data.Nc)
p((v=fill(.5,data.Nc),A=.8,k=.2,tau=.4))
trans = as((v=as(Array,asℝ₊,data.Nc),A=asℝ₊,k=asℝ₊,tau=asℝ₊))
P = TransformedLogDensity(trans, p)
∇P = ADgradient(:ForwardDiff, P)
results = mcmc_with_warmup(Random.GLOBAL_RNG, ∇P, 1000)
posterior = trans.(results.chain)

@itsdfish
Copy link
Collaborator Author

I found some translation errors in new code for the likelihood function (file here). As far as I can tell, it is not possible to convert all of the terms in the likelihood function into logarithms with existing capabilities. The main problem is that many of the terms contain negative values/subtraction, making logsumexp inapplicable. There might be a way to extend logsumexp to accommodate negative values.

Nonetheless, I was able to use logarithms at the level of functions, which did not solve the immediate problem. An undesired side effect of this translation was a 50% reduction in speed in the evaluation of the likelihood. I suspect this might be due in part to the fact that logarithms are more computationally intensive than elementary arithmetic operations.

I'll take a look at the values that produce the errors to see if I can get an idea about the cause.

@goedman
Copy link
Member

goedman commented Oct 16, 2019

Hi Chris, tried your updates, not with a lot of success though. Most fail in the Optim phase, not able to determine an ok starting point. If I disable that, and find a random seed that works, it shows huge numbers.

Not sure if I completely understand your point about logsumexp. Isn't preventing such conditions why transformation are applied?

I wonder if we can create a logpdf version in Stan and if that provides more insight what is happening.

I must say after my last debacle (the one model in SR that didn't provide similar answers in DynamicHMC and Stan) where I also found that in the end I had made a modeling mistake, I think the problem is very likely in the modeling.

@itsdfish
Copy link
Collaborator Author

Sorry Rob, you can disregard my previous post (I misread the parentheses and misdiagnosed the error in Tamas's code).

If you recall, suspected Tamas suspected that the errors were due to numerical under/overflow. He recommended rewriting the likelihood function in terms of logarithms because they are more robust to numerical problems. He made an attempt to rewrite the likelihood function. It turned out to be close, but it had an error. So I made my own version with fewer logs. As you pointed out, the problem still persisted.

However, after reviewing Tamas's code again, I realized I misread something and was able to identify the error:

-log(A) + logaddexp(log(σ) + log(Δpdfs), log(v) + log(Δcdfs))
was mistakenly written as

-log(A) + logaddexp(log(σ) + log(Δpdfs), log(v) * log(Δcdfs))

I incorporated this into my version of the logpdf from before. However, the problem persists with this version.

Sporadic numerical errors occur in Stan with the LBA. I suspect there are parts of the parameter space that have irregular properties and this might be unavoidable. It might be worth letting DynamicHMC run after an error. If it is like Stan, the errors will be infrequent and won't affect the estimates. I'll see if its still possible to let it run after errors.

@itsdfish
Copy link
Collaborator Author

Rob-

I encounter a domain error with this version of the LBA, which disables the initial optimization step.

DomainError with [0.257204, 0.237231, 0.800883, -0.965525, -0.892265, -0.175778]:
Starting point has non-finite density.

My running assumption is that this vector represents a set of sampled parameters, which should not be negative given the constraints: as((v=as(Array,asℝ₊,data.Nc),A=asℝ₊,k=asℝ₊,tau=asℝ₊))

Is my interpretation correct?

@goedman
Copy link
Member

goedman commented Nov 11, 2019

Hi Chris,

That's what I assumed. And I agree with your remark on the sporadic errors.

During and since my return from my trip I kind of prioritized the work on my plate given that I also need to spend more time on a FEM package for directional drilling for geothermal purposes. Unfortunately as a consequence the LBA problem (and Jags) ended up pretty far down on my list.

Clearly the LBA problem is important and related where I want to go with StatisticalRethinking v1 but I think we need Tamas' help to figure this one out now you have tried many things and feel pretty ok with the current formulation.

@itsdfish
Copy link
Collaborator Author

Thanks for your input. I will contact Tamas to see if this is indeed a bug and whether he has any recommendations for any remaining issues.

@goedman
Copy link
Member

goedman commented Nov 12, 2019

In the past I tried to get stresstest to work but clearly didn't do it right. Tamas' example shows it is a great tool! The transformed (backwards) look like decent values to me although tau is above minrt:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/benchmarks/LBA/LBA4.jl")
3-element Array{Array{Float64,1},1}:
 [1.5115300042613513, 1.2159167956922652, 0.051268200203508106, -3.0029066946794587, 0.3423613237349495, -1.622809901573141]  
 [-1.4835829897363175, -1.5471526581329185, 0.11959692225562338, -1.3169517019160528, -0.9620074368405765, -1.324249629506459]
 [0.8018415326240196, -3.303629754386329, 2.2950617222581347, -0.42671349986127, -1.3247155956205998, -0.38005314098622633]   
3-element Array{NamedTuple{(:v, :A, :k, :tau),Tuple{Array{Float64,1},Float64,Float64,Float64}},1}:
 (v = [4.533662012764311, 3.3733853522246378, 1.0526051643482253], A = 0.04964256267983302, k = 1.4082690466287442, tau = 0.19734340374572215) 
 (v = [0.22682352432447578, 0.21285317757091918, 1.1270424742724912], A = 0.2679508523838598, k = 0.3821250236652167, tau = 0.2660024846362025)
 (v = [2.229643110710278, 0.036749533238897555, 9.925048589791379], A = 0.6526505099065516, k = 0.2658785653654175, tau = 0.6838250691082113)  

julia> minRT = minimum(data.rt)
0.521593812184431

(using your Random.seed value)

@goedman
Copy link
Member

goedman commented Nov 12, 2019

With your suggested restriction on tau stresstest ends up with 9 problematic samples and indeed points to line 79 in the test script (which makes sense). It seems to me we can't filter out the parameter samples rows which, given the data, won't be acceptable here. I guess this can only be done deeper inside the sampling process. Is this indeed how Stan does it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants