-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-defined distributions example #42
Comments
Hi @gragusa and @bdeonovic. A known, unknown, or unnormalized distribution can be created and added to Mamba as one of the Distributions package subtypes ( In general, new user-defined and third-party package types and functions can be added with the extension approach in this example to make them available for Mamba model specification. using Mamba
## Define a new unknown Distribution type.
## The definition must be placed within an unevaluated quote block; however, it
## may be easiest to first develop and test the definition apart from the block.
extensions = quote
## Type definition and constructor
type UnknownDist <: ContinuousUnivariateDistribution
mu::Float64
sigma::Float64
UnknownDist(mu, sigma) = new(float64(mu), float64(sigma))
end
## Method functions minimum, maximum, insupport, and logpdf are required
## The minimum and maximum support values
minimum(d::UnknownDist) = -Inf
maximum(d::UnknownDist) = Inf
## A logical indicating whether x is in the support
insupport(d::UnknownDist, x::Real) = true
## The normalized or unnormalized log-density value
function logpdf(d::UnknownDist, x::Real)
-log(d.sigma) - 0.5 * ((x - d.mu) / d.sigma)^2
end
## Make the type available outside of Mamba (optional)
export UnknownDist
end
## Add the new type to Mamba
eval(Mamba, extensions)
## If exported, test its constructor and functions here (optional)
d = UnknownDist(1.0, 2.0)
minimum(d)
maximum(d)
insupport(d, 1.0)
logpdf(d, 1.0)
## Implement a Mamba model using the unknown distribution
model = Model(
y = Stochastic(1,
@modelexpr(mu, s2,
begin
sigma = sqrt(s2)
Distribution[
UnknownDist(mu[i], sigma)
for i in 1:length(mu)
]
end
),
false
),
mu = Logical(1,
@modelexpr(xmat, beta,
xmat * beta
),
false
),
beta = Stochastic(1,
:(MvNormal(2, sqrt(1000)))
),
s2 = Stochastic(
:(InverseGamma(0.001, 0.001))
)
)
## Sampling Scheme
scheme = [NUTS([:beta]),
Slice([:s2], [3.0])]
## Sampling Scheme Assignment
setsamplers!(model, scheme)
## Data
line = (Symbol => Any)[
:x => [1, 2, 3, 4, 5],
:y => [1, 3, 3, 3, 5]
]
line[:xmat] = [ones(5) line[:x]]
## Initial Values
inits = Dict{Symbol,Any}[
[:y => line[:y],
:beta => rand(Normal(0, 1), 2),
:s2 => rand(Gamma(1, 1))]
for i in 1:3]
## MCMC Simulation
sim = mcmc(model, line, inits, 10000, burnin=250, thin=2, chains=3)
describe(sim) |
@brian-j-smith this is a nice trick to get around using the |
Nice. Thank you. |
This kind of functionality is something that should definitely be mentioned in the docs; they currently imply that one must implement everything required to create a custom distribution in the Distribution package, which is way more difficult than what is apparently required (e.g. just a logpdf function). I almost didn't try Mamba because defining methods to randomly draw from a joint distribution (which the Distribution package says is required of custom distributions) is why I'm trying to use Mamba in the first place. Anyway, I'm also trying to familiarise myself with Mamba and have a question that's come about from trying to adapt brian's code. I'm just to simulate draws from a normal, that's it. Drawing from the normal isn't the end goal of course (I know I can do this using the already defined distributions), I just want to know how to work with custom density functions. Since I'm just wanting draws from a distribution, I don't have data. How can I achieve this? This is my code. Unfortunately I just seem to get an error saying there is no method matching.
What is the correct syntax for omitting inputs? Have I made a silly mistake here that I'm just not seeing? |
The inits = [(Symbol => Any)[:z => rand()]] After I changed this for your code I still got an error that you need to define |
Those fixes seem to do the trick for the slice sampler (thanks!). Other samplers don't seem to be working though. E.g. I get the error below when trying to use AMWG or AMM etc (wouldn't expect NUTS to work since I'm not defining the gradient.)
|
Strange, when I ran the code you provided it ran just fine with the AMWG sampler. What is your Mamba and julia version? Try starting julia up fresh and copy paste your code to see if it works. If I am not mistaken, Brian's implementation of NUTS can calculate gradient numerically, so you do not have to provide it. I tried running your code with NUTS and it also worked. |
Hmm, it does seem to work upon restart. I guess there was something defined earlier in the session that was messing things up. Thanks a lot for the help, I appreciate it. |
That's what I figured, glad it worked out! |
Thanks for all the feedback. It's helpful right now as I'm writing a documentation section on user-defined distributions for the next release. |
In that case, some feedback! Having used a bunch of this sort of software on various platforms, there are a few user-defined distributions topics that I think will save people a lot of time (e.g. the things I always stumble against when trying to learn them) if they are included: how to define a custom univariate distribution, how to define a custom multivariate distribution, how to input data to these custom distributions, how to input other parameters to these custom distributions (including when they are and are not part of the function signature that defines the logpdf etc). Thanks for all the great work. |
The latest documentation for Mamba now has sections (see links below) for creating new distributions. |
@brian-j-smith The new documentation is very helpful. Its worth noting that |
The issue of creating user-defined distributions is covered in the package documentation now. |
Hi all, I've worked thru this example and I have a very related open question that i can't seem to implement in mamba. (This problem is what motivated me to look at Mamba and Julia) Like the original poster, The data custom likelihood I want to use has a non-closed form 'kernel' I've tried to adapt this CLL() and fold it into mamba following the examples here (thru the Distributions package). However, I can't seem to figure out the proper way to do it. My mental confusion is, mamba says, the custom distribution requires the logpdf() to be specified. If i interpret the Distributions and the examples here correctly, this treats the parameter theta fixed, and evaluates each observation by cycling thru the dataset. Brian's example When contrasted with my CLL() implementation, my "loglikelihood" treats the dataset as fixed and cycles thru proposed thetas. This idea pops up again when specifying the logpdf(d::myNormal, x::Real). The data looks like it's treated as a one element scalar. Brian's example The normalized or unnormalized log-density valuefunction logpdf(d::myNormal, x::Real) When contrasted with my CLL() implementation, my CLL makes full use of an array of scalars. This is specific to the intermediate optimization routine. To sum up, is there a suggested way I can adapt my CLL(), or somehow convert a Distributions::loglikelihood() to Distributions::logpdf() that's useable by mamba? Here's my pseudo code that I'm pretty sure is wrong, for the second reason above (coded as scalar but really needs an array) function logpdf(d::NewUnivarDist, y::Real) toy_function(d.theta,y) toy example function that combines scalar parameter and vector of scalar data end When i 'evaluate my custom distribution' I get an unexpected value (that doesn't agree my with vanilla Julia implementation) eval(Testing, extensions) PS, on an unrelated note, I think I found a bug. This scheme works fine, the posterior values concentrate around the empirical mean These two particular schemes create weird behavior posterior == init problem in s2scheme_error2 = [NUTS([:mu]),NUTS([:s2])] posterior == init problem in mu and s2scheme_error1 = [NUTS([:mu, :s2])] |
Having to run optimization code for every call to logpdf is going to make the whole procedure quite slow. I'd be interested to hear how well something like this does! Posting your code would be very helpful to try and debug and issues. To clarify a few things before: Mamba can handle univariate (such as Brian's custom univariate normal example above) as well as multivarate custom distributions. In Mamba you can specify Stochastic nodes to be arrays of univariate distributions (this sounds like your case) for example look at the pumps example: http://mambajl.readthedocs.org/en/release-0.6/examples/pumps.html y = Stochastic(1,
@modelexpr(theta, t, N,
UnivariateDistribution[
begin
lambda = theta[i] * t[i]
Poisson(lambda)
end
for i in 1:N
]
),
false
) In this example the stochastic node For your pseudo-code you say you don't think it is correct because you think it should take in an array. So is your distribution a multivariate distribution? If so take a look at implementations of Multivariate distributions like the Multinomial: https://github.com/JuliaStats/Distributions.jl/blob/master/src/multivariate/multinomial.jl But if your data is such that |
I am trying to use
Mamba.jl
for a model whose likelihood cannot be expressed in terms of known distribution.For instance:
Think of this a a layer in a hierarchical structure --- but h is not a known kernel of a distribution.
How do I fit this into a model? In all the examples h is a proper distribution.
The text was updated successfully, but these errors were encountered: