-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pyhf #318
Comments
That why we have the "BAT.jl now offer a different set of functionality [...] C++ predecessor." in the docs. :-) That being said, we do actually want to bring the RootFit interface back at some point, in principle. It would be an add-on package, due to it's additional dependencies. It's not trivial at the moment though, since Cxx.jl doesn't support current Julia versions (hopefully that will change, there's people working on it). BAT.jl does have experimental support for external likelihoods that run in a separate process and communicate with BAT.jl via pipes - that works without C++ calls from Julia. One of the main reasons why we haven't prioritized this a lot is that we haven't had any active use cases with RootFit. If you do plan do use BAT.jl with RootFit though, we'll be happy help! |
Ok, thanks! @oschulz Can I ask an unrelated question (sorry to abuse the GH issue). Can one read/write a BAT model from a file? I don't mean the posterior samples, but the model likelihood and prior itself? |
Ok, talking with Philip it sounds like maybe you can write the model to a file with built in Julia serialization. Looking at https://github.com/JuliaIO/JLD.jl I see
|
Indeed, JLD2 can serialize most Julia structures natively, but is not a long-term data preservation format. BAT itself is intended to work with user-provided likelihoods that may contain arbitrary code and is not limited to a DSL, so it doesn't provide an inherent long-term stable serialization format. But if there's interest, we could of course support loading specific formats for serialized likelihoods, for example via add-on packages to BAT. |
I actually discussed just that with @lukasheinrich today. :-) Having pyhf import (and possibly re-export from a pyhf DSL in Julia) capability would definitely be great. It would also require some manpower though (e.g. coupled to a thesis or so) since pyhf is not a tiny spec. It wouldn't necessarily have to be BAT-specific/dependent, one could think about a standalone PyHF Julia package that implements (e.g.) the lightweight DensityInterface API (BAT will support DensityInterface very soon). |
CC @mmikhasenko |
this is great.. one of the easiest pyhf models is the following (with s, b, d being hyper-parameters) joint model over two poissons. What would this look like inn BAT?
|
also as a cross-ref.. we used ot have a RooFit/python bridge here: https://github.com/cranmer/roofit-python-wrapper but certainly it would be improved. |
In Julia/BAT, it would look like this (for example): using Distributions, UnPack, BAT, ValueShapes
likelihood = let n = 7, a = 4, s = 1, b = 2, d = 3
function (v)
@unpack μ, γ = v
ll = logpdf(Poisson(μ*s + γ*b), n) + logpdf(Poisson(γ*d), a)
LogDVal(ll)
end
end
v = (μ = 3, γ = 4)
likelihood(v) One could also write an explicit forward model function that returns a distribution over NamedTuples forward_model = let s = 1, b = 2, d = 3
function (v)
@unpack μ, γ = v
NamedTupleDist(
n = Poisson(μ*s + γ*b),
a = Poisson(γ*d)
)
end
end
data = (n = 7, a = 4)
likelihood = let forward_model = forward_model, data = data
v -> LogDVal(logpdf(forward_model(v), data))
end
likelihood(v) (Note: The Of course one would still need to define a prior for |
ok step 0 works :)
is therre a easy way to get gradients of |
Disclaimer: I'm not familiar with BAT.jl internals, so please do correct me. For gradients, an easy way seems to be
and then to use the reverse-mode AD package Zygote.jl
To differentiate w.r.t. parameters
or all together in one sweep
Package versions
|
awesome.. works nicely.. for reference our jax stuff in pyhf
giving
|
Thanks for the nice write-up, @niklasschmitz! In addition to Zygote, one can of course also use ForwardDiff (faster with very few parameters due to lower overhead), Enzyme (though it crashes here, doesn't like something in the Poisson-logpdf, Enzyme ist still quite experimental), etc. ForwardDiff can only handle a single array as input, but BAT handles that internally via ValueShapes shaped-to-flat-array transformation based on the prior. Note: I'm in the process of adding an alternative to the |
using PyCall
pyhf = pyimport("pyhf")
model = pyhf.simplemodels.uncorrelated_background(
signal=PyVector([12.0, 11.0]), bkg=PyVector([50.0, 52.0]), bkg_uncertainty=PyVector([3.0, 7.0])
)
data = [[51, 48]; model.config.auxdata]
test_mu = 1.0
CLs_obs, CLs_exp = pyhf.infer.hypotest(
test_mu, data, model, test_stat="qtilde", return_expected=true
)
print("Observed: $CLs_obs, Expected: $CLs_exp")
# Observed: fill(0.052514974238085446), Expected: fill(0.06445320535890237)
btw, I think it's useful to have some set of "common example" to show pyhf <-> BAT albeit not everything can be translated as easy. |
Oh, sure! Might still be very useful to have a Julia implementation longer-term, though, for deeper integration, to use Julia autodiff, and so on. |
But you're right, we can use |
the current model is that
so the priors that are needed are we also provide APIs to only have |
Does pyhf export an API for it's autodiff-pullback? |
you can get the pull back via jax for example as, but for now this is backend specific,
if you want the lhood gradient, we have more backend-indpendent apis |
Neat! We can wrap that in a |
@niklasschmitz has some thoughts on that |
It'll basically just be something like this: using DensityInterface, ChainRulesCore, PyCall
const jax = pyimport("jax")
struct PyHfLikelihood{LT}
log_likelihood::LT
end
DensityInterface.logdensityof(d::PyHfLikelihood, x::AbstractVector{<:Real}) = d.log_likelihood(x)
function ChainRulesCore.rrule(d::PyHfLikelihood, x::AbstractVector{<:Real})
y, pypullback = jax.vjp(lhood,jax.numpy.array(x))
function pyhf_pullback(dy)
dx = @thunk pyhf_pullback(unthunk(dy))
return NoTangent(), dx
end
return y, pyhf_pullback
end Something along these lines should already be a BAT and Zygote-compatible likelihood density. |
re-useing using PyCall, BAT
pyhf = pyimport("pyhf")
# spec is a JSON/Dict per pyhf standard
pdf_pyhf = pyhf.Model(spec)
data_pyhf = [v_data; pdf_pyhf.config.auxdata]
function likelihood_pyhf(v)
(;μ, θ) = v
LogDVal(only(pdf_pyhf.logpdf([μ, θ], data_pyhf)))
end then for BAT prior = BAT.NamedTupleDist(
μ = Uniform(0, 4),
θ = Normal()
)
posterior = PosteriorDensity(likelihood_pyhf, prior)
best_fit = bat_findmode(posterior).result just works. Of course this case it's probably |
nice! one thing that might be nice to try is to use @philippeller's also note: the pyhf pdf comes with data-driven constraints that perhaps in a native Bayesian approach would be given as priors. I.e. the priors BAT should receive are "ur-priors" with this more detailed PDF are likely more uninformative, i.e. θ might rather be a wide normal (with sigma > 1) as the |
Still need to figure out how to define the correct rrule, right now it would crash if you try to sample the posterior |
And we should make life easier on the user by looking into the pyhf model to get the order of variables, so things won't go badly if they are in a different order in the prior. |
Hello,
I believe the previous version of BAT had an interface for working with RooFit likelihoods (that can be stored in a RooFit workspace in a ROOT file). I am curious if there are plans to provide such a module, perhaps in a separate repository. I know there is a generic API for likelihoods.
If I am correct that this was supported in the previous version, it might be worth mentioning the current status of this type of interface in BAT.jl.
Thanks!
The text was updated successfully, but these errors were encountered: