Support pyhf #318

cranmer · 2021-09-08T18:45:39Z

Hello,
I believe the previous version of BAT had an interface for working with RooFit likelihoods (that can be stored in a RooFit workspace in a ROOT file). I am curious if there are plans to provide such a module, perhaps in a separate repository. I know there is a generic API for likelihoods.

If I am correct that this was supported in the previous version, it might be worth mentioning the current status of this type of interface in BAT.jl.

Thanks!

oschulz · 2021-09-08T19:14:07Z

If I am correct that this was supported in the previous version, it might be worth mentioning the current status of this type of interface in BAT.jl.

That why we have the "BAT.jl now offer a different set of functionality [...] C++ predecessor." in the docs. :-)

That being said, we do actually want to bring the RootFit interface back at some point, in principle. It would be an add-on package, due to it's additional dependencies. It's not trivial at the moment though, since Cxx.jl doesn't support current Julia versions (hopefully that will change, there's people working on it). BAT.jl does have experimental support for external likelihoods that run in a separate process and communicate with BAT.jl via pipes - that works without C++ calls from Julia.

One of the main reasons why we haven't prioritized this a lot is that we haven't had any active use cases with RootFit. If you do plan do use BAT.jl with RootFit though, we'll be happy help!

cranmer · 2021-09-09T12:12:10Z

Ok, thanks!
[It would be nice to write an example interface for pyhf as well: https://pyhf.readthedocs.io/en/v0.6.3/ ]

@oschulz Can I ask an unrelated question (sorry to abuse the GH issue). Can one read/write a BAT model from a file? I don't mean the posterior samples, but the model likelihood and prior itself?
From what I see in the documentation, I don't think so.

cranmer · 2021-09-09T15:16:02Z

Ok, talking with Philip it sounds like maybe you can write the model to a file with built in Julia serialization.

Looking at https://github.com/JuliaIO/JLD.jl I see

For lossless storage of arbitrary Julia objects, the only other complete solution appears to be Julia's serializer, which can be accessed via the serialize and deserialize commands. However, because the serializer is also used for inter-process communication, long-term backwards compatibility is currently uncertain.

oschulz · 2021-09-10T20:39:00Z

Indeed, JLD2 can serialize most Julia structures natively, but is not a long-term data preservation format. BAT itself is intended to work with user-provided likelihoods that may contain arbitrary code and is not limited to a DSL, so it doesn't provide an inherent long-term stable serialization format. But if there's interest, we could of course support loading specific formats for serialized likelihoods, for example via add-on packages to BAT.

oschulz · 2021-09-10T21:37:43Z

It would be nice to write an example interface for pyhf as well

I actually discussed just that with @lukasheinrich today. :-) Having pyhf import (and possibly re-export from a pyhf DSL in Julia) capability would definitely be great. It would also require some manpower though (e.g. coupled to a thesis or so) since pyhf is not a tiny spec.

It wouldn't necessarily have to be BAT-specific/dependent, one could think about a standalone PyHF Julia package that implements (e.g.) the lightweight DensityInterface API (BAT will support DensityInterface very soon).

oschulz · 2021-09-10T21:47:36Z

CC @mmikhasenko

lukasheinrich · 2021-09-13T12:41:27Z

this is great..

one of the easiest pyhf models is the following (with s, b, d being hyper-parameters) joint model over two poissons. What would this look like inn BAT?

p(n, a | mu,gamma) = Pois( n | mu*s + gamma*b) Pois( a | gamma*d )

lukasheinrich · 2021-09-13T12:50:19Z

also as a cross-ref.. we used ot have a RooFit/python bridge here: https://github.com/cranmer/roofit-python-wrapper but certainly it would be improved.

oschulz · 2021-09-13T21:10:03Z

What would this look like inn BAT?
p(n, a | mu,gamma) = Pois( n | mu*s + gamma*b) Pois( a | gamma*d )

In Julia/BAT, it would look like this (for example):

using Distributions, UnPack, BAT, ValueShapes

likelihood = let n = 7, a = 4, s = 1, b = 2, d = 3
    function (v)
        @unpack μ, γ = v
        ll = logpdf(Poisson(μ*s + γ*b), n) + logpdf(Poisson(γ*d), a)
        LogDVal(ll)
    end
end

v = (μ = 3, γ = 4)

likelihood(v)

One could also write an explicit forward model function that returns a distribution over NamedTuples (n = ..., a = ...) and simply use the logpdf of those distributions in the likelihood:

forward_model = let s = 1, b = 2, d = 3
    function (v)
        @unpack μ, γ = v
        NamedTupleDist(
            n = Poisson(μ*s + γ*b),
            a = Poisson(γ*d)
        )
    end
end

data = (n = 7, a = 4)

likelihood = let forward_model = forward_model, data = data
    v -> LogDVal(logpdf(forward_model(v), data))
end

likelihood(v)

(Note: The let-statements are just there to ensure the current values are captured in a closure, since access to non-const global variables is slow in Julia because the compiler can't infer their type at code-generation time).

Of course one would still need to define a prior for (μ, γ) (BAT is a Bayesian toolkit, after all ;-) ).

lukasheinrich · 2021-09-14T11:25:40Z

ok step 0 works :)

Python 3.7.2 (default, Jul  3 2020, 03:38:00) 
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhf
>>> m = pyhf.simplemodels.uncorrelated_background([5],[50],[5])
>>> m.logpdf([1.0,1.0],[50.,100])
array([-6.33346465])

julia> using Distributions, UnPack, BAT, ValueShapes

julia> likelihood = let n = 50, a = 100, s = 5, b = 50, d = 100
           function (v)
               @unpack μ, γ = v
               ll = logpdf(Poisson(μ*s + γ*b), n) + logpdf(Poisson(γ*d), a)
               LogDVal(ll)
           end
       end
#1 (generic function with 1 method)

julia> v = (μ = 1, γ = 1)
(μ = 1, γ = 1)

julia> likelihood(v)
LogDVal{Float64}(-6.33346464690384)

is therre a easy way to get gradients of likelihood @oschulz (both wrt do μ, γ or n,a,s,b,d)

niklasschmitz · 2021-09-14T15:29:30Z

Disclaimer: I'm not familiar with BAT.jl internals, so please do correct me.

For gradients, an easy way seems to be

julia> using Distributions, UnPack, BAT, ValueShapes

julia> likelihood = let n = 50, a = 100, s = 5, b = 50, d = 100
           function (v)
               @unpack μ, γ = v
               ll = logpdf(Poisson(μ*s + γ*b), n) + logpdf(Poisson(γ*d), a)
               LogDVal(ll)
           end
       end
#1 (generic function with 1 method)

julia> v = (μ = 1, γ = 1)
(μ = 1, γ = 1)

julia> likelihood(v)
LogDVal{Float64}(-6.33346464690384)

and then to use the reverse-mode AD package Zygote.jl

julia> using Zygote

julia> gradient(v -> likelihood(v).logval, v)
((μ = -0.4545454545454547, γ = -4.545454545454547),)

To differentiate w.r.t. parameters n,a,s,b,d one could similarly introduce another lambda function. Alternatively, another fun Zygote feature is differentiation w.r.t. a function object directly: This returns NamedTuple gradients w.r.t. closed-over variables:

julia> gradient(likelihood -> likelihood(v).logval, likelihood)
((n = nothing, a = nothing, s = -0.09090909090909094, b = -0.09090909090909094, d = 0.0),)

or all together in one sweep

julia> gradient((l, v) -> l(v).logval, likelihood, v)
((n = nothing, a = nothing, s = -0.09090909090909094, b = -0.09090909090909094, d = 0.0), (μ = -0.4545454545454547, γ = -4.545454545454547))

Package versions

(bat) pkg> st
      Status `~/Documents/cloud/sandbox/julia/bat/Project.toml`
  [c0cd4b16] BAT v2.0.5
  [31c24e10] Distributions v0.23.8
  [ced4e74d] DistributionsAD v0.6.2
  [3a884ed6] UnPack v1.0.2
  [136a8f8c] ValueShapes v0.8.3
  [e88e6eb3] Zygote v0.6.17

lukasheinrich · 2021-09-14T15:40:00Z

awesome.. works nicely.. for reference our jax stuff in pyhf

import pyhf
import jax
pyhf.set_backend('jax')
m = pyhf.simplemodels.uncorrelated_background([5],[50],[5])
jax.jacrev(m.logpdf)(jax.numpy.array([1.0,1.0]),jax.numpy.array([50.,100]))

giving

DeviceArray([[-0.45454545, -4.54545455]], dtype=float64)

oschulz · 2021-09-14T20:09:47Z

Thanks for the nice write-up, @niklasschmitz!

In addition to Zygote, one can of course also use ForwardDiff (faster with very few parameters due to lower overhead), Enzyme (though it crashes here, doesn't like something in the Poisson-logpdf, Enzyme ist still quite experimental), etc. ForwardDiff can only handle a single array as input, but BAT handles that internally via ValueShapes shaped-to-flat-array transformation based on the prior.

Note: I'm in the process of adding an alternative to the LogDVal "this is a log"-tagging mechanism, should be on the master branch soon.

Moelf · 2021-10-22T11:05:06Z

pyhf is free via PyCall, for example, this hello world from the main doc page:

using PyCall
pyhf = pyimport("pyhf")

model = pyhf.simplemodels.uncorrelated_background(
    signal=PyVector([12.0, 11.0]), bkg=PyVector([50.0, 52.0]), bkg_uncertainty=PyVector([3.0, 7.0])
)

data = [[51, 48]; model.config.auxdata]
test_mu = 1.0

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    test_mu, data, model, test_stat="qtilde", return_expected=true
)

print("Observed: $CLs_obs, Expected: $CLs_exp")

# Observed: fill(0.052514974238085446), Expected: fill(0.06445320535890237)

PyVector is not needed after:

feat: Add support for arrayful JSON scikit-hep/pyhf#1647

btw, I think it's useful to have some set of "common example" to show pyhf <-> BAT albeit not everything can be translated as easy.

oschulz · 2021-10-22T11:30:25Z

pyhf is free via PyCall, for example

Oh, sure! Might still be very useful to have a Julia implementation longer-term, though, for deeper integration, to use Julia autodiff, and so on.

oschulz · 2021-10-22T11:31:34Z

But you're right, we can use pyhf right now via PyCall, would be good to put and example for pyhf + BAT.jl together. Have to think how to handle priors, though.

lukasheinrich · 2021-10-22T12:07:25Z

the current model is that pyhf comes with "auxiliary measurements" in the box so

p(data | μ,ν) p(aux| ν), where the p(aux| ν) is a surrogate for the measurements performed by e.g. detector groups

so the priors that are needed are p(µ) and p(ν) .. the former would be your prior belief on BSM parameters while the latter would be a "ur-prior" on nuisance parameters (though that'd likely be non-informative, most information about the detector performance comes from those aux. measurement)

we also provide APIs to only have p(data | μ,ν) if you'd like to provide the prior p(ν) = p(ν|aux) yourself

oschulz · 2021-10-22T12:27:43Z

Does pyhf export an API for it's autodiff-pullback?

lukasheinrich · 2021-10-22T13:11:14Z

you can get the pull back via jax for example as, but for now this is backend specific,

import pyhf
import jax
pyhf.set_backend('jax')
m = pyhf.simplemodels.uncorrelated_background([5],[50],[7])
lhood = lambda p: m.logpdf(p,[50]+m.config.auxdata)
v, pullback = jax.vjp(lhood,jax.numpy.array([1.,1.]))
pullback(jax.numpy.array([1.]))

if you want the lhood gradient, we have more backend-indpendent apis

oschulz · 2021-10-22T13:51:56Z

Neat! We can wrap that in a ChainRulesCore.rrule.

lukasheinrich · 2021-10-22T14:12:44Z

@niklasschmitz has some thoughts on that

oschulz · 2021-10-22T19:34:52Z

It'll basically just be something like this:

using DensityInterface, ChainRulesCore, PyCall
const jax = pyimport("jax")

struct PyHfLikelihood{LT}
    log_likelihood::LT
end

DensityInterface.logdensityof(d::PyHfLikelihood, x::AbstractVector{<:Real}) = d.log_likelihood(x)

function ChainRulesCore.rrule(d::PyHfLikelihood, x::AbstractVector{<:Real})
    y, pypullback = jax.vjp(lhood,jax.numpy.array(x))
    function pyhf_pullback(dy)
        dx = @thunk pyhf_pullback(unthunk(dy))
        return NoTangent(), dx
    end
    return y, pyhf_pullback
end

Something along these lines should already be a BAT and Zygote-compatible likelihood density.

Moelf · 2022-04-13T20:06:44Z

re-useing pyhf's likelihood seems to be free:

using PyCall, BAT
pyhf = pyimport("pyhf")
# spec is a JSON/Dict per pyhf standard
pdf_pyhf = pyhf.Model(spec)
data_pyhf = [v_data; pdf_pyhf.config.auxdata]

function likelihood_pyhf(v)
	(;μ, θ) = v
	LogDVal(only(pdf_pyhf.logpdf([μ, θ], data_pyhf)))
end

then for BAT

prior = BAT.NamedTupleDist(
  μ = Uniform(0, 4),
  θ = Normal()
)

posterior = PosteriorDensity(likelihood_pyhf, prior)
best_fit = bat_findmode(posterior).result

just works. Of course this case it's probably Optim.jl handling the fit without gradient.

lukasheinrich · 2022-04-13T22:36:03Z

nice! one thing that might be nice to try is to use @philippeller's batty from python https://github.com/philippeller/batty

also note: the pyhf pdf comes with data-driven constraints that perhaps in a native Bayesian approach would be given as priors. I.e. the priors BAT should receive are "ur-priors" with this more detailed PDF are likely more uninformative, i.e. θ might rather be a wide normal (with sigma > 1) as the auxdata already provides some data to constrain

oschulz · 2022-04-14T06:54:46Z

@Moelf provided a nice initial example in #345.

Moelf · 2022-04-14T06:56:52Z

Still need to figure out how to define the correct rrule, right now it would crash if you try to sample the posterior

oschulz · 2022-04-14T06:58:46Z

And we should make life easier on the user by looking into the pyhf model to get the order of variables, so things won't go badly if they are in a different order in the prior.

oschulz changed the title ~~Documentation: interfacing to RooFit workspace / likelihood~~ Support pyhf Sep 10, 2021

Moelf mentioned this issue Apr 13, 2022

MLE best-fit parameter disagreeing with pyhf #345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pyhf #318

Support pyhf #318

cranmer commented Sep 8, 2021

oschulz commented Sep 8, 2021

cranmer commented Sep 9, 2021

cranmer commented Sep 9, 2021

oschulz commented Sep 10, 2021

oschulz commented Sep 10, 2021

oschulz commented Sep 10, 2021

lukasheinrich commented Sep 13, 2021 •

edited

Loading

lukasheinrich commented Sep 13, 2021

oschulz commented Sep 13, 2021 •

edited

Loading

lukasheinrich commented Sep 14, 2021 •

edited

Loading

niklasschmitz commented Sep 14, 2021 •

edited

Loading

lukasheinrich commented Sep 14, 2021 •

edited

Loading

oschulz commented Sep 14, 2021

Moelf commented Oct 22, 2021 •

edited

Loading

oschulz commented Oct 22, 2021

oschulz commented Oct 22, 2021

lukasheinrich commented Oct 22, 2021 •

edited

Loading

oschulz commented Oct 22, 2021

lukasheinrich commented Oct 22, 2021 •

edited

Loading

oschulz commented Oct 22, 2021

lukasheinrich commented Oct 22, 2021

oschulz commented Oct 22, 2021

Moelf commented Apr 13, 2022 •

edited

Loading

lukasheinrich commented Apr 13, 2022 •

edited

Loading

oschulz commented Apr 14, 2022

Moelf commented Apr 14, 2022

oschulz commented Apr 14, 2022

Support pyhf #318

Support pyhf #318

Comments

cranmer commented Sep 8, 2021

oschulz commented Sep 8, 2021

cranmer commented Sep 9, 2021

cranmer commented Sep 9, 2021

oschulz commented Sep 10, 2021

oschulz commented Sep 10, 2021

oschulz commented Sep 10, 2021

lukasheinrich commented Sep 13, 2021 • edited Loading

lukasheinrich commented Sep 13, 2021

oschulz commented Sep 13, 2021 • edited Loading

lukasheinrich commented Sep 14, 2021 • edited Loading

niklasschmitz commented Sep 14, 2021 • edited Loading

lukasheinrich commented Sep 14, 2021 • edited Loading

oschulz commented Sep 14, 2021

Moelf commented Oct 22, 2021 • edited Loading

oschulz commented Oct 22, 2021

oschulz commented Oct 22, 2021

lukasheinrich commented Oct 22, 2021 • edited Loading

oschulz commented Oct 22, 2021

lukasheinrich commented Oct 22, 2021 • edited Loading

oschulz commented Oct 22, 2021

lukasheinrich commented Oct 22, 2021

oschulz commented Oct 22, 2021

Moelf commented Apr 13, 2022 • edited Loading

lukasheinrich commented Apr 13, 2022 • edited Loading

oschulz commented Apr 14, 2022

Moelf commented Apr 14, 2022

oschulz commented Apr 14, 2022

lukasheinrich commented Sep 13, 2021 •

edited

Loading

oschulz commented Sep 13, 2021 •

edited

Loading

lukasheinrich commented Sep 14, 2021 •

edited

Loading

niklasschmitz commented Sep 14, 2021 •

edited

Loading

lukasheinrich commented Sep 14, 2021 •

edited

Loading

Moelf commented Oct 22, 2021 •

edited

Loading

lukasheinrich commented Oct 22, 2021 •

edited

Loading

lukasheinrich commented Oct 22, 2021 •

edited

Loading

Moelf commented Apr 13, 2022 •

edited

Loading

lukasheinrich commented Apr 13, 2022 •

edited

Loading