pynfold
.fold
An interface for pythonic solutions to the inverse problem also known as unfolding.
pynfold.fold
(type, response, x_shape, y_shape = None,
data = None, data_hist = None, data_bins = None)
All parameters are optional but cannot return unless sufficient information is supplied.
-
Parameters:
- type:
- Naive (naive) - Matrix Inversion, Correction Factors
- Dimensionality - Truncated SVD, Richardson-Lucy/D'Agostini/Iterative/'Bayesian'
- Composite - Tikonov/damped LSQ, Fully Bayesian, RUN
- response
- a two dimensional mapping true:measured in lists of numpy arrays
- can also be set using
fold.set_response(xy)
- a response matrix can be supplied using
fold.set_response_matrix(A)
- a response function can be supplied using
fold.set_response_function(foo)
this is evaluated for a uniform distribution over the required range.
- x_shape - discretisation of function in x. In Nystrom methods these are the abscissas for the quadrature rule and in Galerkin methods the span of the basis vectors. This is the desired shape and should be of lower dimensionality then the
y_shape
. - y_shape - discretisation of function in x. In Nystrom methods these are the abscissas for the quadrature rule and in Galerkin methods the span of the basis vectors. If a histogram of data points is supplied then the shape is taken to match this..
- data - the distribution to unfold.
- data - a data vector which can be discretised by either Nystrom or Galerkin methods.
- data_hist - a vector of bin contents e.g. the outcome of a particle detector experiment with implicit Nystrom discretisation.
- data_bins - shape of data vector.
y_shape
is matched to this shape.
- type:
-
Methods:
- Fill(x, y), Miss(x) - set response manually for individual events (float) x and y. (RooUnfold Legacy mode)
- response_matrix()
- returns A - a matrix of probabilities of dim(m, n) = dim(y), dim(x)
- Unfold() - Performs deconvolution of data given the response according to specified algorithm.
- h_reco()
- Returns g - a histogram of shape
x_shape
estimated from theUnfold()
step
- Returns g - a histogram of shape
- b_reco()
- Returns g - a b-spline of with knots as
x_shape
estimated from theUnfold()
step.
- Returns g - a b-spline of with knots as
Prof. Dr. Volker Blobel compiled a list of rules of thumb to be used when approaching the inverse problem in high energy physics.
- Use at least 100 times more events in your response (MC) than you have data if possible.
- To avoid an underconstrained system you should use at least twice as many data points(bins) as you have estimators (truth bins).
- The response, though independent from the data, should be derived over a distribution as close to the target distribution as possible. (The uniform distribution used when a
response_function
is set to the fold is rarely optimal.)
Naive - Matrix Inversion
,Composite - damped LSQ
,Dimensionality - Richardson-Lucy
andDimensionality - Truncated SVD
all available using RooUnfold type settings.HistogramPdf
for Nystrom discretisation implemented but lacking functor+minimiser to fit a pseudo inverse.response_matrix
is currently the only available kernel. Weighted polynomials (as used in RUN) and kde under preparation.- Fully Bayesian Unfolding was available under
version 1.6
but has been temporarilly removed to avoid dependency issues.
Fit mode
- construct the likelihood for the estimators and retrieve full minos intervals.spline-mode
- discretise data and reconstructed functions using b-splies as basis functions in Galerkin expansion.Regularisation Optimisation
- Control/Visualise the effect of varying regularisation strengths.- Background estimates, Systematics and multidimensional distributions
A typical smearing function:
def smear(xt):
xeff = 0.3 + (1.0 - 0.3) / 20. * (xt + 10.0) # efficiency
x = np.random.rand()
if x > xeff:
return None
xsmear = np.random.normal(-2.5, 0.2)
return xt + xsmear
define data and response
x = np.random.normal(0.3, 2.5,1000)
data = np.array([smear(i) for i in x])
data_hist, bins = np.histogram(data[data != np.array(None)], 10)
now define a simple default fold and check each input in tern.
f = fold()
f.set_data_hist(data_hist, bins)
f.set_response(np.array([x,data]))
though the 'true' distribution isn't used in the Unfolding calculation, the optimised binning is useful.
true_hist, x_bins = np.histogram(x, 5)
f.set_x_shape(shape=x_bins)
our response is obtained as:
print f.response_matrix()
and we can test the output in forward mode.
print data_hist
print 'should equal'
print true_hist
print 'times'
print f.response_matrix()
perform calculation
A = f.response_matrix()
f = np.asarray(true_hist)
print 'let us try it!'
print f*A
And now in inverse/unfolded mode
reco, var = f.Unfold()
in this case the reco vector is the nystrom discretisation of weights for the 'true' function f(x) or the reconstructed histogram.
such that f
should equal reco
if the number of events are the same (e.g. 100 times less events means scaling f by 0.01)
Specifying regularisation strength for some algorithms the regularisation strength needs to be supplied e.g. RL - n iterations, tikonov - lagrange mulitplier etc) this is done dynamically.
f = fold('composite', response= np.array([x,data]), x_shape= x_bins, data_hist= data_hist, data_bins= bins)
reco, var = f.Unfold(.25)