Replies: 2 comments 2 replies
-
And how about |
Beta Was this translation helpful? Give feedback.
1 reply
-
What happened to |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As of coffea v0.7.16 one now receives a warning on import of
coffea.hist
:This discussion is meant to collect tips for migrating from
coffea.hist
tohist
, and I've started by noting changes that were needed in the original coffea histogram tutorial. Feel free to add replies with additional tips and I'll incorporate them here!Construction
A quick guide to the name changes in the constructor:
coffea.hist.
hist.
Hist(...)
Hist(...)
Hist(..., name="Counts")
. A name or label can be specified, and later accessed using the corresponding attribute (e.g.h.name
). 1D plots (e.g.h.plot1d()
) do not seem to use the name as a y-axis label.Cat("name", "label")
axis.StrCategory([], growth=True, name="name", label="label")
growth
argument to the default False. The name and label appear at the end as (optional) keyword arguments.Bin("name", "label", 10, 0.0, 1.0)
axis.Regular(10, 0.0, 1.0, name="name", label="label")
Bin("name", "label", [0.0, 1.0, 3.0])
axis.Variable([0.0, 1.0, 3.0], name="name", label="label")
h1.compatible(h2)
h1.axes == h2.axes
weight
was a protected keyword and was not usable as an axis name. In hist, bothweight
andsample
are protected (the latter is used e.g. forMean()
accumulator types)Cat
axes acted as sparse axes, which means that if you have several categorical axes in your histogram where the number of combinations of values is much less than the full outer product of axis values, the storage requirements for hist may be significantly larger than for coffea. So it is advisable to not use e.g. dataset as an axis in hist, but rather store a dictionary mapping dataset name to a replica of a hist object. Alternatively there are Stack objects.Tip: Some space savings can be found by turning off flow bins in axes where they are not needed. For example, a neural network output might be bound between [0, 1] and so an appropriate binning might be
hist.axis.Regular(20, 0, 1, flow=False)
. The savings multiply as the number of dimensions grows.Hist introduces a new "quick-construct" utility based on the method chaining idiom. For a histogram defined analogously to the one in the coffea histograms tutorial notebook:
we could also construct it with:
of course the character count savings is more pronounced for quick checks where axis names and labels are omitted.
Filling
Hist supports filling with identical parameters to coffea. The example in coffea histograms would be unchanged:
In addition, hist can also fill histograms by positional arguments, i.e. if you specify the arguments in the same order as in the constructor, you can fill without naming the axes:
weight=
argument. For hist, one has to opt-in explicitly during construction by using theweight
storage type (as opposed to the defaultdouble
.Transformation
Below are some possible transformations that can be done on the above-defined histogram
histo
. Many of the methods have direct analogs. Some are a bit more opaque as they utilize the Unified Histogram Indexing (UHI) syntax.histo.sum("z")
histo[{"z": slice(0, 20, sum)}]
len
is a nice shorthand for specifying the last+1 bin index (20
in this case, alsohist.overflow-1
works.) One can also use positional arguments in the slicing, e.g.histo[:, :, :, 0:len:sum]
orhisto[..., 0:20:sum]
. Alternative to usingslice()
directly, one can defines = hist.tag.Slicer()
and then use the usual slice syntax on thes
object:histo[{"z": s[0:len:sum]}]
.histo.sum("z", overflow='over')
histo[{"z": slice(0, hist.overflow, sum)}]
histo.sum("z", overflow='all')
histo[{"z": sum}]
float("NaN")
entries will end up in the overflow bin.histo.sum("z", overflow='allnan')
histo[{"z": sum}]
histo[:, 0:, 4:, 0:]
histo[:, hist.loc(0.0):, 4.0j:, 0.0j:]
hist.loc()
tag. A shorthand forhist.loc
is to specify the bin edge as a pure-imaginary complex number using the python syntax4.0j
wherej
is the imaginary unit.histo.integrate("y", slice(0, 10))
histo[{"y": slice(0.0j, 10.0j, sum)}]
histo.project("y", "z", overflow="allnan")
histo.project("y", "z")
allnan
option. The ordering of arguments allows to permute the axis positions in hist, and they can also be specified by integer index instead of name.histo.rebin("z", hist.Bin("znew", "rebinned z value", [-10, -6, 6, 10]))
histo.rebin("z", 2)
histo[..., ::hist.rebin(2)]
histo.scale(3.)
histo *= 3.0
histo*3.0
histo.identifiers('samp')
histo.axes["samp"]
list(histo.axes["samp"]) == ['sample 1', 'sample 2']
histo.identifiers('x')
histo.axes["x"].edges
for lo, hi in zip(edges[:-1], edges[1:])
. The centers are also available.histo.group(...)
histo.values(overflow: str)
histo.values(flow: bool)
hist
you can only choose either none or all of the overflow bins. Instead of a dictionary mapping identifiers on the sparse axes to the dense axes, inhist
you get a rectangular numpy array, where all the axes are dense. You can use the slicing syntax to reduce the number of axes beforehand. Note that inhist
there is also ahisto.view()
that provides read-write support so you can use it to update entries.Scaling a categorical axis in-place (e.g. for dataset luminosity normalization) changes from
to
where
histo.view(flow=True)
provides a read-write view into the histogram's storage with numpy semantics, similar to coffeahisto.values()
. Note that if thesamp
axis was not the first one, the numpy-style slicehisto.view()[i]
would have to change.Saving
As was the case for coffea, hist histograms can be pickled. In addition, hist histograms are much more compatible with uproot4 writing. For example one can straightaway write any TH* if it has between 1 and 3 axes (TH1, TH2, TH3). So our above
histo
could be projected by sample and written as a TH3:which prints
[('histo;1', <ReadOnlyDirectory '/histo' at 0x00011ff9c2e0>), ('histo/sample 1;1', <TH3D (version 4) at 0x00011ff233d0>), ('histo/sample 2;1', <TH3D (version 4) at 0x00011ff9df10>)]
Plotting
The mplhep package helps interface matplotlib with hist in a similar way as it does for coffea.hist. One can use either
mplhep
directly or via convenience functions likeplot1d
, etc. Some of themhist.plot1d(histo.sum("x", "y"), overlay='sample');
z
distribution per sample without stacking or fillhist.plot1d(histo.sum("x", "y"), overlay='sample', stack=True);
histo[{"x": sum, "y": sum}].plot1d(overlay="samp", histtype="fill", stack=True);
hist.plot2d(histo.sum('x', 'sample'), xaxis='y');
histo[{"x": sum, "samp": sum}].plot2d();
project()
with the approriate index permutaiton, e.g.histo[{"x": sum, "samp": sum}].project(1, 0).plot2d();
would put the z axis on the horizontal.hist.plotgrid(...)
hist.plotratio(...)
Styling
Many of the styling examples carry over directly, since they modify matplotlib attributes. It is worth noting that mplhep now has stylesheets for all four major LHC experiments.
Use within processors
To accomodate the fact that
hist.Hist
does not (and will not) subclassAccumulatorABC
, a change to the accumulator semantics was introduced in coffea v0.7.2 to allow a more flexible definition. A full processor modernization guide will be put in a separate discussion, but it boils down to replacing anydefaultdict_accumulator({"myhist": coffea.hist.Hist(...)})
objects with simple dictionaries of hist objects{"myhist": hist.Hist(...)}
. As for the other types of accumulators, anything that natively supports__add__
(i.e. floats, ints, etc.) is natively supported now. For example, the following is now sufficient to track sum of weights in the return of aProcessorABC.process
method:As a consequence,
ProcessorABC.accumulator
and any use ofAccumulatorABC.identity()
is deprecated. To see more details of the new accumulator semantics, check out https://coffeateam.github.io/coffea/notebooks/accumulators.html#Accumulator-semanticsBeta Was this translation helpful? Give feedback.
All reactions