-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce some processor framework ideas #59
Conversation
Oh, and also this fixes a bug in the clopper-pearson error bars for ratio plots. |
@areinsvo and myself have just finished discussing about weights. One concept we agreed upon is that we would need to make a conceptual difference between object scale factors (btag sfs, but also id corrections or even trigger weights*) and event weight (that can come from calculation starting from obj ifs, like btag weights). Object scale factors would be attributes of objects, possibly retrievable as obj.scale_factor. And it would be jagged... *trigger weights are usually computed as event weights using the leading pt object for the extraction. Would be interesting calculating a trigger weight per object as that specific object was the one it was triggered upon. In this case we would have a weight per object, making this quantity jagged. |
@mcremone @areinsvo So I'm not fully convinced that you need to makes weights specially treated between event weights and per-object weights. The underlying distinction, to me, is if you as the analyst are going to apply a reduction to the object weights to yield an event weight or if the weight needs no reduction. However, a per-object weight is going to be an event weight already if you only select on one object, and you need the reducer if you have more than one object. Therefore I claim there is an imprecision that can lead to confusion or improper application of weights, following from your definition. We should think of a better set of classifications. |
(There may be a better place to discuss this, but regardless.. ) @mcremone, I thought our conclusion was that we need to maintain the ability to do both per-object weights and per-event weights, where the level of jaggedness is set by the jaggedness of the args passed to the lookup table/evaluator. No classification or conceptual distinction is needed on the side of the tools. After returning the weights (whether jagged or non-jagged), we leave it up to the analyzer to decide what to do with them. @lgray I will make a PR later this weekend to add such a feature in place of the current exception in dense lookup: https://github.com/CoffeaTeam/fnal-column-analysis-tools/blob/master/fnal_column_analysis_tools/lookup_tools/dense_lookup.py#L33 |
@areinsvo That exception needs to be there since we cannot pass jagged arrays to numpy.searchsorted, do not change it. If a jagged array is passed in, the evaluator machinery handles flattening the inputs to numpy arrays and re-jaggifying the outputs based on the input structure. In particular, if that exception is not there then it is not guaranteed that you are handling numpy arrays which all share the same jagged shape in the higher layers that are flattening things. Jagged and non-jagged weights are already handled by all the lookup types. |
Hmm I think |
@nsmith- This hack works for now, imho. |
hmm actually this hack means that you can't have columns called |
Pending the check that this works for spark, this can be merged. |
And a stub for condor
It works in spark and produces the same file as before. For now I'll call this version 0.3.0 and make a new release! |
Introduce some processor framework ideas
Introduces utility classes:
DataFrame
for lazy loading of uproot trees. This should be extended or abstracted for other data delivery mechanisms.Weights
to assist with event weights and weight-based uncertaintiesPackedSelection
to assist with evaluating event masks via cuts. This could be extended to awkward with some thought, utilizing MaskedArray.Also introduces a completely non-functional ABC for a general processor. At some point I'll pull the 'python futures' setup out of the baconbits stuff and make it work with this style.