Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce some processor framework ideas #59

Merged
merged 15 commits into from
Apr 12, 2019
Merged

Introduce some processor framework ideas #59

merged 15 commits into from
Apr 12, 2019

Conversation

nsmith-
Copy link
Member

@nsmith- nsmith- commented Mar 15, 2019

Introduces utility classes:

  • DataFrame for lazy loading of uproot trees. This should be extended or abstracted for other data delivery mechanisms.
  • Weights to assist with event weights and weight-based uncertainties
  • PackedSelection to assist with evaluating event masks via cuts. This could be extended to awkward with some thought, utilizing MaskedArray.

Also introduces a completely non-functional ABC for a general processor. At some point I'll pull the 'python futures' setup out of the baconbits stuff and make it work with this style.

@nsmith-
Copy link
Member Author

nsmith- commented Mar 15, 2019

@mcremone @areinsvo some of these systematics tools might be useful for you, possibly with some small modification to work well with awkward (as currently I am using them on baconbits, which are rectangular trees)

@nsmith-
Copy link
Member Author

nsmith- commented Mar 15, 2019

Oh, and also this fixes a bug in the clopper-pearson error bars for ratio plots.

@mcremone
Copy link
Collaborator

@areinsvo and myself have just finished discussing about weights. One concept we agreed upon is that we would need to make a conceptual difference between object scale factors (btag sfs, but also id corrections or even trigger weights*) and event weight (that can come from calculation starting from obj ifs, like btag weights).

Object scale factors would be attributes of objects, possibly retrievable as obj.scale_factor. And it would be jagged...

*trigger weights are usually computed as event weights using the leading pt object for the extraction. Would be interesting calculating a trigger weight per object as that specific object was the one it was triggered upon. In this case we would have a weight per object, making this quantity jagged.

@lgray
Copy link
Collaborator

lgray commented Mar 15, 2019

@mcremone @areinsvo So I'm not fully convinced that you need to makes weights specially treated between event weights and per-object weights. The underlying distinction, to me, is if you as the analyst are going to apply a reduction to the object weights to yield an event weight or if the weight needs no reduction. However, a per-object weight is going to be an event weight already if you only select on one object, and you need the reducer if you have more than one object. Therefore I claim there is an imprecision that can lead to confusion or improper application of weights, following from your definition.

We should think of a better set of classifications.

@areinsvo
Copy link
Collaborator

(There may be a better place to discuss this, but regardless.. ) @mcremone, I thought our conclusion was that we need to maintain the ability to do both per-object weights and per-event weights, where the level of jaggedness is set by the jaggedness of the args passed to the lookup table/evaluator. No classification or conceptual distinction is needed on the side of the tools. After returning the weights (whether jagged or non-jagged), we leave it up to the analyzer to decide what to do with them.

@lgray I will make a PR later this weekend to add such a feature in place of the current exception in dense lookup: https://github.com/CoffeaTeam/fnal-column-analysis-tools/blob/master/fnal_column_analysis_tools/lookup_tools/dense_lookup.py#L33

@lgray
Copy link
Collaborator

lgray commented Mar 16, 2019

@areinsvo That exception needs to be there since we cannot pass jagged arrays to numpy.searchsorted, do not change it. If a jagged array is passed in, the evaluator machinery handles flattening the inputs to numpy arrays and re-jaggifying the outputs based on the input structure.

In particular, if that exception is not there then it is not guaranteed that you are handling numpy arrays which all share the same jagged shape in the higher layers that are flattening things.

Jagged and non-jagged weights are already handled by all the lookup types.

@lgray lgray mentioned this pull request Mar 16, 2019
@lgray
Copy link
Collaborator

lgray commented Mar 16, 2019

@mcremone @areinsvo Let's continue this discussion over in #60.

@lgray
Copy link
Collaborator

lgray commented Apr 11, 2019

@nsmith- is there a way to get ABCs into python 2 or should we just drop python 2 support?

@mcremone @areinsvo any preference here? I've switched to python3 for a long time at this point.

@nsmith-
Copy link
Member Author

nsmith- commented Apr 11, 2019

Hmm I think DataFrame needs to be a base class with spark and uproot having separate implementations. I'm mixing camelCase and underscore as well, so it should probably become dataframe

@lgray
Copy link
Collaborator

lgray commented Apr 11, 2019

@nsmith- This hack works for now, imho.

@nsmith-
Copy link
Member Author

nsmith- commented Apr 11, 2019

hmm actually this hack means that you can't have columns called index, stride, etc.... I'll split it

@nsmith-
Copy link
Member Author

nsmith- commented Apr 11, 2019

Pending the check that this works for spark, this can be merged.

@lgray
Copy link
Collaborator

lgray commented Apr 12, 2019

It works in spark and produces the same file as before.
@nsmith- will do a more thorough check of the output histograms.

For now I'll call this version 0.3.0 and make a new release!

@lgray lgray merged commit 7874f05 into master Apr 12, 2019
@nsmith- nsmith- deleted the processor branch April 12, 2019 22:51
jpata pushed a commit to jpata/coffea that referenced this pull request Jul 22, 2019
Introduce some processor framework ideas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants