Introduce some processor framework ideas #59

nsmith- · 2019-03-15T22:41:43Z

Introduces utility classes:

DataFrame for lazy loading of uproot trees. This should be extended or abstracted for other data delivery mechanisms.
Weights to assist with event weights and weight-based uncertainties
PackedSelection to assist with evaluating event masks via cuts. This could be extended to awkward with some thought, utilizing MaskedArray.

Also introduces a completely non-functional ABC for a general processor. At some point I'll pull the 'python futures' setup out of the baconbits stuff and make it work with this style.

nsmith- · 2019-03-15T22:45:35Z

@mcremone @areinsvo some of these systematics tools might be useful for you, possibly with some small modification to work well with awkward (as currently I am using them on baconbits, which are rectangular trees)

nsmith- · 2019-03-15T22:46:44Z

Oh, and also this fixes a bug in the clopper-pearson error bars for ratio plots.

mcremone · 2019-03-15T22:54:37Z

@areinsvo and myself have just finished discussing about weights. One concept we agreed upon is that we would need to make a conceptual difference between object scale factors (btag sfs, but also id corrections or even trigger weights*) and event weight (that can come from calculation starting from obj ifs, like btag weights).

Object scale factors would be attributes of objects, possibly retrievable as obj.scale_factor. And it would be jagged...

*trigger weights are usually computed as event weights using the leading pt object for the extraction. Would be interesting calculating a trigger weight per object as that specific object was the one it was triggered upon. In this case we would have a weight per object, making this quantity jagged.

lgray · 2019-03-15T23:46:54Z

@mcremone @areinsvo So I'm not fully convinced that you need to makes weights specially treated between event weights and per-object weights. The underlying distinction, to me, is if you as the analyst are going to apply a reduction to the object weights to yield an event weight or if the weight needs no reduction. However, a per-object weight is going to be an event weight already if you only select on one object, and you need the reducer if you have more than one object. Therefore I claim there is an imprecision that can lead to confusion or improper application of weights, following from your definition.

We should think of a better set of classifications.

areinsvo · 2019-03-16T14:30:42Z

(There may be a better place to discuss this, but regardless.. ) @mcremone, I thought our conclusion was that we need to maintain the ability to do both per-object weights and per-event weights, where the level of jaggedness is set by the jaggedness of the args passed to the lookup table/evaluator. No classification or conceptual distinction is needed on the side of the tools. After returning the weights (whether jagged or non-jagged), we leave it up to the analyzer to decide what to do with them.

@lgray I will make a PR later this weekend to add such a feature in place of the current exception in dense lookup: https://github.com/CoffeaTeam/fnal-column-analysis-tools/blob/master/fnal_column_analysis_tools/lookup_tools/dense_lookup.py#L33

lgray · 2019-03-16T14:38:30Z

@areinsvo That exception needs to be there since we cannot pass jagged arrays to numpy.searchsorted, do not change it. If a jagged array is passed in, the evaluator machinery handles flattening the inputs to numpy arrays and re-jaggifying the outputs based on the input structure.

In particular, if that exception is not there then it is not guaranteed that you are handling numpy arrays which all share the same jagged shape in the higher layers that are flattening things.

Jagged and non-jagged weights are already handled by all the lookup types.

lgray · 2019-03-16T15:03:05Z

@mcremone @areinsvo Let's continue this discussion over in #60.

lgray · 2019-04-11T08:37:37Z

@nsmith- is there a way to get ABCs into python 2 or should we just drop python 2 support?

@mcremone @areinsvo any preference here? I've switched to python3 for a long time at this point.

nsmith- · 2019-04-11T13:19:46Z

Hmm I think DataFrame needs to be a base class with spark and uproot having separate implementations. I'm mixing camelCase and underscore as well, so it should probably become dataframe

lgray · 2019-04-11T13:25:56Z

@nsmith- This hack works for now, imho.

nsmith- · 2019-04-11T22:34:08Z

hmm actually this hack means that you can't have columns called index, stride, etc.... I'll split it

nsmith- · 2019-04-11T23:45:07Z

Pending the check that this works for spark, this can be merged.

And a stub for condor

lgray · 2019-04-12T07:56:01Z

It works in spark and produces the same file as before.
@nsmith- will do a more thorough check of the output histograms.

For now I'll call this version 0.3.0 and make a new release!

Introduce some processor framework ideas

lgray mentioned this pull request Mar 16, 2019

Weights discussion #60

Closed

lgray mentioned this pull request Apr 11, 2019

Add in sparkified notebook for vandy, use lz4 nsmith-/coffeandbacon#1

Merged

nsmith- force-pushed the processor branch from e5bae03 to b345ae5 Compare April 11, 2019 22:44

nsmith- and others added 15 commits April 12, 2019 07:32

Introduce some processor framework ideas

3473ac2

Support chunking of uproot files in dataframe

c52f5e4

Collect should be external to Processor

51deb1e

Better error message for missing axes in Hist

e77f443

Handle blank values in Cat axes correctly

3bb288b

Pass iterables using *iter duh

eff4bab

Interval compares only to its own type

7b44eaf

No need for axis argument on export1d()

ff16d40

Some touch-up

6566155

Processor implementations for uproot

aa20ed3

And a stub for condor

python 2 compatibility

5b92a50

Add preloaded dataframe for spark use

fec0312

perils of maintaining two separate python environments

b956496

More python 2 stuff

80e9057

version 0.3.0

87b5a84

lgray force-pushed the processor branch from 7445329 to 87b5a84 Compare April 12, 2019 07:33

lgray merged commit 7874f05 into master Apr 12, 2019

nsmith- deleted the processor branch April 12, 2019 22:51

jpata pushed a commit to jpata/coffea that referenced this pull request Jul 22, 2019

Merge pull request CoffeaTeam#59 from CoffeaTeam/processor

e14b1ba

Introduce some processor framework ideas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce some processor framework ideas #59

Introduce some processor framework ideas #59

nsmith- commented Mar 15, 2019

nsmith- commented Mar 15, 2019

nsmith- commented Mar 15, 2019

mcremone commented Mar 15, 2019

lgray commented Mar 15, 2019

areinsvo commented Mar 16, 2019

lgray commented Mar 16, 2019

lgray commented Mar 16, 2019

lgray commented Apr 11, 2019

nsmith- commented Apr 11, 2019

lgray commented Apr 11, 2019

nsmith- commented Apr 11, 2019

nsmith- commented Apr 11, 2019

lgray commented Apr 12, 2019

Introduce some processor framework ideas #59

Introduce some processor framework ideas #59

Conversation

nsmith- commented Mar 15, 2019

nsmith- commented Mar 15, 2019

nsmith- commented Mar 15, 2019

mcremone commented Mar 15, 2019

lgray commented Mar 15, 2019

areinsvo commented Mar 16, 2019

lgray commented Mar 16, 2019

lgray commented Mar 16, 2019

lgray commented Apr 11, 2019

nsmith- commented Apr 11, 2019

lgray commented Apr 11, 2019

nsmith- commented Apr 11, 2019

nsmith- commented Apr 11, 2019

lgray commented Apr 12, 2019