New accumulator semantics #472

nsmith- · 2021-03-25T14:42:53Z

Keeping the old accumulators intact but allowing new ones to be used in processor output

They need to stay as AccumulatorABC for backwards compatibility in user code for now.

lgray

One minor performance gripe and then one more important point on behavior. Otherwise the PR looks very good.

lgray · 2021-03-30T18:55:29Z

coffea/processor/accumulator.py

+    elif isinstance(a, MutableMapping) and isinstance(b, MutableMapping):
+        out = copy.copy(a)
+        out.clear()
+        for key in set(a) & set(b):


Not sure if it matters that much but minimizing the number of times we iterate over keys of a and b could be important for large mappings. The set operations are all very fast but the creation of set(a) and set(b) may not be.

a_keyset = set(a) b_keyset = set(b) for key in a_keyset & b_keyset: ...

If it's possible to non-hideously get rid of the walk over a from the instantiation of out, that would be good too.

lgray · 2021-03-30T18:56:17Z

coffea/processor/accumulator.py

+    if isinstance(a, MutableSet) and isinstance(b, MutableSet):
+        return operator.ior(a, b)
+    elif isinstance(a, MutableMapping) and isinstance(b, MutableMapping):
+        for key in set(a) & set(b):


same comment as for add about re-instantiating sets that you use over and over.

lgray · 2021-03-30T19:05:45Z

coffea/processor/accumulator.py

+            out[key] = copy.deepcopy(a[key])
+        for key in set(b) - set(a):
+            out[key] = copy.deepcopy(b[key])
+        return out


It makes more sense in iadd, but do we want add to always take the type of the first operand? I think the usual behavior would be to take the first common non-abstract type shared by a and b (counting back from the passed types)?

I'm fine with this being the rule so long as we are clear about it.

What about user defined accumulables which are somehow not compatible with each other? We should detect that and throw an error, to keep people from getting confusing answers.

The eventual call to a + b would throw if they are not compatible

There is a loss of the type of the mapping b here but since I figure these are almost always just dictionaries no need to work too hard to resolve the common type of a and b. It would be a challenging problem. I had originally thought I would convert everything to plain dict in the output, but it might be nice for user to have some chance of receiving what they put in (e.g. defaultdict)

Yeah for sure the common case is covered well by this. I'm only concerned with not-to-farfetched uncommon cases from people who like tinkering.

It's also fine to simply say as a rule that for different input types the lhs operand's type is taken. That gives enough control to the user and just requires docs/comments.

Another option is to require type(a) == type(b) which in this context is how we would expect things to be used regardless. For some of our executors you can't really guarantee what ends up on lhs in a reproducible way since it depends on the order the jobs finish

I went for a bit of a softer type check, take a look at the tests to see how it plays out

That's also a fine solution. I think what we have right now is not confusing.

nsmith- added 3 commits March 25, 2021 09:40

New accumulator semantics

dcb2b62

Keeping the old accumulators intact but allowing new ones to be used in processor output

typing extensions

044a88a

get protocol the right way

ab4a0aa

nsmith- marked this pull request as draft March 25, 2021 19:00

nsmith- added 4 commits March 25, 2021 20:05

Migrate executors to use refactored accucmulate

11f25da

Make spark work too

a636ced

Some barebones typing

87e0078

flake8

9c7a9cf

nsmith- marked this pull request as ready for review March 26, 2021 14:31

nsmith- requested a review from lgray March 26, 2021 14:32

nsmith- mentioned this pull request Mar 26, 2021

Pickle and dask adjustments #476

Merged

nsmith- linked an issue Mar 26, 2021 that may be closed by this pull request

Accumulator refactor #180

Closed

nsmith- added 2 commits March 26, 2021 16:01

Revert changes to Hist and WeightStats

4c174cf

They need to stay as AccumulatorABC for backwards compatibility in user code for now.

Make WeightStatistics actually work

178b644

lgray requested changes Mar 30, 2021

View reviewed changes

nsmith- added 4 commits March 31, 2021 12:26

Revise new accumulator type resolution a bit

2cb719c

Prioritize explicit addables over abstract collections

fc5a7c7

Actually use the old style correctly...

ef428dd

Merge branch 'master' into newaccumulator

6cb6229

lgray approved these changes Mar 31, 2021

View reviewed changes

nsmith- merged commit 24f7d57 into master Mar 31, 2021

nsmith- deleted the newaccumulator branch March 31, 2021 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New accumulator semantics #472

New accumulator semantics #472

nsmith- commented Mar 25, 2021

lgray left a comment

lgray Mar 30, 2021

lgray Mar 30, 2021

lgray Mar 30, 2021 •

edited

nsmith- Mar 31, 2021

nsmith- Mar 31, 2021

lgray Mar 31, 2021

nsmith- Mar 31, 2021 •

edited

nsmith- Mar 31, 2021

lgray Mar 31, 2021

New accumulator semantics #472

New accumulator semantics #472

Conversation

nsmith- commented Mar 25, 2021

lgray left a comment

Choose a reason for hiding this comment

lgray Mar 30, 2021

Choose a reason for hiding this comment

lgray Mar 30, 2021

Choose a reason for hiding this comment

lgray Mar 30, 2021 • edited

Choose a reason for hiding this comment

nsmith- Mar 31, 2021

Choose a reason for hiding this comment

nsmith- Mar 31, 2021

Choose a reason for hiding this comment

lgray Mar 31, 2021

Choose a reason for hiding this comment

nsmith- Mar 31, 2021 • edited

Choose a reason for hiding this comment

nsmith- Mar 31, 2021

Choose a reason for hiding this comment

lgray Mar 31, 2021

Choose a reason for hiding this comment

lgray Mar 30, 2021 •

edited

nsmith- Mar 31, 2021 •

edited