# Accumulators

Accumulators are abstract objects that enable the reduce stage of the typical map-reduce scaleout that we do in Coffea.  One concrete example is a histogram.  The idea is that an accumulator definition holds enough information to be able to create an empty accumulator (the `identity()`) and add two compatible accumulators together (the `add()`).  The former is not strictly necessary, but helps with book-keeping.  Here we show an example usage of a few accumulator types.  An arbitrary-depth nesting of dictionary accumulators is supported, much like the behavior of directories in ROOT `hadd`.

In [1]:
from coffea.processor import dict_accumulator, column_accumulator, defaultdict_accumulator
from coffea.hist import Hist, Bin
import numpy as np

In [2]:
adef = dict_accumulator({
    'cutflow': defaultdict_accumulator(int),
    'pt': Hist("counts", Bin("pt", "$p_T$", 100, 0, 100)),
    'final_pt': column_accumulator(np.zeros(shape=(0,))),
})

Notice that this cell and the following one have no dependencies on each other and could execute in parallel.

In [3]:
ptvals = np.random.exponential(scale=30, size=2000)
cut = ptvals > 200.

a0 = adef.identity()
a0['cutflow']['pt>200'] += cut.sum()
a0['pt'].fill(pt=ptvals)
a0['final_pt'] += column_accumulator(ptvals[cut])

In [4]:
ptvals = np.random.exponential(scale=30, size=2000)
cut = ptvals > 200.

a1 = adef.identity()
a1['cutflow']['pt>200'] += cut.sum()
a1['pt'].fill(pt=ptvals)
a1['final_pt'] += column_accumulator(ptvals[cut])

Now that we've completed our filling of two accumulators in parallel, we reduce the output to a single combined accumulator.

In [5]:
a = a0 + a1
a

{'cutflow': defaultdict_accumulator(int, {'pt>200': 4}),
 'pt': <Hist (pt) instance at 0x121097160>,
 'final_pt': column_accumulator(array([210.57565817, 209.35292604, 314.75790797, 219.19196919]))}