# Numba and OAMap

Vectorization is hard; I just want a #$*& `for` loop

Remember how we selected pairs of muons to compute a mass?

In [None]:
import uproot
import numpy

# get data
tree = uproot.open("~/NanoAOD-DYJetsToLL.root")["Events"]
pt, eta, phi = tree.arrays(["Muon_pt", "Muon_eta", "Muon_phi"], outputtype=tuple)
starts, stops = pt.starts, pt.stops
pt, eta, phi = numpy.array(pt), numpy.array(eta), numpy.array(phi)

# manipulate arrays in a vectorized way
has2muons = stops - starts >= 2          # boolean mask filter
firsts = starts[has2muons]               # indexes of first particle in each event (that has two)
seconds = starts[has2muons] + 1          # indexes of second particle in each event (that has two)
pt1, pt2 = pt[firsts], pt[seconds]       # first and second muon pt per event
eta1, eta2 = eta[firsts], eta[seconds]   # first and second muon eta per event
phi1, phi2 = phi[firsts], phi[seconds]   # first and second muon phi per event

# vectorized calculation of masses from first and second muon per event
numpy.sqrt(2*pt1*pt2*(numpy.cosh(eta1 - eta2) - numpy.cos(phi1 - phi2)))

Wouldn't you rather

In [None]:
import math
import oamap.backend.root.cmsnano

# get events that you can iterate over
events = oamap.backend.root.cmsnano.proxy("~/NanoAOD-DYJetsToLL.root")

for event in events[:100]:        # iteration (over the first hundred) with a for loop
    if len(event.Muon) >= 2:      # selection with an if statement
        mu1 = event.Muon[0]
        mu2 = event.Muon[1]
        mass = math.sqrt(2*mu1.pt*mu2.pt*(math.cosh(mu1.eta - mu2.eta) - math.cos(mu1.phi - mu2.phi)))
        print(mass)               # sequential calculation of each individual event

But: `for` loops! `if` statements! Bad for vectorization!

   * Vectorization can provide factors of ~8 on modern CPUs and ~32 on GPUs.
   * Most of the factor of ~1000 we saw earlier came from avoiding Python `for` loops.

What we really need to avoid is stepping through Python. We need compiled code.

   * **External C++ code?** Works, but switching files obscures the logic of the scientific workflow.
   * **Inline C++ code?** Clearer for small excursions, but even switching languages can be confusing and/or extra effort to translate by hand.
   * **Cython?** Hybrid language between Python and C++. Can gradually add C++, but it's a third language.
   * **Compile the Python code?** Easiest in most cases. (Only problem: the Python language wasn't designed to be compiled and lacks the ability to express advanced features, so you'd still have to defer to the above for complex cases.)

In [None]:
import numba

def mass_numpy(pt1, pt2, eta1, eta2, phi1, phi2, out):
    out[:] = numpy.sqrt(2*pt1*pt2*(numpy.cosh(eta1 - eta2) - numpy.cos(phi1 - phi2)))

def mass_python(pt1, pt2, eta1, eta2, phi1, phi2, out):
    for i in range(len(pt1)):
        out[i] = math.sqrt(2*pt1[i]*pt2[i]*(math.cosh(eta1[i] - eta2[i]) - math.cos(phi1[i] - phi2[i])))

@numba.jit(nopython=True)            # compile the following (nopython=True means we prefer errors to slow code)
def mass_numba(pt1, pt2, eta1, eta2, phi1, phi2, out):
    for i in range(len(pt1)):        # yay! for loop!
        out[i] = math.sqrt(2*pt1[i]*pt2[i]*(math.cosh(eta1[i] - eta2[i]) - math.cos(phi1[i] - phi2[i])))

out = numpy.empty(len(pt1))

In [None]:
%%timeit
mass_numpy(pt1, pt2, eta1, eta2, phi1, phi2, out)

In [None]:
%%timeit
mass_python(pt1, pt2, eta1, eta2, phi1, phi2, out)

In [None]:
%%timeit
mass_numba(pt1, pt2, eta1, eta2, phi1, phi2, out)

Numba can compile most Python control structures and infers data types from the arguments of the function the first time it's called. (Each call with different argument types triggers a new compilation. The compilation process is invisible: no commands to forget.)

Use it to speed up a slow loop without rewriting everything (or confusing reviewers of your analysis with indirection to other languages or files).

However:

   * It's compiled, so all types must be known before runtime. Code that depends on changing the type of a variable won't compile (as in C++).
   * Numba doesn't recognize all data types in `nopython=True` mode. Without that, it falls back to slower but general Python where necessary.
   * If `nopython=True`, it mostly just recognizes arrays and numbers, but this is the part of your code that can benefit most from speeding up.

# OAMap: object-array mapping

ROOT files (usually) store data as arrays of numbers. ROOT rebuilds objects for each event; uproot doesn't.

In [None]:
tree = uproot.open("~/NanoAOD-DYJetsToLL.root")["Events"]
print(repr(tree.array("Jet_pt")))
print(repr(tree.array("Jet_eta")))

In [None]:
print(repr(tree.array("Jet_pt").offsets))
print(repr(tree.array("Jet_pt").content))

Instead of rebuilding the whole event object, OAMap leaves data as arrays but rebuilds objects on demand (proxy objects).

In [None]:
events = oamap.backend.root.cmsnano.proxy("~/NanoAOD-DYJetsToLL.root")
events

In [None]:
events[1].Jet

In [None]:
events[1].Jet[0].pt

And if one of these proxies enters a compiled section of code, we don't have to make any objects at all. Data references like `events[1].Jet[0].pt` are translated into array lookups.

OAMap optimization has been implemented as a Numba extension.

In [None]:
@numba.jit(nopython=True)
def mass_oamap(events, out):
    i = 0
    for event in events:            # yay! for loop!
        if len(event.Muon) >= 2:    # yay! if statement!
            mu1 = event.Muon[0]
            mu2 = event.Muon[1]
            out[i] = math.sqrt(2*mu1.pt*mu2.pt*(math.cosh(mu1.eta - mu2.eta) - math.cos(mu1.phi - mu2.phi)))
            i += 1

In [None]:
%%timeit
mass_oamap(events, out)

In [None]:
def mass_equivalent_numpy(starts, stops, pt, eta, phi, out):
    # the _equivalent_ Numpy has to find the first and second item in each event
    has2muons = stops - starts >= 2
    firsts = starts[has2muons]
    seconds = starts[has2muons] + 1
    pt1, pt2 = pt[firsts], pt[seconds]
    eta1, eta2 = eta[firsts], eta[seconds]
    phi1, phi2 = phi[firsts], phi[seconds]
    out[:] = numpy.sqrt(2*pt1*pt2*(numpy.cosh(eta1 - eta2) - numpy.cos(phi1 - phi2)))

In [None]:
%%timeit
mass_equivalent_numpy(starts, stops, pt, eta, phi, out)

(As fast as the equivalent Numpy and 1070 times faster than pure Python.)

Since OAMap data are in the form of arrays, with no room for insertion, they can't be changed like normal Python objects. You can, however, create new datasets from old datasets.

In [None]:
events_v2 = events.define("pz", lambda jet: jet.pt*math.sinh(jet.eta), at="Jet")
events_v2

In [None]:
events_v2[1].Jet[0].pz

In [None]:
events_v2.project("Jet").flatten().project("pz")

In [None]:
events_v3 = events_v2.filter(lambda event: len(event.Jet) >= 2)

len(events_v2), len(events_v3)

In [None]:
import pandas
pandas.DataFrame(events_v3.map(lambda event: (event.Jet[0].pz, event.Jet[1].pz)))

Many of these operations take a big dataset and return a big dataset; they're fast because they don't copy the data that's shared by input and output.

In this jet object,

In [None]:
jet = events_v3[0].Jet[1]
jet.pt, jet.pz

The `pt` is drawn from a ROOT file and the `pz` is in a Numpy array in memory.

To write or read these hybrid objects, we have to start talking databases.

(But a database can be as simple as a directory of files.)

In [None]:
import oamap.backend.numpyfile
db = oamap.backend.numpyfile.NumpyFileDatabase("mydata")

In [None]:
db.data.events_v3 = events_v3

In [None]:
!tree mydata

Now switch to [4-numba-oamap-end.ipynb](4-numba-oamap-end.ipynb).