# Running inference tools

As machine learning (ML) becomes more popular in HEP analysis, `coffea` also
provide tools to assist with using ML tools within the coffea framework. For
training and validation, you would likely need custom data mangling tools to
convert HEP data formats ([NanoAOD][nanoaod], [PFNano][pfnano]) to a format that
best interfaces with the ML tool of choice, as for training and validation, you
typical want to have fine control over what computation is done. For more
advanced use cases of data mangling and data saving, refer to the [awkward array
manual][datamangle] and [uproot][uproot_write]/[parquet][ak_parquet] write
operations for saving intermediate states. The helper tools provided in coffea
focuses on ML inference, where ML tool outputs are used as another variable to
be used in the event/object selection chain.

[nanoaod]: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD
[pfnano]: https://github.com/cms-jet/PFNano
[datamangle]: https://awkward-array.org/doc/main/user-guide/how-to-restructure.html
[uproot_write]: https://uproot.readthedocs.io/en/latest/basic.html#writing-ttrees-to-a-file
[ak_parquet]: https://awkward-array.org/doc/main/reference/generated/ak.to_parquet.html


## Why these wrapper tools are needed

The typical operation of using ML inference tools in the awkward/coffea analysis
tools involves the conversion and padding of awkward array to ML tool containers
(usually something that is `numpy`-compatible), run the inference, then
convert-and-truncate back into the awkward array syntax required for the
analysis chain to continue. With awkward arrays' laziness now being handled
entirely by [`dask`][dask_awkward], the conversion operation of awkward array to
other array types needs to be wrapped in a way that is understandable to `dask`.
The packages in the `ml_tools` package attempts to wrap the common tools used by
the HEP community with a common interface to reduce the verbosity of the code on
the analysis side.

[dask_awkward]: https://dask-awkward.readthedocs.io/en/stable/gs-limitations.html



## Example using ParticleNet-like jet variable calculation using PyTorch

The example given in this notebook be using [`pytorch`][pytorch] to calculate a
jet-level discriminant using its constituent particles. An example for how to
construct such a `pytorch` network can be found in the docs file, but for
`mltools` in coffea, we only support the [TorchScript][pytorch] format files to
load models to ensure operability when scaling to clusters. Let us first start
by downloading the example ParticleNet model file and a small `PFNano`
compatible file, and a simple function to open the `PFNano` with and without
dask.


[pytorch]: https://pytorch.org/
[pytorch_jit]: https://pytorch.org/tutorials/beginner/saving_loading_models.html#export-load-model-in-torchscript-format


In [None]:
!wget --quiet -O model.pt https://github.com/CoffeaTeam/coffea/raw/ml_tools/tests/samples/triton_models_test/pn_test/1/model.pt

!wget --quiet -O pfnano.root https://github.com/yimuchen/coffea/raw/ml_tools/tests/samples/pfnano.root


In [None]:
from coffea.nanoevents import NanoEventsFactory
from coffea.nanoevents.schemas import PFNanoAODSchema


def open_events(permit_dask=False):
    factory = NanoEventsFactory.from_root(
        "file:./pfnano.root",
        schemaclass=PFNanoAODSchema,
        permit_dask=permit_dask,
    )
    return factory.events()


Now we prepare a class to handle inference request by extending the
`mltools.torch_wrapper` class. As the base class cannot know anything about the
data mangling required for the users particular model, we will need to overload
at least the method `prepare_awkward_to_numpy`:

- The input can be an arbitrary number of awkward arrays. In this example, we
  will be passing in the event array.
- The output should be single tuple `a` + single dictionary `b`, this is to
  ensure that arbitrarily complicated outputs can be passed to the underlying
  `pytorch` model instance like `model(*a, **b)`. The contents of `a` and `b`
  should be `numpy`-compatible _awkward_ arrays (so awkward arrays that can be
  trivially converted to `numpy` arrays via a `ak.to_numpy` call). In this
  ParticleNet-like example, the model expects the following inputs:

  - A `N` jets x `2` coordinate x `100` constituents "points" array,
    representing the constituent coordinates.
  - A `N` jets x `5` feature x `100` constituents "features" array, representing
    the constituent features of interest to be used for inference.
  - A `N` jets x `1` mask x `100` constituent "mask" array, representing whether
    a constituent should be masked from the inference request.

  In this case, we will need to flatten the `E` events x `N` jets structure, as
  well as stack the constituent attributes of interest, into a single array.

After defining this minimum class, we can attempt to run an inference using the
`__call__` method defined in the base class. Notice that overloading this single
method will automatically allow for the inference to be called on both awkward
and dask-awkward. 


In [None]:
from coffea.ml_tools.torch_wrapper import torch_wrapper
import awkward as ak
import numpy as np


class ParticleNetExample1(torch_wrapper):
    def prepare_awkward_to_numpy(self, events):
        jets = ak.flatten(events.Jet)

        def pad(arr):
            return ak.fill_none(
                ak.pad_none(arr, 100, axis=1, clip=True),
                0.0,
            )

        # Human readable version of what the inputs are
        # Each array is a N jets x 100 constituent array
        imap = {
            "points": {
                "deta": pad(jets.eta - jets.constituents.pf.eta),
                "dphi": pad(jets.delta_phi(jets.constituents.pf)),
            },
            "features": {
                "dr": pad(jets.delta_r(jets.constituents.pf)),
                "lpt": pad(np.log(jets.constituents.pf.pt)),
                "lptf": pad(np.log(jets.constituents.pf.pt / jets.pt)),
                "f1": pad(np.log(np.abs(jets.constituents.pf.d0) + 1)),
                "f2": pad(np.log(np.abs(jets.constituents.pf.dz) + 1)),
            },
            "mask": {
                "mask": pad(ak.ones_like(jets.constituents.pf.pt)),
            },
        }

        # Compacting the array elements into the desired dimension using
        # ak.concatenate
        retmap = {
            k: ak.concatenate([x[:, np.newaxis, :] for x in imap[k].values()], axis=1)
            for k in imap.keys()
        }

        # Returning everything using a dictionary. Also take care of type
        # conversion here.
        return (), {
            "points": ak.values_astype(retmap["points"], "float32"),
            "features": ak.values_astype(retmap["features"], "float32"),
            "mask": ak.values_astype(retmap["mask"], "float16"),
        }


# Setting up the model container
pn_example1 = ParticleNetExample1("model.pt")

# Running on awkward arrays
ak_events = open_events(permit_dask=False)
ak_results = pn_example1(ak_events)
print("Awkward results:", ak_results)  # Runs fine!

# Running on dask_awkward array
dak_events = open_events(permit_dask=True)
dak_results = pn_example1(dak_events)
print("Dask awkward results:", dak_results.compute())  # Also runs file!

# Checking that the results are identical
assert ak.all(dak_results.compute() == ak_results)


For each jet in the input to the `torch` model, the model returns a 2-tuple
probability value. Without additional specification, the `torch_wrapper` class
performs a trival conversion of `ak.from_numpy` of the torch model's output. We
can specify that we want to fold this back into nested structure by overloading
the `numpy_to_awkward` method of the class. 

For the ParticleNet example we are going perform additional computation for the
conversion back to awkward array formats: 

- Calculate the `softmax` method for the return of each jet (commonly used as
  the singular ML inference "scores")
- Fold the computed `softmax` array back into nested structure that is
  compatible with the original events.Jet array.

Notice that the inputs of the `numpy_to_awkward` method is different from the
`prepare_awkward_to_numpy` method, only by that the first argument is the return
`numpy` array of the model inference. If you overload this method, the
appropriate supporting also needs to be exposed to dask for the best results.


In [None]:
class ParticleNetExample2(ParticleNetExample1):
    def numpy_to_awkward(self, return_array, events):
        softmax = np.exp(return_array)[:, 0] / np.sum(np.exp(return_array), axis=-1)

        njets = ak.count(ak.typetracer.length_one_if_typetracer(events.Jet.pt), axis=-1)
        if ak.backend(events) == "typetracer":
            njets = ak.full_like(njets, 1)
        out = ak.unflatten(softmax, njets)
        if ak.backend(events) == "typetracer":
            out = ak.Array(
                out.layout.to_typetracer(forget_length=True), behavior=out.behavior
            )
        return out


pn_example2 = ParticleNetExample2("model.pt")

# Running on awkward
ak_events = open_events(permit_dask=False)
ak_jets = ak_events.Jet
ak_jets["MLresults"] = pn_example2(ak_events)
ak_events["Jet"] = ak_jets

# Running on dask awkward
dask_events = open_events(permit_dask=True)
dask_jets = dask_events.Jet
dask_jets["MLresults"] = pn_example2(dask_events)
dask_events["Jet"] = dask_jets

print(ak_events.Jet.MLresults)
assert ak.all(ak_events.Jet.MLresults == dask_events.Jet.MLresults.compute())


Of course, the implementation of the classes above can be written in a single
class. Here is a copy-and-paste implementation of the class with all the
functionality described in the cells above:

In [None]:
def jet_features_as_numpy(jets):
    def pad(arr):
        return ak.fill_none(
            ak.pad_none(arr, 100, axis=1, clip=True),
            0.0,
        )

    # Human readable version of what the inputs are
    # Each array is a N jets x 100 constituent array
    imap = {
        "points": {
            "deta": pad(jets.eta - jets.PFCands.eta),
            "dphi": pad(jets.delta_phi(jets.PFCands)),
        },
        "features": {
            "dr": pad(jets.delta_r(jets.PFCands)),
            "lpt": pad(np.log(jets.PFCands.pt)),
            "lptf": pad(np.log(jets.PFCands.pt / jets.pt)),
            "f1": pad(np.log(np.abs(jets.PFCands.d0) + 1)),
            "f2": pad(np.log(np.abs(jets.PFCands.dz) + 1)),
        },
        "mask": {
            "mask": pad(ak.ones_like(jets.PFCands.pt)),
        },
    }

    # Compacting the array elements into the desired dimension using
    # ak.concatenate
    retmap = {
        k: ak.concatenate([x[:, np.newaxis, :] for x in imap[k].values()], axis=1)
        for k in imap.keys()
    }

    # Returning everything using a dictionary. Also take care of type
    # conversion here.
    return (), {
        "points": ak.values_astype(retmap["points"], "float32"),
        "features": ak.values_astype(retmap["features"], "float32"),
        "mask": ak.values_astype(retmap["mask"], "float16"),
    }


class ParticleNetExample(torch_wrapper):
    def prepare_awkward_to_numpy(self, events):
        jets = ak.flatten(events.Jet)
        return jet_features_as_numpy(jets)

    def numpy_to_awkward(self, return_array, events):
        softmax = np.exp(return_array)[:, 0] / np.sum(np.exp(return_array), axis=-1)

        njets = ak.count(ak.typetracer.length_one_if_typetracer(events.Jet.pt), axis=-1)
        if ak.backend(events) == "typetracer":
            njets = ak.full_like(njets, 1)
        out = ak.unflatten(softmax, njets)
        if ak.backend(events) == "typetracer":
            out = ak.Array(
                out.layout.to_typetracer(forget_length=True), behavior=out.behavior
            )
        return ak.unflatten(softmax, njets)


pn_example = ParticleNetExample("model.pt")

# Running on awkward arrays
ak_events = open_events(permit_dask=False)
ak_jets = ak_events.Jet
ak_jets["MLresults"] = pn_example(dask_events)
ak_events["Jet"] = ak_jets

# Running on dask awkward arrays
dask_events = open_events(permit_dask=True)
dask_jets = dask_events.Jet
dask_jets["MLresults"] = pn_example(dask_events)
dask_events["Jet"] = dask_jets

# Checking that we get identical results
print(dask_events.Jet.MLresults.compute())
assert ak.all(dak_results.Jet.MLresults.compute() == ak_events.Jet.MLresults)

# Check which columns are loaded
print(dak.necessary_columns(dask_events.Jet.MLresults))


If you feel uncertain about the implementation of the `numpy_to_awkward` method.
You can also write the class so that the folding occurs outside the wrapper class: 

In [None]:
class ParticleNetExample_Alt(torch_wrapper):
    def prepare_awkward_to_numpy(self, jets):
        return jet_features_as_numpy(jets)

    def numpy_to_awkward(self, return_array, jets):
        softmax = np.exp(return_array)[:, 0] / np.sum(np.exp(return_array), axis=-1)
        return ak.from_numpy(softmax)


pn_example_alt = ParticleNetExample_Alt("model.pt")

# Running on awkward arrays
ak_events = open_events(permit_dask=False)
ak_njets = ak.num(ak_events.Jet, axis=-1)
ak_jets = ak.flatten(ak_events.Jet)
ak_jets["MLresults"] = pn_example(ak_jets)
ak_events["Jet"] = ak.unflatten(ak_jets, ak_njets)

# Running on dask awkward arrays
dask_events = open_events(permit_dask=True)
dask_njets = dak.num(dask_events.Jet, axis=-1)
dask_jets = dak.flatten(dask_events.Jet)
dask_jets["MLresults"] = pn_example(dask_jets)
dask_events["Jet"] = ak.unflatten(dask_jets, dask_njets)

# Checking that we get identical results
print(dask_events.Jet.MLresults.compute())
assert ak.all(dask_events.Jet.MLresults.compute() == ak_events.Jet.MLresults)

# Check which columns are loaded
print(dak.necessary_columns(dask_events.Jet.MLresults))


## Comments about generalizing to other ML tools

All ML wrappers provided in the `coffea.mltools` module (`triton_wrapper` for
[triton][triton] server inference, `torch_wrapper` for pytorch, and
`xgboost_wrapper` for [xgboost][xgboost] inference) follow the same design:
analyzers is responsible for providing the model of interest, along with
providing an inherited class that overloads of the following methods to data
type conversion:

- `awkward_to_numpy`: converting awkward arrays to `numpy` arrays, the output
  `numpy` arrays should be in the format of a tuple `a` and a dictionary `b`,
  which can be expanded out to the input of the ML tool like `model(*a, **b)`.
  Notice some additional trivial conversion (like converting to available
  kernels for `pytorch`, converting to a matrix format for `xgboost`, and slice
  of array for `triton` is handled automatically by the respective wrappers)
- `numpy_to_awkward` (optional): converting the number results back to awkward
  array format. If this is not provided, then a simple `ak.from_numpy`
  conversion takes place.
- `dask_columns` (optional but recommended): Given the inputs to the
  `awkward_to_numpy` method, list the branches required for the inference
  calculation. If not provided, it will attempt to load all branches
  recursively, which may have significant performance penalties.

If the ML tool of choice for your analysis has not been implemented by the
`coffea.mltools` modules, consider constructing your own with the provided
`numpy_call_wrapper` base class in `coffea.mltools`. Aside from the functions
listed above, you will also need to provide the `numpy_call` method to perform
any additional data format conversions, and call the ML tool of choice. If you
think your implementation is general, also consider submitting a PR to the
`coffea` repository!

[triton]: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
[xgboost]: https://xgboost.readthedocs.io/en/stable/
