# Running inference tools

As machine learning (ML) becomes more popular in HEP analysis, coffea also
provide tools to assist with using ML tools within the coffea framework. For
training and validation, you would likely need custom data mangling tools to
convert standard CMS data formats ([NanoAOD][nanoaod], [PFNano][pfnano]) to a
format that best interfaces with the ML tool of choice to have fine control over
what is done. For more advanced use cases of data mangling and data saving,
refer to the [awkward array manual][datamangle] and
[uproot][uproot_write]/[parquet][ak_parquet] write operations for saving
intermediate states. This reference mainly focuses on the result inference side
of ML tools, where ML tool outputs are used as another variable to be used in
the event/object selection chain.

[nanoaod]: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD
[pfnano]: https://github.com/cms-jet/PFNano
[datamangle]: https://awkward-array.org/doc/main/user-guide/how-to-restructure.html
[uproot_write]: https://uproot.readthedocs.io/en/latest/basic.html#writing-ttrees-to-a-file
[ak_parquet]: https://awkward-array.org/doc/main/reference/generated/ak.to_parquet.html


## Why these wrapper tools are needed

The typical operation of using ML inference tools in the awkward/coffea analysis
tools involves the conversion and padding of awkward array to ML tool containers
(usually something that is `numpy`-compatible), run the inference, then
convert-and-truncate back into the awkward array syntax required for the
analysis chain to continue. With awkward arrays' laziness now being handled
entirely by [`dask`][dask_awkward], the conversion operation of awkward array to
other array types needs to be wrapped in a way that is understandable to `dask`.
The packages in the `ml_tools` package attempts to wrap the common tools used by
the HEP community with a common interface to reduce the verbosity of the code on
the analysis side.

[dask_awkward]: https://dask-awkward.readthedocs.io/en/stable/gs-limitations.html



## Example using ParticleNet-like jet variable calculation using PyTorch

The example given in this notebook be using [`pytorch`][pytorch] to calculate a
jet-level discriminant using its constituent particles. An example for how to
construct such a `pytorch` network can be found in the docs file, but for
`mltools` in coffea, we only support the [TorchScript][pytorch] format files to
load models to ensure operability when scaling to clusters. Let us first start
by downloading the example ParticleNet model file, as well as construct a dummy
Event-Jet-PFCandidate nested jagged arrays (Notice that this is a random array
with actual physical meaning, and is just used as an example for array
structure).


[pytorch]: https://pytorch.org/
[pytorch_jit]: https://pytorch.org/tutorials/beginner/saving_loading_models.html#export-load-model-in-torchscript-format


In [1]:
!wget --quiet -O model.pt https://github.com/CoffeaTeam/coffea/raw/ml_tools/tests/samples/triton_models_test/pn_test/1/model.pt


In [2]:
import numpy as np
import awkward as ak

NEVT = 10
NJETS = (NEVT, 4)
NCANDS = (*NJETS, 100)


def make_events_array():
    def make_ak(arr):
        return ak.from_regular(arr, axis=(arr.ndim - 1))

    # Creating randomized nested-structure
    events = ak.zip(
        {
            "HT": make_ak(np.random.exponential(scale=1000, size=NEVT)),
            "NJets": make_ak(np.random.randint(1, NJETS[-1], size=NEVT)),
        }
    )

    Jets = ak.zip(
        {
            "pt": make_ak(np.random.exponential(20, size=NJETS)),
            "eta": make_ak(np.random.normal(2, size=NJETS)),
            "phi": make_ak(np.random.uniform(-np.pi, np.pi, size=NJETS)),
            "NPFCands": make_ak(np.random.randint(1, NCANDS[-1], size=NJETS)),
        }
    )

    PFCands = ak.zip(
        {
            "pt": make_ak(np.random.exponential(1, size=NCANDS)),
            "eta": make_ak(np.random.normal(2, size=NCANDS)),
            "phi": make_ak(np.random.uniform(-np.pi, np.pi, size=NCANDS)),
            "feat1": make_ak(np.random.random(size=NCANDS)),
            "feat2": make_ak(np.random.random(size=NCANDS)),
            "feat3": make_ak(np.random.random(size=NCANDS)),
            "feat4": make_ak(np.random.random(size=NCANDS)),
        }
    )

    # Making nested jagged structure
    Jets["PFCands"] = PFCands[ak.local_index(PFCands.pt, axis=-1) < Jets.NPFCands]
    events["Jets"] = Jets[ak.local_index(Jets.pt, axis=-1) < events.NJets]
    return events


Now we prepare a class to handle inference request. Here we import the base
`torch_wrapper` class and create a new class that inherits `torch_wrapper`. As
the class cannot know anything about the data mangling required, we will need to
overload at least the method `awkward_to_numpy`:

- The input can be an arbitrary number of awkward arrays. Here we will be
  passing in the event array in the format constructed as the format array.
- The output should be single tuple `a` and single dictionary `b`, this is to
  ensure that arbitrarily complicated outputs can be passed to the underlying
  `pytorch` model instance like `model(*a, **b)`. The contents of `a` and `b`
  will need to be determined by the model of interest. In our ParticleNet-like
  example, the model expects the following inputs:

  - A `N` jets x `2` coordinate x `100` constituents "points" array,
    representing the constituent coordinates.
  - A `N` jets x `5` feature x `100` constituents "features" array, representing
    the constituent features of interest to be used for inference.
  - A `N` jets x `1` mask x `100` constituent "mask" array, representing whether
    a constituent should be masked from the inference request.

  The user is also responsible to making sure the data format is compatible with
  the model of interest. Defining this minimum class, we can attempt to run an
  inference using the `__call__` method of our defined class.



In [3]:
from coffea.ml_tools.torch_wrapper import torch_wrapper


class ParticleNetExample1(torch_wrapper):
    def awkward_to_numpy(self, events):
        jets = ak.flatten(events.Jets)

        def pad(arr):
            return ak.fill_none(ak.pad_none(arr, 100, axis=1, clip=True), 0.0,)

        # Human readable version of what the inputs are
        # Each array is a N jets x 100 constituent arry
        imap = {
            "points": {
                "deta": pad(jets.eta - jets.PFCands.eta),
                "dphi": pad(jets.phi - jets.PFCands.phi),
            },
            "features": {
                "dr": pad(
                    np.sqrt(
                        (jets.eta - jets.PFCands.eta) ** 2
                        + (jets.phi - jets.PFCands.phi) ** 2
                    )
                ),
                "lpt": pad(np.log(jets.PFCands.pt)),
                "lptf": pad(
                    np.log(jets.PFCands.pt / ak.sum(jets.PFCands.pt, axis=-1))
                ),
                "f1": pad(np.log(jets.PFCands.feat1 + 1)),
                "f2": pad(np.log(jets.PFCands.feat2 + 1)),
            },
            "mask": {
                "mask": pad(ak.ones_like(jets.PFCands.pt)),
            },
        }

        # Compacting the array elements into the desired dimension using
        # ak.concatenate
        retmap = {
            k: ak.concatenate(
                [x[:, np.newaxis, :] for x in imap[k].values()], axis=1
            ).to_numpy()
            for k in imap.keys()
        }
        
        # Returning everything using a dictionary. Also take care of type
        # conversion here.
        return (), {
            "points": retmap["points"].astype(np.float32),
            "features": retmap["features"].astype(np.float32),
            "mask": retmap["mask"].astype(np.float16),
        }


pn_example1 = ParticleNetExample1("model.pt")
events = make_events_array()
results = pn_example1(events)
print(results)
print(type(results))
print(results.__repr__)
print(ak.count(events.Jets.pt))




[[0.0511, -0.0321], [0.0543, -0.00448], ..., [0.0513, ...], [0.0511, -0.0354]]
<class 'awkward.highlevel.Array'>
<bound method Array.__repr__ of <Array [[0.0511, -0.0321], ..., [0.0511, -0.0354]] type='15 * 2 * float32'>>
15


For each jet in the input to the `torch` model, the model returns 2-score
elements. Without additional specification, the `torch_wrapper` class performs a
trival conversion of `ak.from_numpy` using the torch outputs. We can specify
that we want to fold this back into nested structure by overloading the
`numpy_to_awkward` method of the class. 

For this example we are going perform additional computation for the conversion
back to awkward array formats: 

- Calculate the `softmax` method for the return 0 of each jets (commonly used
  for ML output ``scores'')
- Fold the computed softmax array back into nested structure that is compatible
  with the original inputs events array.

Notice that the inputs of the `numpy_to_awkward` method is different from the
`awkward_to_numpy` method, only by that the first argument is the return numpy
array of the model inference.


In [4]:
class ParticleNetExample2(ParticleNetExample1):
    def numpy_to_awkward(self, return_array, events):
        softmax = np.exp(return_array)[:, 0] / np.sum(np.exp(return_array), axis=-1)

        njets = ak.count(events.Jets.pt, axis=-1)
        return ak.unflatten(softmax, njets)


pn_example2 = ParticleNetExample2("model.pt")
jets = events.Jets
jets["MLresults"] = pn_example2(events)
events["Jets"] = jets

print(events.Jets.MLresults)


[[0.521], [0.515, 0.522, 0.522], [0.52], ..., [0.521], [0.521], [0.52, 0.522]]


Now we have a per-jet variable we can use to continue the event/object selection
chain for our analysis! 

Notice that up till now, we have been working exclusively with plain awkward.
But this class is already ready to be extended to dask awkward! The `__call__`
method commonly knows how to handle the different array types.


In [5]:
import dask_awkward as dak

# Creating a lazy dask array of events array with the same
ak.to_parquet(events, "events.parquet")
dask_events = dak.from_parquet("events.parquet")

# Syntax for dask arrays is identical to the plain awkward arrays!
dask_jets = dask_events.Jets
dask_jets["MLresults_dask"] = pn_example2(dask_events)
dask_events["Jets"] = dask_jets

# Checking that we get identical results
print(dask_events.Jets.MLresults_dask.compute())
print(
    ak.all(
        dask_events.Jets.MLresults_dask.compute()
        == dask_events.Jets.MLresults.compute()
    )
)

# Check which columns are loaded
print(dak.necessary_columns(dask_events.Jets.MLresults_dask))




[[0.521], [0.515, 0.522, 0.522], [0.52], ..., [0.521], [0.521], [0.52, 0.522]]
True
{'read-parquet-919fe6c2d5b954ae9b92e0d2895dc5f0': ['Jets.PFCands.eta', 'Jets.pt', 'Jets.PFCands.feat4', 'NJets', 'Jets.eta', 'Jets.PFCands.phi', 'Jets.PFCands.feat3', 'Jets.PFCands.pt', 'Jets.PFCands.feat2', 'Jets.phi', 'Jets.MLresults', 'Jets.NPFCands', 'HT', 'Jets.PFCands.feat1']}


The only remaining issue is that `dask` is currently loading all columns in the
array that is passed to the `awkward_to_numpy` method, even those that are not
required for the ML inference computation. While this behavior ensure that
everything is loaded, leading to less room for errors to during the development
of the `awkward_to_numpy` method, leaving this behavior unchanged can load to
excessive memory usage, depending on how the event array is set up.

To properly solve this behavior, one should further overload the `dask_columns`
method to return a list of columns that is strictly needed by the inference
tools. The inputs should be identical to that of the `awkward_to_numpy` method
defined by the user. 

*Note:* as of writing, there is a [bug][bug] that causes the kernel to
segmentation fault if the required branches are not included. If you are running
into issues, it might be that you need to allow the class you are using to be
slightly less lazy.

[bug]: https://github.com/dask-contrib/dask-awkward/issues/249


In [6]:
class ParticleNetExample3(ParticleNetExample2):
    def dask_columns(self, events):
        return [
            events.Jets.pt,
            events.Jets.eta,
            events.Jets.phi,
            events.Jets.NPFCands,
            events.Jets.PFCands,
        ]


pn_example3 = ParticleNetExample3("model.pt")

# Reloading a lazy instance of the array
ak.to_parquet(make_events_array(), "events2.parquet")
dask_events = dak.from_parquet("events2.parquet")

# Syntax for dask arrays is identical to the plain awkward arrays!
dask_jets = dask_events.Jets
dask_jets["MLresults_lazy"] = pn_example3(dask_events)
dask_events["Jets"] = dask_jets

# Checking that we get identical results
print(dask_events.Jets.MLresults_lazy.compute())

# Check which columns are loaded
print(dak.necessary_columns(dask_events.Jets.MLresults_lazy))


[[0.522, 0.518, 0.52], [0.52], ..., [0.521, ..., 0.521], [0.518, 0.521, 0.52]]
{'read-parquet-d2173ecf0056d81c95a83d0ff5c0fe1a': ['Jets.PFCands.eta', 'Jets.pt', 'Jets.PFCands.feat4', 'Jets.eta', 'Jets.PFCands.phi', 'Jets.PFCands.feat3', 'Jets.PFCands.pt', 'Jets.PFCands.feat2', 'Jets.phi', 'Jets.NPFCands', 'Jets.PFCands.feat1']}


Of course, the implementation of the classes above can be written in a single
class. Here is a copy-and-paste implementation of the class with all the
functionality described in the cells above:

In [7]:
class ParticleNetExample(torch_wrapper):
    def awkward_to_numpy(self, events):
        jets = ak.flatten(events.Jets)

        def pad(arr):
            return ak.fill_none(
                ak.pad_none(arr, 100, axis=1, clip=True),
                0.0,
            )

        # Human readable version of what the inputs are
        # Each array is a N jets x 100 constituent arry
        imap = {
            "points": {
                "deta": pad(jets.eta - jets.PFCands.eta),
                "dphi": pad(jets.phi - jets.PFCands.phi),
            },
            "features": {
                "dr": pad(
                    np.sqrt(
                        (jets.eta - jets.PFCands.eta) ** 2
                        + (jets.phi - jets.PFCands.phi) ** 2
                    )
                ),
                "lpt": pad(np.log(jets.PFCands.pt)),
                "lptf": pad(np.log(jets.PFCands.pt / ak.sum(jets.PFCands.pt, axis=-1))),
                "f1": pad(np.log(jets.PFCands.feat1 + 1)),
                "f2": pad(np.log(jets.PFCands.feat2 + 1)),
            },
            "mask": {
                "mask": pad(ak.ones_like(jets.PFCands.pt)),
            },
        }

        # Compacting the array elements into the desired dimension using
        # ak.concatenate
        retmap = {
            k: ak.concatenate(
                [x[:, np.newaxis, :] for x in imap[k].values()], axis=1
            ).to_numpy()
            for k in imap.keys()
        }

        # Returning everything using a dictionary. Also take care of type
        # conversion here.
        return (), {
            "points": retmap["points"].astype(np.float32),
            "features": retmap["features"].astype(np.float32),
            "mask": retmap["mask"].astype(np.float16),
        }

    def numpy_to_awkward(self, return_array, events):
        softmax = np.exp(return_array)[:, 0] / np.sum(np.exp(return_array), axis=-1)

        njets = ak.count(events.Jets.pt, axis=-1)
        return ak.unflatten(softmax, njets)

    def dask_columns(self, events):
        return [
            events.Jets.pt,
            events.Jets.eta,
            events.Jets.phi,
            events.Jets.NPFCands,
            events.Jets.PFCands,
        ]


pn_example = ParticleNetExample("model.pt")

# Reloading a lazy instance of the array
ak.to_parquet(make_events_array(), "events.parquet")
dask_events = dak.from_parquet("events.parquet")

# Syntax for dask arrays is identical to the plain awkward arrays!
dask_jets = dask_events.Jets
dask_jets["MLresults"] = pn_example(dask_events)
dask_events["Jets"] = dask_jets

# Checking that we get identical results
print(dask_events.Jets.MLresults.compute())

# Check which columns are loaded
print(dak.necessary_columns(dask_events.Jets.MLresults))


[[0.519], [0.52, 0.519, 0.521], [...], ..., [0.518], [0.521, 0.521, 0.519]]
{'read-parquet-919fe6c2d5b954ae9b92e0d2895dc5f0': ['Jets.PFCands.eta', 'Jets.pt', 'Jets.PFCands.feat4', 'Jets.eta', 'Jets.PFCands.phi', 'Jets.PFCands.feat3', 'Jets.PFCands.pt', 'Jets.PFCands.feat2', 'Jets.phi', 'Jets.NPFCands', 'Jets.PFCands.feat1']}


## Comments about generalizing to other ML tools

All ML wrappers provided in the `coffea.mltools` module (`triton_wrapper` for
[triton][triton] server inference, `torch_wrapper` for pytorch, and
`xgboost_wrapper` for [xgboost][xgboost] inference) follow the same design:
analyzers is responsible for providing the model of interest, along with
providing an inherited class that overloads of the following methods to data
type conversion:

- `awkward_to_numpy`: converting awkward arrays to `numpy` arrays, the output
  `numpy` arrays should be in the format of a tuple `a` and a dictionary `b`,
  which can be expanded out to the input of the ML tool like `model(*a, **b)`.
  Notice some additional trivial conversion (like converting to available
  kernels for `pytorch`, converting to a matrix format for `xgboost`, and slice
  of array for `triton` is handled automatically by the respective wrappers)
- `numpy_to_awkward` (optional): converting the number results back to awkward
  array format. If this is not provided, then a simple `ak.from_numpy`
  conversion takes place.
- `dask_columns` (optional but recommended): Given the inputs to the
  `awkward_to_numpy` method, list the branches required for the inference
  calculation. If not provided, it will attempt to load all branches
  recursively, which may have significant performance penalties.

If the ML tool of choice for your analysis has not been implemented by the
`coffea.mltools` modules, consider constructing your own with the provided
`numpy_call_wrapper` base class in `coffea.mltools`. Aside from the functions
listed above, you will also need to provide the `numpy_call` method to perform
any additional data format conversions, and call the ML tool of choice. If you
think your implementation is general, also consider submitting a PR to the
`coffea` repository!

[triton]: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
[xgboost]: https://xgboost.readthedocs.io/en/stable/
