In [1]:
# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the "License")

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License

# Run ML Inference with Different Models Per Key

<table align="left">
  <td>
    <a target="_blank" href="https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb"><img src="https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb"><img src="https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png" />View source on GitHub</a>
  </td>
</table>


Often users, desire to run inference with many different models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to use based on some additional metadata.

In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.

This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained models pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook.

## Install Dependencies

We will first install Beam and some dependencies needed by Hugging Face

In [11]:
!pip install apache_beam[gcp]>=2.51.0 --quiet
!pip install torch --quiet
!pip install transformers --quiet

# To use the newly installed versions, restart the runtime.
exit()



In [1]:
from typing import Dict
from typing import Iterable
from typing import Tuple

from transformers import pipeline

import apache_beam as beam
from apache_beam.ml.inference.base import KeyedModelHandler
from apache_beam.ml.inference.base import KeyModelMapping
from apache_beam.ml.inference.base import PredictionResult
from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler
from apache_beam.ml.inference.base import RunInference

## Define Configuration for our Models

A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 model handlers, one for each model we're using in this example. Since both models being used are Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.

We will also load the models using Hugging Face and run them against an example. Note that they produce different outputs.

In [2]:
distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model="distilbert-base-uncased-finetuned-sst-2-english")
roberta_mh = HuggingFacePipelineModelHandler('text-classification', model="roberta-large-mnli")

distilbert_pipe = pipeline('text-classification', model="distilbert-base-uncased-finetuned-sst-2-english")
roberta_large_pipe = pipeline(model="roberta-large-mnli")

Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [3]:
distilbert_pipe("This restaurant is awesome")

[{'label': 'POSITIVE', 'score': 0.9998743534088135}]

In [4]:
roberta_large_pipe("This restaurant is awesome")


[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]

## Define our Examples

Next, we will define some examples that we can input into our pipeline, along with their correct classifications.

In [5]:
examples = [
    ("This restaurant is awesome", "positive"),
    ("This restaurant is bad", "negative"),
    ("I feel fine", "neutral"),
    ("I love chocolate", "positive"),
]

To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can also later extract the actual sentiment of the example.

In [6]:
class FormatExamples(beam.DoFn):
  """
  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').
  We will use these keyes to map our elements to the correct models.
  """
  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:
    yield (f'distilbert-{element[1]}', element[0])
    yield (f'roberta-{element[1]}', element[0])

Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the model handler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory.

In [7]:
per_key_mhs = [
    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),
    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)
]
mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)

## Postprocess our results

The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction.

In [8]:
class ExtractResults(beam.DoFn):
  """
  Extract the data we care about from the PredictionResult object.
  """
  def process(self, element: Tuple[str, PredictionResult]) -> Iterable[Tuple[str, Dict[str, str]]]:
    actual_sentiment = element[0].split('-')[1]
    model = element[0].split('-')[0]
    result = element[1]
    example = result.example
    predicted_sentiment = result.inference[0]['label']

    yield (example, {'model': model, 'actual_sentiment': actual_sentiment, 'predicted_sentiment': predicted_sentiment})

Finally, we will print the results produced by each model.

In [9]:
class PrintResults(beam.DoFn):
  """
  Print the results produced by each model along with the actual sentiment.
  """
  def process(self, element: Tuple[str, Iterable[Dict[str, str]]]):
    example = element[0]
    actual_sentiment = element[1][0]['actual_sentiment']
    predicted_sentiment_1 = element[1][0]['predicted_sentiment']
    model_1 = element[1][0]['model']
    predicted_sentiment_2 = element[1][1]['predicted_sentiment']
    model_2 = element[1][1]['model']

    if model_1 == 'distilbert':
      distilbert_prediction = predicted_sentiment_1
      roberta_prediction = predicted_sentiment_2
    else:
      roberta_prediction = predicted_sentiment_1
      distilbert_prediction = predicted_sentiment_2

    print(f'Example: {example}\nActual Sentiment: {actual_sentiment}\n'
          f'Distilbert Prediction: {distilbert_prediction}\n'
          f'Roberta Prediction: {roberta_prediction}\n------------')

## Run Your Pipeline

We're now ready to put together all of the pieces into a single Beam pipeline!

In [10]:
with beam.Pipeline() as beam_pipeline:

  formatted_examples = (
            beam_pipeline
            | "ReadExamples" >> beam.Create(examples)
            | "FormatExamples" >> beam.ParDo(FormatExamples()))
  inferences = (
            formatted_examples
            | "Run Inference" >> RunInference(mh)
            | "ExtractResults" >> beam.ParDo(ExtractResults())
            | "GroupByExample" >> beam.GroupByKey()
  )

  inferences | beam.ParDo(PrintResults())



Example: This restaurant is awesome
Actual Sentiment: positive
Distilbert Prediction: POSITIVE
Roberta Prediction: NEUTRAL
------------
Example: This restaurant is bad
Actual Sentiment: negative
Distilbert Prediction: NEGATIVE
Roberta Prediction: NEUTRAL
------------
Example: I love chocolate
Actual Sentiment: positive
Distilbert Prediction: POSITIVE
Roberta Prediction: NEUTRAL
------------
Example: I feel fine
Actual Sentiment: neutral
Distilbert Prediction: POSITIVE
Roberta Prediction: ENTAILMENT
------------
