In [4]:
# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the "License")

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License

# Apache Beam RunInference with Hugging Face

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_huggingface.ipynb"><img src="https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_huggingface.ipynb"><img src="https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png" />View source on GitHub</a>
  </td>
</table>

This notebook shows how to use the Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) transform with [Hugging Face](https://huggingface.co/).
Apache Beam has built-in support for Hugging Face model handlers: [`HuggingFacePipelineModelHandler`](https://github.com/apache/beam/blob/926774dd02be5eacbe899ee5eceab23afb30abca/sdks/python/apache_beam/ml/inference/huggingface_inference.py#L567), [`HuggingFaceModelHandlerKeyedTensor`](https://github.com/apache/beam/blob/926774dd02be5eacbe899ee5eceab23afb30abca/sdks/python/apache_beam/ml/inference/huggingface_inference.py#L208), and [`HuggingFaceModelHandlerTensor`](https://github.com/apache/beam/blob/926774dd02be5eacbe899ee5eceab23afb30abca/sdks/python/apache_beam/ml/inference/huggingface_inference.py#L392).

There are 3 Hugging Face model handlers:


1.   Use `HuggingFacePipelineModelHandler` to run inference with [Hugging Face Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines#pipelines).
2.   Use `HuggingFaceModelHandlerKeyedTensor` to run inference with models that uses keyed tensors as inputs. (For example: language modeling tasks).
3.   Use `HuggingFaceModelHandlerTensor` to run inference with models that uses tensor input (`tf.Tensor`/`torch.Tensor`).


This notebook demonstrates how to use models from Hugging Face and Hugging Face pipeline in Beam pipeline with `RunInference`.

For more information about using RunInference, see [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) in the Apache Beam documentation.

## Install Dependencies

First install Beam and the required dependencies for Hugging Face.

In [None]:
!pip install torch --quiet
!pip install tensorflow --quiet
!pip install transformers==4.30.0 --quiet
!pip install apache-beam[gcp]>=2.50 --quiet

In [2]:
from typing import Dict
from typing import Iterable
from typing import Tuple

import tensorflow as tf
import torch
from transformers import AutoTokenizer
from transformers import TFAutoModelForMaskedLM

import apache_beam as beam
from apache_beam.ml.inference.base import KeyedModelHandler
from apache_beam.ml.inference.base import PredictionResult
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler
from apache_beam.ml.inference.huggingface_inference import HuggingFaceModelHandlerKeyedTensor
from apache_beam.ml.inference.huggingface_inference import HuggingFaceModelHandlerTensor
from apache_beam.ml.inference.huggingface_inference import PipelineTask


## RunInference with [Hugging Face Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines#pipelines)

The Hugging Face pipelines can be used with `RunInference` using `HuggingFacePipelineModelHandler`. Similar to the pipelines, the model handler needs either the pipeline `task` or `model` that defines the task to instantiate the model handler. Any optional arguments to load the pipeline should be passed with `load_pipeline_args`. While `inference_args` should be used to pass the optional arguments for inference.



Note: Pipeline task can be defined like we do for Hugging Face pipelines in the form of string (eg:`"translation"`) or you can use a [`PipelineTask`](https://github.com/apache/beam/blob/ac936b0b89a92d836af59f3fc04f5733ad6819b3/sdks/python/apache_beam/ml/inference/huggingface_inference.py#L75) enum object defined in Beam (eg: `PipelineTask.Translation`).

### Create a model handler

Let's do a text translation task from English to Spanish.

In [13]:
model_handler = HuggingFacePipelineModelHandler(
    task=PipelineTask.Translation_XX_to_YY,
    model = "google/flan-t5-small",
    load_pipeline_args={'framework': 'pt'},
    inference_args={'max_length': 200}
)

### Define input examples

In [14]:
text = ["translate English to Spanish: How are you doing?",
        "translate English to Spanish: This is the Apache Beam project."]

### Postprocess results

The output from the `RunInference` transform is a `PredictionResult` object. We will extract inferences from that and format the output.

In [15]:
class FormatOutput(beam.DoFn):
  """
  Extract the results from PredictionResult and print it.
  """
  def process(self, element):
    example = element.example
    translated_text = element.inference[0]['translation_text']
    print(f'Example: {example}')
    print(f'Translated text: {translated_text}')
    print('-' * 80)


### Run Pipeline

In [16]:
with beam.Pipeline() as beam_pipeline:
  examples = (
      beam_pipeline
      | "CreateExamples" >> beam.Create(text)
  )
  inferences = (
      examples
      | "RunInference" >> RunInference(model_handler)
      | "Print" >> beam.ParDo(FormatOutput())
  )

Example: translate English to Spanish: How are you doing?
Translated text: Cómo está acerca?
--------------------------------------------------------------------------------
Example: translate English to Spanish: This is the Apache Beam project.
Translated text: Esto es el proyecto Apache Beam.
--------------------------------------------------------------------------------


## RunInference with pretrained model from Hugging Face Hub


To use pretrained models directly from Hugging Face hub, use the `HuggingFaceModelHandlerTensor` or  `HuggingFaceModelHandlerKeyedTensor` depending upon the input type. Use `load_model_args` and `inference_args` to provide optional arguments to load the model and do the inference respectively. Also, please specify the `framework='tf'` for TensorFlow models and `'pt'` for PyTorch models.


Let's do a language modeling task to predict the masked word in a sentence.

### Create a model handler

We will do a masked language modeling task. These models take keyed tensors as input.

In [17]:
model_handler = HuggingFaceModelHandlerKeyedTensor(
    model_uri="stevhliu/my_awesome_eli5_mlm_model",
    model_class=TFAutoModelForMaskedLM,
    framework='tf',
    load_model_args={'from_pt': True},
    max_batch_size=1
)

### Define input examples

In [18]:
text = ['The capital of France is Paris .',
    'It is raining cats and dogs .',
    'He looked up and saw the sun and stars .',
    'Today is Monday and tomorrow is Tuesday .',
    'There are 5 coconuts on this palm tree .']

### Preprocess input

Edit the given input to replace the last word with a `<mask>` and tokenize it for doing inference.

In [19]:
def add_mask_to_last_word(text: str) -> Tuple[str, str]:
  """Replace the last word of sentence with <mask> and return
  the original sentence and masked sentence"""
  text_list = text.split()
  masked = ' '.join(text_list[:-2] + ['<mask>' + text_list[-1]])
  return text, masked

tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_eli5_mlm_model")

def tokenize_sentence(
    text_and_mask: Tuple[str, str],
    tokenizer) -> Tuple[str, Dict[str, tf.Tensor]]:
  """Convert string examples to tensors."""
  text, masked_text = text_and_mask
  tokenized_sentence = tokenizer.encode_plus(
      masked_text, return_tensors="tf")

  # Workaround to manually remove batch dim until we have the feature to
  # add optional batching flag.
  # TODO(https://github.com/apache/beam/issues/21863): Remove once optional
  # batching flag added
  return text, {
      k: tf.squeeze(v)
      for k, v in dict(tokenized_sentence).items()
  }

### Postprocess results

Extract the result from `PredictionResult` object and format output to print the actual sentence and predicted word for last word in the sentence.

In [28]:
class PostProcessor(beam.DoFn):
  """Processes the PredictionResult to get the predicted word.

  The logits are the output of the BERT Model. We can get the word with the highest
  probability of being the masked word by taking the argmax.
  """
  def __init__(self, tokenizer):
    super().__init__()
    self.tokenizer = tokenizer

  def process(self, element: Tuple[str, PredictionResult]) -> Iterable[str]:
    text, prediction_result = element
    inputs = prediction_result.example
    logits = prediction_result.inference['logits']
    mask_token_index = tf.where(inputs["input_ids"] == self.tokenizer.mask_token_id)[0]
    predicted_token_id = tf.math.argmax(logits[mask_token_index[0]], axis=-1)
    decoded_word = self.tokenizer.decode(predicted_token_id)
    print(f"Actual Sentence: {text}\nPredicted last word: {decoded_word}")
    print('-' * 80)

### Run Pipeline

In [29]:
with beam.Pipeline() as beam_pipeline:
  tokenized_examples = (
      beam_pipeline
      | "CreateExamples" >> beam.Create(text)
      | 'AddMask' >> beam.Map(add_mask_to_last_word)
      | 'TokenizeSentence' >>
      beam.Map(lambda x: tokenize_sentence(x, tokenizer)))

  result = (
      tokenized_examples
      | "RunInference" >> RunInference(KeyedModelHandler(model_handler))
      | "PostProcess" >> beam.ParDo(PostProcessor(tokenizer))
  )

Actual Sentence: The capital of France is Paris .
Predicted last word:  Paris
--------------------------------------------------------------------------------
Actual Sentence: It is raining cats and dogs .
Predicted last word:  dogs
--------------------------------------------------------------------------------
Actual Sentence: He looked up and saw the sun and stars .
Predicted last word:  stars
--------------------------------------------------------------------------------
Actual Sentence: Today is Monday and tomorrow is Tuesday .
Predicted last word:  Tuesday
--------------------------------------------------------------------------------
Actual Sentence: There are 5 coconuts on this palm tree .
Predicted last word:  tree
--------------------------------------------------------------------------------
