Copyright (c) 2024 Habana Labs, Ltd. an Intel Company.
SPDX-License-Identifier: Apache-2.0


### Using Hugging Face Pipelines on Intel® Gaudi® 2 - Text Generation

This example shows how to use the Hugging Face Transformers pipeline API to run text generation task on Intel Gaudi.

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks. Text generation pipeline is one of them.

#### Text Generation Pipeline Brief Introduction
Text generation pipeline using any ModelWithLMHead. This pipeline predicts the words that will follow a specified text prompt.

This language generation pipeline can currently be loaded from pipeline() using the following task identifier: "text-generation".

The **models that this pipeline can use are models that have been trained with an autoregressive language modeling objective**, which includes the uni-directional models in the library (e.g. gpt2). See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=text-generation).

#### Install the Hugging Face Optimum Habana Library

In [None]:
%cd ~/Gaudi-tutorials/PyTorch/Hugging_Face_pipelines
!pip install optimum-habana==1.16.0

#### Import all neccessary dependencies

In [None]:
#Enable PT_HPU_LAZY_MODE=1
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

import torch
import requests
import torch
from transformers import pipeline

from habana_frameworks.torch.hpu import wrap_in_hpu_graph
import habana_frameworks.torch.core as htcore

The command below may be needed to modify the existing Hugging Face model classes to use the Intel Gaudi specific version of the model classes.

In [None]:
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
adapt_transformers_to_gaudi()

#### Prepare the input
     We set prompt as a list of 3 prompts, it can do batch inference.

In [3]:
prompts = [
    "Once upon a time, in a land far, far away,",
    "In the beginning, there was darkness.",
    "The quick brown fox jumps over the lazy dog."
]

#### Setup the pipeline
To setup the Hugging Face pipeline we set the following:

* Choose the Hugging Face task: "text-generation"
   This Text generation pipeline can currently be loaded from [`pipeline`] using the following task identifier:
`"text-generation"`.
* Set the device to "hpu" which allows the pipeline to run on Intel Gaudi
* Choose model "gpt2" and data type to be bf16
* Finally we'll use the "wrap_in_hpu_graph" to wrap the module forward function with HPU Graphs. This wrapper captures, caches and replays the graph. More info [here](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_HPU_Graphs.html).

You will see that the Intel Gaudi will build the pipeline

In [None]:
generator = pipeline('text-generation', model = 'gpt2', trust_remote_code=True, torch_dtype=torch.bfloat16, device="hpu")

In [5]:
generator.model = wrap_in_hpu_graph(generator.model)

#### Execute the pipeline and output the results
Here the input prompts are 3 prompts.
It is batch inference of batch_size =3, and the outputs are list of 3 items.

In [None]:
output = generator(prompts, max_length = 100)

Extract each output generated_text and display it.

In [None]:
print (f"=== output 0 ===: \n  {output[0][0]['generated_text']} \n")
print (f"=== output 1 ===: \n  {output[1][0]['generated_text']} \n")
print (f"=== output 2 ===: \n {output[2][0]['generated_text']}  \n")

In [None]:
exit()