Copyright (c) 2024 Habana Labs, Ltd. an Intel Company.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

### Using Hugging Face Pipelines on Intel® Gaudi® 2 - Text Generation

This example shows how to use the Hugging Face Transformers pipeline API to run text generation task on Intel Gaudi.

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks. Text generation pipeline is one of them.

#### Text Generation Pipeline Brief Introduction
Text generation pipeline using any ModelWithLMHead. This pipeline predicts the words that will follow a specified text prompt.

This language generation pipeline can currently be loaded from pipeline() using the following task identifier: "text-generation".

The **models that this pipeline can use are models that have been trained with an autoregressive language modeling objective**, which includes the uni-directional models in the library (e.g. gpt2). See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=text-generation).

#### Install the Hugging Face Optimum Habana Library

In [1]:
%cd ~/Gaudi-tutorials/PyTorch/Hugging_Face_pipelines
!pip install optimum-habana==1.11.1

[0m

#### Import all neccessary dependencies

In [2]:
import torch
import requests
import torch
from transformers import pipeline

from habana_frameworks.torch.hpu import wrap_in_hpu_graph
import habana_frameworks.torch.core as htcore

  from .autonotebook import tqdm as notebook_tqdm


#### Prepare the input
     We set prompt as a list of 3 prompts, it can do batch inference.

In [3]:
prompts = [
    "Once upon a time, in a land far, far away,",
    "In the beginning, there was darkness.",
    "The quick brown fox jumps over the lazy dog."
]

#### Setup the pipeline
To setup the Hugging Face pipeline we set the following:

* Choose the Hugging Face task: "text-generation"
   This Text generation pipeline can currently be loaded from [`pipeline`] using the following task identifier:
`"text-generation"`.
* Set the device to "hpu" which allows the pipeline to run on Intel Gaudi
* Choose model "gpt2" and data type to be bf16
* Finally we'll use the "wrap_in_hpu_graph" to wrap the module forward function with HPU Graphs. This wrapper captures, caches and replays the graph. More info here.

You will see that the Intel Gaudi will build the pipeline

In [4]:
generator = pipeline('text-generation', model = 'gpt2', trust_remote_code=True, torch_dtype=torch.bfloat16, device="hpu")

 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 10
CPU RAM       : 102493984 KB
------------------------------------------------------------------------------


In [5]:
generator.model = wrap_in_hpu_graph(generator.model)

#### Execute the pipeline and output the results
Here the input prompts are 3 prompts.
It is batch inference of batch_size =3, and the outputs are list of 3 items.

In [6]:
output = generator(prompts, max_length = 100)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Extract each output generated_text and display it.

In [7]:
print (f"=== output 0 ===: \n  {output[0][0]['generated_text']} \n")
print (f"=== output 1 ===: \n  {output[1][0]['generated_text']} \n")
print (f"=== output 2 ===: \n {output[2][0]['generated_text']}  \n")

=== output 0 ===: 
  Once upon a time, in a land far, far away, from the Earth, a very beautiful sun had spread forth.

The sun that appeared before him had shone brighter than the sun that had eclipsed him. As the sun had risen from the earth, so had the moon risen from the earth, and the stars were risen from the earth. The earth shook, and the sun struck down upon the city of Sodom, and the stars fell upon Sodom, and the city of 

=== output 1 ===: 
  In the beginning, there was darkness. The night was full of snow, which made the man's mouth water with blood.

Quran: There were many in the city. People were dancing from head to toe. They were in need of a place to hide. The police came to their rescue and helped them. They thought they could kill the man and get away with it. But when the police arrived, they did nothing.

Quran: Allah and his messenger said: " 

=== output 2 ===: 
 The quick brown fox jumps over the lazy dog. 'Have a look,' he says. 'He loves to find new things.' 

In [None]:
exit()