Copyright (c) 2024 Habana Labs, Ltd. an Intel Company.
SPDX-License-Identifier: Apache-2.0


## Translation using Hugging Face Pipelines on the Intel&reg; Gaudi&reg; 2 AI Acclerator
This tutorial will show how to run translation tasks using Hugging Face pipelines.  We'll show a simple example to a more complex example where we fine tune the t5 model with a specific dataset

In [None]:
# Start with the `exit()` command to re-start the Python kernel to ensure that there are no other proceses holding the Intel Gaudi Accelerator as you start to run this notebook.  
# You will see a warning that the kernel has died, this is expected.
exit()

#### Installation and Setup
Install the Hugging Face Optimum for Intel® Gaudi® Accelerators library and examples

In [None]:
%cd ~
!git clone -b v1.16.0 https://github.com/huggingface/optimum-habana.git
!pip install --quiet optimum-habana==1.16.0

Install DeepSpeed for faster training

In [None]:
!pip install --quiet git+https://github.com/HabanaAI/DeepSpeed.git@1.21.0

In this case, we'll be using the "Translation" Task example from the Hugging Face Examples directory, so we'll go to this directory and install the specific requiremetns and create the directory to hold the fine-tuned model.  For this example, the fine tuning has already been performed.

In [None]:
%cd ~/optimum-habana/examples/translation
!pip install -r requirements.txt
!pip install pickleshare
!mkdir finetune_model_output

#### Simple Example using the Hugging Face Pipeline on Intel Gaudi
In this case, the example below just shows the simple setup of the Hugging Face pipeline for the translation task and runs inference only.  Note that the pipelne sets the `device="hpu"` to ensure that the inference is running on the Intel Gaudi AI Accelerator.

In [None]:
#Enable PT_HPU_LAZY_MODE=1
import os
os.environ['PT_HPU_LAZY_MODE'] = '1'

import torch
import habana_frameworks.torch

from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer

text = "translate English to French: Good Morning, I'd like to run to the store to get some milk."
translator_pipe = pipeline("translation_xx_to_yy", model="t5-small", device="hpu", torch_dtype=torch.bfloat16, max_length=200)
translator_pipe(text)

#### Fine Tuning with DeepSpeed
We now run the Fine Tuning of the t5 model with the English-German Dataset wmt14-en-de-pre-processed, we'll take the output of the model for inference.  To accelerate the fine tuning, we'll use DeepSpeed and eight Intel Gaudi Accelerators.   Note the Intel Gaudi speific commands used.  
    --use_habana  
    --use_lazy_mode  
    --use_hpu_graphs_for_training  
    
For more information you can refer to the Hugging Face Translation example [here](https://github.com/huggingface/optimum-habana/tree/main/examples/translation)

In [None]:
!PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py \
    --world_size 4 --use_deepspeed run_translation.py \
    --model_name_or_path t5-small \
    --do_train \
    --do_eval \
    --source_lang en \
    --target_lang de \
    --source_prefix "translate English to German: " \
    --dataset_name stas/wmt14-en-de-pre-processed \
    --output_dir ./finetune_model_output \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --overwrite_output_dir \
    --save_steps=20000 \
    --save_total_limit=3 \
    --predict_with_generate \
    --use_habana \
    --use_lazy_mode \
    --report_to none \
    --use_hpu_graphs_for_training \
    --gaudi_config_name Habana/t5 \
    --ignore_pad_token_for_loss False \
    --pad_to_max_length \
    --throughput_warmup_steps 3 \
    --bf16 


Now that the model is fine Tuned, you can see the updated model in the `./finetune_model_output` directory

In [None]:
%cd finetune_model_output
%ls -al
%cd ..

#### Inference
Now we'll enter a prompt for the simple setup of Hugging Face Translation pipeline using the new Fine Tuned tuned model.  If you want to skip the Fine Tuning, you can just change the `path_to_local_model="t5-small"`.  

In [None]:
prompt = input("Enter a sentence for translation from English to German: ")

In [None]:
import torch
import habana_frameworks.torch

from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer

# Point to the location of the fine-tuned model, If you want to skip the fine tuning step and just run the T5 model direclty, comment out the first line and uncomment the second line:  
path_to_local_model = "./finetune_model_output"
#path_to_local_model ="t5-small"

# Load the tokenizer and model from the specified local directory
tokenizer = AutoTokenizer.from_pretrained(path_to_local_model)
model = AutoModelForSeq2SeqLM.from_pretrained(path_to_local_model)

# Create the Hugging Face pipeline with the input prompt
text = f"translate English to German: {prompt}"
translator_pipeline = pipeline("translation_xx_to_yy", model=path_to_local_model, device="hpu", torch_dtype=torch.bfloat16, max_length=150)
output = translator_pipeline(text)

# Print the results:
print(f"English: {prompt}")
print(f"German: {output[0]['translation_text']}")


#### Simple Gradio Front End for Translation
In this final example, we'll move the Hugging Face pipeline into a Gradio user interface to make it easier to have ongoing translation

In [2]:
!pip install gradio>=4.31.5
%load_ext gradio

In [None]:
import gradio as gr
import os
import requests
import argparse
import json
import torch
import habana_frameworks.torch

from transformers import pipeline, AutoModelForSeq2SeqLM, AutoTokenizer

# Point to the location of the fine-tuned model, If you want to skip the fine tuning step and just run the T5 model direclty, comment out the first line and uncomment the second line:  
path_to_local_model = "./finetune_model_output"
#path_to_local_model ="t5-small"

# Load the tokenizer and model from the specified local directory
tokenizer = AutoTokenizer.from_pretrained(path_to_local_model)
model = AutoModelForSeq2SeqLM.from_pretrained(path_to_local_model)

# Create the translation pipeline
translator_pipeline = pipeline("translation_xx_to_yy", model=path_to_local_model, device="hpu", tokenizer=tokenizer, torch_dtype=torch.bfloat16, max_length=500)

def text_gen(inputs):
    # Format the input text for translation
    text = f"translate English to German: {inputs}"
    outputs = translator_pipeline(text)

    # Extract and return the translation result
    return outputs[0]['translation_text']

inputs = gr.Textbox(label="Prompt", value="I'd like to order a hamburger and a cold glass of beer")
outputs = gr.Markdown(label="Response")

demo = gr.Interface(
        fn=text_gen,
        inputs=inputs,
        outputs=outputs,
        title="Translation on Intel&reg; Gaudi&reg; 2", 
        description="Have a chat with Intel Gaudi",
)

demo.launch()

In [1]:
exit()