### Multi-node distributed inference of *Hugging Face* models with *Transformers Pipeline*, *Pandas UDF*, and *DeepSpeed*

##### Here we perform model inference using the NLP model from [*Hugging Face*](https://huggingface.co/), after we fine tune it. We use [*Transformers Pipeline*](https://huggingface.co/docs/transformers/main_classes/pipelines) to perform the inference task and [*Pandas user-defined functions*](https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/udf-python-pandas) on *Spark* to provide the distributed infrastructure to parallelize the process. [*DeepSpeed*](https://www.microsoft.com/en-us/research/project/deepspeed/) provides the infrastructure for [optimized model inference](https://deepspeed.readthedocs.io/en/latest/inference-init.html).

Import the necessary packages.

In [0]:
import os
import time

import pandas as pd

from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import FloatType, IntegerType, StructType, StructField

import torch
from torch.utils.data import DataLoader, SequentialSampler

from transformers import pipeline
from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification

import deepspeed



Define the number of workers in the cluster and the folder in the Databricks File System to load the prepared data from.

In [0]:
num_workers = 8

test_data_folder = '/test_data'

Read the testing data into a *Spark DataFrame*. We repartition the data by the number of workers in the cluster. We do this to make sure all workers in the cluster are used to process the data and to illustrate the parallel processing with *Pandas UDF* below, given that the data is relatively small.

In [0]:
sdf_test = spark.read.parquet(os.path.join(test_data_folder, 'test_data')).repartition(num_workers)
display(sdf_test.select(sdf_test.columns).limit(10))

text,label,id
"the santa clause 2 is a barely adequate babysitter for older kids , but i've got to give it thumbs down .",0,8589934595
"forages for audience sympathy like a temperamental child begging for attention , giving audiences no reason to truly care for its decrepit freaks beyond the promise of a reprieve from their incessant whining .",0,8589934640
"a dreadful day in irish history is given passionate , if somewhat flawed , treatment .",1,8589934628
"what a dumb , fun , curiously adolescent movie this is .",1,8589934620
"girlfriends are bad , wives are worse and babies are the kiss of death in this bitter italian comedy .",0,8589934615
"the film boasts dry humor and jarring shocks , plus moments of breathtaking mystery .",1,8589934693
"the movie's downfall is to substitute plot for personality . it doesn't really know or care about the characters , and uses them as markers for a series of preordained events .",0,8589934663
an energetic and engaging film that never pretends to be something it isn't .,1,8589934680
"spinning a web of dazzling entertainment may be overstating it , but "" spider-man "" certainly delivers the goods .",1,8589934650
"devotees of star trek ii : the wrath of khan will feel a nagging sense of deja vu , and the grandeur of the best next generation episodes is lacking .",1,8589934662


Here we instantiate the pre-trained model from Hugging Face, but load it with the fine-tuned model weights.

In [0]:
model_type = 'microsoft/deberta-v3-base'
max_length = 128
hidden_dropout_prob = 0.
attention_probs_dropout_prob = 0.
num_labels = 2

model_folder = '/dbfs/model_outputs_deepspeed'
tokenizer = AutoTokenizer.from_pretrained(model_type)
config = AutoConfig.from_pretrained(model_type, hidden_dropout_prob=hidden_dropout_prob,
                                    attention_probs_dropout_prob=attention_probs_dropout_prob, num_labels=num_labels)
model = AutoModelForSequenceClassification.from_pretrained(model_folder, config=config)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


We create a [*TextClassificationPipeline*](https://huggingface.co/docs/transformers/v4.21.3/en/main_classes/pipelines#transformers.TextClassificationPipeline) object, which allows us to make model predictions directly from the input text. In this case, the *tokenization* is performed on-the-fly.

We then wrap the model with [*DeepSpeed*, for optimized inference](https://deepspeed.readthedocs.io/en/latest/inference-init.html).

In [0]:
pipe = pipeline('text-classification', tokenizer=tokenizer, model=model, device=0)
pipe.model = deepspeed.init_inference(model=pipe.model, mp_size=1, dtype=torch.half, replace_method='auto', replace_with_kernel_inject=True)

[2022-09-14 17:50:29,682] [INFO] [logging.py:68:log_dist] [Rank -1] DeepSpeed info: version=0.7.2, git-hash=unknown, git-branch=unknown
[2022-09-14 17:50:29,682] [INFO] [logging.py:68:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1


Here is where we define the function that performs model inference over the testing data, using the *pipeline* object defined above.

The model inference is performed in parallel, over the Spark DataFrame, using Spark's [*Pandas UDF*](https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/udf-python-pandas) functionality.

In [0]:
schema = StructType([
  StructField('label', IntegerType()),
  StructField('score', FloatType())
])

@pandas_udf(schema)
def predict(text: pd.Series) -> pd.DataFrame:
    pred = pipe(text.to_list())
    res = pd.DataFrame(pred)
    res['label'][res['label'] == 'LABEL_1'] = 1
    res['label'][res['label'] == 'LABEL_0'] = 0
    res['label'] = res['label'].astype(int)
    return res

The model inference function defined above is then executed through the *select* function from the *DataFrame* API. It returns a new column with the predicted labels and corresponding scores, given by the *pipeline* execution.

We then split the returned column into three separate columns corresponding to its components.

In [0]:
sdf_test = sdf_test.select('id', 'text', 'label', predict('text').alias('predictions'))

sdf_test = sdf_test.withColumns({'predicted_label': sdf_test['predictions'].getItem('label'),
                                 'score': sdf_test['predictions'].getItem('score')}).drop('predictions')

Display the testing data with the predicted values.

In [0]:
start_time = time.time()
display(sdf_test.select(sdf_test.columns).limit(10))
end_time = time.time()

print('\nTotal Duration: %f' % (round(end_time-start_time, 3)))

id,text,label,predicted_label,score
25769803801,"the film has a laundry list of minor shortcomings , but the numerous scenes of gory mayhem are worth the price of admission . . . if "" gory mayhem "" is your idea of a good time .",1,1,0.9462997
25769803864,"certainly beautiful to look at , but its not very informative about its titular character and no more challenging than your average television biopic .",0,0,0.9870297
25769803849,"rarely has skin looked as beautiful , desirable , even delectable , as it does in trouble every day .",1,1,0.99444515
25769803887,"i walked away not really know who "" they "" were , what "" they "" looked like . why "" they "" were here and what "" they "" wanted and quite honestly , i didn't care .",0,0,0.64288104
25769803790,"though overall an overwhelmingly positive portrayal , the film doesn't ignore the more problematic aspects of brown's life .",1,1,0.98292196
25769803798,"even if the naipaul original remains the real masterpiece , the movie possesses its own languorous charm .",1,1,0.9904959
25769803786,"if s&m seems like a strange route to true love , maybe it is , but it's to this film's ( and its makers' ) credit that we believe that that's exactly what these two people need to find each other -- and themselves .",1,1,0.9892855
25769803791,tells ( the story ) with such atmospheric ballast that shrugging off the plot's persnickety problems is simply a matter of ( being ) in a shrugging mood .,1,1,0.9807473
25769803876,fresnadillo has something serious to say about the ways in which extravagant chance can distort our perspective and throw us off the path of good sense .,1,1,0.9926399
25769803874,the closest thing to the experience of space travel,1,1,0.9012708



Total Duration: 18.267000
