# PySpark Huggingface Inferencing
## Conditional generation

From: https://huggingface.co/docs/transformers/model_doc/t5

### Using PyTorch

In [1]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

max_source_length = 512
max_target_length = 128

task_prefix = "translate English to German: "

lines = [
    "The house is wonderful",
    "Welcome to NYC",
    "HuggingFace is a company"
]

input_sequences = [task_prefix + l for l in lines]

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
input_ids = tokenizer(input_sequences, 
                      padding="longest", 
                      max_length=max_source_length,
                      return_tensors="pt").input_ids
outputs = model.generate(input_ids)



In [3]:
[tokenizer.decode(o, skip_special_tokens=True) for o in outputs]

['Das Haus ist wunderbar',
 'Willkommen in NYC',
 'HuggingFace ist ein Unternehmen']

In [4]:
model.framework

'pt'

### Using TensorFlow

In [5]:
from transformers import T5Tokenizer, TFT5ForConditionalGeneration

In [6]:
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = TFT5ForConditionalGeneration.from_pretrained("t5-small")

max_source_length = 512
max_target_length = 128

task_prefix = "translate English to German: "

lines = [
    "The house is wonderful",
    "Welcome to NYC",
    "HuggingFace is a company"
]

input_sequences = [task_prefix + l for l in lines]

All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [7]:
input_ids = tokenizer(input_sequences, 
                      padding="longest", 
                      max_length=max_source_length,
                      return_tensors="tf").input_ids
outputs = model.generate(input_ids)



In [8]:
[tokenizer.decode(o, skip_special_tokens=True) for o in outputs]

['Das Haus ist wunderbar',
 'Willkommen in NYC',
 'HuggingFace ist ein Unternehmen']

In [9]:
model.framework

'tf'

## PySpark

In [10]:
import os
from pathlib import Path
from torchtext.datasets import IMDB

In [11]:
# load IMDB reviews (test) dataset
data = IMDB(split='test')

In [12]:
# convert to nested array of string for pyspark
lines = []
for label, text in data:
    # only take first sentence of IMDB review
    lines.append([text])
len(lines)

25000

### Create PySpark DataFrame

In [13]:
from pyspark.sql.types import *

In [14]:
df = spark.createDataFrame(lines, ['lines']).repartition(10)
df.schema

StructType([StructField('lines', StringType(), True)])

In [15]:
df.take(1)

23/05/19 19:03:36 WARN TaskSetManager: Stage 0 contains a task of very large size (3858 KiB). The maximum recommended task size is 1000 KiB.
                                                                                

[Row(lines='Now and again, a film comes around purely by accident that makes you doubt your sanity. We just finished studying the novel, "Northanger Abbey", at school and decided to refresh our memory of this unexciting piece of humourless garbage with the BBC adaptation.<br /><br />The funny thing about Northanger Abbey is that it actually makes you want to kill yourself. The film is NOTHING like the book, for example, the subtly evil characters seem to have been turned into transparent stereotypes. John Thorpe looks like a leprechaun on acid while Isabella plays the role of slut. Catherine, the main character, is the most depressingly stupid and irritating actress on god\'s earth (she looks like a coffee addict, her eyes are like basketballs) whilst Mr Tilney looks and acts like a retired porno stunt double. The plot goes completely off the rails at certain points of the film, I don\'t know what the hell the director was thinking when for no reason at all, a 7 year old black kid who 

### Save the test dataset as parquet files

In [16]:
df.write.mode("overwrite").parquet("imdb_test")

23/05/19 19:03:39 WARN TaskSetManager: Stage 3 contains a task of very large size (3858 KiB). The maximum recommended task size is 1000 KiB.
                                                                                

### Check arrow memory configuration

In [17]:
spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", "512")
# This line will fail if the vectorized reader runs out of memory
assert len(df.head()) > 0, "`df` should not be empty"

23/05/19 19:03:40 WARN TaskSetManager: Stage 6 contains a task of very large size (3858 KiB). The maximum recommended task size is 1000 KiB.


## Inference using Spark DL API (PyTorch)
Note: you can restart the kernel and run from this point to simulate running in a different node or environment.

In [18]:
import pandas as pd
from pyspark.ml.functions import predict_batch_udf
from pyspark.sql.functions import col, pandas_udf, struct
from pyspark.sql.types import StringType

In [19]:
# only use first sentence and add prefix for conditional generation
def preprocess(text: pd.Series, prefix: str = "") -> pd.Series:
    @pandas_udf("string")
    def _preprocess(text: pd.Series) -> pd.Series:
        return pd.Series([prefix + s.split(".")[0] for s in text])
    return _preprocess(text)

In [20]:
# only use first N examples, since this is slow
df = spark.read.parquet("imdb_test").limit(100)
df.show(truncate=120)
df.count()

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   lines|
+------------------------------------------------------------------------------------------------------------------------+
|...But not this one! I always wanted to know "what happened" next. We will never know for sure what happened because ...|
|Hard up, No proper jobs going down at the pit, why not rent your kids! DIY pimp story without the gratuitous sex scen...|
|I watched this movie to see the direction one of the most promising young talents in movies was going. Unfortunately,...|
|This movie makes you wish imdb would let you vote a zero. One of the two movies I've ever walked out of. It's very ha...|
|I never want to see this movie again!<br /><br />Not only is it dreadfully bad, but I can't stand seeing my hero Stan...|
|(As a note, I'd

100

In [21]:
# only use first 100 rows, since generation takes a while
df1 = df.withColumn("input", preprocess(col("lines"), "Translate English to German: ")).select("input").limit(100).cache()

In [22]:
df1.count()

                                                                                

100

In [23]:
df1.show(truncate=120)

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|                                                                                           Translate English to German: |
|Translate English to German: Hard up, No proper jobs going down at the pit, why not rent your kids! DIY pimp story wi...|
|Translate English to German: I watched this movie to see the direction one of the most promising young talents in mov...|
|                                   Translate English to German: This movie makes you wish imdb would let you vote a zero|
|Translate English to German: I never want to see this movie again!<br /><br />Not only is it dreadfully bad, but I ca...|
|Translate Engli

In [24]:
def predict_batch_fn():
    import numpy as np
    from transformers import T5ForConditionalGeneration, T5Tokenizer
    model = T5ForConditionalGeneration.from_pretrained("t5-small")
    tokenizer = T5Tokenizer.from_pretrained("t5-small")

    def predict(inputs):
        flattened = np.squeeze(inputs).tolist()   # convert 2d numpy array of string into flattened python list
        input_ids = tokenizer(flattened, 
                              padding="longest", 
                              max_length=128,
                              return_tensors="pt").input_ids
        output_ids = model.generate(input_ids)
        string_outputs = np.array([tokenizer.decode(o, skip_special_tokens=True) for o in output_ids])
        print("predict: {}".format(len(flattened)))
        return string_outputs
    
    return predict

In [25]:
generate = predict_batch_udf(predict_batch_fn,
                             return_type=StringType(),
                             batch_size=10)

In [26]:
%%time
# first pass caches model/fn
preds = df1.withColumn("preds", generate(struct("input")))
results = preds.collect()

[Stage 21:>                                                         (0 + 1) / 1]

CPU times: user 23.1 ms, sys: 4.47 ms, total: 27.6 ms
Wall time: 22.6 s


                                                                                

In [27]:
%%time
preds = df1.withColumn("preds", generate("input"))
results = preds.collect()

[Stage 23:>                                                         (0 + 1) / 1]

CPU times: user 15.7 ms, sys: 2.96 ms, total: 18.7 ms
Wall time: 16 s


                                                                                

In [28]:
%%time
preds = df1.withColumn("preds", generate(col("input")))
results = preds.collect()

[Stage 25:>                                                         (0 + 1) / 1]

CPU times: user 9.34 ms, sys: 4.6 ms, total: 13.9 ms
Wall time: 16 s


                                                                                

In [29]:
preds.show(truncate=60)

[Stage 27:>                                                         (0 + 1) / 1]

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|                               Translate English to German: |                                    Übersetzen Sie Englisch.|
|Translate English to German: Hard up, No proper jobs goin...|                              Warum nicht die Kinder mieten?|
|Translate English to German: I watched this movie to see ...|Ich habe diesen Film gesehen, um zu sehen, in welche Rich...|
|Translate English to German: This movie makes you wish im...|Dieser Film macht Sie sich wünschen, dass imdb Sie es Ihn...|
|Translate English to German: I never want to see this mov...|             Ich möchte diesen Film nie wieder sehen!br />br|
|Transla

                                                                                

In [30]:
# only use first 100 rows, since generation takes a while
df2 = df.withColumn("input", preprocess(col("lines"), "Translate English to French: ")).select("input").limit(100).cache()

In [31]:
df2.show(truncate=120)



+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|                                                                                           Translate English to French: |
|Translate English to French: Hard up, No proper jobs going down at the pit, why not rent your kids! DIY pimp story wi...|
|Translate English to French: I watched this movie to see the direction one of the most promising young talents in mov...|
|                                   Translate English to French: This movie makes you wish imdb would let you vote a zero|
|Translate English to French: I never want to see this movie again!<br /><br />Not only is it dreadfully bad, but I ca...|
|Translate Engli

                                                                                

In [32]:
%%time
# first pass caches model/fn
preds = df2.withColumn("preds", generate(struct("input")))
result = preds.collect()

[Stage 31:>                                                         (0 + 1) / 1]

CPU times: user 20.3 ms, sys: 0 ns, total: 20.3 ms
Wall time: 22 s


                                                                                

In [33]:
%%time
preds = df2.withColumn("preds", generate("input"))
result = preds.collect()

[Stage 33:>                                                         (0 + 1) / 1]

CPU times: user 16.8 ms, sys: 0 ns, total: 16.8 ms
Wall time: 15.4 s


                                                                                

In [34]:
%%time
preds = df2.withColumn("preds", generate(col("input")))
result = preds.collect()

[Stage 35:>                                                         (0 + 1) / 1]

CPU times: user 12.6 ms, sys: 791 µs, total: 13.3 ms
Wall time: 15.4 s


                                                                                

In [35]:
preds.show(truncate=60)

[Stage 37:>                                                         (0 + 1) / 1]

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|                               Translate English to French: |                                                           :|
|Translate English to French: Hard up, No proper jobs goin...|                       Vous ne pouvez pas louer vos enfants!|
|Translate English to French: I watched this movie to see ...|J’ai regardé ce film pour voir la direction d’un des jeun...|
|Translate English to French: This movie makes you wish im...|Ce film vous fait envie de voir imdb vous laisser voter zéro|
|Translate English to French: I never want to see this mov...|               Je ne veux jamais voir ce film à nouveau!br /|
|Transla

                                                                                

### Using Triton Inference Server

Note: you can restart the kernel and run from this point to simulate running in a different node or environment.

This notebook uses the [Python backend with a custom execution environment](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments), using a conda-pack environment created as follows:
```
conda create -n huggingface -c conda-forge python=3.8
conda activate huggingface

export PYTHONUSERSITE=True
pip install conda-pack sentencepiece sentence_transformers transformers

conda-pack  # huggingface.tar.gz
```

In [36]:
import numpy as np
import pandas as pd
import os
from pyspark.ml.functions import predict_batch_udf
from pyspark.sql.functions import col, struct, pandas_udf
from pyspark.sql.types import FloatType, StringType, StructField, StructType

In [37]:
%%bash
# copy custom model to expected layout for Triton
rm -rf models
mkdir -p models
cp -r models_config/hf_generation models

# add custom execution environment
cp huggingface.tar.gz models

#### Start Triton Server on each executor

In [38]:
num_executors = 1
triton_models_dir = "{}/models".format(os.getcwd())
huggingface_cache_dir = "{}/.cache/huggingface".format(os.path.expanduser('~'))
nodeRDD = sc.parallelize(list(range(num_executors)), num_executors)

def start_triton(it):
    import docker
    import time
    import tritonclient.grpc as grpcclient
    
    client=docker.from_env()
    containers=client.containers.list(filters={"name": "spark-triton"})
    if containers:
        print(">>>> containers: {}".format([c.short_id for c in containers]))
    else:
        container=client.containers.run(
            "nvcr.io/nvidia/tritonserver:23.04-py3", "tritonserver --model-repository=/models",
            detach=True,
            device_requests=[docker.types.DeviceRequest(device_ids=["0"], capabilities=[['gpu']])],
            environment=[
                "TRANSFORMERS_CACHE=/cache"
            ],
            name="spark-triton",
            network_mode="host",
            remove=True,
            shm_size="256M",
            volumes={
                triton_models_dir: {"bind": "/models", "mode": "ro"},
                huggingface_cache_dir: {"bind": "/cache", "mode": "rw"}
            }
        )
        print(">>>> starting triton: {}".format(container.short_id))

        # wait for triton to be running
        time.sleep(15)
        client = grpcclient.InferenceServerClient("localhost:8001")
        ready = False
        while not ready:
            try:
                ready = client.is_server_ready()
            except Exception as e:
                time.sleep(5)

    return [True]

nodeRDD.barrier().mapPartitions(start_triton).collect()

                                                                                

[True]

#### Run inference

In [39]:
import pandas as pd
from functools import partial
from pyspark.ml.functions import predict_batch_udf
from pyspark.sql.functions import col, pandas_udf, struct
from pyspark.sql.types import StringType

In [40]:
# only use first N examples, since this is slow
df = spark.read.parquet("imdb_test").limit(100).cache()

In [41]:
# only use first sentence and add prefix for conditional generation
def preprocess(text: pd.Series, prefix: str = "") -> pd.Series:
    @pandas_udf("string")
    def _preprocess(text: pd.Series) -> pd.Series:
        return pd.Series([prefix + s.split(".")[0] for s in text])
    return _preprocess(text)

In [42]:
# only use first 100 rows, since generation takes a while
df1 = df.withColumn("input", preprocess(col("lines"), "Translate English to German: ")).select("input").limit(100)

In [43]:
df1.show(truncate=120)

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|                                                                                           Translate English to German: |
|Translate English to German: Hard up, No proper jobs going down at the pit, why not rent your kids! DIY pimp story wi...|
|Translate English to German: I watched this movie to see the direction one of the most promising young talents in mov...|
|                                   Translate English to German: This movie makes you wish imdb would let you vote a zero|
|Translate English to German: I never want to see this movie again!<br /><br />Not only is it dreadfully bad, but I ca...|
|Translate Engli

In [44]:
def triton_fn(triton_uri, model_name):
    import numpy as np
    import tritonclient.grpc as grpcclient
    
    np_types = {
      "BOOL": np.dtype(np.bool8),
      "INT8": np.dtype(np.int8),
      "INT16": np.dtype(np.int16),
      "INT32": np.dtype(np.int32),
      "INT64": np.dtype(np.int64),
      "FP16": np.dtype(np.float16),
      "FP32": np.dtype(np.float32),
      "FP64": np.dtype(np.float64),
      "FP64": np.dtype(np.double),
      "BYTES": np.dtype(object)
    }

    client = grpcclient.InferenceServerClient(triton_uri)
    model_meta = client.get_model_metadata(model_name)
    
    def predict(inputs):
        if isinstance(inputs, np.ndarray):
            # single ndarray input
            request = [grpcclient.InferInput(model_meta.inputs[0].name, inputs.shape, model_meta.inputs[0].datatype)]
            request[0].set_data_from_numpy(inputs.astype(np_types[model_meta.inputs[0].datatype]))
        else:
            # dict of multiple ndarray inputs
            request = [grpcclient.InferInput(i.name, inputs[i.name].shape, i.datatype) for i in model_meta.inputs]
            for i in request:
                i.set_data_from_numpy(inputs[i.name()].astype(np_types[i.datatype()]))
        
        response = client.infer(model_name, inputs=request)
        
        if len(model_meta.outputs) > 1:
            # return dictionary of numpy arrays
            return {o.name: response.as_numpy(o.name) for o in model_meta.outputs}
        else:
            # return single numpy array
            return response.as_numpy(model_meta.outputs[0].name)
        
    return predict

In [45]:
generate = predict_batch_udf(partial(triton_fn, triton_uri="localhost:8001", model_name="hf_generation"),
                             return_type=StringType(),
                             input_tensor_shapes=[[1]],
                             batch_size=100)

In [46]:
%%time
# first pass caches model/fn
preds = df1.withColumn("preds", generate(struct("input")))
results = preds.collect()

[Stage 43:>                                                         (0 + 1) / 1]

CPU times: user 22.3 ms, sys: 4.66 ms, total: 27 ms
Wall time: 4.47 s


                                                                                

In [47]:
%%time
preds = df1.withColumn("preds", generate("input"))
results = preds.collect()

[Stage 45:>                                                         (0 + 1) / 1]

CPU times: user 9.54 ms, sys: 4.29 ms, total: 13.8 ms
Wall time: 4.31 s


                                                                                

In [48]:
%%time
preds = df1.withColumn("preds", generate(col("input")))
results = preds.collect()

[Stage 47:>                                                         (0 + 1) / 1]

CPU times: user 11.7 ms, sys: 9.72 ms, total: 21.4 ms
Wall time: 4.22 s


                                                                                

In [49]:
preds.show(truncate=60)

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|                               Translate English to German: |                                    Übersetzen Sie Englisch.|
|Translate English to German: Hard up, No proper jobs goin...|                              Warum nicht die Kinder mieten?|
|Translate English to German: I watched this movie to see ...|Ich habe diesen Film gesehen, um zu sehen, in welche Rich...|
|Translate English to German: This movie makes you wish im...|Dieser Film macht Sie sich wünschen, dass imdb Sie es Ihn...|
|Translate English to German: I never want to see this mov...|             Ich möchte diesen Film nie wieder sehen!br />br|
|Transla

In [50]:
# only use first 100 rows, since generation takes a while
df2 = df.withColumn("input", preprocess(col("lines"), "Translate English to French: ")).select("input").limit(100).cache()

23/05/19 19:06:28 WARN CacheManager: Asked to cache already cached data.


In [51]:
df2.show(truncate=120)

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|                                                                                           Translate English to French: |
|Translate English to French: Hard up, No proper jobs going down at the pit, why not rent your kids! DIY pimp story wi...|
|Translate English to French: I watched this movie to see the direction one of the most promising young talents in mov...|
|                                   Translate English to French: This movie makes you wish imdb would let you vote a zero|
|Translate English to French: I never want to see this movie again!<br /><br />Not only is it dreadfully bad, but I ca...|
|Translate Engli

In [52]:
%%time
preds = df2.withColumn("preds", generate(struct("input")))
results = preds.collect()

[Stage 53:>                                                         (0 + 1) / 1]

CPU times: user 8.14 ms, sys: 12.6 ms, total: 20.8 ms
Wall time: 4.75 s


                                                                                

In [53]:
%%time
preds = df2.withColumn("preds", generate("input"))
results = preds.collect()

[Stage 55:>                                                         (0 + 1) / 1]

CPU times: user 11.6 ms, sys: 3 ms, total: 14.6 ms
Wall time: 3.87 s


                                                                                

In [54]:
%%time
preds = df2.withColumn("preds", generate(col("input")))
results = preds.collect()

[Stage 57:>                                                         (0 + 1) / 1]

CPU times: user 13.1 ms, sys: 4.3 ms, total: 17.4 ms
Wall time: 3.9 s


                                                                                

In [55]:
preds.show(truncate=60)

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|                               Translate English to French: |                                                           :|
|Translate English to French: Hard up, No proper jobs goin...|                       Vous ne pouvez pas louer vos enfants!|
|Translate English to French: I watched this movie to see ...|J’ai regardé ce film pour voir la direction d’un des jeun...|
|Translate English to French: This movie makes you wish im...|Ce film vous fait envie de voir imdb vous laisser voter zéro|
|Translate English to French: I never want to see this mov...|               Je ne veux jamais voir ce film à nouveau!br /|
|Transla

                                                                                

#### Stop Triton Server on each executor

In [56]:
def stop_triton(it):
    import docker
    import time
    
    client=docker.from_env()
    containers=client.containers.list(filters={"name": "spark-triton"})
    print(">>>> stopping containers: {}".format([c.short_id for c in containers]))
    if containers:
        container=containers[0]
        container.stop(timeout=120)

    return [True]

nodeRDD.barrier().mapPartitions(stop_triton).collect()

                                                                                

[True]

In [57]:
spark.stop()