# PySpark Huggingface Inferencing
## Conditional generation with Tensorflow

From: https://huggingface.co/docs/transformers/model_doc/t5

### Using TensorFlow
Note that cuFFT/cuDNN/cuBLAS registration errors are expected with `tf=2.17.0` and will not affect behavior, as noted in [this issue.](https://github.com/tensorflow/tensorflow/issues/62075)  
This notebook does not demonstrate inference with TensorRT, as [TF-TRT](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#tensorrt-10) does not yet support `tf=2.17.0`. See the `pytorch` notebooks for TensorRT demos.

In [1]:
from transformers import AutoTokenizer, TFT5ForConditionalGeneration

2024-10-11 00:16:59.451769: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-11 00:16:59.459246: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-11 00:16:59.467162: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-11 00:16:59.469569: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-11 00:16:59.475888: I tensorflow/core/platform/cpu_feature_guar

Enabling Huggingface tokenizer parallelism so that it is not automatically disabled with Python parallelism. See [this thread](https://github.com/huggingface/transformers/issues/5486) for more info. 

In [2]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"

In [3]:
import tensorflow as tf

# Enable GPU memory growth
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)
        
print(tf.__version__)

2.17.0


In [4]:
tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")
model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")

task_prefix = "translate English to German: "

lines = [
    "The house is wonderful",
    "Welcome to NYC",
    "HuggingFace is a company"
]

input_sequences = [task_prefix + l for l in lines]

2024-10-11 00:17:00.886565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 46024 MB memory:  -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:01:00.0, compute capability: 8.6
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [5]:
input_ids = tokenizer(input_sequences, 
                      padding="longest", 
                      max_length=512,
                      truncation=True,
                      return_tensors="tf").input_ids
outputs = model.generate(input_ids, max_length=20)

I0000 00:00:1728605822.106234  276792 service.cc:146] XLA service 0x7f53a8003630 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1728605822.106259  276792 service.cc:154]   StreamExecutor device (0): NVIDIA RTX A6000, Compute Capability 8.6
2024-10-11 00:17:02.108842: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-10-11 00:17:02.117215: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907
I0000 00:00:1728605822.137920  276792 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


In [6]:
[tokenizer.decode(o, skip_special_tokens=True) for o in outputs]

['Das Haus ist wunderbar',
 'Willkommen in NYC',
 'HuggingFace ist ein Unternehmen']

In [7]:
model.framework

'tf'

## PySpark

In [8]:
import os
from pathlib import Path
from datasets import load_dataset

In [9]:
from pyspark.sql.types import *
from pyspark.sql import SparkSession
from pyspark import SparkConf
import socket

In [10]:
conda_env = os.environ.get("CONDA_PREFIX")
hostname = socket.gethostname()

conf = SparkConf()
if 'spark' not in globals():
    # If Spark is not already started with Jupyter, attach to Spark Standalone
    import socket
    hostname = socket.gethostname()
    conf.setMaster(f"spark://{hostname}:7077") # assuming Master is on default port 7077
conf.set("spark.task.maxFailures", "1")
conf.set("spark.driver.memory", "8g")
conf.set("spark.executor.memory", "8g")
conf.set("spark.pyspark.python", f"{conda_env}/bin/python")
conf.set("spark.pyspark.driver.python", f"{conda_env}/bin/python")
conf.set("spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled", "false")
conf.set("spark.sql.pyspark.jvmStacktrace.enabled", "true")
conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", "512")
conf.set("spark.python.worker.reuse", "true")
# Create Spark Session
spark = SparkSession.builder.appName("spark-dl-examples").config(conf=conf).getOrCreate()
sc = spark.sparkContext

24/10/11 00:17:03 WARN Utils: Your hostname, cb4ae00-lcedt resolves to a loopback address: 127.0.1.1; using 10.110.47.100 instead (on interface eno1)
24/10/11 00:17:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/10/11 00:17:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [11]:
# load IMDB reviews (test) dataset
data = load_dataset("imdb", split="test")

In [12]:
lines = []
for example in data:
    lines.append([example["text"].split(".")[0]])

len(lines)

25000

### Create PySpark DataFrame

In [13]:
df = spark.createDataFrame(lines, ['lines']).repartition(8)
df.schema

StructType([StructField('lines', StringType(), True)])

In [14]:
df.take(1)

                                                                                

[Row(lines='(Some Spoilers) Dull as dishwater slasher flick that has this deranged homeless man Harry, Darwyn Swalve, out murdering real-estate agent all over the city of L')]

### Save the test dataset as parquet files

In [15]:
df.write.mode("overwrite").parquet("imdb_test")

### Check arrow memory configuration

In [16]:
if int(spark.conf.get("spark.sql.execution.arrow.maxRecordsPerBatch")) > 512:
    print("Decreasing `spark.sql.execution.arrow.maxRecordsPerBatch` to ensure the vectorized reader won't run out of memory")
    spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", "512")
assert len(df.head()) > 0, "`df` should not be empty"

## Inference using Spark DL API
Note: you can restart the kernel and run from this point to simulate running in a different node or environment.

In [17]:
import pandas as pd
from pyspark.ml.functions import predict_batch_udf
from pyspark.sql.functions import col, pandas_udf, struct
from pyspark.sql.types import StringType

In [18]:
# only use first sentence and add prefix for conditional generation
def preprocess(text: pd.Series, prefix: str = "") -> pd.Series:
    @pandas_udf("string")
    def _preprocess(text: pd.Series) -> pd.Series:
        return pd.Series([prefix + s.split(".")[0] for s in text])
    return _preprocess(text)

In [19]:
# only use first N examples, since this is slow
df = spark.read.parquet("imdb_test").limit(100)
df.show(truncate=120)
df.count()

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   lines|
+------------------------------------------------------------------------------------------------------------------------+
|                                       This is so overly clichéd you'll want to switch it off after the first 45 minutes|
|                                                                      I am a big fan of The ABC Movies of the Week genre|
|In the early 1990's "Step-by-Step" came as a tedious combination of the ultra-cheesy "Full House" and the long-defunc...|
|When The Spirits Within was released, all you heard from Final Fantasy fans was how awful the movie was because it di...|
|                                                                    I like to think of myself as a bad movie connoisseur|
|This film did w

100

In [20]:
# only use first 100 rows, since generation takes a while
df1 = df.withColumn("input", preprocess(col("lines"), "Translate English to German: ")).select("input").limit(100).cache()

In [21]:
df1.count()

100

In [22]:
df1.show(truncate=120)

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|          Translate English to German: This is so overly clichéd you'll want to switch it off after the first 45 minutes|
|                                         Translate English to German: I am a big fan of The ABC Movies of the Week genre|
|Translate English to German: In the early 1990's "Step-by-Step" came as a tedious combination of the ultra-cheesy "Fu...|
|Translate English to German: When The Spirits Within was released, all you heard from Final Fantasy fans was how awfu...|
|                                       Translate English to German: I like to think of myself as a bad movie connoisseur|
|Translate Engli

In [23]:
def predict_batch_fn():
    import tensorflow as tf
    import numpy as np
    from transformers import TFT5ForConditionalGeneration, AutoTokenizer

    # Enable GPU memory growth
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        try:
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
        except RuntimeError as e:
            print(e)

    model = TFT5ForConditionalGeneration.from_pretrained("google-t5/t5-small")
    tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small")

    def predict(inputs):
        flattened = np.squeeze(inputs).tolist()   # convert 2d numpy array of string into flattened python list
        input_ids = tokenizer(flattened, 
                              padding="longest", 
                              max_length=512,
                              return_tensors="tf").input_ids
        output_ids = model.generate(input_ids, max_length=20)
        string_outputs = np.array([tokenizer.decode(o, skip_special_tokens=True) for o in output_ids])
        print("predict: {}".format(len(flattened)))

        return string_outputs
    
    return predict

In [24]:
generate = predict_batch_udf(predict_batch_fn,
                             return_type=StringType(),
                             batch_size=10)

In [25]:
%%time
# first pass caches model/fn
preds = df1.withColumn("preds", generate(struct("input")))
results = preds.collect()

[Stage 21:>                                                         (0 + 1) / 1]

CPU times: user 9.39 ms, sys: 2.14 ms, total: 11.5 ms
Wall time: 11.4 s


                                                                                

In [26]:
%%time
preds = df1.withColumn("preds", generate("input"))
results = preds.collect()

[Stage 23:>                                                         (0 + 1) / 1]

CPU times: user 3.62 ms, sys: 4.01 ms, total: 7.64 ms
Wall time: 8.53 s


                                                                                

In [27]:
%%time
preds = df1.withColumn("preds", generate(col("input")))
results = preds.collect()

[Stage 25:>                                                         (0 + 1) / 1]

CPU times: user 5.37 ms, sys: 2.51 ms, total: 7.88 ms
Wall time: 8.52 s


                                                                                

In [28]:
preds.show(truncate=60)

[Stage 27:>                                                         (0 + 1) / 1]

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|Translate English to German: This is so overly clichéd yo...|   Das ist so übertrieben klischeehaft, dass Sie es nach den|
|Translate English to German: I am a big fan of The ABC Mo...|       Ich bin ein großer Fan von The ABC Movies of the Week|
|Translate English to German: In the early 1990's "Step-by...|          Anfang der 1990er Jahre kam "Step-by-Step" als müh|
|Translate English to German: When The Spirits Within was ...|Als The Spirits Within veröffentlicht wurde, hörten Sie v...|
|Translate English to German: I like to think of myself as...|           Ich halte mich gerne als schlechter Filmliebhaber|
|Transla

                                                                                

In [29]:
# only use first 100 rows, since generation takes a while
df2 = df.withColumn("input", preprocess(col("lines"), "Translate English to French: ")).select("input").limit(100).cache()

In [30]:
df2.show(truncate=120)

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|          Translate English to French: This is so overly clichéd you'll want to switch it off after the first 45 minutes|
|                                         Translate English to French: I am a big fan of The ABC Movies of the Week genre|
|Translate English to French: In the early 1990's "Step-by-Step" came as a tedious combination of the ultra-cheesy "Fu...|
|Translate English to French: When The Spirits Within was released, all you heard from Final Fantasy fans was how awfu...|
|                                       Translate English to French: I like to think of myself as a bad movie connoisseur|
|Translate Engli

In [31]:
%%time
# first pass caches model/fn
preds = df2.withColumn("preds", generate(struct("input")))
result = preds.collect()

[Stage 33:>                                                         (0 + 1) / 1]

CPU times: user 2.9 ms, sys: 5.97 ms, total: 8.87 ms
Wall time: 11.7 s


                                                                                

In [32]:
%%time
preds = df2.withColumn("preds", generate("input"))
result = preds.collect()

[Stage 35:>                                                         (0 + 1) / 1]

CPU times: user 4.41 ms, sys: 1.59 ms, total: 5.99 ms
Wall time: 8.23 s


                                                                                

In [33]:
%%time
preds = df2.withColumn("preds", generate(col("input")))
result = preds.collect()

[Stage 37:>                                                         (0 + 1) / 1]

CPU times: user 5.46 ms, sys: 1.17 ms, total: 6.63 ms
Wall time: 8.08 s


                                                                                

In [34]:
preds.show(truncate=60)

[Stage 39:>                                                         (0 + 1) / 1]

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|Translate English to French: This is so overly clichéd yo...|                 Vous ne pouvez pas en tirer d'un tel cliché|
|Translate English to French: I am a big fan of The ABC Mo...|    Je suis un grand fan du genre The ABC Movies of the Week|
|Translate English to French: In the early 1990's "Step-by...|          Au début des années 1990, «Step-by-Step» a été une|
|Translate English to French: When The Spirits Within was ...|Lorsque The Spirits Within a été publié, tout ce que vous...|
|Translate English to French: I like to think of myself as...|       Je me considère comme un mauvais réalisateur de films|
|Transla

                                                                                

### Using Triton Inference Server

Note: you can restart the kernel and run from this point to simulate running in a different node or environment.  

This notebook uses the [Python backend with a custom execution environment](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments) with the compatible versions of Python/Numpy for Triton 24.08, using a conda-pack environment created as follows:
```
conda create -n huggingface-tf -c conda-forge python=3.10.0
conda activate huggingface-tf

export PYTHONNOUSERSITE=True
pip install numpy==1.26.4 tensorflow[and-cuda] tf-keras transformers conda-pack 

conda-pack  # huggingface-tf.tar.gz
```

In [35]:
import os

In [36]:
%%bash
# copy custom model to expected layout for Triton
rm -rf models
mkdir -p models
cp -r models_config/hf_generation_tf models

# add custom execution environment
cp huggingface-tf.tar.gz models

#### Start Triton Server on each executor

In [37]:
num_executors = 1
triton_models_dir = "{}/models".format(os.getcwd())
huggingface_cache_dir = "{}/.cache/huggingface".format(os.path.expanduser('~'))
nodeRDD = sc.parallelize(list(range(num_executors)), num_executors)

def start_triton(it):
    import docker
    import time
    import tritonclient.grpc as grpcclient
    
    client=docker.from_env()
    containers=client.containers.list(filters={"name": "spark-triton"})
    if containers:
        print(">>>> containers: {}".format([c.short_id for c in containers]))
    else:
        try:
            container=client.containers.run(
                "nvcr.io/nvidia/tritonserver:24.08-py3", "tritonserver --model-repository=/models",
                detach=True,
                device_requests=[docker.types.DeviceRequest(device_ids=["0"], capabilities=[['gpu']])],
                environment=[
                    "TRANSFORMERS_CACHE=/cache"
                ],
                name="spark-triton",
                network_mode="host",
                remove=True,
                shm_size="1G",
                volumes={
                    triton_models_dir: {"bind": "/models", "mode": "ro"},
                    huggingface_cache_dir: {"bind": "/cache", "mode": "rw"}
                }
            )
            print(">>>> starting triton: {}".format(container.short_id))
        except Exception as e:
            print(">>>> failed to start triton: {}".format(e))
        # wait for triton to be running
        time.sleep(15)
        client = grpcclient.InferenceServerClient("localhost:8001")
        ready = False
        while not ready:
            try:
                ready = client.is_server_ready()
            except Exception as e:
                time.sleep(5)

    return [True]

nodeRDD.barrier().mapPartitions(start_triton).collect()

                                                                                

[True]

#### Run inference

In [38]:
import pandas as pd
from functools import partial
from pyspark.ml.functions import predict_batch_udf
from pyspark.sql.functions import col, pandas_udf, struct
from pyspark.sql.types import StringType

In [39]:
# only use first N examples, since this is slow
df = spark.read.parquet("imdb_test").limit(100).cache()

In [40]:
# only use first sentence and add prefix for conditional generation
def preprocess(text: pd.Series, prefix: str = "") -> pd.Series:
    @pandas_udf("string")
    def _preprocess(text: pd.Series) -> pd.Series:
        return pd.Series([prefix + s.split(".")[0] for s in text])
    return _preprocess(text)

In [41]:
# only use first 100 rows, since generation takes a while
df1 = df.withColumn("input", preprocess(col("lines"), "Translate English to German: ")).select("input").limit(100)

In [42]:
df1.show(truncate=120)

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|          Translate English to German: This is so overly clichéd you'll want to switch it off after the first 45 minutes|
|                                         Translate English to German: I am a big fan of The ABC Movies of the Week genre|
|Translate English to German: In the early 1990's "Step-by-Step" came as a tedious combination of the ultra-cheesy "Fu...|
|Translate English to German: When The Spirits Within was released, all you heard from Final Fantasy fans was how awfu...|
|                                       Translate English to German: I like to think of myself as a bad movie connoisseur|
|Translate Engli

In [43]:
def triton_fn(triton_uri, model_name):
    import numpy as np
    import tritonclient.grpc as grpcclient
    
    np_types = {
      "BOOL": np.dtype(np.bool_),
      "INT8": np.dtype(np.int8),
      "INT16": np.dtype(np.int16),
      "INT32": np.dtype(np.int32),
      "INT64": np.dtype(np.int64),
      "FP16": np.dtype(np.float16),
      "FP32": np.dtype(np.float32),
      "FP64": np.dtype(np.float64),
      "FP64": np.dtype(np.double),
      "BYTES": np.dtype(object)
    }

    client = grpcclient.InferenceServerClient(triton_uri)
    model_meta = client.get_model_metadata(model_name)
    
    def predict(inputs):
        if isinstance(inputs, np.ndarray):
            # single ndarray input
            request = [grpcclient.InferInput(model_meta.inputs[0].name, inputs.shape, model_meta.inputs[0].datatype)]
            request[0].set_data_from_numpy(inputs.astype(np_types[model_meta.inputs[0].datatype]))
        else:
            # dict of multiple ndarray inputs
            request = [grpcclient.InferInput(i.name, inputs[i.name].shape, i.datatype) for i in model_meta.inputs]
            for i in request:
                i.set_data_from_numpy(inputs[i.name()].astype(np_types[i.datatype()]))
        
        response = client.infer(model_name, inputs=request)
        
        if len(model_meta.outputs) > 1:
            # return dictionary of numpy arrays
            return {o.name: response.as_numpy(o.name) for o in model_meta.outputs}
        else:
            # return single numpy array
            return response.as_numpy(model_meta.outputs[0].name)
        
    return predict

In [44]:
generate = predict_batch_udf(partial(triton_fn, triton_uri="localhost:8001", model_name="hf_generation_tf"),
                             return_type=StringType(),
                             input_tensor_shapes=[[1]],
                             batch_size=100)

In [45]:
%%time
# first pass caches model/fn
preds = df1.withColumn("preds", generate(struct("input")))
results = preds.collect()

[Stage 45:>                                                         (0 + 1) / 1]

CPU times: user 5.88 ms, sys: 3.96 ms, total: 9.84 ms
Wall time: 2.66 s


                                                                                

In [46]:
%%time
preds = df1.withColumn("preds", generate("input"))
results = preds.collect()

[Stage 47:>                                                         (0 + 1) / 1]

CPU times: user 2.82 ms, sys: 1.05 ms, total: 3.87 ms
Wall time: 1.03 s


                                                                                

In [47]:
%%time
preds = df1.withColumn("preds", generate(col("input")))
results = preds.collect()

[Stage 49:>                                                         (0 + 1) / 1]

CPU times: user 1.55 ms, sys: 2.49 ms, total: 4.03 ms
Wall time: 967 ms


                                                                                

In [48]:
preds.show(truncate=60)

[Stage 51:>                                                         (0 + 1) / 1]

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|Translate English to German: This is so overly clichéd yo...|   Das ist so übertrieben klischeehaft, dass Sie es nach den|
|Translate English to German: I am a big fan of The ABC Mo...|       Ich bin ein großer Fan von The ABC Movies of the Week|
|Translate English to German: In the early 1990's "Step-by...|          Anfang der 1990er Jahre kam "Step-by-Step" als müh|
|Translate English to German: When The Spirits Within was ...|Als The Spirits Within veröffentlicht wurde, hörten Sie v...|
|Translate English to German: I like to think of myself as...|           Ich halte mich gerne als schlechter Filmliebhaber|
|Transla

                                                                                

In [49]:
# only use first 100 rows, since generation takes a while
df2 = df.withColumn("input", preprocess(col("lines"), "Translate English to French: ")).select("input").limit(100).cache()

24/10/11 00:18:52 WARN CacheManager: Asked to cache already cached data.


In [50]:
df2.show(truncate=120)

+------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                   input|
+------------------------------------------------------------------------------------------------------------------------+
|          Translate English to French: This is so overly clichéd you'll want to switch it off after the first 45 minutes|
|                                         Translate English to French: I am a big fan of The ABC Movies of the Week genre|
|Translate English to French: In the early 1990's "Step-by-Step" came as a tedious combination of the ultra-cheesy "Fu...|
|Translate English to French: When The Spirits Within was released, all you heard from Final Fantasy fans was how awfu...|
|                                       Translate English to French: I like to think of myself as a bad movie connoisseur|
|Translate Engli

In [51]:
%%time
preds = df2.withColumn("preds", generate(struct("input")))
results = preds.collect()

[Stage 55:>                                                         (0 + 1) / 1]

CPU times: user 3.91 ms, sys: 1.34 ms, total: 5.25 ms
Wall time: 1.27 s


                                                                                

In [52]:
%%time
preds = df2.withColumn("preds", generate("input"))
results = preds.collect()

[Stage 57:>                                                         (0 + 1) / 1]

CPU times: user 4.31 ms, sys: 0 ns, total: 4.31 ms
Wall time: 1 s


                                                                                

In [53]:
%%time
preds = df2.withColumn("preds", generate(col("input")))
results = preds.collect()

[Stage 59:>                                                         (0 + 1) / 1]

CPU times: user 2.84 ms, sys: 1.31 ms, total: 4.15 ms
Wall time: 990 ms


                                                                                

In [54]:
preds.show(truncate=60)

[Stage 61:>                                                         (0 + 1) / 1]

+------------------------------------------------------------+------------------------------------------------------------+
|                                                       input|                                                       preds|
+------------------------------------------------------------+------------------------------------------------------------+
|Translate English to French: This is so overly clichéd yo...|                 Vous ne pouvez pas en tirer d'un tel cliché|
|Translate English to French: I am a big fan of The ABC Mo...|    Je suis un grand fan du genre The ABC Movies of the Week|
|Translate English to French: In the early 1990's "Step-by...|          Au début des années 1990, «Step-by-Step» a été une|
|Translate English to French: When The Spirits Within was ...|Lorsque The Spirits Within a été publié, tout ce que vous...|
|Translate English to French: I like to think of myself as...|       Je me considère comme un mauvais réalisateur de films|
|Transla

                                                                                

#### Stop Triton Server on each executor

In [55]:
def stop_triton(it):
    import docker
    import time
    
    client=docker.from_env()
    containers=client.containers.list(filters={"name": "spark-triton"})
    print(">>>> stopping containers: {}".format([c.short_id for c in containers]))
    if containers:
        container=containers[0]
        container.stop(timeout=120)

    return [True]

nodeRDD.barrier().mapPartitions(stop_triton).collect()

                                                                                

[True]

In [56]:
spark.stop()