Use the idea in [this notebook](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/pytorch_text_classification_using_vertex_sdk_and_gcloud/pytorch-text-classification-vertex-ai-train-tune-deploy.ipynb) => we will create a custom container for the prediction step. 

- That idea uses another model which doesn't need `pipeline()` to give the prediction.
- In our customization, we will use `pipeline()` method in the prediction step of the container.

__Warning__: In this notebook, I use the same variable names in different sections with or without the same values. Make sure you have the right things before running the codes.

__TL;DR__:

The final model is the one created at step "Option 2: Using `tokenizer` to encode and then decode to text to be used in `inference` with `pipeline()`". They uses:

- `predictor/custom_handler_3.py` 
- `Dockerfile.3`

# Setting up

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os

In [3]:
# # The Google Cloud Notebook product has specific requirements
# IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# # Google Cloud Notebook requires dependencies to be installed with '--user'
# USER_FLAG = ""
# if IS_GOOGLE_CLOUD_NOTEBOOK:
#     USER_FLAG = "--user"

In [4]:
# # Install Vertex AI SDK
# !pip -q install {USER_FLAG} --upgrade google-cloud-aiplatform

In [5]:
# To disable the warning:
# huggingface/tokenizers: The current process just got forked, after parallelism has already been used.
# Disabling parallelism to avoid deadlocks...
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [6]:
import torch
import transformers
from google.cloud import aiplatform
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

In [7]:
import base64
import re
import json
import requests

In [8]:
print(f"Notebook runtime: {'GPU' if torch.cuda.is_available() else 'CPU'}")
print(f"PyTorch version : {torch.__version__}")
print(f"Transformers version : {transformers.__version__}")

Notebook runtime: CPU
PyTorch version : 1.11.0+cu102
Transformers version : 4.18.0


In [10]:
# Link with google storage
! gsutil ls -al

gs://artifacts.ideta-ml-thi.appspot.com/
gs://ideta-sentiment-analysis/
gs://torch-text-class-testing/


## Variables

In [9]:
BUCKET_NAME = 'gs://ideta-sentiment-analysis/'
PROJECT_ID = 'ideta-ml-thi'
REGION = 'europe-west1'
src_dir = "../../.." # change this if your notebook is in another place
data_loc = src_dir + "/model"
APP_NAME = 'pt-xlm-roberta-large-xnli'
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/pytorch_predict_{APP_NAME}"
data_loc

'../../../model'

In [10]:
pt_model_dir = data_loc + '/' + APP_NAME
pt_model_dir

'../../../model/pt-xlm-roberta-large-xnli'

# From and to Torch format

Runce once, if you've already saved the model, skip to next section to load it.

In [11]:
pt_model_dir

'../../../model/pt-xlm-roberta-large-xnli'

In [None]:
tokenizer = AutoTokenizer.from_pretrained("joeddav/xlm-roberta-large-xnli")

In [11]:
pt_model = AutoModelForSequenceClassification.from_pretrained("joeddav/xlm-roberta-large-xnli")

Downloading:   0%|          | 0.00/2.09G [00:00<?, ?B/s]

Some weights of the model checkpoint at joeddav/xlm-roberta-large-xnli were not used when initializing XLMRobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Save to disk

In [13]:
tokenizer.save_pretrained(pt_model_dir)

('../../../model/pt-xlm-roberta-large-xnli/tokenizer_config.json',
 '../../../model/pt-xlm-roberta-large-xnli/special_tokens_map.json',
 '../../../model/pt-xlm-roberta-large-xnli/sentencepiece.bpe.model',
 '../../../model/pt-xlm-roberta-large-xnli/added_tokens.json',
 '../../../model/pt-xlm-roberta-large-xnli/tokenizer.json')

In [14]:
pt_model.save_pretrained(pt_model_dir)

In [8]:
! ls {pt_model_dir}

config.json	   sentencepiece.bpe.model  tokenizer.json
pytorch_model.bin  special_tokens_map.json  tokenizer_config.json


# Load from saved model

In [10]:
pt_model_dir

'../../../model/pt-xlm-roberta-large-xnli'

In [11]:
saved_tokenizer = AutoTokenizer.from_pretrained(pt_model_dir)
saved_model = AutoModelForSequenceClassification.from_pretrained(pt_model_dir)

## Use `pipeline` to test

In [15]:
pipe = pipeline(task='zero-shot-classification', model=saved_model, tokenizer=saved_tokenizer)

In [16]:
pipe(
    'I\'m really disapointed with what are you saying and your service, give my money back!!!!',
    candidate_labels=["positive", "negative", "neutral"],
)

{'sequence': "I'm really disapointed with what are you saying and your service, give my money back!!!!",
 'labels': ['negative', 'neutral', 'positive'],
 'scores': [0.9862303137779236, 0.00949057936668396, 0.004279125016182661]}

# Create a custom image for prediction

https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements

We use `TorchServe`! => https://pytorch.org/serve/

## Play with functions in TorchServe before making the `custom_handler.py` file

Source code without using `pipeline()`: https://github.com/dinhanhthi/google-vertex-ai/tree/main/VIEW_ONLY_pytorch_text_classification_using_vertex_sdk_and_gcloud/predictor

Reference: https://pytorch.org/serve/api/ts.torch_handler.html#ts.torch_handler.base_handler.BaseHandler.preprocess

In [59]:
def preprocess(tokenizer, data):
    """ Preprocessing input request by tokenizing
        Extend with your own preprocessing steps as needed
    """
    text = data[0].get("data")
    if text is None:
        text = data[0].get("body")
    sentences = text.decode('utf-8')
    print("Received text: '%s'", sentences)

    # Tokenize the texts
    tokenizer_args = ((sentences,))
    inputs = tokenizer(*tokenizer_args,
                        padding='max_length',
                        max_length=128,
                        truncation=True,
                        return_tensors = "pt")
    return inputs

In [60]:
import base64
import re

In [61]:
test_data = [
    {
        "data": b"Jaw dropping visual affects and action! One of the best I have seen to date."
    }
]

What is `b`? => https://stackoverflow.com/questions/6269765/what-does-the-b-character-do-in-front-of-a-string-literal

In [62]:
inputs = preprocess(saved_tokenizer, test_data)
inputs

Received text: '%s' Jaw dropping visual affects and action! One of the best I have seen to date.


{'input_ids': tensor([[    0,   823,   434, 36069, 10366, 21176, 52490,     7,   136, 22631,
            38,  6561,   111,    70,  2965,    87,   765, 51592,    47,  5622,
             5,     2,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,  

In [64]:
inputs["input_ids"][0]

tensor([    0,   823,   434, 36069, 10366, 21176, 52490,     7,   136, 22631,
           38,  6561,   111,    70,  2965,    87,   765, 51592,    47,  5622,
            5,     2,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1])

In [66]:
saved_tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)

'Jaw dropping visual affects and action! One of the best I have seen to date.'

Reference: https://pytorch.org/serve/api/ts.torch_handler.html#ts.torch_handler.base_handler.BaseHandler.inference

In [66]:
device = "cpu"
def inference(model, inputs):
    """ Predict the class of a text using a trained transformer model.
    """
    prediction = model(inputs['input_ids'].to(device))[0].argmax().item()
        
    print("Model predicted: '%s'", prediction)
    return [prediction]

In [67]:
inference(saved_model, inputs)

Model predicted: '%s' 1


[1]

In [68]:
saved_model(inputs['input_ids'].to(device))

SequenceClassifierOutput(loss=None, logits=tensor([[-2.0831,  2.5675, -0.9757]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

❗ Because the `reference()` requires input as `Torch Tensor` type but the `pipeline()` requires the sentence to be `str`. We have to find a way to convert from Torch Tensor to str.

https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline

We use this pipeline: https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.ZeroShotClassificationPipeline

In [34]:
pipe = pipeline(task='zero-shot-classification', model=saved_model, tokenizer=saved_tokenizer)

In [47]:
inputs[0]

'Jaw dropping visual affects and action! One of the best I have seen to date.'

In [49]:
pipe(
    inputs[0],
    candidate_labels=["positive", "negative", "neutral"]
)

{'sequence': 'Jaw dropping visual affects and action! One of the best I have seen to date.',
 'labels': ['positive', 'negative', 'neutral'],
 'scores': [0.9401667714118958, 0.034984465688467026, 0.024848783388733864]}

👉 Try this idea: https://github.com/cceyda/lit-NER/blob/master/lit_ner/serve.py

In [17]:
device = "cpu"
saved_model = AutoModelForSequenceClassification.from_pretrained(pt_model_dir)
saved_model.to(device)
saved_model.eval();

In [70]:
def preprocess(tokenizer, data):
    """ Preprocessing input request by tokenizing
        Extend with your own preprocessing steps as needed
    """
    text = data[0].get("data")
    if text is None:
        text = data[0].get("body")
    sentences = text.decode('utf-8')
    print("Received text: '%s'", sentences)
    
    # Below: https://github.com/cceyda/lit-NER/blob/master/lit_ner/serve.py
    processed_sentences = []
    num_separated = [s.strip() for s in re.split("(\d+)", sentences)]
    digit_processed = " ".join(num_separated)
    processed_sentences.append(digit_processed)
    
    return processed_sentences

inputs = preprocess(saved_tokenizer, test_data)

device = "cpu"

def inference(model, tokenizer, inputs):
    """ Predict the class of a text using a trained transformer model.
    """
    pipe = pipeline(task='zero-shot-classification', model=model, tokenizer=tokenizer)
    prediction = pipe(inputs[0], candidate_labels=["positive", "negative", "neutral"])
        
    return [prediction]

Received text: '%s' Jaw dropping visual affects and action! One of the best I have seen to date.


In [74]:
inference(saved_model, saved_tokenizer, inputs)

[{'sequence': 'Jaw dropping visual affects and action! One of the best I have seen to date.',
  'labels': ['positive', 'negative', 'neutral'],
  'scores': [0.9401667714118958, 0.034984465688467026, 0.024848783388733864]}]

## Create `custom_hander.py`

In [None]:
! mkdir predictor

In [76]:
%%writefile predictor/custom_handler.py

import os
import logging
import re

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)


class TransformersClassifierHandler(BaseHandler):
    """
    The handler takes an input string and returns the classification text 
    based on the serialized transformers checkpoint.
    """
    def __init__(self):
        super(TransformersClassifierHandler, self).__init__()
        self.initialized = False

    def initialize(self, ctx):
        """ Loads the model.pt file and initialized the model object.
        Instantiates Tokenizer for preprocessor to use
        Loads labels to name mapping file for post-processing inference response
        """
        self.manifest = ctx.manifest

        properties = ctx.system_properties
        model_dir = properties.get("model_dir")
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        # Read model serialize/pt file
        serialized_file = self.manifest["model"]["serializedFile"]
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt or pytorch_model.bin file")
        
        # Load model
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.model.to(self.device)
        self.model.eval()
        logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))
        
        # Ensure to use the same tokenizer used during training
        self.tokenizer = AutoTokenizer.from_pretrained(model_dir)

        # pipeline()
        self.pipe = pipeline(task='zero-shot-classification', model=self.model, tokenizer=self.tokenizer)

        self.initialized = True

    def preprocess(self, data):
        """ Preprocessing input request by tokenizing
            Extend with your own preprocessing steps as needed
        """
        text = data[0].get("data")
        if text is None:
            text = data[0].get("body")
        sentences = text.decode('utf-8')
        logger.info("Received text: '%s'", sentences)

        # Below: https://github.com/cceyda/lit-NER/blob/master/lit_ner/serve.py
        processed_sentences = []
        num_separated = [s.strip() for s in re.split("(\d+)", sentences)]
        digit_processed = " ".join(num_separated)
        processed_sentences.append(digit_processed)

        return processed_sentences

    def inference(self, inputs):
        """ Predict the class of a text using a trained transformer model.
        """
        prediction = self.pipe(inputs[0], candidate_labels=["negative", "neutral", "positive"])
        if len(inputs) == 1:
            prediction = [prediction]
        return prediction

    def postprocess(self, inference_output):
        return inference_output

Writing predictor/custom_handler.py


In [None]:
# Copy model to /predictor/
! cp {src_dir}/model/pt-xlm-roberta-large-xnli {src_dir}/ideta-logos/playground/vertex-ai/predictor/model

### Make some tests to understanding the last steps of `preprocess()`

In [79]:
# instance = b"I am really disapointed with your service!"
# text = base64.b64encode(instance)
# sentences = text.decode('utf-8')

# print("Received text: '%s'", sentences)

sentences = "I am not 1 happy."
re.split("(\d+)", sentences)


# # Below: https://github.com/cceyda/lit-NER/blob/master/lit_ner/serve.py
# processed_sentences = []
# num_separated = [s.strip() for s in re.split("(\d+)", sentences)]
# digit_processed = " ".join(num_separated)
# processed_sentences.append(digit_processed)
# print(f"processed_sentences: {processed_sentences}")

['I am not ', '1', ' happy.']

## Create `Dockerfile` file

This is done manually but you can use below shell to perform this step.

In [None]:
%%bash -s $APP_NAME

APP_NAME=$1

cat << EOF > ./predictor/Dockerfile

FROM pytorch/torchserve:latest-cpu

# install dependencies
RUN python3 -m pip install --upgrade pip
RUN pip3 install transformers

USER model-server

# copy model artifacts, custom handler and other dependencies
COPY ./custom_handler.py /home/model-server/
COPY ./model/$APP_NAME/ /home/model-server/

# create torchserve configuration file
USER root
RUN printf "\nservice_envelope=json" >> /home/model-server/config.properties
RUN printf "\ninference_address=http://0.0.0.0:7080" >> /home/model-server/config.properties
RUN printf "\nmanagement_address=http://0.0.0.0:7081" >> /home/model-server/config.properties
USER model-server

# expose health and prediction listener ports from the image
EXPOSE 7080
EXPOSE 7081

# create model archive file packaging model artifacts and dependencies
RUN torch-model-archiver -f \
  --model-name=$APP_NAME \
  --version=1.0 \
  --serialized-file=/home/model-server/pytorch_model.bin \
  --handler=/home/model-server/custom_handler.py \
  --extra-files "/home/model-server/config.json,/home/model-server/tokenizer.json,/home/model-server/tokenizer_config.json,/home/model-server/special_tokens_map.json" \
  --export-path=/home/model-server/model-store

# run Torchserve HTTP serve to respond to prediction requests
CMD ["torchserve", \
     "--start", \
     "--ts-config=/home/model-server/config.properties", \
     "--models", \
     "$APP_NAME=$APP_NAME.mar", \
     "--model-store", \
     "/home/model-server/model-store"]
EOF

echo "Writing ./predictor/Dockerfile"

## Create custom prediction image

In [103]:
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/pytorch_predict_{APP_NAME}"
CUSTOM_PREDICTOR_IMAGE_URI

'gcr.io/ideta-ml-thi/pytorch_predict_pt-xlm-roberta-large-xnli'

In [104]:
# Build based on predictor/custom_handler.py and predictor/Dockerfile
!docker build \
  --tag=$CUSTOM_PREDICTOR_IMAGE_URI \
  ./predictor

Sending build context to Docker daemon  2.262GB
Step 1/15 : FROM pytorch/torchserve:latest-cpu
 ---> a1d88b873573
Step 2/15 : RUN python3 -m pip install --upgrade pip
 ---> Using cache
 ---> 9cd7983a39ee
Step 3/15 : RUN pip3 install transformers
 ---> Using cache
 ---> 05d7d7d7e863
Step 4/15 : USER model-server
 ---> Using cache
 ---> c989f33b4974
Step 5/15 : COPY ./custom_handler.py /home/model-server/
 ---> adcbbbd31662
Step 6/15 : COPY ./model/pt-xlm-roberta-large-xnli/ /home/model-server/
 ---> 193b21a0082a
Step 7/15 : USER root
 ---> Running in 6135f43a34d3
Removing intermediate container 6135f43a34d3
 ---> 00b417261432
Step 8/15 : RUN printf "\nservice_envelope=json" >> /home/model-server/config.properties
 ---> Running in ea3b76e25975
Removing intermediate container ea3b76e25975
 ---> f0323076c9ea
Step 9/15 : RUN printf "\ninference_address=http://0.0.0.0:7080" >> /home/model-server/config.properties
 ---> Running in df8de4368c21
Removing intermediate container df8de4368c21
 ---

# Run the predict container LOCALLY before pushing to vertex registry

To run the container image as a container locally, run the following command:

In [105]:
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/pytorch_predict_{APP_NAME}"
print("CUSTOM_PREDICTOR_IMAGE_URI: ", CUSTOM_PREDICTOR_IMAGE_URI)
PREDICT_CONTAINER_NAME = "local_xlm_roberta_large_xnli"

CUSTOM_PREDICTOR_IMAGE_URI:  gcr.io/ideta-ml-thi/pytorch_predict_pt-xlm-roberta-large-xnli


In [51]:
!docker stop $PREDICT_CONTAINER_NAME
!docker run -t -d --rm -p 7080:7080 --name=$PREDICT_CONTAINER_NAME $CUSTOM_PREDICTOR_IMAGE_URI
!sleep 20

Error response from daemon: No such container: local_xlm_roberta_large_xnli
0711c1ad3b6c92ed42fadea3921e06ee7442cf6567c1ee7abe35b9b05bc3a677


In [91]:
# To send the container's server a health check, run the following command:
!curl http://localhost:7080/ping

{
  "status": "Healthy"
}


To send the container's server a prediction request, run the following commands:

In [91]:
instance = b"You aren't kind, i hate you."
b64_encoded = base64.b64encode(instance)
b64_encoded

b'WW91IGFyZW4ndCBraW5kLCBpIGhhdGUgeW91Lg=='

In [54]:
%%bash -s $APP_NAME

APP_NAME=$1

cat > ./predictor/instances.json <<END
{
   "instances": [
     {
       "data": {
         "b64": "WW91IGFyZW4ndCBraW5kLCBpIGhhdGUgeW91Lg=="
       }
     }
   ]
}
END

curl -s -X POST \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @./predictor/instances.json \
  http://localhost:7080/predictions/$APP_NAME/

{"predictions": [{"sequence": "You aren't kind, i hate you.", "labels": ["negative", "neutral", "positive"], "scores": [0.9942014217376709, 0.0030435377266258, 0.0027550666127353907]}]}

In [99]:
# Other way
instance = b"You aren't kind, i hate you."
b64_encoded = base64.b64encode(instance)
print("b64_encoded: ", b64_encoded)
print("b64_encoded.decode: ", b64_encoded.decode('utf-8'))
test_instance = {
    "instances": [
        {
            "data": {
                "b64": b64_encoded.decode('utf-8')
            }
        }
    ]
}

print("test_instance: ", test_instance)
print("json.dumps(test_instance): ", json.dumps(test_instance))

b64_encoded:  b'WW91IGFyZW4ndCBraW5kLCBpIGhhdGUgeW91Lg=='
b64_encoded.decode:  WW91IGFyZW4ndCBraW5kLCBpIGhhdGUgeW91Lg==
test_instance:  {'instances': [{'data': {'b64': 'WW91IGFyZW4ndCBraW5kLCBpIGhhdGUgeW91Lg=='}}]}
json.dumps(test_instance):  {"instances": [{"data": {"b64": "WW91IGFyZW4ndCBraW5kLCBpIGhhdGUgeW91Lg=="}}]}


In [90]:
import requests

In [100]:
# payload = '{"instances":[{"data": {"b64": "WW91IGFyZW4ndCBraW5kLCBpIGhhdGUgeW91Lg=="}}]}'
payload = json.dumps(test_instance)
r = requests.post(
    f"http://localhost:7080/predictions/{APP_NAME}/",
    headers={"Content-Type": "application/json", "charset": "utf-8"},
    data=payload
)

r.json()

{'predictions': [{'sequence': "You aren't kind, i hate you.",
   'labels': ['negative', 'neutral', 'positive'],
   'scores': [0.9942014217376709, 0.0030435377266258, 0.0027550666127353907]}]}

In [102]:
# Stop the container
!docker stop $PREDICT_CONTAINER_NAME

local_xlm_roberta_large_xnli


In [34]:
# Start the container
!docker start $PREDICT_CONTAINER_NAME

Error response from daemon: No such container: local_xlm_roberta_large_xnli
Error: failed to start containers: local_xlm_roberta_large_xnli


E0513 14:17:41.946436689   14264 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


# Deploying the serving container to Vertex AI Predictions

In [14]:
CUSTOM_PREDICTOR_IMAGE_URI

'gcr.io/ideta-ml-thi/pytorch_predict_pt-xlm-roberta-large-xnli'

In [18]:
!docker push $CUSTOM_PREDICTOR_IMAGE_URI

Using default tag: latest
The push refers to repository [gcr.io/ideta-ml-thi/pytorch_predict_pt-xlm-roberta-large-xnli]

[1Bee34fa11: Preparing 
[1Bdd45180b: Preparing 
[1B22f0ef5f: Preparing 
[1B69d9d12d: Preparing 
[1Ba78796d0: Preparing 
[1B7ae06b2c: Preparing 
[1B7f4f2867: Preparing 
[1B80252ad2: Preparing 
[1Bbf18a086: Preparing 
[1B6e366763: Preparing 
[1Bfbe2a476: Preparing 
[1B4d4eeb09: Preparing 
[1B3c9adfa3: Preparing 
[1B28ad53a6: Preparing 
[1B6b650202: Preparing 
[1Bf2b9b970: Preparing 
[13B78796d0: Pushed   2.262GB/2.262GB[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[13A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[16A[2K[12A[2K[17A[2K[13A[2K[10A[2K[11A

In [21]:
# Validate the custom container image in Container Registry
!gcloud container images describe $CUSTOM_PREDICTOR_IMAGE_URI

image_summary:
  digest: sha256:a14187f385b95490e44fdaae4950dc1e1e145c3b83e12a16f5eec58ac2298ac7
  fully_qualified_digest: gcr.io/ideta-ml-thi/pytorch_predict_pt-xlm-roberta-large-xnli@sha256:a14187f385b95490e44fdaae4950dc1e1e145c3b83e12a16f5eec58ac2298ac7
  registry: gcr.io
  repository: ideta-ml-thi/pytorch_predict_pt-xlm-roberta-large-xnli


In [17]:
print(f"PROJECT_ID: {PROJECT_ID}")
print(f"BUCKET_NAME: {BUCKET_NAME}")
print(f"REGION: {REGION}")

PROJECT_ID: ideta-ml-thi
BUCKET_NAME: gs://ideta-sentiment-analysis/
REGION: europe-west1


In [18]:
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME, location=REGION)

# Create a Model resource with custom serving container

In [20]:
VERSION = 1
model_display_name = f"{APP_NAME}-v{VERSION}"
model_description = "PyTorch based sentiment analysis with 3 labels"

MODEL_NAME = APP_NAME
health_route = "/ping"
predict_route = f"/predictions/{MODEL_NAME}"
serving_container_ports = [7080]

In [21]:
model = aiplatform.Model.upload(
    display_name=model_display_name,
    description=model_description,
    serving_container_image_uri=CUSTOM_PREDICTOR_IMAGE_URI,
    serving_container_predict_route=predict_route,
    serving_container_health_route=health_route,
    serving_container_ports=serving_container_ports,
)

model.wait()

print(model.display_name)
print(model.resource_name)

Creating Model
Create Model backing LRO: projects/211294546736/locations/europe-west1/models/7738046176938688512/operations/4262652649259663360
Model created. Resource name: projects/211294546736/locations/europe-west1/models/7738046176938688512
To use this Model in another session:
model = aiplatform.Model('projects/211294546736/locations/europe-west1/models/7738046176938688512')
pt-xlm-roberta-large-xnli-v1
projects/211294546736/locations/europe-west1/models/7738046176938688512


# Create an Endpoint for Model with Custom Container

In [22]:
endpoint_display_name = f"{APP_NAME}-endpoint"
endpoint_display_name

'pt-xlm-roberta-large-xnli-endpoint'

In [23]:
endpoint = aiplatform.Endpoint.create(display_name=endpoint_display_name)

Creating Endpoint
Create Endpoint backing LRO: projects/211294546736/locations/europe-west1/endpoints/4667664354420719616/operations/4070123765189574656
Endpoint created. Resource name: projects/211294546736/locations/europe-west1/endpoints/4667664354420719616
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/211294546736/locations/europe-west1/endpoints/4667664354420719616')


# Deploy the model to endpoint

__Note__: This step takes few minutes to deploy the resources.

In [25]:
model_display_name

'pt-xlm-roberta-large-xnli-v1'

In [26]:
traffic_percentage = 100
machine_type = "n1-standard-4"
deployed_model_display_name = model_display_name
min_replica_count = 1
max_replica_count = 3
sync = True

model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=deployed_model_display_name,
    machine_type=machine_type,
    traffic_percentage=traffic_percentage,
    sync=sync,
)

Deploying model to Endpoint : projects/211294546736/locations/europe-west1/endpoints/4667664354420719616
Deploy Endpoint model backing LRO: projects/211294546736/locations/europe-west1/endpoints/4667664354420719616/operations/7171978008541003776
Endpoint model deployed. Resource name: projects/211294546736/locations/europe-west1/endpoints/4667664354420719616


<google.cloud.aiplatform.models.Endpoint object at 0x7fb95f272a50> 
resource name: projects/211294546736/locations/europe-west1/endpoints/4667664354420719616

# Invoking the Endpoint with deployed Model using Vertex AI SDK to make predictions

In [27]:
# Get the list of endpoints
filter = f'display_name="{endpoint_display_name}"'

for endpoint_info in aiplatform.Endpoint.list(filter=filter):
    print(
        f"Endpoint display name = {endpoint_info.display_name} resource id ={endpoint_info.resource_name} "
    )

endpoint = aiplatform.Endpoint(endpoint_info.resource_name)

Endpoint display name = pt-xlm-roberta-large-xnli-endpoint resource id =projects/211294546736/locations/europe-west1/endpoints/4667664354420719616 


In [28]:
# List models wrt. this endpoint
endpoint.list_models()

[id: "1498871843170287616"
model: "projects/211294546736/locations/europe-west1/models/7738046176938688512"
display_name: "pt-xlm-roberta-large-xnli-v1"
create_time {
  seconds: 1652449695
  nanos: 672793000
}
dedicated_resources {
  machine_spec {
    machine_type: "n1-standard-4"
  }
  min_replica_count: 1
  max_replica_count: 1
}
]

## Make some tests

In [56]:
# Single sentence
instance = b"I am really disapointed with your service!"
b64_encoded = base64.b64encode(instance)
test_instance = [{"data": {"b64": f"{str(b64_encoded.decode('utf-8'))}"}}]
print(f"test_instance: {test_instance}")
prediction = endpoint.predict(instances=test_instance)
print(f"Prediction response: \n\t{prediction}")

test_instance: [{'data': {'b64': 'SSBhbSByZWFsbHkgZGlzYXBvaW50ZWQgd2l0aCB5b3VyIHNlcnZpY2Uh'}}]


ResourceExhausted: 429 Rate of traffic exceeds capacity. Ramp your traffic up more slowly. endpoint_id: 4667664354420719616, deployed_model_id: 1498871843170287616.

You can use this site to encode text in 64: https://www.base64encode.org/

In [55]:
# "I am happy"
test_instance = [{"data": {"b64": "SSBhbSBoYXBweQ=="}}]
prediction = endpoint.predict(instances=test_instance)
print(f"Prediction response: \n\t{prediction}")

Prediction response: 
	Prediction(predictions=[{'scores': [0.9699267745018005, 0.02488462254405022, 0.005188622046262026], 'sequence': 'I am happy', 'labels': ['positive', 'neutral', 'negative']}], deployed_model_id='1498871843170287616', explanations=None)


# (Fix 429) Try with `custom_handler_2.py` and `Dockerfile.2`

## Option 2: Using `tokenizer` to encode and then decode to text to be used in `inference` with `pipeline()`

Load the model first => this section: "Load from saved model"

__Result__: 

- the problem are still there!
- __Solution__: When creating a new endpoint, set "Maximum number of compute nodes" to a number (don't leave it empty) and also choose a more powerful "Machine type".
- This section contains useful (and also final) codes for `Dockerfile` and `custom_handler.py` which are used in constructing the image.

### Encode and decode without `return_tensors`

In [19]:
inputs_2 = saved_tokenizer("I\'m really disapointed with what are you saying and your service, give my money back!!!!")
inputs_2

{'input_ids': [0, 87, 25, 39, 6183, 6392, 38496, 297, 678, 2367, 621, 398, 54433, 136, 935, 4516, 4, 8337, 759, 17265, 4420, 11305, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

In [20]:
type(inputs)

transformers.tokenization_utils_base.BatchEncoding

In [41]:
saved_tokenizer.decode(inputs_2["input_ids"], skip_special_tokens=True)

"I'm really disapointed with what are you saying and your service, give my money back!!!!"

### Ecode and decode with `return_tensors`

In [47]:
sentence = "I\'m really disapointed with what are you saying and your service, give my money back!!!!"

In [55]:
inputs = saved_tokenizer(sentence, return_tensors="pt")
inputs

{'input_ids': tensor([[    0,    87,    25,    39,  6183,  6392, 38496,   297,   678,  2367,
           621,   398, 54433,   136,   935,  4516,     4,  8337,   759, 17265,
          4420, 11305,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [49]:
saved_tokenizer.decode(inputs["input_ids"], skip_special_tokens=True)

TypeError: 'list' object cannot be interpreted as an integer

__WHY???__

Try with manually convert to tensor using `torch.tensor` instead of using `return_tensors=True`

In [50]:
inputs_3 = saved_tokenizer("I\'m really disapointed with what are you saying and your service, give my money back!!!!")
inputs_3

{'input_ids': [0, 87, 25, 39, 6183, 6392, 38496, 297, 678, 2367, 621, 398, 54433, 136, 935, 4516, 4, 8337, 759, 17265, 4420, 11305, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

In [53]:
new_inputs = torch.tensor(inputs_3["input_ids"])
new_inputs

tensor([    0,    87,    25,    39,  6183,  6392, 38496,   297,   678,  2367,
          621,   398, 54433,   136,   935,  4516,     4,  8337,   759, 17265,
         4420, 11305,     2])

In [54]:
saved_tokenizer.decode(new_inputs, skip_special_tokens=True)

"I'm really disapointed with what are you saying and your service, give my money back!!!!"

Try with get inside the `inputs["input_ids"]` => bypass the dimension problem

In [58]:
saved_tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)

"I'm really disapointed with what are you saying and your service, give my money back!!!!"

### Build another image

👉 The only difference is the return of function `preprocess()` in `custom_handler_3.py`

In [11]:
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/{APP_NAME}_3"
CUSTOM_PREDICTOR_IMAGE_URI

'gcr.io/ideta-ml-thi/pt-xlm-roberta-large-xnli_3'

In [12]:
!docker build --tag=$CUSTOM_PREDICTOR_IMAGE_URI ./predictor -f ./predictor/Dockerfile.3

Sending build context to Docker daemon  2.262GB
Step 1/15 : FROM pytorch/torchserve:latest-cpu
 ---> a1d88b873573
Step 2/15 : RUN python3 -m pip install --upgrade pip
 ---> Using cache
 ---> 9cd7983a39ee
Step 3/15 : RUN pip3 install transformers
 ---> Using cache
 ---> 05d7d7d7e863
Step 4/15 : USER model-server
 ---> Using cache
 ---> c989f33b4974
Step 5/15 : COPY ./custom_handler_3.py /home/model-server/
 ---> c467df415b43
Step 6/15 : COPY ./model/pt-xlm-roberta-large-xnli/ /home/model-server/
 ---> d3b7e1fd9f66
Step 7/15 : USER root
 ---> Running in 0b97f1e0a77c
Removing intermediate container 0b97f1e0a77c
 ---> f6ec2600c937
Step 8/15 : RUN printf "\nservice_envelope=json" >> /home/model-server/config.properties
 ---> Running in 08324f39afdf
Removing intermediate container 08324f39afdf
 ---> 172e3c2d18de
Step 9/15 : RUN printf "\ninference_address=http://0.0.0.0:7080" >> /home/model-server/config.properties
 ---> Running in 099b27fc529f
Removing intermediate container 099b27fc529f
 -

### Create another container

In [13]:
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/{APP_NAME}_3"
PREDICT_CONTAINER_NAME = "local_xlm_roberta_large_xnli_3"
!docker stop $PREDICT_CONTAINER_NAME
!docker run -t -d --rm -p 7080:7080 --name=$PREDICT_CONTAINER_NAME $CUSTOM_PREDICTOR_IMAGE_URI
!sleep 20

Error response from daemon: No such container: local_xlm_roberta_large_xnli_3
9aa2ff237197b1a3bd3e90eb823ddd78baba7e509d432ccfe706755a0094c7d8


### Check healthy

In [14]:
!curl http://localhost:7080/ping

{
  "status": "Healthy"
}


### Make a prediction locally

In [15]:
instance = b"You aren't kind, i hate you."
b64_encoded = base64.b64encode(instance)
test_instance = {
    "instances": [
        {
            "data": {
                "b64": b64_encoded.decode('utf-8')
            }
        }
    ]
}

payload = json.dumps(test_instance)
r = requests.post(
    f"http://localhost:7080/predictions/{APP_NAME}/",
    headers={"Content-Type": "application/json", "charset": "utf-8"},
    data=payload
)

r.json()

{'predictions': [{'sequence': "You aren't kind, i hate you.",
   'labels': ['negative', 'neutral', 'positive'],
   'scores': [0.9942014217376709, 0.0030435377266258, 0.0027550666127353907]}]}

### Deploy to vertex

In [16]:
CUSTOM_PREDICTOR_IMAGE_URI

'gcr.io/ideta-ml-thi/pt-xlm-roberta-large-xnli_3'

In [17]:
!docker push $CUSTOM_PREDICTOR_IMAGE_URI

Using default tag: latest
The push refers to repository [gcr.io/ideta-ml-thi/pt-xlm-roberta-large-xnli_3]

[1B1a791db3: Preparing 
[1B86b98fc6: Preparing 
[1B4d03d205: Preparing 
[1Baf584099: Preparing 
[1Be2f2374e: Preparing 
[1B6936b933: Preparing 
[1B7f4f2867: Preparing 
[1B80252ad2: Preparing 
[1Bbf18a086: Preparing 
[1B6e366763: Preparing 
[1Bfbe2a476: Preparing 
[7B6936b933: Waiting g 
[6B80252ad2: Waiting g 
[1B28ad53a6: Preparing 
[4B4d4eeb09: Waiting g 
[4B3c9adfa3: Waiting g 
[13B2f2374e: Pushed   2.262GB/2.262GB[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[17A[2K[13A[2K[17A[2K[17A[2K[13A[2K[17A[2K[17A[2K[17A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[13A[2K[17A[2K[17A[2K[17A[2K[17A[2K[17A[2K[13A[2K[13A[2K[17A[2K[13A[2K[16A[2K[13A[2K[12A[2K[12A[2K[17A[2K[11A[2K[17A[2K[10A[2K[9A[2K[17A[2K[17A[2K[7A[2K[17A[2K[13A[2K[4A[2K[13

In [18]:
# Validate the custom container image in Container Registry
!gcloud container images describe $CUSTOM_PREDICTOR_IMAGE_URI

image_summary:
  digest: sha256:d1a589c433fb561d9bd6ed5141f82a424e62375e3d11c9165f0d2020c049d473
  fully_qualified_digest: gcr.io/ideta-ml-thi/pt-xlm-roberta-large-xnli_3@sha256:d1a589c433fb561d9bd6ed5141f82a424e62375e3d11c9165f0d2020c049d473
  registry: gcr.io
  repository: ideta-ml-thi/pt-xlm-roberta-large-xnli_3


In [19]:
print(f"PROJECT_ID: {PROJECT_ID}")
print(f"BUCKET_NAME: {BUCKET_NAME}")
print(f"REGION: {REGION}")

PROJECT_ID: ideta-ml-thi
BUCKET_NAME: gs://ideta-sentiment-analysis/
REGION: europe-west1


In [20]:
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME, location=REGION)

In [22]:
VERSION = 2
model_display_name = f"{APP_NAME}-v{VERSION}"
print("model_display_name: ", model_display_name)
model_description = "PyTorch based sentiment analysis with 3 labels"

MODEL_NAME = APP_NAME
health_route = "/ping"
predict_route = f"/predictions/{MODEL_NAME}"
serving_container_ports = [7080]

model_display_name:  pt-xlm-roberta-large-xnli-v2


In [23]:
model = aiplatform.Model.upload(
    display_name=model_display_name,
    description=model_description,
    serving_container_image_uri=CUSTOM_PREDICTOR_IMAGE_URI,
    serving_container_predict_route=predict_route,
    serving_container_health_route=health_route,
    serving_container_ports=serving_container_ports,
)

model.wait()

print(model.display_name)
print(model.resource_name)

Creating Model
Create Model backing LRO: projects/211294546736/locations/europe-west1/models/8348283926447390720/operations/5826815691910545408
Model created. Resource name: projects/211294546736/locations/europe-west1/models/8348283926447390720
To use this Model in another session:
model = aiplatform.Model('projects/211294546736/locations/europe-west1/models/8348283926447390720')
pt-xlm-roberta-large-xnli-v2
projects/211294546736/locations/europe-west1/models/8348283926447390720


## Option 1: The return of `preprocess` = original input sentences

👉 The only difference are functions `preprocess()` and `inference()` in `custom_handler_2.py`

```python
def preprocess(self, data):
    """ Preprocessing input request by tokenizing
        Extend with your own preprocessing steps as needed
    """
    text = data[0].get("data")
    if text is None:
        text = data[0].get("body")
    sentences = text.decode('utf-8')
    logger.info("Received text: '%s'", sentences)

    return sentences

def inference(self, inputs):
    """ Predict the class of a text using a trained transformer model.
    """
    prediction = self.pipe(inputs[0], candidate_labels=["negative", "neutral", "positive"])
    if len(inputs) == 1:
        prediction = [prediction]
    return prediction
```

👉 __Result__: fail, check the log of error at the end of this section.

In [59]:
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/pytorch_predict_{APP_NAME}_2"
!docker build --tag=$CUSTOM_PREDICTOR_IMAGE_URI ./predictor -f ./predictor/Dockerfile.2

E0513 21:04:57.926380585   14264 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


Sending build context to Docker daemon  2.262GB
Step 1/15 : FROM pytorch/torchserve:latest-cpu
 ---> a1d88b873573
Step 2/15 : RUN python3 -m pip install --upgrade pip
 ---> Using cache
 ---> 9cd7983a39ee
Step 3/15 : RUN pip3 install transformers
 ---> Using cache
 ---> 05d7d7d7e863
Step 4/15 : USER model-server
 ---> Using cache
 ---> c989f33b4974
Step 5/15 : COPY ./custom_handler_2.py /home/model-server/
 ---> e961e600e47c
Step 6/15 : COPY ./model/pt-xlm-roberta-large-xnli/ /home/model-server/
 ---> cb6519ae252b
Step 7/15 : USER root
 ---> Running in 5e725a9ffdf1
Removing intermediate container 5e725a9ffdf1
 ---> 751cdd0d5851
Step 8/15 : RUN printf "\nservice_envelope=json" >> /home/model-server/config.properties
 ---> Running in 2ba3ae6f9932
Removing intermediate container 2ba3ae6f9932
 ---> 880a5c82453f
Step 9/15 : RUN printf "\ninference_address=http://0.0.0.0:7080" >> /home/model-server/config.properties
 ---> Running in cd4f32c0f355
Removing intermediate container cd4f32c0f355
 -

In [60]:
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/pytorch_predict_{APP_NAME}_2"
PREDICT_CONTAINER_NAME = "local_xlm_roberta_large_xnli_2"
!docker stop $PREDICT_CONTAINER_NAME
!docker run -t -d --rm -p 7080:7080 --name=$PREDICT_CONTAINER_NAME $CUSTOM_PREDICTOR_IMAGE_URI
!sleep 20

Error response from daemon: No such container: local_xlm_roberta_large_xnli_2


E0513 21:15:07.296953339   14264 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0513 21:15:07.511717375   14264 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


feac50383d5ac0d5d2c7174827112a25fb6aa6fa64cc4e9319fb89b22432d354


E0513 21:15:08.552110203   14264 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


In [61]:
!curl http://localhost:7080/ping

E0513 21:16:08.459317118   14264 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


{
  "status": "Healthy"
}


In [63]:
APP_NAME

'pt-xlm-roberta-large-xnli'

In [65]:
%%bash -s $APP_NAME

APP_NAME=$1

cat > ./predictor/instances.json <<END
{
   "instances": [
     {
       "data": {
         "b64": "$(echo 'I am happy.' | base64 --wrap=0)"
       }
     }
   ]
}
END

curl -s -X POST \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @./predictor/instances.json \
  http://localhost:7080/predictions/$APP_NAME/

E0513 21:22:23.872905385   14264 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


{
  "code": 503,
  "type": "InternalServerException",
  "message": "Prediction failed"
}


Use `docker attach <container_id>` to see the log of what are going on

Error inside docker,

```
2022-05-13T21:22:24,590 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 668
2022-05-13T21:22:24,589 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/service.py", line 102, in predict
2022-05-13T21:22:24,590 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0 ACCESS_LOG - /172.17.0.1:39050 "POST /predictions/pt-xlm-roberta-large-xnli/ HTTP/1.1" 503 670
2022-05-13T21:22:24,591 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -     ret = self._entry_point(input_batch, self.context)
2022-05-13T21:22:24,591 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0 TS_METRICS - Requests5XX.Count:1|#Level:Host|#hostname:feac50383d5a,timestamp:null
2022-05-13T21:22:24,592 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/torch_handler/request_envelope/base.py", line 31, in handle
2022-05-13T21:22:24,592 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -     results = self.format_output(results)
2022-05-13T21:22:24,592 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/torch_handler/request_envelope/json.py", line 24, in format_output
2022-05-13T21:22:24,593 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -     return self._batch_to_json(data, self._lengths)
2022-05-13T21:22:24,593 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -   File "/home/venv/lib/python3.8/site-packages/ts/torch_handler/request_envelope/json.py", line 60, in _batch_to_json
2022-05-13T21:22:24,593 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG -     mini_batch = batch[cursor:cursor_end]
2022-05-13T21:22:24,594 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_LOG - TypeError: unhashable type: 'slice'
2022-05-13T21:22:24,594 [INFO ] W-9001-pt-xlm-roberta-large-xnli_1.0-stdout MODEL_METRICS - HandlerTime.Milliseconds:664.13|#ModelName:pt-xlm-roberta-large-xnli,Level:Model|#hostname:feac50383d5a,requestID:ab8f0b03-3b22-4a1c-aeca-6ccecf435ffd,timestamp:1652476944
```