# Triton Inference Server Custom Python Backend Setup

Traditionally for the Triton Inference Server, the Python backend can be used to customize your server with additional requirements (ex: transformers). Generally a conda-pack environment is expected to be packaged in as displayed in this blog: https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-python-backend/. However, this can get heavy having such large artifacts, thus we explore installing these dependencies at the container level itself in this sample.

### Setting
For this example we are working in a SageMaker Classic Notebook Instance (g5.4xlarge), increase the Notebook Volume as needed, the conda-pack and Docker operations will be heavy.

### Credits/Reference
- SageMaker Triton BYOC Example: https://github.com/aws-samples/sagemaker-hosting/tree/main/Inference-Serving-Options/SageMaker-Triton/Triton-BYOC
- SageMaker Python Backend Blog: https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-python-backend/

## Setup

In [None]:
!pip install nvidia-pyindex
!pip install tritonclient[http]
!pip install -U sagemaker numpy transformers torch

## Building Docker Image & Setting Up Triton

For the Python backend we will extend the latest Triton Server Docker container and install the necessary dependencies (torch, transformers in this case).

### Build Docker Image

In [None]:
%%writefile Dockerfile
FROM nvcr.io/nvidia/tritonserver:25.10-py3

# install what your python model needs
RUN pip install --no-cache-dir transformers torch

#### Docker Build
Run the following command in the Terminal/CLI:

```
docker build -t custom-triton .
```

To check the image has been built, use the following command:

```
docker images
```

## Local Model Inference

For this sample, let's use a BERT transformers model, here's a local Python code inference with it:

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline
import torch
import numpy as np
model = pipeline("text-classification", model="nlptown/bert-base-multilingual-uncased-sentiment")
output = model("I am super happy")
res = np.array(output)
print(res)

## Triton Artifact Creation

Ensure the following structure is present within the repository:

```
- nlp
    - sentiment (Triton model name)
        - 1
            - model.py (inference logic)
        - config.pbtxt (input/output specs)
```

## Docker Container Startup

To startup the docker container run this command via the terminal or a shell script:

```
docker run --gpus=all --shm-size=4G --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ec2-user/SageMaker/nlp:/model_repository custom-triton:latest tritonserver --model-repository=/model_repository --exit-on-error=false --log-verbose=1
```

Note run this command in the same root directory as the notebook, in this case we are using the SageMaker Notebook Instance home directory path adjust it to your appropriate directory. We also use the "custom-triton" image we built, change this if you named your Docker image otherwise.

## Sample Inference

We can use the Triton Client library or Python requests library for sample inference.

In [None]:
import tritonclient.http as http_client
triton_client = http_client.InferenceServerClient(url="localhost:8000", verbose=True)

In [None]:
import json 
# Create inputs to send to Triton
model_name = "sentiment"
text_inputs = ["I am super happy right now"]

# structure inputs
inputs = []
inputs.append(http_client.InferInput("text", [1], "BYTES"))
input0_real = np.array(text_inputs, dtype=np.object_)
inputs[0].set_data_from_numpy(input0_real)

# structure outputs
outputs = []
outputs.append(http_client.InferRequestedOutput("sent_arr"))

# sample inference
results = triton_client.infer(model_name=model_name, inputs=inputs, outputs=outputs)
sentiment = results.as_numpy('sent_arr')
print(sentiment)

In [None]:
%%time
for i in range(100):
    results = triton_client.infer(model_name=model_name, inputs=inputs, outputs=outputs)
    sentiment = results.as_numpy('sent_arr')