## Ensemble Model Inference with Triton Inference Server

### Steps

* [1. Set up client](#client)
* [2. Set up a model repository](#setup_model)
* [3. Set up the ensemble model](#setup_ensemble)
* [4. Set up the ensemble scheduler](#setup_scheduler)
* [5. Run triton inference serever](#run_server)
* [6. Request image classification](#request)



<a id='client'></a>
### 1.Set up client

#### a) Build docker image for client

In [10]:
%%bash 
git clone https://github.com/NVIDIA/triton-inference-server
cd triton-inference-server
git checkout r20.03


Cloning into 'triton-inference-server'...
Branch r20.03 set up to track remote branch r20.03 from origin.


Checking out files:  13% (210/1614)   Checking out files:  14% (226/1614)   Checking out files:  15% (243/1614)   Checking out files:  16% (259/1614)   Checking out files:  17% (275/1614)   Checking out files:  18% (291/1614)   Checking out files:  19% (307/1614)   Checking out files:  20% (323/1614)   Checking out files:  21% (339/1614)   Checking out files:  22% (356/1614)   Checking out files:  23% (372/1614)   Checking out files:  24% (388/1614)   Checking out files:  24% (394/1614)   Checking out files:  25% (404/1614)   Checking out files:  26% (420/1614)   Checking out files:  27% (436/1614)   Checking out files:  28% (452/1614)   Checking out files:  29% (469/1614)   Checking out files:  30% (485/1614)   Checking out files:  31% (501/1614)   Checking out files:  32% (517/1614)   Checking out files:  33% (533/1614)   Checking out files:  34% (549/1614)   Checking out files:  35% (565/1614)   Checking out files:  36% (582/1614)   Checking out files:  36% 

Need to modify the dockerfile due to this issue (https://github.com/NVIDIA/triton-inference-server/issues/1453)

In [4]:
%%writefile triton-inference-server/Dockerfile.client
# Copyright (c) 2019-2020, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#  * Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#  * Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#  * Neither the name of NVIDIA CORPORATION nor the names of its
#    contributors may be used to endorse or promote products derived
#    from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# Default setting is building on nvidia/cuda:10.1-devel-ubuntu18.04
ARG BASE_IMAGE=nvidia/cuda:10.1-devel-ubuntu18.04

FROM ${BASE_IMAGE}

# Ensure apt-get won't prompt for selecting options
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
            software-properties-common \
            autoconf \
            automake \
            build-essential \
            cmake \
            curl \
            git \
            libopencv-dev \
            libopencv-core-dev \
            libssl-dev \
            libtool \
            pkg-config \
            python3 \
            python3-pip \
            python3-dev \
            rapidjson-dev && \
    pip3 install --upgrade wheel setuptools && \
    pip3 install --upgrade grpcio-tools

# Build expects "python" executable (not python3).
RUN rm -f /usr/bin/python && \
    ln -s /usr/bin/python3 /usr/bin/python

# Build the client library and examples
WORKDIR /workspace
COPY VERSION .
COPY build build
COPY src/clients src/clients
COPY src/core src/core

RUN cd build && \
    cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX:PATH=/workspace/install && \
    make -j16 trtis-clients
RUN cd install && \
    export VERSION=`cat /workspace/VERSION` && \
    tar zcf /workspace/v$VERSION.clients.tar.gz *

# For CI testing need to install a test script.
COPY qa/L0_client_tar/test.sh /tmp/test.sh

# Install an image needed by the quickstart and other documentation.
COPY qa/images/mug.jpg images/mug.jpg

# Install the dependencies needed to run the client examples. These
# are not needed for building but including them allows this image to
# be used to run the client examples.
RUN pip3 install --upgrade install/python/tensorrtserver-*.whl numpy pillow

ENV PATH //workspace/install/bin:${PATH}
ENV LD_LIBRARY_PATH /workspace/install/lib:${LD_LIBRARY_PATH}


Overwriting triton-inference-server/Dockerfile.client


In [6]:
%%bash 
cd triton-inference-server
docker build -t tritonserver_client -f Dockerfile.client .

Sending build context to Docker daemon  11.96MB
Step 1/17 : ARG BASE_IMAGE=nvidia/cuda:10.1-devel-ubuntu18.04
Step 2/17 : FROM ${BASE_IMAGE}
 ---> 9e47e9dfcb9a
Step 3/17 : ENV DEBIAN_FRONTEND=noninteractive
 ---> Using cache
 ---> 8725e4621ff1
Step 4/17 : RUN apt-get update &&     apt-get install -y --no-install-recommends             software-properties-common             autoconf             automake             build-essential             cmake             curl             git             libopencv-dev             libopencv-core-dev             libssl-dev             libtool             pkg-config             python3             python3-pip             python3-dev             rapidjson-dev &&     pip3 install --upgrade wheel setuptools &&     pip3 install --upgrade grpcio-tools
 ---> Running in 25217b266d2b
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
Get:3 http://security.ubuntu.com/ub

#### b) Implement client for ensemble model

In [50]:
%%writefile ensemble_image_client.py
#!/usr/bin/env python
# Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#  * Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#  * Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#  * Neither the name of NVIDIA CORPORATION nor the names of its
#    contributors may be used to endorse or promote products derived
#    from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import argparse
import numpy as np
import os
from builtins import range
from PIL import Image
from functools import partial
from tensorrtserver.api import *
import tensorrtserver.api.model_config_pb2 as model_config

if sys.version_info >= (3, 0):
  import queue
else:
  import Queue as queue

class UserData:
    def __init__(self):
        self._completed_requests = queue.Queue()

# Callback function used for async_run()
def completion_callback(input_filenames, user_data, infer_ctx, request_id):
    user_data._completed_requests.put((request_id, input_filenames))

FLAGS = None

def model_dtype_to_np(model_dtype):
    if model_dtype == model_config.TYPE_BOOL:
        return np.bool
    elif model_dtype == model_config.TYPE_INT8:
        return np.int8
    elif model_dtype == model_config.TYPE_INT16:
        return np.int16
    elif model_dtype == model_config.TYPE_INT32:
        return np.int32
    elif model_dtype == model_config.TYPE_INT64:
        return np.int64
    elif model_dtype == model_config.TYPE_UINT8:
        return np.uint8
    elif model_dtype == model_config.TYPE_UINT16:
        return np.uint16
    elif model_dtype == model_config.TYPE_FP16:
        return np.float16
    elif model_dtype == model_config.TYPE_FP32:
        return np.float32
    elif model_dtype == model_config.TYPE_FP64:
        return np.float64
    elif model_dtype == model_config.TYPE_STRING:
        return np.dtype(object)
    return None

def parse_model(url, protocol, model_name, batch_size, verbose=False):
    """
    Check the configuration of a model to make sure it meets the
    requirements for an image classification network (as expected by
    this client)
    """
    ctx = ServerStatusContext(url, protocol, model_name, verbose)
    server_status = ctx.get_server_status()

    if model_name not in server_status.model_status:
        raise Exception("unable to get status for '" + model_name + "'")

    status = server_status.model_status[model_name]
    config = status.config

    if len(config.input) != 2:
        raise Exception("expecting 2 input, got {}".format(len(config.input)))
    if len(config.output) != 1:
        raise Exception("expecting 1 output, got {}".format(len(config.output)))

    input_0 = config.input[0]
    input_1 = config.input[1]

    output = config.output[0]

    if output.data_type != model_config.TYPE_FP32:
        raise Exception("expecting output datatype to be TYPE_FP32, model '" +
                        model_name + "' output type is " +
                        model_config.DataType.Name(output.data_type))

    # Output is expected to be a vector. But allow any number of
    # dimensions as long as all but 1 is size 1 (e.g. { 10 }, { 1, 10
    # }, { 10, 1, 1 } are all ok). Variable-size dimensions are not
    # currently supported.
    non_one_cnt = 0
    for dim in output.dims:
        if dim == -1:
            raise Exception("variable-size dimension in model output not supported")
        if dim > 1:
            non_one_cnt += 1
            if non_one_cnt > 1:
                raise Exception("expecting model output to be a vector")

    # Model specifying maximum batch size of 0 indicates that batching
    # is not supported and so the input tensors do not expect an "N"
    # dimension (and 'batch_size' should be 1 so that only a single
    # image instance is inferred at a time).
    max_batch_size = config.max_batch_size
    if max_batch_size == 0:
        if batch_size != 1:
            raise Exception("batching not supported for model '" + model_name + "'")
    else: # max_batch_size > 0
        if batch_size > max_batch_size:
            raise Exception("expecting batch size <= {} for model {}".format(max_batch_size, model_name))

    # Model input must have 3 dims, either CHW or HWC
    if len(input_0.dims) != 3:
        raise Exception(
            "expecting input to have 3 dimensions, model '{}' input has {}".format(
                model_name, len(input_0.dims)))

    # Variable-size dimensions are not currently supported.
    for dim in input_0.dims:
        if dim == -1:
            raise Exception("variable-size dimension in model input not supported")

    if ((input_0.format != model_config.ModelInput.FORMAT_NCHW) and
        (input_0.format != model_config.ModelInput.FORMAT_NHWC)):
        raise Exception("unexpected input format " + model_config.ModelInput.Format.Name(input_0.format) +
                        ", expecting " +
                        model_config.ModelInput.Format.Name(model_config.ModelInput.FORMAT_NCHW) +
                        " or " +
                        model_config.ModelInput.Format.Name(model_config.ModelInput.FORMAT_NHWC))

    if input_0.format == model_config.ModelInput.FORMAT_NHWC:
        h = input_0.dims[0]
        w = input_0.dims[1]
        c = input_0.dims[2]
    else:
        c = input_0.dims[0]
        h = input_0.dims[1]
        w = input_0.dims[2]

    return (input_0.name, input_1.name, output.name, c, h, w, input_0.format, model_dtype_to_np(input_0.data_type))

def preprocess(img, format, dtype, c, h, w, scaling):
    """
    Pre-process an image to meet the size, type and format
    requirements specified by the parameters.
    """
    #np.set_printoptions(threshold='nan')

    if c == 1:
        sample_img = img.convert('L')
    else:
        sample_img = img.convert('RGB')

    resized_img = sample_img.resize((w, h), Image.BILINEAR)
    resized = np.array(resized_img)
    if resized.ndim == 2:
        resized = resized[:,:,np.newaxis]

    typed = resized.astype(dtype)

    if scaling == 'INCEPTION':
        scaled = (typed / 128) - 1
    elif scaling == 'VGG':
        if c == 1:
            scaled = typed - np.asarray((128,), dtype=dtype)
        else:
            scaled = typed - np.asarray((123, 117, 104), dtype=dtype)
    else:
        scaled = typed

    # Swap to CHW if necessary
    if format == model_config.ModelInput.FORMAT_NCHW:
        ordered = np.transpose(scaled, (2, 0, 1))
    else:
        ordered = scaled

    # Channels are in RGB order. Currently model configuration data
    # doesn't provide any information as to other channel orderings
    # (like BGR) so we just assume RGB.
    return ordered

def postprocess(results, filenames, batch_size):
    """
    Post-process results to show classifications.
    """
    if len(results) != 1:
        raise Exception("expected 1 result, got {}".format(len(results)))

    batched_result = list(results.values())[0]
    if len(batched_result) != batch_size:
        raise Exception("expected {} results, got {}".format(batch_size, len(batched_result)))
    if len(filenames) != batch_size:
        raise Exception("expected {} filenames, got {}".format(batch_size, len(filenames)))

    for (index, result) in enumerate(batched_result):
        print("Image '{}':".format(filenames[index]))
        for cls in result:
            print("    {} ({}) = {}".format(cls[0], cls[2], cls[1]))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-v', '--verbose', action="store_true", required=False, default=False,
                        help='Enable verbose output')
    parser.add_argument('-a', '--async', dest="async_set", action="store_true", required=False,
                        default=False, help='Use asynchronous inference API')
    parser.add_argument('--streaming', action="store_true", required=False, default=False,
                        help='Use streaming inference API. ' +
                        'The flag is only available with gRPC protocol.')
    parser.add_argument('-m', '--model-name', type=str, required=True,
                        help='Name of model')
    parser.add_argument('-x', '--model-version', type=int, required=False,
                        help='Version of model. Default is to use latest version.')
    parser.add_argument('-b', '--batch-size', type=int, required=False, default=1,
                        help='Batch size. Default is 1.')
    parser.add_argument('-c', '--classes', type=int, required=False, default=1,
                        help='Number of class results to report. Default is 1.')
    parser.add_argument('-s', '--scaling', type=str, choices=['NONE', 'INCEPTION', 'VGG'],
                        required=False, default='NONE',
                        help='Type of scaling to apply to image pixels. Default is NONE.')
    parser.add_argument('-u', '--url', type=str, required=False, default='localhost:8000',
                        help='Inference server URL. Default is localhost:8000.')
    parser.add_argument('-i', '--protocol', type=str, required=False, default='HTTP',
                        help='Protocol (HTTP/gRPC) used to ' +
                        'communicate with inference service. Default is HTTP.')
    parser.add_argument('image_filename', type=str, nargs='?', default=None,
                        help='Input image / Input folder.')
    FLAGS = parser.parse_args()

    protocol = ProtocolType.from_str(FLAGS.protocol)

    if FLAGS.streaming and protocol != ProtocolType.GRPC:
        raise Exception("Streaming is only allowed with gRPC protocol")

    # Make sure the model matches our requirements, and get some
    # properties of the model that we need for preprocessing
    input_0_name, input_1_name, output_name, c, h, w, format, dtype = parse_model(
        FLAGS.url, protocol, FLAGS.model_name,
        FLAGS.batch_size, FLAGS.verbose)

    ctx = InferContext(FLAGS.url, protocol, FLAGS.model_name,
                       FLAGS.model_version, FLAGS.verbose, 0, FLAGS.streaming)

    filenames = []
    if os.path.isdir(FLAGS.image_filename):
        filenames = [os.path.join(FLAGS.image_filename, f)
                     for f in os.listdir(FLAGS.image_filename)
                     if os.path.isfile(os.path.join(FLAGS.image_filename, f))]
    else:
        filenames = [FLAGS.image_filename,]

    filenames.sort()

    # Preprocess the images into input data according to model
    # requirements
    image_data = []
    for filename in filenames:
        img = Image.open(filename)
        image_data.append(preprocess(img, format, dtype, c, h, w, FLAGS.scaling))

    # Send requests of FLAGS.batch_size images. If the number of
    # images isn't an exact multiple of FLAGS.batch_size then just
    # start over with the first images until the batch is filled.
    results = []
    result_filenames = []
    request_ids = []
    image_idx = 0
    last_request = False
    user_data = UserData()
    sent_count = 0
    while not last_request:
        input_filenames = []
        input_batch = []
        for idx in range(FLAGS.batch_size):
            input_filenames.append(filenames[image_idx])
            input_batch.append(image_data[image_idx])
            image_idx = (image_idx + 1) % len(image_data)
            if image_idx == 0:
                last_request = True

        # Send request
        if not FLAGS.async_set:
            results.append(ctx.run(
                { input_0_name : input_batch, input_1_name: input_batch },
                { output_name : (InferContext.ResultFormat.CLASS, FLAGS.classes) },
                FLAGS.batch_size))
            result_filenames.append(input_filenames)
        else:
            ctx.async_run(partial(completion_callback, input_filenames, user_data),
                            { input_0_name : input_batch, input_1_name: input_batch },
                            { output_name : (InferContext.ResultFormat.CLASS, FLAGS.classes) },
                            FLAGS.batch_size)
            sent_count += 1

    # For async, retrieve results according to the send order
    if FLAGS.async_set:
        processed_count = 0
        while processed_count < sent_count:
            (request_id, input_filenames) = user_data._completed_requests.get()
            results.append(ctx.get_async_run_results(request_id))
            result_filenames.append(input_filenames)
            processed_count += 1

    for idx in range(len(results)):
        print("Request {}, batch size {}".format(idx, FLAGS.batch_size))
        postprocess(results[idx], result_filenames[idx], FLAGS.batch_size)


Writing ensemble_image_client.py


<a id='setup_model'></a>
### 2. Set up a model repository 

#### a) Download the pre-trained models 

In [11]:
%%bash
cd triton-inference-server
cd docs/examples
./fetch_models.sh

+ mkdir -p model_repository/resnet50_netdef/1
+ wget -O model_repository/resnet50_netdef/1/model.netdef http://download.caffe2.ai.s3.amazonaws.com/models/resnet50/predict_net.pb
--2020-05-18 17:37:29--  http://download.caffe2.ai.s3.amazonaws.com/models/resnet50/predict_net.pb
Resolving download.caffe2.ai.s3.amazonaws.com (download.caffe2.ai.s3.amazonaws.com)... 52.216.112.251
Connecting to download.caffe2.ai.s3.amazonaws.com (download.caffe2.ai.s3.amazonaws.com)|52.216.112.251|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31649 (31K) [binary/octet-stream]
Saving to: ‘model_repository/resnet50_netdef/1/model.netdef’

     0K .......... .......... ..........                      100%  432K=0.07s

2020-05-18 17:37:29 (432 KB/s) - ‘model_repository/resnet50_netdef/1/model.netdef’ saved [31649/31649]

+ wget -O model_repository/resnet50_netdef/1/init_model.netdef http://download.caffe2.ai.s3.amazonaws.com/models/resnet50/init_net.pb
--2020-05-18 17:37:29--  http:/

#### b) Implement an ensemble model and export the model to onnx

Ensemble model is model averaging ensemble

In [12]:
%%writefile ensemble_model.py
import torch
import torch.nn as nn

class EnsembleModel(nn.Module):
    def __init__(self):
        super(EnsembleModel,self).__init__()
        self.act1 = nn.Softmax(dim=1)
    def forward(self,x1,x2): # x1 : FC x2: softmax
        
        return (self.act1(x1) + x2) / 2.0

def onnx_export(model, x1 ,x2, onnx_path = 'model.onnx'):
    torch.onnx.export(model,
                        (x1,x2),
                        onnx_path,
                        opset_version=10,       
                        do_constant_folding=True,
                        input_names = ['data_0', 'data_1'],
                        output_names = ['output'])
if __name__ == "__main__":
    model = EnsembleModel().cuda()
    model.eval()
    batch = 1
    x1 = torch.randn(batch,1000, 1, 1, device= torch.device('cuda'))
    x2 = torch.randn(batch,1000, 1, 1, device= torch.device('cuda'))
    out = model(x1,x2)
    onnx_export(model,x1,x2)
    print("export onnx file")

Overwriting ensemble_model.py


#### c) Run the script

In [13]:
!docker run --name pytorch --rm --runtime=nvidia  -v $(pwd):/workspace nvcr.io/nvidia/pytorch:20.03-py3 python ensemble_model.py


== PyTorch ==

NVIDIA Release 20.03 (build 11122848)
PyTorch Version 1.5.0a0+8f84ded

Container image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.

Copyright (c) 2014-2019 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered

<a id='setup_ensemble'></a>
### 3. Set up the ensemble model

In [14]:
!mkdir -p triton-inference-server/docs/examples/model_repository/PostModel_onnx/1

In [15]:
!mv model.onnx triton-inference-server/docs/examples/model_repository/PostModel_onnx/1

In [16]:
!ls triton-inference-server/docs/examples/model_repository/PostModel_onnx/1

model.onnx


In [17]:
!cp triton-inference-server/docs/examples/model_repository/resnet50_netdef/resnet50_labels.txt triton-inference-server/docs/examples/model_repository/PostModel_onnx/

In [18]:
!ls triton-inference-server/docs/examples/model_repository/PostModel_onnx/

1  resnet50_labels.txt


#### Model configuration

In [19]:
%%writefile triton-inference-server/docs/examples/model_repository/PostModel_onnx/config.pbtxt
name: "PostModel_onnx"
platform: "onnxruntime_onnx"
max_batch_size : 0
input [
  {
    name: "data_0"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    reshape { shape: [ 1, 1000, 1, 1 ] }
  },
  {
    name: "data_1"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    reshape { shape: [ 1, 1000, 1, 1 ] }
  }
  
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 1000]
    reshape { shape: [ 1, 1000, 1, 1 ] }
    label_filename: "resnet50_labels.txt"
  }
]


Writing triton-inference-server/docs/examples/model_repository/PostModel_onnx/config.pbtxt


<a id='setup_scheduler'></a>
### 4. Set up ensemble scheduler 


<img src="images/Ensemble_scheduler.png" width="300" height="300">


In [21]:
!mkdir -p triton-inference-server/docs/examples/model_repository/Ensemble_model/1

In [22]:
%%writefile triton-inference-server/docs/examples/model_repository/Ensemble_model/config.pbtxt
name: "Ensemble_model"
platform: "ensemble"
max_batch_size: 1
input [
  {
    name: "INPUT0"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  },
  {
    name: "INPUT1"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "OUTPUT"
    data_type: TYPE_FP32
    dims: [ 1000]
  }
]
ensemble_scheduling {
  step [
    {
      model_name: "densenet_onnx"
      model_version: -1
      input_map {
        key: "data_0"
        value: "INPUT0"
      }
      output_map {
        key: "fc6_1"
        value: "dense_out"
      }
    },
    {
      model_name: "resnet50_netdef"
      model_version: -1
      input_map {
        key: "gpu_0/data"
        value: "INPUT1"
      }
      output_map {
        key: "gpu_0/softmax"
        value: "resnet_out"
      }
    },
    {
      model_name: "PostModel_onnx"
      model_version: -1
      input_map {
        key: "data_0"
        value: "dense_out"
      }
      input_map {
        key: "data_1"
        value: "resnet_out"
      }
      output_map {
        key: "output"
        value: "OUTPUT"
      }
    }
    
  ]
}

Writing triton-inference-server/docs/examples/model_repository/Ensemble_model/config.pbtxt


<a id='run_server'></a>
### 5. Run triton inference serever

In [34]:
!nvidia-docker run -d --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v $(pwd)/triton-inference-server/docs/examples/model_repository/:/models nvcr.io/nvidia/tritonserver:20.03-py3  trtserver --model-repository=/models

bf35bf4498614665c057c4b65c222ea0766e04901ad3459bb82fcb982d7a3770


Wait while model loading

In [41]:
!curl localhost:8000/api/status

id: "inference:0"
version: "1.12.0"
uptime_ns: 32444249693
model_status {
  key: "Ensemble_model"
  value {
    config {
      name: "Ensemble_model"
      platform: "ensemble"
      version_policy {
        latest {
          num_versions: 1
        }
      }
      max_batch_size: 1
      input {
        name: "INPUT0"
        data_type: TYPE_FP32
        format: FORMAT_NCHW
        dims: 3
        dims: 224
        dims: 224
      }
      input {
        name: "INPUT1"
        data_type: TYPE_FP32
        format: FORMAT_NCHW
        dims: 3
        dims: 224
        dims: 224
      }
      output {
        name: "OUTPUT"
        data_type: TYPE_FP32
        dims: 1000
      }
      ensemble_scheduling {
        step {
          model_name: "densenet_onnx"
          model_version: -1
          input_map {
            key: "data_0"
            value: "INPUT0"
          }
          output_map {
            key: "fc6_1"
            value: "de

<a id='request'></a>
### 6. Request image classification

In [55]:
!docker run --rm -t --net=host -v $(pwd):/workspace/client --name client tritonserver_client python client/ensemble_image_client.py -m Ensemble_model -s INCEPTION images/mug.jpg

Request 0, batch size 1
Image 'images/mug.jpg':
    504 (COFFEE MUG) = 0.8049629926681519
