# TensorFlow model serving with Konduit Serving

This notebook illustrates a simple client-server interaction to perform inference on a TensorFlow model using the Python SDK for Konduit Serving.  

This tutorial is split into two parts: 

1. Configuration 
2. Running the server

In [1]:
from konduit import ParallelInferenceConfig, ServingConfig, TensorFlowConfig, ModelConfigType
from konduit import TensorDataTypesConfig, ModelStep, InferenceConfiguration
from konduit.server import Server
from konduit.client import Client

import time
import numpy as np

## Configuration

Konduit Serving works by defining a series of "pipeline steps". These include operations such as 
1. Pre- or post-processing steps
2. One or more machine learning models
3. Transforming the output in a way that can be understood by humans

If deploying your model does not require pre- nor post-processing, only one pipeline step - a machine learning model - is required. This configuration is defined using a single `ModelStep`. 

Before running this notebook, you should run the `build_jar.py` script and copy the JAR (`konduit.jar`) to this folder. Refer to the [Python SDK README](https://github.com/KonduitAI/konduit-serving/blob/master/python/README.md) for details. 

Start by downloading the model weights to the `data` folder. 

In [2]:
from urllib.request import urlretrieve 
from zipfile import ZipFile
urlretrieve("https://deeplearning4jblob.blob.core.windows.net/testresources/bert_mrpc_frozen_v1.zip", "../data/bert.zip")
with ZipFile('../data/bert.zip', 'r') as zipObj:
    zipObj.extractall()

### Configuring `ModelStep` 

Define the TensorFlow configuration as a `TensorFlowConfig` object. 

- `tensor_data_types_config`: The TensorFlowConfig object requires a dictionary `input_data_types`. Its keys should represent column names, and the values should represent data types as strings, e.g. `"INT32"`. See [here](https://github.com/KonduitAI/konduit-serving/blob/master/konduit-serving-api/src/main/java/ai/konduit/serving/model/TensorDataType.java) for a list of supported data types. 
- `model_config_type`: This argument requires a `ModelConfigType` object. Specify `model_type` as `TENSORFLOW`, and `model_loading_path` to point to the location of TensorFlow weights saved in the PB file format.


In [3]:
input_data_types = {'IteratorGetNext:0': 'INT32',
                    'IteratorGetNext:1': 'INT32',
                    'IteratorGetNext:4': 'INT32'}

tensorflow_config = TensorFlowConfig(
    tensor_data_types_config = TensorDataTypesConfig(input_data_types = input_data_types),
    model_config_type = ModelConfigType(model_type = 'TENSORFLOW',
                                        model_loading_path = '../data/bert_mrpc_frozen.pb')
)

Now that we have a `TensorFlowConfig` defined, we can define a `ModelStep`. The following parameters are specified: 
- `model_config`: pass the TensorFlowConfig object here 
- `parallel_inference_config`: specify the number of workers to run in parallel. Here, we specify `workers = 1`.
- `input_names`:  names for the input data  
- `output_names`: names for the output data

In [4]:
input_names = list(input_data_types.keys())
output_names = ["loss/Softmax"]

tf_pipeline_step = ModelStep(model_config = tensorflow_config,
                                     parallel_inference_config = ParallelInferenceConfig(workers = 1),
                                     input_names = input_names,
                                     output_names = output_names)

### Configuring the server

Specify the following:
- `http_port`: select a random port.
- `input_data_type`, `output_data_type`: Specify input and output data types as strings. 

<div class="alert alert-info">
ℹ Accepted input and output data types are as follows: 
    <ul>
        <li> Input: JSON, ARROW, IMAGE, ND4J (not yet implemented) and NUMPY. </li>
        <li> Output: NUMPY, JSON, ND4J (not yet implemented) and ARROW.</li>
    </ul>
</div>

In [5]:
port = np.random.randint(1000, 65535)
serving_config = ServingConfig(http_port = port,
                               input_data_type = 'NUMPY',
                               output_data_type = 'NUMPY')

The `ServingConfig` has to be passed to `InferenceConfiguration`, in addition to the pipeline steps as a Python list. In this case, there is a single pipeline step: `tf_pipeline_step`. 

In [6]:
server = Server(inference_config = InferenceConfiguration(serving_config = serving_config,
                                                pipeline_steps = [tf_pipeline_step]))

By default, `Server()` looks for the Konduit Serving JAR `konduit.jar` in the directory the script is run in. To change this default, use the `jar_path` argument.

The configuration is stored as a dictionary. Note that the configuration can be converted to a dictionary using the `as_dict()` method: 

In [7]:
server.config.as_dict()

{'@type': 'InferenceConfiguration',
 'pipelineSteps': [{'@type': 'ModelStep',
   'inputNames': ['IteratorGetNext:0',
    'IteratorGetNext:1',
    'IteratorGetNext:4'],
   'outputNames': ['loss/Softmax'],
   'modelConfig': {'@type': 'TensorFlowConfig',
    'tensorDataTypesConfig': {'@type': 'TensorDataTypesConfig',
     'inputDataTypes': {'IteratorGetNext:0': 'INT32',
      'IteratorGetNext:1': 'INT32',
      'IteratorGetNext:4': 'INT32'}},
    'modelConfigType': {'@type': 'ModelConfigType',
     'modelType': 'TENSORFLOW',
     'modelLoadingPath': '../data/bert_mrpc_frozen.pb'}},
   'parallelInferenceConfig': {'@type': 'ParallelInferenceConfig',
    'workers': 1}}],
 'servingConfig': {'@type': 'ServingConfig',
  'httpPort': 41551,
  'inputDataType': 'NUMPY',
  'outputDataType': 'NUMPY'}}

### Configuring the client 

To configure the client, create a Client object with the following arguments: 
- `input_names`: names of the input data
- `output_names`: names of the output data
- `input_type`: data type passed to the server for inference
- `endpoint_output_type`: data type returned by the server endpoint 
- `return_output_type`: data type to be returned to the client. Note that this argument can be used to convert the output returned from the server to the client into a different format, e.g. NUMPY to JSON.


<div class="alert alert-warning">
    ⚠ Future versions of the Python SDK may remove the <code>input_names</code> and <code>output_names</code> arguments in <code>Client()</code>, since these are already specified in <code>ModelStep()</code>. 
</div>

In [8]:
client = Client(input_names = input_names,
                output_names = output_names,
                input_type = 'NUMPY',
                endpoint_output_type = 'NUMPY',
                return_output_type = "NUMPY",
                url = 'http://localhost:' + str(port))

## Running the server 

Load some sample data from NumPy files. Note that these are NumPy arrays, each with shape (4, 128): 

In [9]:
data_input = {
    'IteratorGetNext:0': np.load('../data/input-0.npy'),
    'IteratorGetNext:1': np.load('../data/input-1.npy'),
    'IteratorGetNext:4': np.load('../data/input-4.npy')
}

Start the server and wait 60 seconds for the server to start before the client requests the server for a prediction using the `data_input`. 

In [10]:
server.start()
time.sleep(60)

predicted = client.predict(data_input)
print(predicted)

server.stop()

Wrote config.json to path C:\Users\Skymind AI Berhad\Documents\pk_konduit-serving\python\examples\config.json
Running with args
java -cp konduit.jar ai.konduit.serving.configprovider.KonduitServingMain --configPath C:\Users\Skymind AI Berhad\Documents\pk_konduit-serving\python\examples\config.json --verticleClassName ai.konduit.serving.verticles.inference.InferenceVerticle




[[0.996409   0.00359104]
 [0.97321105 0.02678899]
 [0.9955929  0.00440712]
 [0.9962774  0.00372254]]
