# Konduit Serving Model Runtime with YAML Configuration

Konduit supports specifying server configurations as YAML files. This allows you to serve simple server configurations using: 
1. the Konduit Python CLI, and   
2. the `konduit.load` module. 

The YAMLs on this page can be used as boilerplate code for your model serving use cases. 

Some resources on the YAML format are as follows: 
- https://gettaurus.org/docs/YAMLTutorial/
- https://docs.saltstack.com/en/latest/topics/yaml/
- http://jessenoller.com/blog/2009/04/13/yaml-aint-markup-language-completely-different

In [None]:
from urllib.request import urlretrieve 
from zipfile import ZipFile
dl_path = "../data/bert/bert.zip"
if not os.path.isfile(dl_path):
    urlretrieve("https://deeplearning4jblob.blob.core.windows.net/testresources/bert_mrpc_frozen_v1.zip", 
                dl_path)
with ZipFile(dl_path, 'r') as zipObj:
    zipObj.extractall()

A Konduit Serving YAML configuration file has three top-level entities: 
1. `serving`
2. `steps`
2. `client`

The following is a sample YAML file for serving a Python script located at `simple.py` which takes a NumPy array `first` as input and a NumPy array `second` as output:

```yaml
serving:
  http_port: 1337
  input_data_format: NUMPY
  output_data_format: NUMPY
  log_timings: True
  extra_start_args: -Xmx8g
steps:
  python_step:
    type: PYTHON
    python_path: .
    python_code_path: ./simple.py
    python_inputs:
      first: NDARRAY
    python_outputs:
      second: NDARRAY
client:
    port: 1337
```


## Serving 

The server configuration takes the following arguments: 

- `http_port`: specify the port number 
- `input_data_format` and `output_data_format`: specify one of the following: JSON, NUMPY, ARROW, IMAGE
- `log_timings`: specify True to log timings 
- `extra_start_args`: Java Virtual Machine (JVM) arguments. In this case, `-Xmx8g` specifies that the maximum memory allocation for the JVM is 8GB. 


Refer to the [Server](https://serving.oss.konduit.ai/server/inference) documentation for details. 


## Client 

Refer to the [Client](https://serving.oss.konduit.ai/client/python-client) documentation for details. 

- `input_names`, `output_names`: names of the first and final nodes of the Konduit Serving pipeline configuration defined in the Server. These arguments are typically inherited from the Server when initialized. 
- `input_data_format`, `output_data_format`, `return_output_data_format`: One of the following: JSON, NUMPY, ARROW, IMAGE. `input_data_format` and `output_data_format` refer to the format of the server's input and output, whereas `return_output_data_format` specifies the data format returned to the client. 
- `port`: specify the same HTTP port as the Server. 

## Running YAML configurations 

Assume all commands are run from this folder (`notebooks`). 

The CLI provides a handy command `predict-numpy` that returns predictions from a model server, if the input name is `default` and a **NumPy array** is supplied as input. To initialize the server, run the following command: 

```bash 
konduit serve --config ../yaml/konduit.yaml
```

Once the server has started, run `predict-numpy` to obtain the predicted output given the location of the NumPy array saved as a [NumPy `.npy` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.format.html): 

```bash
konduit predict-numpy --config ../yaml/konduit.yaml --numpy_data ../data/bert/input-0.npy
```

Finally, to stop the server, run the `stop-server` command: 
```bash
konduit stop-server --config ../yaml/konduit.yaml
```

# PythonStep

Python steps can take any argument that can be passed to `PythonConfig`.  
Specify a Python step as follows: 

```yaml
steps: 
  python_step: 
    type: PYTHON
    python_code: simple.py 
```

- `type`: specify this as PYTHON
- `python_code`: if you want to specify your Python code directly in your YAML file. The following [documentation](http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html) may be helpful for specifying multi-line Python code, specifically the section on literal block scalars.
- `python_code_path`: specify the path of a Python `.py` script. 
- `python_inputs`: name-value pairs specifying the data types for each of the inputs referenced in the script 
- `python_outputs`: name-value pairs specifying the data types for each of the outputs referenced in the script
- `python_path`: location of the Python modules. Generally, if your script only requires NumPy, setting a custom `python_path` is not necessary. Refer to the [Python modules](https://serving.oss.konduit.ai/python#python-modules-and-the-pythonpath-argument) documentation on setting a custom Python path with additional modules. 

The names referenced in `python_inputs` and `python_outputs` correspond with `inputColumnNames` and `outputColumnNames`.  Modifying `python_inputs` and `python_outputs` does not modify the input and output name of the step. `input_names` and `output_names` are arguments to `PythonStep` which cannot be accessed through the YAML configuration, and default to the name `default`. 

## Example 1: Array operation

```yaml
serving:
  http_port: 1337
  input_data_format: NUMPY
  output_data_format: NUMPY
  log_timings: True
  extra_start_args: -Xmx8g
steps:
  python_step:
    type: PYTHON
    python_code_path: ./simple.py
    python_inputs:
      first: NDARRAY
    python_outputs:
      second: NDARRAY
client:
    port: 1337
```

Run the following commands in the command line: 

```bash
konduit serve --config ../yaml/simple.yaml
konduit predict-numpy --config ../yaml/simple.yaml --numpy_data ../data/simple/input_arr.npy
konduit stop-server --config ../yaml/simple.yaml
```

## Example 2: PyTorch and ONNX Runtime

The following is a sample YAML for serving a PyTorch model using a Python script.

```yaml
serving:
  http_port: 1337
  input_data_format: NUMPY
  output_data_format: NUMPY
  log_timings: True
  extra_start_args: -Xmx8g
steps:
  python_step:
    type: PYTHON
    python_path: .
    python_code_path: ../python/pytorch.py
    python_inputs:
      image: NDARRAY
    python_outputs:
      img_out_y: NDARRAY
client:
    port: 1337
```

Note the following: 
- The `python_path` has been left out intentionally. Replace this following the instructions in the [Python modules documentation](https://serving.oss.konduit.ai/python#python-modules-and-the-pythonpath-argument), making sure that ONNX and PyTorch are installed in said Python environment. Refer to the [PyTorch quickstart](https://pytorch.org/). 
- The YAML file referenced below shows how to use YAML literal blocks to embed the Python code within your YAML file. 

Refer to the [ONNX Runtime](https://serving.oss.konduit.ai/examples/onnx) page for complete documentation.

Run the following commands in the command line: 

```bash
konduit serve --config ../yaml/pytorch.yaml
konduit predict-numpy --config ../yaml/pytorch.yaml --numpy_data ../data/pytorch/im.npy
konduit stop-server --config ../yaml/pytorch.yaml
```

# ModelStep

## Deeplearning4j
A Deeplearning4j model step can be specified as follows: 

```yaml
steps:
  dl4j_mln_step:
    type: MULTI_LAYER_NETWORK
    model_loading_path: ../data/multilayernetwork/SimpleCNN.zip
    input_names: 
    - image_array
    output_names: 
    - output
    input_data_types:
      image_array: FLOAT
```

Depending on the type of model, specify a `dl4j_mln_step` or a `dl4j_cg_step` for MultiLayerNetwork and ComputationGraph models respectively. 


- `type`: `MULTI_LAYER_NETWORK` or `COMPUTATION_GRAPH`
- `model_loading_path`: location of model weights 
- `input_names` and `output_names`: name of input and output nodes. See [here](https://serving.oss.konduit.ai/examples/dl4j#configuring-modelstep) for details on obtaining the names of input and output nodes. 
- `input_data_types`: map input nodes to data types. List of accepted data types are available [here](https://github.com/KonduitAI/konduit-serving/blob/master/konduit-serving-api/src/main/java/ai/konduit/serving/model/TensorDataType.java).



The following is a sample YAML file for serving a Deeplearning4j model:

```yaml
serving:
  http_port: 1337
  input_data_format: NUMPY
  output_data_format: NUMPY
  log_timings: True
  extra_start_args: -Xmx8g
steps:
  dl4j_mln_step:
    type: MULTI_LAYER_NETWORK
    model_loading_path: ../data/multilayernetwork/SimpleCNN.zip
    input_names: 
    - image_array
    output_names: 
    - output
    input_data_types:
      image_array: FLOAT
client:
    port: 1337
```

Run the following commands in the command line: 

```bash
konduit serve --config ../yaml/deeplearning4j.yaml
konduit predict-numpy --config ../yaml/deeplearning4j.yaml --numpy_data ../data/multilayernetwork/image_array.npy --input_names input_layer
konduit stop-server --config ../yaml/deeplearning4j.yaml
```

## Tensorflow Graph ('frozen model')

Konduit Serving supports loading models saved in the TensorFlow Graph format. See the [relevant documentation](https://serving.oss.konduit.ai/examples/tensorflow-model-serving/tf-mnist) on how to save models in the TensorFlow Graph format. 

Declare a TensorFlow step in your YAML file as follows: 

- `type`: set `type` as `TENSORFLOW`
- `model_loading_path`: location of the model weights 
- `input_names`, `output_names`: a list of the input and output nodes 
- `input_data_types`: maps input nodes to the corresponding [data type](https://github.com/KonduitAI/konduit-serving/blob/master/konduit-serving-api/src/main/java/ai/konduit/serving/model/TensorDataType.java)
- `parallel_inference_config`: specify the number of workers to run in parallel 

### Example 1: MNIST classifier
```yaml
steps:
  tensorflow_step:
    type: TENSORFLOW
    model_loading_path: ../data/mnist/mnist_2.0.0.pb
    input_names:
      - input_layer
    output_names:
      - output_layer/Softmax
    input_data_types:
      input_layer: FLOAT
    parallel_inference_config:
      workers: 1
```



Run the following commands in the command line: 

```bash
konduit serve --config ../yaml/tensorflow-mnist.yaml
konduit predict-numpy --config ../yaml/tensorflow-mnist.yaml --numpy_data ../data/mnist/input_layer.npy --input_names input_layer
konduit stop-server --config ../yaml/tensorflow-mnist.yaml
```

### Example 2: Multiple input nodes 

A sample YAML serving a TensorFlow Graph model with multiple input nodes.

```yaml 
steps:
  tensorflow_step:
    type: TENSORFLOW
    model_loading_path: bert_mrpc_frozen.pb
    input_names:
      - IteratorGetNext:0
      - IteratorGetNext:1
      - IteratorGetNext:4
    output_names:
      - loss/Softmax
    input_data_types:
      IteratorGetNext:0: INT32
      IteratorGetNext:1: INT32
      IteratorGetNext:4: INT32
    parallel_inference_config:
      workers: 1
```

Run the following commands in the command line: 

```bash
konduit serve --config ../yaml/tensorflow-bert.yaml
konduit predict-numpy --config ../yaml/tensorflow-bert.yaml --numpy_data "../data/bert/input-0.npy,../data/bert/input-1.npy,../data/bert/input-4.npy" --input_names "IteratorGetNext:0,IteratorGetNext:1,IteratorGetNext:4"
konduit stop-server --config ../yaml/tensorflow-bert.yaml
```

## Keras 

Konduit Serving supports Keras HDF5 models via Deeplearning4J model import. The following is a sample YAML file for serving a Keras model: 

- `type`: specify this as `KERAS`
- `model_loading_path`: location of the model weights 
- `input_names`, `output_names`: names for the input and output nodes, as lists  

Input and output names can be obtained by visualizing the graph in [Netron](https://github.com/lutzroeder/netron). 

```yaml
steps:
  keras_step:
    type: KERAS
    model_loading_path: ../data/keras/embedding_lstm_tensorflow_2.h5
    input_names:
    - input 
    output_names:
    - lstm_1

```



Run the following commands in the command line: 

```bash
konduit serve --config ../yaml/keras.yaml
konduit predict-numpy --config ../yaml/keras.yaml --numpy_data ../data/keras/input.npy --input_names input
konduit stop-server --config ../yaml/keras.yaml
```