# Inferencing Iris XGBoost PMML Model using AI-Serving

PMML stands for Predictive Model Markup Language. It is the de facto standard to present the classic machine learning models. With PMML, it is easy to develop a model on one system using one application and deploy the model on another system using another application.

In this tutorial, we will use the PMML to show how to deploy the famous Iris classifier using AI-Serving

# <a id="contents"></a>Contents
This notebook contains the following parts:

**[Setup](#setup)**<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Prerequisites to run the notebook](#prerequisites)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Additional information about `ai_serving_pb2.py`](#additional)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Install dependencies](#dependencies)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Import dependent modules](#import)<br />
**[Validate server](#validate)**<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Define the base HTTP URL](#httpurl)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Test the server availability](#testserver)<br />
**[Deploy the PMML model](#deploy)**<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Deploy the Iris XGBoost model into AI-Serving](#deploy-pmml)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Retrieve metadata of the deployed model](#metadata)<br />
**[Make predictions](#predictions)**<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Prepare the testing records](#test-data)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[HTTP request formats](#request)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[JSON requests](#json-request)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Construct JSON requests](#construct-json-request)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Make the HTTP requests with JSON data](#make-json-request)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Consume the HTTP response with JSON data](#consume-json-response)<br />
&nbsp;&nbsp;&nbsp;&nbsp;[Binary requests](#binary-request)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Construct binary requests](#construct-binary-request)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Make the HTTP requests with binary data](#make-binary-request)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[Consume the HTTP responses with binary data](#consume-binary-response)<br />
**[Next steps](#next-steps)**<br />

# <a id="setup"></a>Setup

## <a id="prerequisites"></a>Prerequisites to run the notebook

Run a docker container of AI-Serving. The port `9090` is the port of HTTP endpoint while `9091` is for gRPC, you could see an error likes `Bind for 0.0.0.0:9090 failed: port is already allocated`, then use another new port instead of the first part as follows `-p $(NEW_PORT):9090` to run the command again, and remember the port is always needed in the URL of HTTP endpoint. It will aslo pull the latest docker image of AI-Serving from docker hub if it hvaen't been downloaded yet. Please, refer to [Docker Containers for AI-Serving](https://github.com/autodeployai/ai-serving/tree/master/dockerfiles) about more docker images.

```bash
docker run --rm -it -v $(PWD):/opt/ai-serving -p 9090:9090 -p 9091:9091 autodeployai/ai-serving
```

## <a id="additional"></a>Additional information about `ai_serving_pb2.py`
In the current directory, there is a python file `ai_serving_pb2.py`, which is generated from compiling the [ai-serving.proto](https://github.com/autodeployai/ai-serving/tree/master/src/main/protobuf/ai-serving.proto) using [protoc](https://developers.google.com/protocol-buffers/docs/pythontutorial), for example, the command as follows:

```bash
protoc -I=$SRC_DIR --python_out=. ai-serving.proto
```

## <a id="dependencies"></a>Install dependencies
We will install python libraries for HTTP request and data manipulation:

In [None]:
!pip install requests
!pip install numpy

## <a id="import"></a>Import dependent modules
Import some dependent modules that we are going to need to run the Iris XGBoost model.

In [10]:
import os
import numpy as np
import requests
from pprint import pprint

import ai_serving_pb2

# <a id="validate"></a>Validate server

## <a id="httpurl"></a>Define the base HTTP URL
Change the port number `9090` to the appropriate port number if you had changed it during AI-Serving docker instantiation.

In [13]:
port = 9090
base_url = 'http://localhost:' + str(port)

## <a id="testserver"><a>Test the server availability
Use the specific endpoint `http://host:port/up` to test whether the server has been initialized and is ready to accept requests. The `OK` message indicates it's already available.

In [15]:
test_url = base_url + '/up'
response = requests.get(test_url)
print('The status of the server: ', response.text)

The status of the server:  OK


# <a id="deploy"></a>Deploy the PMML model

## <a id="deploy-pmml">Deploy the Iris XGBoost model into AI-Serving
First, we need to deploy the PMML model `xgb-iris.pmml` into AI-Serving, which can serve multiple models or multiple versions for a named model at once. The PMML model was generated by the notebook [`Training an Iris classifier using XGBoost`](https://github.com/autodeployai/ai-serving/blob/master/examples/IrisXGBoost.ipynb).

You must specify a correct content type for PMML models when constructing an HTTP request to deploy a PMML model, the candidates are:
 * application/xml
 * text/xml

In [18]:
# The specified servable name
model_name = 'iris'
deployment_url = base_url + '/v1/models/' + model_name

# The specified content type for the model:
headers = {'Content-Type': 'application/xml'}

model_path = os.path.join('models', 'xgb-iris.pmml')
with open(model_path, 'rb') as file:
    deployment_response = requests.put(deployment_url, headers=headers, data=file)

# The response is a JSON object contains the sepcified servable name and the model version deployed
deployment_response_info = deployment_response.json()
print('The depoyment response: ', deployment_response_info)

The depoyment response:  {'name': 'iris', 'version': 1}


## <a id="metadata"></a>Retrieve metadata of the deployed model
The metadata will contain model inputs and outputs, which are needed when constructing an input request and consume an output response.

In [20]:
model_version = deployment_response_info['version']
metadata_url = base_url + '/v1/models/' + model_name + '/versions/' + str(model_version)
metadata_response = requests.get(metadata_url)
metadata_response_json = metadata_response.json()

# Model info of the specified version
model_info = metadata_response_json['versions'][0]

# Extra some key values: the inputs and outputs
inputs = [x['name'] for x in model_info['inputs']]
outpus = [x['name'] for x in model_info['outputs']]

# Show the metadata result in json
print('The model metadata response:')
pprint(metadata_response.json())

The model metadata response:
{'createdAt': '2024-10-09T20:54:01',
 'id': '02ae4650-0e17-472d-a919-95edb1bbde21',
 'latestVersion': 1,
 'name': 'iris',
 'updateAt': '2024-10-09T20:54:01',
 'versions': [{'algorithm': 'MiningModel',
               'app': 'Nyoka',
               'appVersion': '5.5.0',
               'copyright': 'Copyright (c) 2021 Software AG',
               'createdAt': '2024-10-09T20:54:01',
               'description': 'Default description',
               'formatVersion': '4.4.1',
               'functionName': 'classification',
               'hash': 'cedae9d98bced8fe67de18318d31b209',
               'inputs': [{'name': 'sepal length (cm)',
                           'optype': 'continuous',
                           'type': 'double'},
                          {'name': 'sepal width (cm)',
                           'optype': 'continuous',
                           'type': 'double'},
                          {'name': 'petal length (cm)',
                         

# <a id="predictions"></a>Make predictions

## <a id="test-data"></a>Prepare the testing records
We will use the following testing records in different formats.

In [23]:
list_records = [[5.7, 4.4, 1.5, 0.4], [6.4, 2.8, 5.6, 2.1]]
map_records = [{'sepal length (cm)': 5.7, 
                'sepal width (cm)': 4.4,
                'petal length (cm)': 1.5,
                'petal width (cm)': 0.4}, 
               {'sepal length (cm)': 6.4, 
                'sepal width (cm)': 2.8,
                'petal length (cm)': 5.6,
                'petal width (cm)': 2.1},]

## <a id="request">HTTP request formats for the AI-Serving
The request for AI-Serving could have two formats: JSON and binary, the HTTP header Content-Type tells the server which format to handle and thus it is required for all requests. The binary payload has better latency, especially for the big tensor value for ONNX models, while the JSON format is easy for human readability.

- Content-Type: application/json. The request body must be a JSON object formatted as described [here](https://github.com/autodeployai/ai-serving#4-predict-api).


- Content-Type: application/octet-stream, application/vnd.google.protobuf or application/x-protobuf. The request body must be the protobuf message PredictRequest, besides of those common scalar values, it can use the standard onnx.TensorProto value directly.

## <a id="json-request"></a>JSON requests

### <a id="construct-json-request"></a>Construct JSON requests
We will create both JSON objects, one is using the `Records` format that contains a single record, the other is using `Split` format that contains two records.

In [27]:
# Create a JSON object using `records` that contains a single record.
request_json_recoreds = {
    'X': map_records
}

# Create a JSON object using `split` that contains two records with a filter that
# only the output `predicted_Species` is expected.
request_json_split = {
    'X': {
        'columns': inputs,
        'data': list_records
    },
    'filter': ['predicted_species']
}

### <a id="make-json-request">Make the HTTP requests with JSON data
Make predictions using the AI-Serving, the content type of requests with JSON data must be `application/json`.

In [29]:
# When version is omitted, the latest version is used.
prediction_url = base_url + '/v1/models/' + model_name

# The Content-Type: application/json is specified implicitly when using json instead of data
prediction_json_response_records = requests.post(prediction_url, json=request_json_recoreds)
prediction_json_response_split = requests.post(prediction_url, json=request_json_split)

### <a id="consume-json-response">Consume the HTTP response with JSON data
Having received the results from the server, we are going to parse the JSON text that we just received for us to make sense of the results. **NOTE: The data format of the output response is always the same as the input request.**

In [31]:
print('JSON prediction response of the request using `records`:')
pprint(prediction_json_response_records.json())

print('-'*120)

# Only the predicton column `predicted_Species` is expected.
print('JSON prediction response of the request using `records` with a filter:')
pprint(prediction_json_response_split.json())

JSON prediction response of the request using `records`:
{'result': [{'predicted_species': 0,
             'species_probability_0': 0.5506691017471723,
             'species_probability_1': 0.22587241877164385,
             'species_probability_2': 0.22345847948118383},
            {'predicted_species': 2,
             'species_probability_0': 0.22347717521202046,
             'species_probability_1': 0.22720021905738896,
             'species_probability_2': 0.5493226057305906}]}
------------------------------------------------------------------------------------------------------------------------
JSON prediction response of the request using `records` with a filter:
{'result': {'columns': ['predicted_species'], 'data': [[0], [2]]}}


## <a id="binary-request"></a>Binary requests

### <a id="construct-binary-request"></a>Construct binary requests
We will create both instances of PredictRequest, one is using the `Records` format that contains a single record, the other is using the `Split` format that contains two records.

In [34]:
from ai_serving_pb2 import RecordSpec, Record, PredictRequest, ListValue, Value

# Create an instance of RecordSpec using `records` that contains a single record.
request_message_records = PredictRequest(X=RecordSpec(
    records=[ Record(fields={y[0]: Value(number_value=y[1]) for y in x.items()}) for x in map_records ]))

# Create an instance of RecordSpec using `split` that contains two records with a filter that
# only the output `predicted_Species` is expected.
request_message_split = PredictRequest(
    X=RecordSpec(
        columns = inputs,
        data = [ ListValue(values=[Value(number_value=y) for y in x]) for x in list_records ]),
    filter=['predicted_species'])

## <a id="make-binary-request"></a>Make the HTTP requests with binary data
Make predictions using the AI-Serving, the content type of requests with binary data must be one of those three candidates above.

In [36]:
headers = {'Content-Type': 'application/x-protobuf'}

# When version is omitted, the latest version is used.
prediction_url = base_url + '/v1/models/' + model_name

# Make prediction for the `records` request message.
prediction_response_records = requests.post(prediction_url, 
                                           headers=headers, 
                                           data=request_message_records.SerializeToString())

# Make prediciton for the `split` request message.
prediction_response_split = requests.post(prediction_url, 
                                           headers=headers, 
                                           data=request_message_split.SerializeToString())

## <a id="consume-binary-response"></a>Consume the HTTP response with binary data
Having received the results from the server, we are going to parse the "serialized" message that we just received for us to make sense of the results. **NOTE: The data format of the output response is always the same as the input request.**

In [38]:
# Parse the response message from the `recrods` request.
response_message = ai_serving_pb2.PredictResponse()
response_message.ParseFromString(prediction_response_records.content)
print('Binary prediction response of the request using `records`:')
print(response_message)

print('-'*120)

# Parse the response message from the `split` request.
response_message = ai_serving_pb2.PredictResponse()
response_message.ParseFromString(prediction_response_split.content)
print('Binary prediction response of the request using `split` with a filter:')
print(response_message)

Binary prediction response of the request using `records`:
result {
  records {
    fields {
      key: "predicted_species"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "species_probability_0"
      value {
        number_value: 0.5506691017471723
      }
    }
    fields {
      key: "species_probability_1"
      value {
        number_value: 0.22587241877164385
      }
    }
    fields {
      key: "species_probability_2"
      value {
        number_value: 0.22345847948118383
      }
    }
  }
  records {
    fields {
      key: "predicted_species"
      value {
        number_value: 2.0
      }
    }
    fields {
      key: "species_probability_0"
      value {
        number_value: 0.22347717521202046
      }
    }
    fields {
      key: "species_probability_1"
      value {
        number_value: 0.22720021905738896
      }
    }
    fields {
      key: "species_probability_2"
      value {
        number_value: 0.5493226057305906
      }
    }
  

## <a id="next-steps"></a>Next steps

I hope the tutoiral can help you to learn how to use the AI-Serving. If you have any questions, please open issues on this repository. Feedback and contributions to the project, no matter what kind, are always very welcome.

**Star the [AI-Serving](https://github.com/autodeployai/ai-serving) project if it's helpful for you!!!**