# Inference Schema
In this example, we use the [Inference Schema](https://github.com/Azure/InferenceSchema) package to facilitate automatic Swagger generation and parameter casting for Managed Online Endpoints.

## Background

[Inference Schema](https://github.com/Azure/InferenceSchema) is an open source library for ML applications maintained by Travis Angevine [@trangevi](https://github.com/trangevi) that streamlines schema design and development for ML applications and offers features such as parameter type definition, automatic type conversion, and schema/swagger file generation. Using Inference Schema, users can easily define parameter types and associate them to functions with input and output decorators. 

Inference Schema integrates directly with AzureML endpoints. User `run` functions with Inference Schema decorators can be defined with an arbitrary number of arguments and receive automatic swagger file generation at `/swagger.json`.

## Function Decorators
The decorators `input_schema` and `output_schema` are used to define the schema. The `input_schema` decorator can be stacked multiple times as in [score-standard](code/score-standard.py) to correspond to multiple function arguments in the run function. 

## Parameter Types
There are 4 core parameter types available:
- StandardPythonParameterType
- PandasParameterType
- NumpyParameterType
- SparkParameterType

It is possible to nest parameter types by wrapping them in a list or dict and a `StandardParameterType`.

## Swagger
The automatically-generated Swagger can be retrieved by default at `/swagger.json` with an optional `version` HTTP parameter. See the [online-endpoints-openapi](online-endpoints-openapi.ipynb) example for more details about generating and accessing Swagger files.


# 1. Configure parameters, assets, and clients

## 1.1 Set workspace details

In [None]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

## 1.2 Set asset details

In [None]:
import random

rand = random.randint(0, 10000)

endpoint_name = f"infsrv-{rand}"
model_name = "infsrv"
model_version = str(rand)

## 1.3 Import required packages

In [None]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    CodeConfiguration,
    Environment,
)
from azure.identity import DefaultAzureCredential

## 1.4 Create an MLClient instance

In [None]:
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    workspace_name=workspace_name,
)

# 2. Create endpoint and model

### 2.1 Define and create the endpoint

In [None]:
endpoint = ManagedOnlineEndpoint(name=endpoint_name)

In [None]:
endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result()

### 2.2 Get endpoint key

In [None]:
key = ml_client.online_endpoints.get_keys(endpoint_name).primary_key

### 2.3 Register the model

In [None]:
model = Model(name=model_name, version=model_version, path="inference-schema/models")
model = ml_client.models.create_or_update(model)

## 3. Create Deployment: Numpy

Each of the deployments in this example use the same model and produce the same output, however they each accept input in different formats.

For the first deployment, we will use the [score-numpy.py](../../../../../cli/endpoints/online/managed/inference-schema/code/score-numpy.py) scoring script, which declares a single input parameter `iris` as a `NumpyParameterType` and a nested `StandardParameterType` as output.

```python
@input_schema(
    param_name="iris",
    param_type=NumpyParameterType(np.array([[7.2, 3.2, 6.0, 1.8]]))
)
@output_schema(
    output_type=StandardPythonParameterType({
        "Category" : ["Virginica"]
    })
)
``` 
When the following [sample input](../../../../../cli/endpoints/online/managed/inference-schema/sample-inputs/numpy.json) is sent, it is automatically validated and converted to a Numpy array. Validation can be toggled using the `enforce_column_type` and `enforce_shape` arguments to `NumpyParameterType`.

```json
{"iris": [[7.2, 3.2, 6.0, 1.8], [4.2, 3.5, 1.0, 3.0]]}
```


### 3.1 Define the deployment

In [None]:
deployment = ManagedOnlineDeployment(
    name="infsrv-numpy",
    endpoint_name=endpoint_name,
    model=f"azureml:{model.name}:{model.version}",
    code_configuration=CodeConfiguration(
        code="inference-schema/code", scoring_script="score-numpy.py"
    ),
    environment=Environment(
        image="mcr.microsoft.com/azureml/minimal-ubuntu20.04-py38-cpu-inference",
        conda_file="inference-schema/env.yml",
    ),
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

### 3.2 Create the deployment

In [None]:
deployment = ml_client.online_deployments.begin_create_or_update(deployment).result()

### 3.4 Set traffic to 100%

In [None]:
endpoint.traffic = {"infsrv-numpy": 100}
endpoint = ml_client.begin_create_or_update(endpoint).result()

### 3.5 Test the endpoint

In [None]:
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="inference-schema/sample-inputs/numpy.json",
)

### 3.6 Get swagger

In [None]:
import requests

res = requests.get(url=endpoint.openapi_uri, headers={"Authorization": f"Bearer {key}"})
print(res.content)

## 4. Create Deployment: Standard / Multiple Parameters

This deployment uses a different scoring script - [score-standard.py](../../../../../cli/endpoints/online/managed/inference-schema/code/score-numpy.py) - in which the run function accepts multiple list arguments. This is done by using the `StandardPythonParameter` type and apploying the `input_schema` decorator multiple times. 

```python
@input_schema(
    param_name="sepal_length",
    param_type=StandardPythonParameterType([7.2])
)
@input_schema(
    param_name="sepal_width",
    param_type=StandardPythonParameterType([3.2])
)
@input_schema(
    param_name="petal_length",
    param_type=StandardPythonParameterType([6.0])
)
@input_schema(
    param_name="petal_width",
    param_type=StandardPythonParameterType([1.8])
)
@output_schema(
    output_type=StandardPythonParameterType({
        "Category" : ["Virginica"]
    })
)
def run(sepal_length, sepal_width, petal_length, petal_width):
``` 
This script produces the same output as before using the following [sample input](../../../../../cli/endpoints/online/managed/inference-schema/sample-inputs/standard.json).

```json
{
    "sepal_length": [7.2, 4.2],
    "sepal_width": [3.2,3.5], 
    "petal_length": [6.0,1.0],
    "petal_width": [1.8,3.0]
}


### 4.1 Define the deployment

In [None]:
deployment = ManagedOnlineDeployment(
    name="infsrv-standard",
    endpoint_name=endpoint_name,
    model=f"azureml:{model.name}:{model.version}",
    code_configuration=CodeConfiguration(
        code="inference-schema/code/", scoring_script="score-standard.py"
    ),
    environment=Environment(
        image="mcr.microsoft.com/azureml/minimal-ubuntu20.04-py38-cpu-inference",
        conda_file="inference-schema/env.yml",
    ),
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

### 4.2 Create the deployment

In [None]:
deployment = ml_client.online_deployments.begin_create_or_update(deployment).result()

### 4.3 Set traffic to 100%

In [None]:
endpoint.traffic = {"infsrv-standard": 100}
endpoint = ml_client.begin_create_or_update(endpoint).result()

### 4.4 Test the endpoint

In [None]:
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="inference-schema/sample-inputs/standard.json",
)

### 4.5 Get swagger

In [None]:
res = requests.get(url=endpoint.openapi_uri, headers={"Authorization": f"Bearer {key}"})
print(res.content)

## 5. Create Deployment: Standard / Multiple Parameters

This scoring script - [score-pandas.py](../../../../../cli/endpoints/online/managed/inference-schema/code/score-pandas.py) - accepts a single Pandas dataframe by using the following decorators
```python
@input_schema(
    param_name="iris",
    param_type=PandasParameterType(pd.DataFrame({
        "sepal_length": [7.2],
        "sepal_width": [3.2],
        "petal_length": [6.0],
        "petal_width": [1.8]})
    )
)
@output_schema(
    output_type=StandardPythonParameterType({
        "Category" : ["Virginica"]
    })
)
``` 
This script produces the same output as before using the following [sample input](../../../../../cli/endpoints/online/managed/inference-schema/sample-inputs/pandas.json).

```json
{
    "iris": {
        "sepal_length": {
            "0": 7.2,
            "1": 4.2
        },
        "sepal_width": {
            "0": 3.2,
            "1": 3.5
        },
        "petal_length": {
            "0": 6.0,
            "1": 1.0
        },
        "petal_width": {
            "0": 1.8,
            "1": 3.0
        }
    }
}
```


### 5.1 Define the deployment

In [None]:
deployment = ManagedOnlineDeployment(
    name="infsrv-pandas",
    endpoint_name=endpoint_name,
    model=f"azureml:{model.name}:{model.version}",
    code_configuration=CodeConfiguration(
        code="inference-schema/code", scoring_script="score-pandas.py"
    ),
    environment=Environment(
        image="mcr.microsoft.com/azureml/minimal-ubuntu20.04-py38-cpu-inference",
        conda_file="inference-schema/env.yml",
    ),
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

### 5.2 Create the deployment

In [None]:
deployment = ml_client.online_deployments.begin_create_or_update(deployment).result()

### 5.3 Set traffic to 100%

In [None]:
endpoint.traffic = {"infsrv-pandas": 100}
endpoint = ml_client.begin_create_or_update(endpoint).result()

### 5.4 Test the endpoint

In [None]:
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="inference-schema/sample-inputs/pandas.json",
)

### 5.5 Get swagger

In [None]:
res = requests.get(url=endpoint.openapi_uri, headers={"Authorization": f"Bearer {key}"})
print(res.content)

## 6. Delete assets

### 6.1 Delete the endpoint

In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint_name)