# Deploying models with MLflow

Azure ML supports no-code deployment of models created and logged with MLflow. This means that you don't have to provide a scoring script or an environment. Those models can be deployed to ACI (Azure Container Instances), AKS (Azure Kubernetes Services) or our managed inference services (usually referred as MIR). 

For no-code-deployment, Azure Machine Learning

- Dynamically installs Python packages provided in the conda.yaml file, this means the dependencies are installed during container runtime.
- The base container image/curated environment used for dynamic installation is `mcr.microsoft.com/azureml/mlflow-ubuntu18.04-py37-cpu-inference` or `AzureML-mlflow-ubuntu18.04-py37-cpu-inference`.

The following table shows the target support for MLflow models in Azure ML:


| Feature | ACI | AKS | MIR |
| :- | :-: | :-: | :-: |
| Deploying models logged with MLflow to real time inference | ☑️* | ☑️* | ☑️* |
| Deploying models logged with MLflow to batch inference | ☐** | ☐** | ☑️ |
| Deploying models with ColSpec signatures | ☑️**** | ☑️**** | ☑️**** |
| Deploying models with TensorSpec signatures | ☑️ | ☑️ | ☑️ |
| Run models logged with MLflow in you local compute with Azure ML CLI v2 | ☑️ | ☑️ | ☐*** |
| Debug online endpoints locally in Visual Studio Code (preview) | ☐ | ☐ | ☐ |

**Notes:**
- (*) Spark flavor is not supported at the moment.
- (**) We suggest you to use Azure Machine Learning Pipelines with Parallel Run Step.
- (***) For deploying MLflow models locally, use the command `mlflow models serve -m <MODEL_NAME>`. Configure the environment variable `MLFLOW_TRACKING_URI` with the URL of your tracking server.
- (****) Data type `mlflow.types.DataType.Binary` is not supported as column type. For models that works with images, we suggest you to use Base64 encoding schemes with a `mlflow.types.DataType.String` column type, which is commonly used when there is a need to encode binary data that needs be stored and transferred over media. 



There are two workflows for deploying models to Azure ML:

- Using the MLflow plugin [azureml-mlflow](https://pypi.org/project/azureml-mlflow/).
- Using Azure ML CLI/SDK v2.

If you are familiar with MLflow are you wish to continue using the same set of methods, keep using the approach. If, on the other hand, you are more familiar with the Azure ML CLI, you want to automate deployments using CI/CD pipelines, or you want to keep deployments configuration in a git repository, we recommend you to use the Azure ML CLI v2.

## Deploying models using Azure ML plugin for MLflow

The MLflow plugin azureml-mlflow can deploy models to Azure ML, either to Azure Kubernetes Service (AKS), Azure Container Instances (ACI) and Managed Inference Service (MIR) for real-time serving.

> Note that deploying to Managed Inference Service - Batch endpoints is not supported in the MLflow plugin at the moment.

### Prerequisites

- Install the `azureml-mlflow` package.
- Configure the MLflow tracking URI or MLflow's registry URI to point to the workspace you are working on. See [MLflow Tracking URI to connect with Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow) for more details.

### Deploying models to ACI/AKS

Deployments can be generated using both the Python API for MLflow or MLflow CLI. In both cases, a JSON configuration file can be indicated with the details of the deployment you want to achieve. If not indicated, then a default deployment is done using Azure Container Instances (ACI) and a minimal configuration. The full specification of this configuration for ACI and AKS file can be checked at [Deployment configuration schema](https://docs.microsoft.com/en-us/azure/machine-learning/reference-azure-machine-learning-cli#deployment-configuration-schema).

#### Configuration example for ACI deployment

```json
{
  "computeType": "aci",
  "containerResourceRequirements":
  {
    "cpu": 1,
    "memoryInGB": 1
  },
  "location": "eastus2",
}
```

Remarks:
- If `containerResourceRequirements` is not indicated, a deployment with minimal compute configuration is applied (cpu: 0.1 and memory: 0.5).
- If `location` is not indicated, it defaults to the location of the workspace.

#### Configuration example for an AKS deployment

```json
{
  "computeType": "aks",
  "computeTargetName": "aks-mlflow"
}
```

Remarks:
- In above exmaple, `aks-mlflow` is the name of an Azure Kubernetes Cluster registered/created in Azure Machine Learning.

The following sample creates a deployment using an ACI:

In [None]:
import json
from mlflow.deployments import get_deploy_client

# Create the deployment configuration.
# If no deployment configuration is provided, then the deployment happens on ACI.
deploy_config = {"computeType": "aci"}

# Write the deployment configuration into a file.
deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

# Set the tracking uri in the deployment client.
client = get_deploy_client("<azureml-mlflow-tracking-url>")

# MLflow requires the deployment configuration to be passed as a dictionary.
config = {"deploy-config-file": deployment_config_path}
model_name = "mymodel"
model_version = 1

# define the model path and the name is the service name
# if model is not registered, it gets registered automatically and a name is autogenerated using the "name" parameter below
client.create_deployment(
    model_uri=f"models:/{model_name}/{model_version}",
    config=config,
    name="mymodel-aci-deployment",
)

### Deploying models to Managed Inference

Deployments can be generated using both the Python API for MLflow or MLflow CLI. In both cases, a JSON configuration file needs to be indicated with the details of the deployment you want to achieve. The full specification of this configuration can be found at [Managed online deployment schema (v2)](https://docs.microsoft.com/en-us/azure/machine-learning/reference-yaml-deployment-managed-online).

#### Configuration example for an Managed Inference Service deployment (real time)

```json
{
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}
```

Remarks:
- We recommend `instance_count` to be at least 3 to ensure High Availability.

The following sample deploys a model to a real time Managed Inference Endpoint:

In [None]:
import json
from mlflow.deployments import get_deploy_client

# Create the deployment configuration.
deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

# Write the deployment configuration into a file.
deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

# Set the tracking uri in the deployment client.
client = get_deploy_client("<azureml-mlflow-tracking-url>")

# MLflow requires the deployment configuration to be passed as a dictionary.
config = {"deploy-config-file": deployment_config_path}
model_name = "mymodel"
model_version = 1

# define the model path and the name is the service name
# if model is not registered, it gets registered automatically and a name is autogenerated using the "name" parameter below
client.create_deployment(
    model_uri=f"models:/{model_name}/{model_version}",
    config=config,
    name="mymodel-mir-deployment",
)

Remarks:
- Notice how `endpoint` has been specified for Managed Inference endpoints, which support multiple deployment being deployed to the same endpoint.

## Deploying models using Azure ML CLI v2 to Managed Inference

You can use Azure ML CLI v2 to deploy models trained and logged with MLflow to Managed Inference. When you deploy your MLflow model using the Azure ML CLI v2, it's a no-code-deployment so you don't have to provide a scoring script or an environment, but you can if needed.

### Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

- The [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/) and the `ml` extension to the Azure CLI. For more information, see [Install, set up, and use the CLI (v2) (preview)](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-cli).
- A MLflow model. You can deploy models inside a run, models registered in Azure Machine Learning Registry, or a MLflow model's folder stored in the local file system.

### Deploying models to Managed Inference

To deploy models using the Azure ML CLI v2:

1. Create a YAML configuration file for your endpoint. The following example configures the name and authentication mode of the endpoint:

```yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: mymodel-mir-deployment
auth_mode: key
```

2. To create a new endpoint using the YAML configuration, use the following command:

```bash
ENDPOINT_NAME="mymodel-mir-deployment"
az ml online-endpoint create -n $ENDPOINT_NAME -f mlflow-endpoint.yaml
```

3. Create a YAML configuration file for the deployment. 

```yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: default
endpoint_name: mymodel-mir-endpoint
model: azureml:model_name:1
instance_type: Standard_DS2_v2
instance_count: 1
```

4. To create the deployment using the YAML configuration, use the following command:

```bash
az ml online-deployment create --endpoint $ENDPOINT_NAME -f mlflow-deployment.yaml --all-traffic
```

## Considerations when deploying to real time inference

When deploying to any of the real time services, take the following into consideration:


### Input's format

The following input's types are supported in Azure ML. Take a look at *Notes* in the bottom of the table for additional considerations.

| Input type | Support in MLflow models (serve) | Support in Azure ML|
| :- | :-: | :-: |
| JSON-serialized pandas DataFrames in the split orientation | ☑️ | ☑️ |
| JSON-serialized pandas DataFrames in the records orientation | ☑️ | ☐* |
| CSV-serialized pandas DataFrames | ☑️ | ☐** |
| Tensor input format as JSON-serialized lists (tensors) and dictionary of lists (named tensors) | ☐ | ☑️ |
| Tensor input formatted as in TF Serving’s API | ☑️ | ☐ |

Notes:

- (*) We suggest you to use split orientation instead. Records orientation doesn't guarante column ordering preservation.
- (**) We suggest you to explore batch inference for processing files.

Your inputs should be submitted inside the a JSON payload containing a dictionary with key `input_data`. The following shows a valid example for the heart classifier model we were working on in JSON-serialized pandas DataFrames in the split orientation:

```json
{
    "input_data": {
        "columns": [
            "age", "sex", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal"
        ],
        "index": [1],
        "data": [
            [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
        ]
    }
}
```

### Using your own scoring script or environment with MLflow models

Although Azure ML supports deployment without indicating an scoring script or an environment, you can still indicate such configurations when deploying models using the Azure ML CLI v2 in the endpoint configuration:

```yaml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: main
endpoint_name: heart-classifier-endpoint
model: azureml:heart-classifier:1
code_configuration:
  code:
    local_path: score.py
instance_type: Standard_DS3_v2
instance_count: 1
```

The corresponding `score.py` file would look as follows:

In [1]:
%%writefile score.py

"""
Scoring routine
"""
import logging
import json
import os
import pandas as pd
import numpy as np
import mlflow
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType

input_sample = pd.DataFrame(data=[{
    "age":63,
    "sex":1,
    "cp":1,
    "trestbps":145,
    "chol":233,
    "fbs":1,
    "restecg":2,
    "thalach":150,
    "exang":0,
    "oldpeak":2.3,
    "slope":3,
    "ca":0,
    "thal":2
},])

output_sample = np.ndarray([1,])

MODEL = None

def init():
    model_path = os.getenv("AZUREML_MODEL_DIR")
    logging.info(f"[INFO] Loading model from package {model_path}")

    global MODEL
    # You can use the specific flavor (`xgboost`) or the generic one `pyfunc`
    MODEL = mlflow.pyfunc.load_model(model_path)


@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
    logging.info("Request received")

    try:
        results = MODEL.predict(data)
        if isinstance(results, pd.DataFrame):
            results = results.values
        return json.dumps({"result": results.tolist()})

    except RuntimeError as E:
        logging.error(f'[ERR] Exception happened: {str(E)}')
        return f'Input {str(data)}. Exception was: {str(E)}'


Writing score.py


## Considerations when deploying to batch inference

Azure ML supports no-code deployment for batch inference in Managed Inference service. This represents a convenient way to deploy models that require processing of big amounts of data in a batch-fashion.

### How work is distributed on workers

Work is distributed at the file level, for both structured and unstructured data. As a consequence, only [file datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets#filedataset) or [URI folders](https://docs.microsoft.com/en-us/azure/machine-learning/reference-yaml-data) are supported for this feature. Each worker processes batches of `Mini batch size` files at a time. Further parallelism can be achieved if `Max concurrency per instance` is increased. 

> Nested folder structures are not explored during inference. If you are partitioning your data using folders, make sure to flatten the structure beforehand.


### File's types support

The following data types are supported for batch inference.

| File extension | Type returned as model's input | Signature requirement |
| :- | :- | :- |
| `.csv` | `pd.DataFrame` | `ColSpec`. If not provided, columns typing is not enforced. |
| `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.gif` | `np.ndarray` | `TensorSpec`. Input is reshaped to match tensors shape if available. If no signature is available, tensors of type `np.uint8` are inferred. |
