# Use batch deployments for image file processing with MLflow

The following notebook demostrates how to use batch endpoints to deploy models that work with images. Particularly, we are going to deploy a TensorFlow model for the popular ImageNet classification problem using MLflow.

This notebook requires:

- `tensorflow`
- `tensorflow_hub`
- `pillow`
- `azure-ai-ml`
- `azureml-mlflow`
- `pandas`
- `scipy`

## 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

### 1.1. Import the required libraries

In [1]:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import (
    BatchEndpoint,
    BatchDeployment,
    Model,
    AmlCompute,
    Data,
    BatchRetrySettings,
    CodeConfiguration,
    Environment,
)
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.identity import DefaultAzureCredential

### 1.2. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../jobs/configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [2]:
# enter details of your AML workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

In [3]:
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

## 2. Using MLflow with images

When working with MLflow models that processes images, it is important to take into account that you won't be providing an scoring script. Hence, any data transformation that needs to be done before actually running the classifier needs to be done inside the model itself. Fortunately, you can design models that can compute these transformations:

### 2.1 Creating an MLflow model for image classification

The following example shows how to create a TensorFlow model that takes images of any size and preprocess them using keras layers.

In [4]:
import tensorflow_hub as hub
import tensorflow as tf

model = tf.keras.Sequential(
    [
        tf.keras.layers.Resizing(
            244, 244, interpolation="bilinear", crop_to_aspect_ratio=False
        ),
        tf.keras.layers.Rescaling(1 / 255.0),
        hub.KerasLayer(
            "https://tfhub.dev/google/imagenet/resnet_v2_101/classification/5"
        ),
        tf.keras.layers.Softmax(axis=-1),
    ]
)
model.build([None, None, None, 3])

2022-11-29 13:55:34.189807: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-29 13:55:35.426790: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/mic/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64/:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64/
2022-11-29 1



2022-11-29 13:55:37.680232: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-11-29 13:55:37.680306: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (santiagxf-workbench): /proc/driver/nvidia/version does not exist
2022-11-29 13:55:37.681032: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


Let's save this model in a local folder

In [5]:
model_local_path = "imagenet-classifier-mlflow/model"
model.save(model_local_path)





INFO:tensorflow:Assets written to: imagenet-classifier-mlflow/model/assets


INFO:tensorflow:Assets written to: imagenet-classifier-mlflow/model/assets


### 2.2 Adding labels to the model predictions

We are going to include the labels for the predicted class in the directory so we can use them for inference:

In [13]:
!wget https://azuremlexampledata.blob.core.windows.net/data/imagenet/ImageNetLabels.txt -P imagenet-classifier-mlflow/model

--2022-11-29 13:58:54--  https://azuremlexampledata.blob.core.windows.net/data/imagenet/ImageNetLabels.txt
Resolving azuremlexampledata.blob.core.windows.net (azuremlexampledata.blob.core.windows.net)... 20.209.0.229
Connecting to azuremlexampledata.blob.core.windows.net (azuremlexampledata.blob.core.windows.net)|20.209.0.229|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10484 (10K) [text/plain]
Saving to: ‘imagenet-classifier-mlflow/model/ImageNetLabels.txt’


2022-11-29 13:58:54 (439 KB/s) - ‘imagenet-classifier-mlflow/model/ImageNetLabels.txt’ saved [10484/10484]



### 2.3 Creating a custom model loader for MLflow

Let's create a custom loader for the MLflow model:

In [6]:
%%writefile imagenet-classifier-mlflow/code/module_loader.py

import pandas as pd
import tensorflow as tf

class TfClassifier():
    def __init__(self, model_path: str, labels_path: str):
        import numpy as np
        from tensorflow.keras.models import load_model
        
        self.model = load_model(model_path)
        self.imagenet_labels = np.array(open(labels_path).read().splitlines())

    def predict(self, data):

        preds = self.model.predict(data)

        pred_prob = tf.reduce_max(preds, axis=-1)
        pred_class = tf.argmax(preds, axis=-1)
        pred_label = [self.imagenet_labels[pred] for pred in pred_class]

        return pd.DataFrame({
            "class": pred_class, 
            "probability": pred_prob,
            "label": pred_label
        })

def _load_pyfunc(data_path: str):
    import os

    model_path = os.path.abspath(data_path)
    labels_path = os.path.join(model_path, "ImageNetLabels.txt")

    return TfClassifier(model_path, labels_path)

Overwriting imagenet-classifier-mlflow/code/module_loader.py


### 2.4 Adding a model signature for images

Indicating a signature for your model

In [8]:
import numpy as np
import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec

input_schema = Schema(
    [
        TensorSpec(np.dtype(np.uint8), (-1, -1, -1, 3)),
    ]
)
signature = ModelSignature(inputs=input_schema)

Creating the dependencies:

In [9]:
from mlflow.utils.environment import _mlflow_conda_env

custom_env = _mlflow_conda_env(
    additional_conda_deps=None,
    additional_pip_deps=["tensorflow"],
    additional_conda_channels=None,
)



Logging the model:

In [14]:
mlflow_model_path = "mlflow-model"
mlflow.pyfunc.save_model(
    mlflow_model_path,
    data_path="imagenet-classifier-mlflow/model",
    code_path=["imagenet-classifier-mlflow/code/module_loader.py"],
    loader_module="module_loader",
    conda_env=custom_env,
    signature=signature,
)

<mlflow.models.model.Model at 0x7f61248b2d00>

### 2.5 Registering the new model

In [9]:
mlflow_model_name = f"{model_name}-mlflow"
ml_client.models.create_or_update(
    Model(
        name=mlflow_model_name,
        path=mlflow_model_path,
        type=AssetTypes.MLFLOW_MODEL,
    )
)

Your file exceeds 100 MB. If you experience low upload speeds or latency, we recommend using the AzCopy tool for this file transfer. See https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-v10 for more information.
[32mUploading mlflow-model (182.74 MBs): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

Model({'job_name': None, 'is_anonymous': False, 'auto_increment_version': False, 'name': 'imagenet-classifier-mlflow', 'description': None, 'tags': {}, 'properties': {}, 'id': '/subscriptions/18522758-626e-4d88-92ac-dc9c7a5c26d4/resourceGroups/Analytics.Aml.Experiments.Workspaces/providers/Microsoft.MachineLearningServices/workspaces/aa-ml-aml-workspace/models/imagenet-classifier-mlflow/versions/6', 'Resource__source_path': None, 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/santiagxf-workbench/code/repos/azureml-examples/sdk/python/endpoints/batch', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x7f8131e8ab20>, 'serialize': <msrest.serialization.Serializer object at 0x7f8131e8ab80>, 'version': '6', 'latest_version': None, 'path': 'azureml://subscriptions/18522758-626e-4d88-92ac-dc9c7a5c26d4/resourceGroups/Analytics.Aml.Experiments.Workspaces/workspaces/aa-ml-aml-workspace/datastores/workspaceblobstore/paths/LocalUpload/9ed9a7b32dd145511708

This new model can be used for batch scoring using batch deployments.

In [10]:
model = ml_client.models.get(name=mlflow_model_name, label="latest")

## 3 Create Batch Endpoint

Batch endpoints are endpoints that are used batch inferencing on large volumes of data over a period of time. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters. Batch endpoints store outputs to a data store for further analysis.

To create an online endpoint we will use `BatchEndpoint`. This class allows user to configure the following key aspects:
- `name` - Name of the endpoint. Needs to be unique at the Azure region level
- `auth_mode` - The authentication method for the endpoint. Currently only Azure Active Directory (Azure AD) token-based (`aad_token`) authentication is supported. 
- `defaults` - Default settings for the endpoint.
   - `deployment_name` - Name of the deployment that will serve as the default deployment for the endpoint.
- `description`- Description of the endpoint.

### 3.1 Configure the endpoint

First, let's create the endpoint that is going to host the batch deployments. To ensure that our endpoint name is unique, let's create a random suffix to append to it. 

> In general, you won't need to use this technique but you will use more meaningful names. Please skip the following cell if your case:

In [None]:
import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "imagenet-classifier-" + endpoint_suffix

Let's configure the endpoint:

In [None]:
endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An batch service to perform ImageNet image classification",
)

### 3.2 Create the endpoint
Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

## 4. Create a batch deployment

### 4.2 Creating the compute

Batch deployments can run on any Azure ML compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an AzureML compute cluster called `cpu-cluster`. Let's verify the compute exists on the workspace or create it otherwise.

In [None]:
compute_name = "cpu-cluster"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    print(f"Compute {compute_name} is not created. Creating...")
    compute_cluster = AmlCompute(
        name=compute_name, description="amlcompute", min_instances=0, max_instances=5
    )
    ml_client.begin_create_or_update(compute_cluster)

Compute may take time to be created. Let's wait for it:

In [None]:
print("Waiting for compute", end="")
while ml_client.compute.get(name=compute_name).provisioning_state == "Creating":
    sleep(1)
    print(".", end="")

print(" [DONE]")

A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `BatchDeployment` class.

In [12]:
mlflow_deployment = BatchDeployment(
    name="imagenet-classifier-rnet-mlflow",
    description="A ResNetV2 model architecture for performing ImageNet classification in batch with MLflow",
    endpoint_name=endpoint.name,
    model=model,
    compute=compute_name,
    instance_count=2,
    max_concurrency_per_instance=1,
    mini_batch_size=10,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)

### 2.7 Create the deployment

In [13]:
ml_client.batch_deployments.begin_create_or_update(mlflow_deployment)

<azure.core.polling._poller.LROPoller at 0x7f811ae95850>

Let's wait for the deployment to complete:

In [14]:
print(f"Waiting for batch deployment {mlflow_deployment.name}", end="")
while not any(
    filter(
        lambda m: m.name == mlflow_deployment.name,
        ml_client.batch_deployments.list(endpoint_name),
    )
):
    sleep(1)
    print(".", end="")

print(" [DONE]")

Waiting for batch deployment imagenet-classifier-rnet-mlflow [DONE]
