# Deploy MLflow model to online endpoints
Learn how to deploy your [MLflow](https://www.mlflow.org/) model to an [online endpoint](https://docs.microsoft.com/azure/machine-learning/concept-endpoints). When you deploy your MLflow model to an online endpoint, it's a no-code-deployment. It doesn't require scoring script and environment.

### Requirements - In order to benefit from this tutorial, you will need:
- This sample notebook assumes you're using online endpoints; for more information, see [What are Azure Machine Learning endpoints?](https://docs.microsoft.com/azure/machine-learning/concept-endpoints).
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../jobs/configuration.ipynb)
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../README.md) - check the getting started section

# 1. Connect to Azure Machine Learning Workspace
The [workspace](https://docs.microsoft.com/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [None]:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
)
from azure.identity import DefaultAzureCredential
from azure.ai.ml.constants import AssetTypes

# Import additional libraries for improved logging
import logging
import json
import time
from IPython.display import display, JSON

In [None]:
# Set up logging configuration
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger('azure-ml-notebook')
logger.info('Notebook execution started')

## 1.2 Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../jobs/configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [None]:
# enter details of your AML workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

In [None]:
# get a handle to the workspace
try:
    logger.info(f"Connecting to workspace: {workspace} in resource group: {resource_group}")
    ml_client = MLClient(
        DefaultAzureCredential(), subscription_id, resource_group, workspace
    )
    logger.info(f"Successfully connected to workspace: {workspace}")
    # Display workspace information
    print(f"Workspace information:")
    print(f"  Name: {ml_client.workspace_name}")
    print(f"  Resource Group: {ml_client.resource_group_name}")
    print(f"  Location: {ml_client.location}")
except Exception as e:
    logger.error(f"Error connecting to workspace: {str(e)}")
    raise

Or if you are working in a compute instance in Azure Machine Learning:

In [None]:
# If working in a compute instance in Azure Machine Learning
try:
    logger.info("Connecting to workspace from compute instance configuration")
    ml_client = MLClient.from_config(DefaultAzureCredential())
    logger.info(f"Successfully connected to workspace: {ml_client.workspace_name}")
except Exception as e:
    logger.error(f"Error connecting from compute instance: {str(e)}")

# 2. Create Online Endpoint

Online endpoints are endpoints that are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.

To create an online endpoint we will use `ManagedOnlineEndpoint`. This class allows user to configure the following key aspects:

- `name` - Name of the endpoint. Needs to be unique at the Azure region level
- `auth_mode` - The authentication method for the endpoint. Key-based authentication and Azure ML token-based authentication are supported. Key-based authentication doesn't expire but Azure ML token-based authentication does. Possible values are `key` or `aml_token`.
- `identity`- The managed identity configuration for accessing Azure resources for endpoint provisioning and inference.
    - `type`- The type of managed identity. Azure Machine Learning supports `system_assigned` or `user_assigned identity`.
    - `user_assigned_identities` - List (array) of fully qualified resource IDs of the user-assigned identities. This property is required is `identity.type` is user_assigned.
- `description`- Description of the endpoint.

## 2.1 Configure the endpoint

In [None]:
# Creating a unique endpoint name with current datetime to avoid conflicts
import datetime

online_endpoint_name = "endpoint-" + datetime.datetime.now().strftime("%m%d%H%M%f")
logger.info(f"Creating online endpoint with name: {online_endpoint_name}")

# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is a sample online endpoint for mlflow model",
    auth_mode="key",
    tags={"foo": "bar"},
)

# Print endpoint configuration for better visibility
print(f"Endpoint configuration:")
print(f"  Name: {endpoint.name}")
print(f"  Auth Mode: {endpoint.auth_mode}")
print(f"  Tags: {endpoint.tags}")

## 2.2 Create the endpoint
Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
# Create the endpoint with progress updates
try:
    logger.info("Starting endpoint creation...")
    start_time = time.time()
    result = ml_client.begin_create_or_update(endpoint).result()
    end_time = time.time()
    logger.info(f"Endpoint creation completed in {end_time - start_time:.2f} seconds")
    
    # Display the endpoint information in a structured format
    print(f"\nEndpoint created successfully:")
    print(f"  Name: {result.name}")
    print(f"  Provisioning State: {result.provisioning_state}")
    print(f"  Created: {result.creation_context.created_on}")
    print(f"  Scoring URI: {result.scoring_uri}")
except Exception as e:
    logger.error(f"Error creating endpoint: {str(e)}")
    raise

## 3. Create a blue deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `ManagedOnlineDeployment` class. This class allows user to configure the following key aspects.

- `name` - Name of the deployment.
- `endpoint_name` - Name of the endpoint to create the deployment under.
- `model` - The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.
- `instance_type` - The VM size to use for the deployment. For the list of supported sizes, see [Managed online endpoints SKU list](https://docs.microsoft.com/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list).
- `instance_count` - The number of instances to use for the deployment

### No code deployment
For MLflow no-code-deployment (NCD) to work, setting `type` to `MLFLOW` is mandatory. 
When you deploy a MLflow model to managed online endpoint, scoring script and environment is generated for you.

Azure ML models consist of the binary file(s) that represent a machine learning model and any corresponding metadata. Models can be created from a local file or directory. The created model will be tracked in the workspace under the specified name and version.

The Model class can be used to create a model. It accepts the following parameters:

- `name` - Name of the model.
- `version` - Version of the model. If omitted, Azure ML will autogenerate a version.
- `path` - Local path to the model file(s). This can point to either a file or a directory.
- `type` - Storage format of the model. Applicable for no-code deployment scenarios. Allowed values are `CUSTOM`, `MLFLOW` and `TRITON`
- `description` - Description of the model.

## 3.1 Configure the deployment

In [None]:
# create a blue deployment
logger.info("Configuring model deployment")
try:
    model = Model(
        path="sklearn-diabetes/model",
        type=AssetTypes.MLFLOW_MODEL,
        description="my sample mlflow model",
    )
    logger.info(f"Model configured: {model.path}")
    
    blue_deployment = ManagedOnlineDeployment(
        name="blue",
        endpoint_name=online_endpoint_name,
        model=model,
        instance_type="Standard_F4s_v2",
        instance_count=1,
    )
    
    # Print deployment configuration details
    print(f"Deployment configuration:")
    print(f"  Name: {blue_deployment.name}")
    print(f"  Endpoint: {blue_deployment.endpoint_name}")
    print(f"  Model: {model.path} (Type: {model.type})")
    print(f"  VM Size: {blue_deployment.instance_type}")
    print(f"  Instance Count: {blue_deployment.instance_count}")
except Exception as e:
    logger.error(f"Error configuring deployment: {str(e)}")
    raise

## 3.2 Create the deployment

Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
# Create the deployment with progress tracking
try:
    logger.info("Starting blue deployment creation...")
    print("Creating deployment (this may take several minutes)...")
    start_time = time.time()
    deployment_result = ml_client.online_deployments.begin_create_or_update(blue_deployment).result()
    end_time = time.time()
    deployment_time = end_time - start_time
    logger.info(f"Deployment completed in {deployment_time:.2f} seconds")
    
    # Display detailed deployment information
    print(f"\nDeployment created successfully:")
    print(f"  Name: {deployment_result.name}")
    print(f"  Provisioning State: {deployment_result.provisioning_state}")
    print(f"  Deployment Time: {deployment_time:.2f} seconds")
    print(f"  Model: {deployment_result.model.name}")
except Exception as e:
    logger.error(f"Error creating deployment: {str(e)}")
    print(f"Deployment failed: {str(e)}")
    raise

In [None]:
# Update traffic allocation with better error handling
try:
    logger.info("Updating traffic allocation to 100% for blue deployment")
    endpoint.traffic = {"blue": 100}
    traffic_update = ml_client.begin_create_or_update(endpoint).result()
    logger.info("Traffic allocation updated successfully")
    
    # Print traffic allocation details
    print(f"Traffic allocation updated:")
    print(f"  Deployment 'blue': {endpoint.traffic['blue']}%")
except Exception as e:
    logger.error(f"Error updating traffic allocation: {str(e)}")
    raise

# 4. Test the deployment

Using the `MLClient` created earlier, we will get a handle to the endpoint. The endpoint can be invoked using the invoke command with the following parameters:

- `endpoint_name` - Name of the endpoint
- `request_file` - File with request data
- `deployment_name` - Name of the specific deployment to test in an endpoint

We will send a sample request using a [sample-request-lightgbm.json](sample-request-lightgbm.json) file.

In [None]:
# Test the deployment with improved visualization
try:
    logger.info(f"Invoking endpoint {online_endpoint_name} with deployment 'blue'")
    print("Sending test request to endpoint...")
    
    # Display the input data for reference
    with open("sample-request-sklearn.json", "r") as f:
        request_data = json.load(f)
    print("\nRequest data:")
    display(JSON(request_data))
    
    # Invoke the endpoint
    response = ml_client.online_endpoints.invoke(
        endpoint_name=online_endpoint_name,
        deployment_name="blue",
        request_file="sample-request-sklearn.json",
    )
    
    # Display the response
    print("\nResponse:")
    try:
        response_json = json.loads(response)
        display(JSON(response_json))
    except json.JSONDecodeError:
        print(response)
        
    logger.info("Endpoint invocation completed successfully")
except Exception as e:
    logger.error(f"Error invoking endpoint: {str(e)}")
    print(f"Endpoint invocation failed: {str(e)}")
    raise

# 5. Get endpoint details

In [None]:
# Get the endpoint details with better visualization
try:
    logger.info(f"Retrieving details for endpoint: {online_endpoint_name}")
    endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
    
    # Print comprehensive endpoint details
    print(f"\nEndpoint Details:")
    print(f"  Name: {endpoint.name}")
    print(f"  Scoring URI: {endpoint.scoring_uri}")
    print(f"  Traffic Configuration: {endpoint.traffic}")
    print(f"  Auth Mode: {endpoint.auth_mode}")
    print(f"  Provisioning State: {endpoint.provisioning_state}")
    print(f"  Created On: {endpoint.creation_context.created_on}")
    print(f"  Created By: {endpoint.creation_context.created_by}")
    
    # Get and display deployment details
    print("\nDeployment Details:")
    deployment = ml_client.online_deployments.get(name="blue", endpoint_name=endpoint.name)
    print(f"  Name: {deployment.name}")
    print(f"  Model: {deployment.model.name}")
    print(f"  Instance Type: {deployment.instance_type}")
    print(f"  Instance Count: {deployment.instance_count}")
    print(f"  Provisioning State: {deployment.provisioning_state}")
    
    logger.info("Successfully retrieved endpoint and deployment details")
except Exception as e:
    logger.error(f"Error retrieving endpoint details: {str(e)}")
    raise

# 6. Delete the deployment and endopoint

In [None]:
# Delete the endpoint with confirmation and status updates
try:
    confirm = input(f"Type 'yes' to confirm deletion of endpoint '{online_endpoint_name}': ")
    if confirm.lower() == 'yes':
        logger.info(f"Starting deletion of endpoint {online_endpoint_name}")
        print("Deleting endpoint (this may take a few minutes)...")
        start_time = time.time()
        operation = ml_client.online_endpoints.begin_delete(name=online_endpoint_name)
        result = operation.result()
        end_time = time.time()
        logger.info(f"Endpoint deletion completed in {end_time - start_time:.2f} seconds")
        print(f"Endpoint '{online_endpoint_name}' deleted successfully")
    else:
        print("Deletion cancelled")
        logger.info("Endpoint deletion cancelled by user")
except Exception as e:
    logger.error(f"Error deleting endpoint: {str(e)}")
    print(f"Error deleting endpoint: {str(e)}")

In [None]:
# Display notebook execution summary
logger.info("Notebook execution completed")
print("\nNotebook Execution Summary:")
print(f"  Endpoint Name: {online_endpoint_name}")
print(f"  Model Type: MLflow model (sklearn-diabetes)")
print(f"  Deployment Name: blue")
print("\nResources have been cleaned up.")