# Deploy MLflow model to online endpoints with a custom environment and scoring script

Learn how to deploy your [MLflow](https://www.mlflow.org/) model to an [online endpoint](https://docs.microsoft.com/azure/machine-learning/concept-endpoints) using a custom environment and scoring script.

### Requirements - In order to benefit from this tutorial, you will need:
- This sample notebook assumes you're using online endpoints; for more information, see [What are Azure Machine Learning endpoints?](https://docs.microsoft.com/azure/machine-learning/concept-endpoints).
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../jobs/configuration.ipynb)
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../README.md) - check the getting started section

# 1. Connect to Azure Machine Learning Workspace
The [workspace](https://docs.microsoft.com/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [None]:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
)
from azure.identity import DefaultAzureCredential
from azure.ai.ml.constants import AssetTypes

# Additional imports for logging
import logging
import os
import json
import time
import sys
from IPython.display import display, JSON

In [None]:
# Set up basic logging configuration
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s | %(levelname)s | %(message)s',
    handlers=[logging.StreamHandler(sys.stdout)]
)
logger = logging.getLogger('mlflow-deploy')

# Check if running in CI environment
is_ci = os.environ.get('CI', 'false').lower() == 'true'

# Simple wrapper to log section headers
def log_section(title):
    logger.info(f"\n{'=' * 30}\n{title}\n{'=' * 30}")

logger.info("Starting MLflow model deployment notebook")

## 1.2 Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../jobs/configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [None]:
# enter details of your AML workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

# Use environment variables in CI environments
if os.environ.get('CI', 'false').lower() == 'true':
    subscription_id = os.environ.get("AZURE_SUBSCRIPTION_ID", subscription_id)
    resource_group = os.environ.get("AZURE_RESOURCE_GROUP", resource_group)
    workspace = os.environ.get("AZURE_ML_WORKSPACE", workspace)
    logger.info("Using environment variables for workspace configuration")

logger.info(f"Workspace configuration: {workspace} in {resource_group}")

In [None]:
# get a handle to the workspace
try:
    log_section("Connecting to Azure ML Workspace")
    ml_client = MLClient(
        DefaultAzureCredential(), subscription_id, resource_group, workspace
    )
    logger.info(f"Successfully connected to {ml_client.workspace_name} in {ml_client.location}")
    
    # Print workspace information for user
    print(f"Workspace: {ml_client.workspace_name}")
    print(f"Resource Group: {ml_client.resource_group_name}")
    print(f"Location: {ml_client.location}")
except Exception as e:
    logger.error(f"Failed to connect to workspace: {str(e)}")
    if is_ci:
        print(f"::error::Workspace connection failed: {str(e)}")
    raise

Or if you are working in a compute instance in Azure Machine Learning:

In [None]:
# If working in a compute instance in Azure Machine Learning
try:
    if 'AZUREML_SERVICE_ENDPOINT' in os.environ:
        logger.info("Using compute instance connection method")
        ml_client = MLClient.from_config(DefaultAzureCredential())
        logger.info(f"Connected to workspace: {ml_client.workspace_name}")
except Exception as e:
    logger.warning(f"Note: Compute instance connection attempt failed: {str(e)}")

# 2. Create Online Endpoint

Online endpoints are endpoints that are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.

To create an online endpoint we will use `ManagedOnlineEndpoint`. This class allows user to configure the following key aspects:

- `name` - Name of the endpoint. Needs to be unique at the Azure region level
- `auth_mode` - The authentication method for the endpoint. Key-based authentication and Azure ML token-based authentication are supported. Key-based authentication doesn't expire but Azure ML token-based authentication does. Possible values are `key` or `aml_token`.
- `identity`- The managed identity configuration for accessing Azure resources for endpoint provisioning and inference.
    - `type`- The type of managed identity. Azure Machine Learning supports `system_assigned` or `user_assigned identity`.
    - `user_assigned_identities` - List (array) of fully qualified resource IDs of the user-assigned identities. This property is required is `identity.type` is user_assigned.
- `description`- Description of the endpoint.

## 2.1 Configure the endpoint

In [None]:
import random
import string
import datetime

log_section("Creating Online Endpoint")

# Create unique name with timestamp and random suffix
timestamp = datetime.datetime.now().strftime("%m%d%H%M")
suffix = ''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(5))
endpoint_name = f"diabetes-{timestamp}-{suffix}"

# Add CI prefix if running in CI environment
if is_ci:
    endpoint_name = f"ci-{endpoint_name}"

logger.info(f"Endpoint name: {endpoint_name}")

In [None]:
# create an online endpoint
try:
    logger.info("Configuring endpoint")
    
    # Add CI-specific tags if in CI environment
    tags = {"foo": "bar"} 
    if is_ci:
        tags.update({
            "ci_run": "true",
            "run_id": os.environ.get("GITHUB_RUN_ID", "unknown")
        })
    
    endpoint = ManagedOnlineEndpoint(
        name=endpoint_name,
        description="Online endpoint for diabetes prediction using MLflow model",
        auth_mode="key",
        tags=tags,
    )
    
    logger.info("Endpoint configuration completed")
    print(f"Endpoint: {endpoint.name}")
    print(f"Auth Mode: {endpoint.auth_mode}")
except Exception as e:
    logger.error(f"Error configuring endpoint: {str(e)}")
    if is_ci:
        print(f"::error::Endpoint configuration failed: {str(e)}")
    raise

## 2.2 Create the endpoint
Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
# create the endpoint in the workspace
try:
    logger.info("Creating endpoint - this may take several minutes...")
    start_time = time.time()
    
    # Create the endpoint
    result = ml_client.online_endpoints.begin_create_or_update(endpoint).result()
    
    # Log completion info
    duration = time.time() - start_time
    logger.info(f"Endpoint created successfully in {duration:.1f} seconds")
    logger.info(f"Scoring URI: {result.scoring_uri}")
    
    # Display details for user
    print(f"\nEndpoint created:")
    print(f"  Name: {result.name}")
    print(f"  Scoring URI: {result.scoring_uri}")
    print(f"  State: {result.provisioning_state}")
except Exception as e:
    logger.error(f"Failed to create endpoint: {str(e)}")
    if is_ci:
        print(f"::error::Endpoint creation failed: {str(e)}")
    raise

## 3. Create a blue deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `ManagedOnlineDeployment` class. This class allows user to configure the following key aspects.

- `name` - Name of the deployment.
- `endpoint_name` - Name of the endpoint to create the deployment under.
- `model` - The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.
- `environment` - The environment where the model will run.
- `code_configuration` - The scoring script used to serve the model.
- `instance_type` - The VM size to use for the deployment. For the list of supported sizes, see [Managed online endpoints SKU list](https://docs.microsoft.com/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list).
- `instance_count` - The number of instances to use for the deployment

## 3.1 Configure the deployment

Registering the model:

In [None]:
# Register the model
try:
    log_section("Registering Model")
    logger.info("Configuring model")
    
    # Use consistent naming for the model
    model_name = "sklearn-diabetes"
    model_path = "sklearn-diabetes/model"
    
    # Log model information 
    logger.info(f"Model path: {model_path}")
    
    # Verify model path exists
    if not os.path.exists(model_path):
        logger.error(f"Model path not found: {model_path}")
        raise FileNotFoundError(f"Model path not found: {model_path}")
    
    model = Model(
        path=model_path,
        type=AssetTypes.MLFLOW_MODEL,
        description="MLflow model for diabetes prediction",
    )
    logger.info(f"Model configured: {model.path} (Type: {model.type})")
except Exception as e:
    logger.error(f"Error configuring model: {str(e)}")
    if is_ci:
        print(f"::error::Model configuration failed: {str(e)}")
    raise

Creating an environment to perform inference:

In [None]:
# Create custom environment for model deployment
try:
    log_section("Creating Environment")
    logger.info("Configuring environment")
    
    conda_file = "sklearn-diabetes/environment/conda.yaml"
    
    # Verify conda file exists
    if not os.path.exists(conda_file):
        logger.error(f"Conda file not found: {conda_file}")
        raise FileNotFoundError(f"Conda file not found: {conda_file}")
    
    environment = Environment(
        conda_file=conda_file,
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
    )
    
    logger.info(f"Environment configured with conda file: {conda_file}")
except Exception as e:
    logger.error(f"Error configuring environment: {str(e)}")
    if is_ci:
        print(f"::error::Environment configuration failed: {str(e)}")
    raise

Creating the deployment:

In [None]:
# Configure the deployment
try:
    log_section("Configuring Deployment")
    logger.info("Setting up deployment configuration")
    
    # Validate code path and scoring script
    code_path = "sklearn-diabetes/src"
    scoring_script = "score.py"
    full_script_path = os.path.join(code_path, scoring_script)
    
    if not os.path.exists(code_path):
        logger.error(f"Code directory not found: {code_path}")
        raise FileNotFoundError(f"Code directory not found: {code_path}")
        
    if not os.path.exists(full_script_path):
        logger.error(f"Scoring script not found: {full_script_path}")
        raise FileNotFoundError(f"Scoring script not found: {full_script_path}")
    
    logger.info(f"Using scoring script: {full_script_path}")
    
    # Create deployment config
    blue_deployment = ManagedOnlineDeployment(
        name="blue",
        endpoint_name=endpoint_name,
        model=model,
        environment=environment,
        code_configuration=CodeConfiguration(
            code=code_path, scoring_script=scoring_script
        ),
        instance_type="Standard_F4s_v2",
        instance_count=1,
    )
    
    logger.info(f"Configured deployment '{blue_deployment.name}' with {blue_deployment.instance_count}x {blue_deployment.instance_type}")
except Exception as e:
    logger.error(f"Error configuring deployment: {str(e)}")
    if is_ci:
        print(f"::error::Deployment configuration failed: {str(e)}")
    raise

## 3.2 Create the deployment

Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
# create the deployment 
try:
    log_section("Creating Deployment")
    logger.info("Starting deployment - this may take 5-10 minutes...")
    print("Creating deployment (this will take several minutes)...")
    
    # Track deployment time
    start_time = time.time()
    
    # Create the deployment
    result = ml_client.online_deployments.begin_create_or_update(blue_deployment).result()
    
    # Log completion information
    duration = time.time() - start_time
    logger.info(f"Deployment completed in {duration/60:.1f} minutes")
    
    # Display deployment details
    print(f"\nDeployment created:")
    print(f"  Name: {result.name}")
    print(f"  State: {result.provisioning_state}")
    print(f"  Model: {result.model.name}")
except Exception as e:
    logger.error(f"Deployment failed: {str(e)}")
    if is_ci:
        print(f"::error::Deployment creation failed: {str(e)}")
    print(f"Error: {str(e)}")
    raise

In [None]:
# Update endpoint traffic to use the blue deployment
try:
    log_section("Updating Traffic Allocation")
    logger.info("Setting traffic allocation to blue deployment")
    
    # Update the traffic allocation
    endpoint.traffic = {"blue": 100}
    ml_client.begin_create_or_update(endpoint).result()
    
    logger.info("Traffic successfully allocated to blue deployment")
    print("All traffic is now directed to the 'blue' deployment")
except Exception as e:
    logger.error(f"Failed to update traffic allocation: {str(e)}")
    if is_ci:
        print(f"::error::Traffic allocation failed: {str(e)}")
    raise

# 4. Test the deployment

Using the `MLClient` created earlier, we will get a handle to the endpoint. The endpoint can be invoked using the invoke command with the following parameters:

- `endpoint_name` - Name of the endpoint
- `request_file` - File with request data
- `deployment_name` - Name of the specific deployment to test in an endpoint

We will send a sample request using a [sample-request-sklearn-custom.json](sample-request-sklearn-custom.json) file.

In [None]:
# test the blue deployment
try:
    log_section("Testing Deployment")
    logger.info("Sending test request to the endpoint")
    
    # Check if request file exists
    request_file = "sample-request-sklearn.json"
    if not os.path.exists(request_file):
        logger.error(f"Request file not found: {request_file}")
        raise FileNotFoundError(f"Request file not found: {request_file}")
    
    # Show request data for reference
    with open(request_file, "r") as f:
        request_data = json.load(f)
        print("Request data:")
        display(JSON(request_data))
    
    # Invoke the endpoint
    start_time = time.time()
    response = ml_client.online_endpoints.invoke(
        endpoint_name=endpoint_name,
        deployment_name="blue",
        request_file=request_file,
    )
    duration = time.time() - start_time
    
    # Log and display the response
    logger.info(f"Request completed in {duration:.2f} seconds")
    print("\nResponse:")
    
    # Try to parse and display as JSON
    try:
        response_json = json.loads(response)
        display(JSON(response_json))
        logger.info("Response received successfully")
    except json.JSONDecodeError:
        # Fall back to plain text display
        print(response)
        logger.info("Response is not JSON format")
except Exception as e:
    logger.error(f"Error testing endpoint: {str(e)}")
    if is_ci:
        print(f"::error::Endpoint testing failed: {str(e)}")
    raise

# 5. Get endpoint details

In [None]:
# Get the details for online endpoint
try:
    log_section("Getting Endpoint Details")
    logger.info(f"Retrieving endpoint details for {endpoint_name}")
    
    # Get endpoint details
    endpoint = ml_client.online_endpoints.get(name=endpoint_name)
    
    # Log key information
    logger.info(f"Endpoint state: {endpoint.provisioning_state}")
    logger.info(f"Scoring URI: {endpoint.scoring_uri}")
    
    # Display details for user
    print(f"Endpoint: {endpoint.name}")
    print(f"Scoring URI: {endpoint.scoring_uri}")
    print(f"Auth mode: {endpoint.auth_mode}")
    
    # Get and display deployment details
    deployment = ml_client.online_deployments.get(name="blue", endpoint_name=endpoint_name)
    print(f"\nDeployment: {deployment.name}")
    print(f"Model: {deployment.model.name}")
    print(f"VM Size: {deployment.instance_type}")
    print(f"Instance count: {deployment.instance_count}")
    
    # Output variables for CI/CD pipelines
    if is_ci:
        print(f"::set-output name=endpoint_name::{endpoint.name}")
        print(f"::set-output name=scoring_uri::{endpoint.scoring_uri}")
except Exception as e:
    logger.error(f"Error retrieving endpoint details: {str(e)}")
    if is_ci:
        print(f"::error::Getting endpoint details failed: {str(e)}")
    raise

# 6. Delete the deployment and endopoint

In [None]:
# Delete the endpoint
try:
    log_section("Cleaning Up Resources")
    
    # Determine if deletion should happen
    should_delete = False
    if is_ci:
        logger.info("CI environment detected - will clean up automatically")
        should_delete = True
    else:
        # Ask for confirmation in interactive mode
        confirm = input(f"Do you want to delete the endpoint '{endpoint_name}'? (yes/no): ")
        should_delete = confirm.lower() in ['y', 'yes']
        if not should_delete:
            logger.info("Deletion cancelled by user")
            print(f"⚠️  Remember to delete the endpoint '{endpoint_name}' when no longer needed")
    
    # Perform deletion if confirmed
    if should_delete:
        logger.info(f"Deleting endpoint {endpoint_name}")
        print("Deleting endpoint...")
        
        start_time = time.time()
        ml_client.online_endpoints.begin_delete(name=endpoint_name).result()
        duration = time.time() - start_time
        
        logger.info(f"Endpoint deleted in {duration:.1f} seconds")
        print(f"Endpoint '{endpoint_name}' deleted successfully")
        
        # Set output for CI pipelines
        if is_ci:
            print("::set-output name=cleanup_status::success")
except Exception as e:
    logger.error(f"Error deleting endpoint: {str(e)}")
    if is_ci:
        print(f"::error::Endpoint deletion failed: {str(e)}")
        print("::set-output name=cleanup_status::failed")
    raise

In [None]:
# Summary of notebook execution
log_section("Notebook Summary")
logger.info("Notebook execution completed")

print("\nNotebook Execution Summary:")
print(f"  Endpoint: {endpoint_name}")
print(f"  Deployment: blue (Standard_F4s_v2)")
print(f"  Model: MLflow diabetes model")

# Verify resources were cleaned up
try:
    # Try to get the endpoint - this should fail if it was deleted
    ml_client.online_endpoints.get(name=endpoint_name)
    print(f"\n⚠️  Warning: Endpoint '{endpoint_name}' still exists and may incur costs")
except:
    print("\n✅ All resources have been cleaned up")