# Internal notes
- descriptions should be refined a lot
- not tested for work

# Open Questions
- Do we include two deployments (blue/green)?
- Do we check the status of the deployment after testing (sending a sample data)
    - described in [Azure ML Tutorial Work Sheet_MIR.docx](https://microsoft-my.sharepoint.com/:w:/p/saoh/EaMP222FLapPjqcMot-zWCwBBUy0RyCXGDOolxsMmhSwAg?e=zfOqvf&ovuser=72f988bf-86f1-41af-91ab-2d7cd011db47%2Cshnagata%40microsoft.com&clickparams=eyJBcHBOYW1lIjoiVGVhbXMtRGVza3RvcCIsIkFwcFZlcnNpb24iOiIyNy8yMjExMjkxNzQwMCIsIkhhc0ZlZGVyYXRlZFVzZXIiOmZhbHNlfQ%3D%3D)
    - I think check the status before testing (right after the deploying) is better (shohei)

# Deploy the model as an online endpoint

Now deploy your machine learning model as a web service in the Azure cloud, an [`online endpoint`](https://docs.microsoft.com/azure/machine-learning/concept-endpoints).

To deploy a machine learning service, you usually need:

* The model assets (file, metadata) that you want to deploy. You've already registered these assets in your training job.
* Some code to run as a service. The code executes the model on a given input request. This entry script receives data submitted to a deployed web service and passes it to the model, then returns the model's response to the client. The script is specific to your model. The entry script must understand the data that the model expects and returns. With an MLFlow model, as in this tutorial, this script is automatically created for you. Samples of scoring scripts can be found [here](https://github.com/Azure/azureml-examples/tree/sdk-preview/sdk/endpoints/online).

## Prerequisites
If you already completed tutorial 5, you have everything you need.

If you’re starting from here, you need to do the following: 

- Create a workspace, create compute resource,  
- Configure the environment: Dev environment would be set up in article 2. Skip if you have already done it. If you haven’t done it, go read tutorial 2. We probably want to make it explicit what additional environment configuration is needed for inferencing. 
- We’ll provide the model files ~~(they’ll register later)~~. 
- Make sure to have enough quota for the compute resources (VM SKUs) 

## Connect to the workspace (configure auth)
Before you dive in the code, you'll need to connect to your Azure ML workspace. The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning.

We're using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` is used to handle most Azure SDK authentication scenarios. 

Reference for more available credentials if it doesn't work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
# Handle to the workspace
from azure.ai.ml import MLClient

# Authentication package
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()

# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id="f66b853e-91bc-4852-9a4e-506ac873520a",
    resource_group_name="rg-azureml",
    workspace_name="ml-lab",
)

## Import required AzureML libraries for model deployment

In [None]:
# import libraries
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
)

## Check what you need
Again, these resources are needed to be specifed in the definition of Online deployment. 

- Model: 
    - Registered
- Environment (specific to model):
    - conda.yaml and docker image (you don’t need to install docker; you need a link to the docker image)
    - Automatically generated for MLflow models. You don't need to specify this time.
- Compute resource
    - VM SkU is to be specified 
- Scoring script
    - The scoring script which takes input requests and returns the scored results.
    - Automatically generated for MLflow models. You don't need to specify this time.

## Check if the model is registerd
If you already completed tutorial 5, your model was registered in the training script.
We recommend registering model as a best practice.

You can check the **Models** page on Azure ML studio, to identify the latest version of your registered model. Alternatively, the code below will retrieve the latest version number for you to use.

In [None]:
registered_model_name = "credit_defaults_model"

# Let's pick the latest version of the model
latest_model_version = max(
    [int(m.version) for m in ml_client.models.list(name=registered_model_name)]
)

## Create a new online endpoint

Now that you have a registered model and an inference script (auto-generated this time), it's time to create your online endpoint. The endpoint name needs to be unique in the entire Azure region. For this tutorial, you'll create a unique name using [`UUID`](https://en.wikipedia.org/wiki/Universally_unique_identifier#:~:text=A%20universally%20unique%20identifier%20(UUID,%2C%20for%20practical%20purposes%2C%20unique.).

> [!NOTE]
> Expect the endpoint creation to take approximately 6 to 8 minutes.

In [None]:
import uuid

# Creating a unique name for the endpoint
online_endpoint_name = "credit-endpoint-" + str(uuid.uuid4())[:8]

In [None]:
# define an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="this is an online endpoint",
    auth_mode="key",
    tags={
        "training_dataset": "credit_defaults",
        "model_type": "sklearn.GradientBoostingClassifier",
    },
)

# create the online endpoint
endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result()

print(f"Endpoint {endpoint.name} provisioning state: {endpoint.provisioning_state}")

Once you've created an endpoint, you can retrieve it as below:

In [None]:
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)

print(
    f'Endpoint "{endpoint.name}" with provisioning state "{endpoint.provisioning_state}" is retrieved'
)

## Deploy the model to the endpoint

Once the endpoint is created, deploy the model with the entry script. Each endpoint can have multiple deployments. Direct traffic to these deployments can be specified using rules. Here you'll create a single deployment that handles 100% of the incoming traffic. We have chosen a color name for the deployment, for example, *blue*, *green*, *red* deployments, which is arbitrary.

In [None]:
# picking the model to deploy. Here we use the latest version of our registered model
model = ml_client.models.get(name=registered_model_name, version=latest_model_version)


# define an online deployment
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name=online_endpoint_name,
    model=model,
    instance_type="Standard_DS3_v2",
    instance_count=1,
)

# create the online deployment
blue_deployment = ml_client.begin_create_or_update(blue_deployment).result()

## Check the status of the endpoint
You can check the status to see whether the model was deployed without error:

In [None]:
endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)
print(f"Name: {endpoint.name}\nStatus: {endpoint.provisioning_state}\nDescription: {endpoint.description}")

## Test with a sample query (Manually send test data)

Now that the model is deployed to the endpoint, you can run inference with it.

Create a sample request file following the design expected in the run method in the score script.

In [None]:
# Create a directory to store a sample request file
import os
deploy_dir = "./deploy"
os.makedirs(deploy_dir, exist_ok=True)

Create a sample request file

In [None]:
%%writefile {deploy_dir}/sample-request.json
{
  "input_data": {
    "columns": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
    "index": [0, 1],
    "data": [
            [20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],
            [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8]
        ]
  }
}

In [None]:
# test the blue deployment with some sample data
ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    request_file="./deploy/sample-request.json",
    deployment_name="blue",
)

## Get logs of the deployment
Check the logs to see whether the endpoint/deployment were invoked successfuly
If you face errors, see [Troubleshooting online endpoints deployment](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-online-endpoints?tabs=cli).

In [None]:
logs = ml_client.online_deployments.get_logs(name="blue", endpoint_name=online_endpoint_name, lines=50)
print(logs)

## Create a new deployment

## (scale)
Show manual, mention (link) auto-scale. Do this for model #2 only. 

## (split traffic)
Split production traffic between deployments

## Delete the old deployment