Skip to content

Azure/azureml-online-endpoint-profiler-preview

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Online Endpoints Model Profiler

Overview

Inferencing machine learning models is a time and compute intensive process. It is vital to quantify the performance of model inferencing to ensure that you make the best use of compute resources and reduce cost to reach the desired performance SLA (e.g. latency, throughput).

Online Endpoints Model Profiler (Preview) provides fully managed experience that makes it easy to benchmark your model performance served through Online Endpoints.

  • Use the benchmarking tool of your choice.

  • Easy to use CLI experience.

  • Support for CI/CD MLOps pipelines to automate profiling.

  • Thorough performance report containing latency percentiles and resource utilization metrics.

A brief introduction on benchmarking tools

The online endpoints model profiler currently supports 3 types of benchmarking tools: wrk, wrk2, and labench.

  • wrk: wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue. For detailed info please refer to this link: https://github.com/wg/wrk.

  • wrk2: wrk2 is wrk modifed to produce a constant throughput load, and accurate latency details to the high 9s (i.e. can produce accuracy 99.9999% if run long enough). In addition to wrk's arguments, wrk2 takes a throughput argument (in total requests per second) via either the --rate or -R parameters (default is 1000). For detailed info please refer to this link: https://github.com/giltene/wrk2.

  • labench: LaBench (for LAtency BENCHmark) is a tool that measures latency percentiles of HTTP GET or POST requests under very even and steady load. For detailed info please refer to this link: https://github.com/microsoft/LaBench.

Prerequisites

Get started

Please follow this example and get started with the model profiling experience.

Create an online endpoint

Follow the example in this tutorial to deploy a model using an online endpoint.

  • Replace the instance_type in deployment yaml file with your desired Azure VM SKU. VM SKUs vary in terms of computing power, price and availability in different Azure regions.

  • Tune request_settings.max_concurrent_requests_per_instance which defines the concurrent level. The higher this setting is, the higher throughput the endpoint gets. If this setting is set higher than the online endpoint can handle, the inference request may end up waiting in the queue and eventually results in longer end-to-end latency.

  • If you plan to profile using multiple instance_type and request_settings.max_concurrent_requests_per_instance, please create one online deployment for each pair. You can attach all online deployments under the same online endpoint.

Below is a sample yaml file defines an online deployment.

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
  path: ../../model-1/model/sklearn_regression_model.pkl
code_configuration:
  code: ../../model-1/onlinescoring/
  scoring_script: score.py
environment: 
  conda_file: ../../model-1/environment/conda.yml
  image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_F2s_v2
instance_count: 1
request_settings:
  request_timeout_ms: 3000
  max_concurrent_requests_per_instance: 1024

Create a compute to host the profiler

You will need a compute to host the profiler, send requests to the online endpoint and generate performance report.

  • This compute is NOT the same one that you used above to deploy your model. Please choose a compute SKU with proper network bandwidth (considering the inference request payload size and profiling traffic, we'd recommend Standard_F4s_v2) in the same region as the online endpoint.

    az ml compute create --name $PROFILER_COMPUTE_NAME --size $PROFILER_COMPUTE_SIZE --identity-type SystemAssigned --type amlcompute
  • Create proper role assignment for accessing online endpoint resources. The compute needs to have contributor role to the machine learning workspace. For more information, see Assign Azure roles using Azure CLI.

    compute_info=`az ml compute show --name $PROFILER_COMPUTE_NAME --query '{"id": id, "identity_object_id": identity.principal_id}' -o json`
    workspace_resource_id=`echo $compute_info | jq -r '.id' | sed 's/\(.*\)\/computes\/.*/\1/'`
    identity_object_id=`echo $compute_info | jq -r '.identity_object_id'`
    az role assignment create --role Contributor --assignee-object-id $identity_object_id --scope $workspace_resource_id
    if [[ $? -ne 0 ]]; then echo "Failed to create role assignment for compute $PROFILER_COMPUTE_NAME" && exit 1; fi

Create a profiling job

Understand a profiling job

A profiling job simulates how an online endpoint serves live requests. It produces a throughput load to the online endpoint and generates performance report.

Below is a template yaml file that defines a profiling job.

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: >
  python -m online_endpoints_model_profiler --payload_path ${{inputs.payload}}
experiment_name: profiling-job
display_name: <% SKU_CONNECTION_PAIR %>
environment:
  image: mcr.microsoft.com/azureml/online-endpoints-model-profiler:latest
environment_variables:
  ONLINE_ENDPOINT: "<% ENDPOINT_NAME %>"
  DEPLOYMENT: "<% DEPLOYMENT_NAME %>"
  PROFILING_TOOL: "<% PROFILING_TOOL %>"
  DURATION: "<% DURATION %>"
  CONNECTIONS: "<% CONNECTIONS %>"
  TARGET_RPS: "<% TARGET_RPS %>"
  CLIENTS: "<% CLIENTS %>"
  TIMEOUT: "<% TIMEOUT %>"
  THREAD: "<% THREAD %>"
compute: "azureml:<% COMPUTE_NAME %>"
inputs:
  payload:
    type: uri_file
    path: azureml://datastores/workspaceblobstore/paths/profiling_payloads/<% ENDPOINT_NAME %>_payload.txt
YAML syntax
Key Type Description Allowed values Default value
command string The command for running the profiling job. python -m online_endpoints_model_profiler ${{inputs.payload}} -
experiment_name string The experiment name of the profiling job. An experiment is a group of jobs. - -
display_name string The profiling job name. - A random string guid, such as willing_needle_wrzk3lt7j5
environment.image string An Azure Machine Learning curated image containing benchmarking tools and profiling scripts. mcr.microsoft.com/azureml/online-endpoints-model-profiler:latest -
environment_variables string Environment vairables for the profiling job. Profiling related environment variables

Benchmarking tool related environment variables
-
compute string The aml compute for running the profiling job. - -
inputs.payload string Payload file that is stored in an AML registered datastore. Example payload file content -
YAML profiling related environment_variables
Key Description Default Value
SUBSCRIPTION Used together with RESOURCE_GROUP, WORKSPACE, ONLINE_ENDPOINT, DEPLOYMENT to form the profiling target. Subscription of the profiling job
RESOURCE_GROUP Used together with SUBSCRIPTION, WORKSPACE, ONLINE_ENDPOINT, DEPLOYMENT to form the profiling target. Resource group of the profiling job
WORKSPACE Used together with SUBSCRIPTION, RESOURCE_GROUP, ONLINE_ENDPOINT, DEPLOYMENT to form the profiling target. AML workspace of the profiling job
ONLINE_ENDPOINT Used together with SUBSCRIPTION, RESOURCE_GROUP, WORKSPACE, DEPLOYMENT to form the profiling target.

If not provided, SCORING_URI will be used as the profiling target.

If neither OLINE_ENDPOINT/DEPLOYMENT nor SCORING_URI is provided, an error will be thrown.
-
DEPLOYMENT Used together with SUBSCRIPTION, RESOURCE_GROUP, WORKSPACE, ONLINE_ENDPOINT to form the profiling target.

If not provided, SCORING_URI will be used as the profiling target.

If neither OLINE_ENDPOINT/DEPLOYMENT nor SCORING_URI is provided, an error will be thrown.
-
IDENTITY_ACCESS_TOKEN An optional aad token for retrieving online endpoint scoring_uri, access_key, and resource usage metrics. This will not be necessary for the following scenario:
- The aml compute that is used to run the profiling job has contributor access to the workspace of the online endpoint.

Users should keep in mind that it's recommended to assign appropriate permissions to the aml compute rather than providing this aad token, since the aad token might be expired during the process of the profiling job.
-
SCORING_URI Users are optional to provide this env var as instead of the SUBSCRIPTION/RESOURCE_GROUP/WORKSPACE/ONLINE_ENDPOINT/DEPLOYMENT combination to define the profiling target. Although, missing ONLINE_ENDPOINT/DEPLOYMENT info will lead to missing resource usage metrics in the final report. -
SCORING_HEADERS Users may use this env var to provide any special headers necessary when invoking the profiling target.
{
    "Content-Type": "application/json",
    "Authorization": "Bearer ${ONLINE_ENDPOINT_ACCESS_KEY}",
    "azureml-model-deployment": "${DEPLOYMENT}"
}
PROFILING_TOOL The name of the benchmarking tool. Currently support: wrk, wrk2, labench wrk
PAYLOAD Users may use this param to provide a single string format payload data for invoking the profiling target. For example: {"data": [[1,2,3,4,5,6,7,8,9,10], [10,9,8,7,6,5,4,3,2,1]]}.

If inputs.payload is provided in the profiling job yaml file, this env var will be ignored.
-
YAML benchmarking tool related environment_variables
Key Description Default Value wrk wrk2 labench
DURATION Period of time for running the benchmarking tool. 300s ✔️ ✔️ ✔️
CONNECTIONS No. of connections for the benchmarking tool. The default value will be set to the value of max_concurrent_requests_per_instance 1 ✔️ ✔️
THREAD No. of threads allocated for the benchmarking tool. 1 ✔️ ✔️
TARGET_RPS Target requests per second for the benchmarking tool. 50 ✔️ ✔️
CLIENTS No. of clients for the benchmarking tool. The default value will be set to the value of max_concurrent_requests_per_instance 1 ✔️
TIMEOUT Timeout in seconds for each request. 10s ✔️

Create a profiling job with azure cli and ml extension

Update the profiling job yaml template with your own values and create a profiling job.

az ml job create -f ${PROFILING_JOB_YAML_FILE_PATH}

Read the performance report

  • Users may find profiling job info in the AML workspace studio, under "Experiments" tab. image

  • Users may also find job metrics within each individual job page, under "Metrics" tab. image

  • Users may also find job report file within each individual job page, under "Outputs + logs" tab, file "outputs/report.json". image

  • Users may also use the following cli to download all job output files.

    az ml job download --name $JOB_ID --download-path $JOB_LOCAL_PATH

Cleanup

Please use az ml online-endpoint delete to delete the test online endpoints and online deployment after completing profiling.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Contact us

For any questions, bugs and requests of new features, please contact us at miroptprof@microsoft.com

Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published