Online Endpoints Model Profiler

Overview

Inferencing machine learning models is a time and compute intensive process. It is vital to quantify the performance of model inferencing to ensure that you make the best use of compute resources and reduce cost to reach the desired performance SLA (e.g. latency, throughput).

Online Endpoints Model Profiler (Preview) provides fully managed experience that makes it easy to benchmark your model performance served through Online Endpoints.

Use the benchmarking tool of your choice.
Easy to use CLI experience.
Support for CI/CD MLOps pipelines to automate profiling.
Thorough performance report containing latency percentiles and resource utilization metrics.

A brief introduction on benchmarking tools

The online endpoints model profiler currently supports 3 types of benchmarking tools: wrk, wrk2, and labench.

wrk: wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue. For detailed info please refer to this link: https://github.com/wg/wrk.
wrk2: wrk2 is wrk modifed to produce a constant throughput load, and accurate latency details to the high 9s (i.e. can produce accuracy 99.9999% if run long enough). In addition to wrk's arguments, wrk2 takes a throughput argument (in total requests per second) via either the --rate or -R parameters (default is 1000). For detailed info please refer to this link: https://github.com/giltene/wrk2.
labench: LaBench (for LAtency BENCHmark) is a tool that measures latency percentiles of HTTP GET or POST requests under very even and steady load. For detailed info please refer to this link: https://github.com/microsoft/LaBench.

Prerequisites

Azure subscription. If you don't have an Azure subscription, sign up to try the free or paid version of Azure Machine Learning today.
Azure CLI and ML extension. For more information, see Install, set up, and use the CLI (v2) (preview).

Get started

Please follow this example and get started with the model profiling experience.

Create an online endpoint

Follow the example in this tutorial to deploy a model using an online endpoint.

Replace the instance_type in deployment yaml file with your desired Azure VM SKU. VM SKUs vary in terms of computing power, price and availability in different Azure regions.
Tune request_settings.max_concurrent_requests_per_instance which defines the concurrent level. The higher this setting is, the higher throughput the endpoint gets. If this setting is set higher than the online endpoint can handle, the inference request may end up waiting in the queue and eventually results in longer end-to-end latency.
If you plan to profile using multiple instance_type and request_settings.max_concurrent_requests_per_instance, please create one online deployment for each pair. You can attach all online deployments under the same online endpoint.

Below is a sample yaml file defines an online deployment.

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
  path: ../../model-1/model/sklearn_regression_model.pkl
code_configuration:
  code: ../../model-1/onlinescoring/
  scoring_script: score.py
environment: 
  conda_file: ../../model-1/environment/conda.yml
  image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_F2s_v2
instance_count: 1
request_settings:
  request_timeout_ms: 3000
  max_concurrent_requests_per_instance: 1024

Create a compute to host the profiler

You will need a compute to host the profiler, send requests to the online endpoint and generate performance report.

This compute is NOT the same one that you used above to deploy your model. Please choose a compute SKU with proper network bandwidth (considering the inference request payload size and profiling traffic, we'd recommend Standard_F4s_v2) in the same region as the online endpoint.
```
az ml compute create --name $PROFILER_COMPUTE_NAME --size $PROFILER_COMPUTE_SIZE --identity-type SystemAssigned --type amlcompute
```

Create proper role assignment for accessing online endpoint resources. The compute needs to have contributor role to the machine learning workspace. For more information, see Assign Azure roles using Azure CLI.

compute_info=`az ml compute show --name $PROFILER_COMPUTE_NAME --query '{"id": id, "identity_object_id": identity.principal_id}' -o json`
workspace_resource_id=`echo $compute_info | jq -r '.id' | sed 's/\(.*\)\/computes\/.*/\1/'`
identity_object_id=`echo $compute_info | jq -r '.identity_object_id'`
az role assignment create --role Contributor --assignee-object-id $identity_object_id --scope $workspace_resource_id
if [[ $? -ne 0 ]]; then echo "Failed to create role assignment for compute $PROFILER_COMPUTE_NAME" && exit 1; fi

Create a profiling job

Understand a profiling job

A profiling job simulates how an online endpoint serves live requests. It produces a throughput load to the online endpoint and generates performance report.

Below is a template yaml file that defines a profiling job.

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: >
  python -m online_endpoints_model_profiler --payload_path ${{inputs.payload}}
experiment_name: profiling-job
display_name: <% SKU_CONNECTION_PAIR %>
environment:
  image: mcr.microsoft.com/azureml/online-endpoints-model-profiler:latest
environment_variables:
  ONLINE_ENDPOINT: "<% ENDPOINT_NAME %>"
  DEPLOYMENT: "<% DEPLOYMENT_NAME %>"
  PROFILING_TOOL: "<% PROFILING_TOOL %>"
  DURATION: "<% DURATION %>"
  CONNECTIONS: "<% CONNECTIONS %>"
  TARGET_RPS: "<% TARGET_RPS %>"
  CLIENTS: "<% CLIENTS %>"
  TIMEOUT: "<% TIMEOUT %>"
  THREAD: "<% THREAD %>"
compute: "azureml:<% COMPUTE_NAME %>"
inputs:
  payload:
    type: uri_file
    path: azureml://datastores/workspaceblobstore/paths/profiling_payloads/<% ENDPOINT_NAME %>_payload.txt

YAML syntax

Key	Type	Description	Allowed values	Default value
`command`	string	The command for running the profiling job.	`python -m online_endpoints_model_profiler ${{inputs.payload}}`	-
`experiment_name`	string	The experiment name of the profiling job. An experiment is a group of jobs.	-	-
`display_name`	string	The profiling job name.	-	A random string guid, such as `willing_needle_wrzk3lt7j5`
`environment.image`	string	An Azure Machine Learning curated image containing benchmarking tools and profiling scripts.	mcr.microsoft.com/azureml/online-endpoints-model-profiler:latest	-
`environment_variables`	string	Environment vairables for the profiling job.	Profiling related environment variables Benchmarking tool related environment variables	-
`compute`	string	The aml compute for running the profiling job.	-	-
`inputs.payload`	string	Payload file that is stored in an AML registered datastore.	Example payload file content	-

YAML profiling related environment_variables

Key	Description	Default Value
`SUBSCRIPTION`	Used together with `RESOURCE_GROUP`, `WORKSPACE`, `ONLINE_ENDPOINT`, `DEPLOYMENT` to form the profiling target.	Subscription of the profiling job
`RESOURCE_GROUP`	Used together with `SUBSCRIPTION`, `WORKSPACE`, `ONLINE_ENDPOINT`, `DEPLOYMENT` to form the profiling target.	Resource group of the profiling job
`WORKSPACE`	Used together with `SUBSCRIPTION`, `RESOURCE_GROUP`, `ONLINE_ENDPOINT`, `DEPLOYMENT` to form the profiling target.	AML workspace of the profiling job
`ONLINE_ENDPOINT`	Used together with `SUBSCRIPTION`, `RESOURCE_GROUP`, `WORKSPACE`, `DEPLOYMENT` to form the profiling target. If not provided, `SCORING_URI` will be used as the profiling target. If neither `OLINE_ENDPOINT`/`DEPLOYMENT` nor `SCORING_URI` is provided, an error will be thrown.	-
`DEPLOYMENT`	Used together with `SUBSCRIPTION`, `RESOURCE_GROUP`, `WORKSPACE`, `ONLINE_ENDPOINT` to form the profiling target. If not provided, `SCORING_URI` will be used as the profiling target. If neither `OLINE_ENDPOINT`/`DEPLOYMENT` nor `SCORING_URI` is provided, an error will be thrown.	-
`IDENTITY_ACCESS_TOKEN`	An optional aad token for retrieving online endpoint scoring_uri, access_key, and resource usage metrics. This will not be necessary for the following scenario: - The aml compute that is used to run the profiling job has contributor access to the workspace of the online endpoint. Users should keep in mind that it's recommended to assign appropriate permissions to the aml compute rather than providing this aad token, since the aad token might be expired during the process of the profiling job.	-
`SCORING_URI`	Users are optional to provide this env var as instead of the `SUBSCRIPTION`/`RESOURCE_GROUP`/`WORKSPACE`/`ONLINE_ENDPOINT`/`DEPLOYMENT` combination to define the profiling target. Although, missing `ONLINE_ENDPOINT`/`DEPLOYMENT` info will lead to missing resource usage metrics in the final report.	-
`SCORING_HEADERS`	Users may use this env var to provide any special headers necessary when invoking the profiling target.	{ "Content-Type": "application/json", "Authorization": "Bearer ${ONLINE_ENDPOINT_ACCESS_KEY}", "azureml-model-deployment": "${DEPLOYMENT}" }
`PROFILING_TOOL`	The name of the benchmarking tool. Currently support: `wrk`, `wrk2`, `labench`	`wrk`
`PAYLOAD`	Users may use this param to provide a single string format payload data for invoking the profiling target. For example: `{"data": [[1,2,3,4,5,6,7,8,9,10], [10,9,8,7,6,5,4,3,2,1]]}`. If `inputs.payload` is provided in the profiling job yaml file, this env var will be ignored.	-

YAML benchmarking tool related environment_variables

Key	Description	Default Value	wrk	wrk2	labench
`DURATION`	Period of time for running the benchmarking tool.	`300s`	✔️	✔️	✔️
`CONNECTIONS`	No. of connections for the benchmarking tool. The default value will be set to the value of `max_concurrent_requests_per_instance`	`1`	✔️	✔️	❌
`THREAD`	No. of threads allocated for the benchmarking tool.	`1`	✔️	✔️	❌
`TARGET_RPS`	Target requests per second for the benchmarking tool.	`50`	❌	✔️	✔️
`CLIENTS`	No. of clients for the benchmarking tool. The default value will be set to the value of `max_concurrent_requests_per_instance`	`1`	❌	❌	✔️
`TIMEOUT`	Timeout in seconds for each request.	`10s`	❌	❌	✔️

Create a profiling job with azure cli and ml extension

Update the profiling job yaml template with your own values and create a profiling job.

az ml job create -f ${PROFILING_JOB_YAML_FILE_PATH}

Read the performance report

Users may find profiling job info in the AML workspace studio, under "Experiments" tab.
Users may also find job metrics within each individual job page, under "Metrics" tab.
Users may also find job report file within each individual job page, under "Outputs + logs" tab, file "outputs/report.json".
Users may also use the following cli to download all job output files.
```
az ml job download --name $JOB_ID --download-path $JOB_LOCAL_PATH
```

Cleanup

Please use az ml online-endpoint delete to delete the test online endpoints and online deployment after completing profiling.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Contact us

For any questions, bugs and requests of new features, please contact us at miroptprof@microsoft.com

Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
code		code
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Online Endpoints Model Profiler

Overview

A brief introduction on benchmarking tools

Prerequisites

Get started

Create an online endpoint

Create a compute to host the profiler

Create a profiling job

Understand a profiling job

YAML syntax

YAML profiling related environment_variables

YAML benchmarking tool related environment_variables

Create a profiling job with azure cli and ml extension

Read the performance report

Cleanup

Contributing

Contact us

About

Releases

Packages

Contributors 3

Languages

License

Azure/azureml-online-endpoint-profiler-preview

Folders and files

Latest commit

History

Repository files navigation

Online Endpoints Model Profiler

Overview

A brief introduction on benchmarking tools

Prerequisites

Get started

Create an online endpoint

Create a compute to host the profiler

Create a profiling job

Understand a profiling job

YAML syntax

YAML profiling related environment_variables

YAML benchmarking tool related environment_variables

Create a profiling job with azure cli and ml extension

Read the performance report

Cleanup

Contributing

Contact us

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages