Skip to content

Files

Latest commit

 

History

History
218 lines (131 loc) · 10.7 KB

how-to-manage-compute-instance.md

File metadata and controls

218 lines (131 loc) · 10.7 KB
title titleSuffix description services ms.service ms.subservice ms.custom ms.topic ms.author author ms.reviewer ms.date
Manage a compute instance
Azure Machine Learning
Learn how to manage an Azure Machine Learning compute instance. Use as your development environment, or as compute target for dev/test purposes.
machine-learning
machine-learning
compute
devx-track-azurecli
how-to
sgilley
sdgilley
vijetaj
05/03/2024

Manage an Azure Machine Learning compute instance

[!INCLUDE dev v2]

Learn how to manage a compute instance in your Azure Machine Learning workspace.

Use a compute instance as your fully configured and managed development environment in the cloud. For development and testing, you can also use the instance as a training compute target. A compute instance can run multiple jobs in parallel and has a job queue. As a development environment, a compute instance can't be shared with other users in your workspace.

In this article, you learn how to start, stop, restart, delete a compute instance. To learn how to create a compute instance, see Create an Azure Machine Learning compute instance.

Note

This article shows CLI v2 in the sections below. If you are still using CLI v1, see Create an Azure Machine Learning compute cluster CLI v1.

Prerequisites

Select the appropriate tab for the rest of the prerequisites based on your preferred method of managing your compute instance.

  • If you're not running your code on a compute instance, install the Azure Machine Learning Python SDK. This SDK is already installed for you on a compute instance.

  • Attach to the workspace in your Python script:

    [!INCLUDE connect ws v2]

  • If you're not running these commands on a compute instance, install the Azure CLI extension for Machine Learning service (v2). This extension is already installed for you on a compute instance.

  • Authenticate and set the default workspace and resource group. Leave the terminal open to run the rest of the commands in this article.

    [!INCLUDE cli first steps]

Start at Azure Machine Learning studio.


Manage

Start, stop, restart, and delete a compute instance. A compute instance doesn't always automatically scale down, so make sure to stop the resource to prevent ongoing charges. Stopping a compute instance deallocates it. Then start it again when you need it. While stopping the compute instance stops the billing for compute hours, you'll still be billed for disk, public IP, and standard load balancer.

You can enable automatic shutdown to automatically stop the compute instance after a specified time.

You can also create a schedule for the compute instance to automatically start and stop based on a time and day of week.

Tip

The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before you stop or restart the compute instance. Please do not stop the compute instance by issuing sudo shutdown from the terminal. The temp disk size on compute instance depends on the VM size chosen and is mounted on /mnt.

[!INCLUDE sdk v2]

In these examples, the name of the compute instance is stored in the variable ci_basic_name.

  • Get status

    [!notebook-python]

  • Stop

    [!notebook-python]

  • Start

    [!notebook-python]

  • Restart

    [!notebook-python]

  • Delete

    [!notebook-python]

[!INCLUDE cli v2]

In these examples, the name of the compute instance is instance.

  • Stop

    az ml compute stop --name instance 
    
  • Start

    az ml compute start --name instance 
    
  • Restart

    az ml compute restart --name instance 
    
  • Delete

    az ml compute delete --name instance 
    

In your workspace in Azure Machine Learning studio, select Compute, then select compute instance on the top.

You can perform the following actions:

  • Create a new compute instance
  • Refresh the compute instances tab.
  • Start, stop, and restart a compute instance. You do pay for the instance whenever it's running. Stop the compute instance when you aren't using it to reduce cost. Stopping a compute instance deallocates it. Then start it again when you need it. You can also schedule a time for the compute instance to start and stop.
  • Delete a compute instance.
  • Filter the list of compute instances to show only ones you created.

For each compute instance in a workspace that you created (or that was created for you), you can:

  • Access Jupyter, JupyterLab, RStudio on the compute instance.

  • SSH into compute instance. SSH access is disabled by default but can be enabled at compute instance creation time. SSH access is through public/private key mechanism. The tab gives you details for SSH connection such as IP address, username, and port number. In a virtual network deployment, disabling SSH prevents SSH access from public internet. You can still SSH from within virtual network using private IP address of compute instance node and port 22.

    [!TIP] If the compute instances is in a managed virtual network and the public IP address is disabled, use the az ml compute connect-ssh command to connect to the compute instance.

  • Select the compute name to:

    • View details about a specific compute instance such as IP address, and region.
    • Create or modify the schedule for starting and stopping the compute instance. Scroll down to the bottom of the page to edit the schedule.

Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. A compute instance is dedicated to a single user who has root access. That user has access to Jupyter/JupyterLab/RStudio running on the instance. Compute instance has single-user sign-in and all actions use that user's identity for Azure RBAC and attribution of experiment jobs. SSH access is controlled through public/private key mechanism.

These actions can be controlled by Azure RBAC:

  • Microsoft.MachineLearningServices/workspaces/computes/read
  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/computes/delete
  • Microsoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action

To create a compute instance, you need permissions for the following actions:

  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action

Audit and observe compute instance version

Once a compute instance is deployed, it doesn't get automatically updated. Microsoft releases new VM images on a monthly basis. To understand options for keeping recent with the latest version, see vulnerability management.

To keep track of whether an instance's operating system version is current, you could query its version using the CLI, SDK, or Studio UI.

[!INCLUDE sdk v2]

from azure.ai.ml.entities import ComputeInstance, AmlCompute

# Display operating system version
instance = ml_client.compute.get("myci")
print instance.os_image_metadata

For more information on the classes, methods, and parameters used in this example, see the following reference documents:

[!INCLUDE cli v2]

az ml compute show --name "myci"

# query outdated compute instances:
az ml compute list --query "[?os_image_metadata.is_latest_os_image_version == ``false``].name"

In your workspace in Azure Machine Learning studio, select Compute, then select compute instance on the top. To see its properties including the current operating system, select a compute instance's compute name.


IT administrators can use Azure Policy to monitor the inventory of instances across workspaces in Azure Policy compliance portal. Assign the built-in policy Audit Azure Machine Learning Compute Instances with an outdated operating system on an Azure subscription or Azure management group scope.

Next steps