Skip to content

Latest commit

 

History

History
644 lines (483 loc) · 30.5 KB

how-to-manage-synapse-spark-pool.md

File metadata and controls

644 lines (483 loc) · 30.5 KB
title titleSuffix description author ms.author ms.reviewer ms.service ms.subservice ms.topic ms.date ms.custom
Attach and manage a Synapse Spark pool in Azure Machine Learning
Azure Machine Learning
Learn how to attach and manage Spark pools with Azure Synapse.
ynpandey
yogipandey
franksolomon
machine-learning
mldata
how-to
04/12/2024
template-how-to, devx-track-azurecli

Attach and manage a Synapse Spark pool in Azure Machine Learning

[!INCLUDE dev v2]

In this article, you'll learn how to attach a Synapse Spark Pool in Azure Machine Learning. You can attach a Synapse Spark Pool in Azure Machine Learning in one of these ways:

  • Using Azure Machine Learning studio UI
  • Using Azure Machine Learning CLI
  • Using Azure Machine Learning Python SDK

Prerequisites

[!INCLUDE cli v2]

[!INCLUDE sdk v2]


Attach a Synapse Spark pool in Azure Machine Learning

Azure Machine Learning offers different ways to attach and manage a Synapse Spark pool.

To attach a Synapse Spark Pool with the Studio Compute tab:

:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png" alt-text="Screenshot showing creation of a new Synapse Spark Pool." lightbox= "media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png":::

  1. In the Manage section of the left pane, select Compute.
  2. Select Attached computes.
  3. On the Attached computes screen, select New, to see the options for attaching different types of computes.
  4. Select Synapse Spark pool.

The Attach Synapse Spark pool panel opens on the right side of the screen. In this panel:

  1. Enter a Name, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning resource.

  2. Select an Azure Subscription from the dropdown menu.

  3. Select a Synapse workspace from the dropdown menu.

  4. Select a Spark Pool from the dropdown menu.

  5. Toggle the Assign a managed identity option, to enable it.

  6. Select a managed Identity type to use with this attached Synapse Spark Pool.

  7. Select Update, to complete the Synapse Spark Pool attach process.

[!INCLUDE cli v2]

With the Azure Machine Learning CLI, we can use intuitive YAML syntax and commands from the command line interface, to attach and manage a Synapse Spark pool.

To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:

  • name – name of the attached Synapse Spark pool.

  • type – set this property to synapsespark.

  • resource_id – this property should provide the resource ID value of the Synapse Spark pool created in the Azure Synapse Analytics workspace. The Azure resource ID includes

    • Azure Subscription ID,

    • resource Group Name,

    • Azure Synapse Analytics Workspace Name, and

    • name of the Synapse Spark Pool.

      name: <ATTACHED_SPARK_POOL_NAME>
      
      type: synapsespark
      
      resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>
  • identity – this property defines the identity type to assign to the attached Synapse Spark pool. It can take one of these values:

    • system_assigned

    • user_assigned

      name: <ATTACHED_SPARK_POOL_NAME>
      
      type: synapsespark
      
      resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>
      
      identity:
      type: system_assigned
  • For the identity type user_assigned, you should also provide a list of user_assigned_identities values. Each user-assigned identity should be declared as an element of the list, by using the resource_id value of the user-assigned identity. The first user-assigned identity in the list is used to submit a job by default.

    name: <ATTACHED_SPARK_POOL_NAME>
    
    type: synapsespark
    
    resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>
    
    identity:
      type: user_assigned
      user_assigned_identities:
        - resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>

The YAML files above can be used in the az ml compute attach command as the --file parameter. A Synapse Spark pool can be attached to an Azure Machine Learning workspace, in a specified resource group of a subscription, with the az ml compute attach command as shown here:

az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please visit https://aka.ms/azuremlexperimental for more information.

{
    "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
    },
    "created_on": "2022-09-13 19:01:05.109840+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
}

If the attached Synapse Spark pool, with the name specified in the YAML specification file, already exists in the workspace, then az ml compute attach command execution updates the existing pool with the information provided in the YAML specification file. You can update the

  • identity type
  • user assigned identities
  • tags

values through YAML specification file.

To display details of an attached Synapse Spark pool, execute the az ml compute show command. Pass the name of the attached Synapse Spark pool with the --name parameter, as shown:

az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

<ATTACHED_SPARK_POOL_NAME>
{
    "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
    },
    "created_on": "2022-09-13 19:01:05.109840+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
}

To see a list of all computes, including the attached Synapse Spark pools in a workspace, use the az ml compute list command. Use the name parameter to pass the name of the workspace, as shown:

az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

[
    {
    "auto_pause_settings": {
        "auto_pause_enabled": true,
        "delay_in_minutes": 15
    },
    "created_on": "2022-09-09 21:28:54.871251+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "identity": {
        "principal_id": "<PRINCIPAL_ID>",
        "tenant_id": "<TENANT_ID>",
        "type": "system_assigned"
    },
    "location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
        "auto_scale_enabled": false,
        "max_node_count": 0,
        "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
    },
    ...
]

[!INCLUDE sdk v2]

Azure Machine Learning Python SDK provides convenient functions for attaching and managing Synapse Spark pool, using Python code in Azure Machine Learning Notebooks.

To attach a Synapse Compute using Python SDK, first create an instance of azure.ai.ml.MLClient class. This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses azure.identity.DefaultAzureCredential to connect to a workspace in the resource group of a specified Azure subscription. In the following code sample, define the SynapseSparkCompute with these parameters:

  • name - user-defined name of the new attached Synapse Spark pool.
  • resource_id - resource ID of the Synapse Spark pool created earlier in the Azure Synapse Analytics workspace

An azure.ai.ml.MLClient.begin_create_or_update() function call attaches the defined Synapse Spark pool to the Azure Machine Learning workspace.

from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"

synapse_comp = SynapseSparkCompute(name=synapse_name, resource_id=synapse_resource)
ml_client.begin_create_or_update(synapse_comp)

To attach a Synapse Spark pool that uses system-assigned identity, pass IdentityConfiguration, with type set to SystemAssigned, as the identity parameter of the SynapseSparkCompute class. This code snippet attaches a Synapse Spark pool that uses system-assigned identity:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute, IdentityConfiguration
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(type="SystemAssigned")

synapse_comp = SynapseSparkCompute(
    name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)

A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the IdentityConfiguration class, as the identity parameter of the SynapseSparkCompute class. For the managed identity definition used in this way, set the type to UserAssigned. In addition, pass a user_assigned_identities parameter. The parameter user_assigned_identities is a list of objects of the UserAssignedIdentity class. The resource_id of the user-assigned identity populates each UserAssignedIdentity class object. This code snippet attaches a Synapse Spark pool that uses a user-assigned identity:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    SynapseSparkCompute,
    IdentityConfiguration,
    UserAssignedIdentity,
)
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(
    type="UserAssigned",
    user_assigned_identities=[
        UserAssignedIdentity(
            resource_id="/subscriptions/<SUBSCRIPTION_ID/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
        )
    ],
)

synapse_comp = SynapseSparkCompute(
    name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)

Note

The azure.ai.ml.MLClient.begin_create_or_update() function attaches a new Synapse Spark pool, if a pool with the specified name does not already exist in the workspace. However, if a Synapse Spark pool with that specified name is already attached to the workspace, a call to the azure.ai.ml.MLClient.begin_create_or_update() function will update the existing attached pool with the new identity or identities.


Add role assignments in Azure Synapse Analytics

To ensure that the attached Synapse Spark Pool works properly, assign the Administrator Role to it, from the Azure Synapse Analytics studio UI. These steps show how to do it:

  1. Open your Synapse Workspace in Azure portal.

  2. In the left pane, select Overview.

    :::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png" alt-text="Screenshot showing Open Synapse Studio." lightbox= "media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png":::

  3. Select Open Synapse Studio.

  4. In the Azure Synapse Analytics studio, select Manage in the left pane.

  5. Select Access Control in the Security section of the left pane, second from the left.

  6. Select Add.

  7. The Add role assignment panel will open on the right side of the screen. In this panel:

    1. Select Workspace item for Scope.

    2. In the Item type dropdown menu, select Apache Spark pool.

    3. In the Item dropdown menu, select your Apache Spark pool.

    4. In Role dropdown menu, select Synapse Administrator.

    5. In the Select user search box, start typing the name of your Azure Machine Learning Workspace. It shows you a list of attached Synapse Spark pools. Select your desired Synapse Spark pool from the list.

    6. Select Apply.

      :::image type="content" source="media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png" alt-text="Screenshot showing Add Role Assignment." lightbox= "media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png":::

Update the Synapse Spark Pool

You can manage the attached Synapse Spark pool from the Azure Machine Learning studio UI. Spark pool management functionality includes associated managed identity updates for an attached Synapse Spark pool. You can assign a system-assigned or a user-assigned identity while updating a Synapse Spark pool. You should create a user-assigned managed identity in Azure portal, before you assign it to a Synapse Spark pool.

To update managed identity for the attached Synapse Spark pool:

:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_update_managed_identity.png" alt-text="Screenshot showing Synapse Spark Pool managed identity update." lightbox= "media/how-to-manage-synapse-spark-pool/synapse_compute_update_managed_identity.png":::

  1. Open the Details page for the Synapse Spark pool in the Azure Machine Learning studio.

  2. Find the edit icon, located on the right side of the Managed identity section.

  3. To assign a managed identity for the first time, toggle Assign a managed identity to enable it.

  4. To assign a system-assigned managed identity:

    1. Select System-assigned as the Identity type.
    2. Select Update.
  5. To assign a user-assigned managed identity:

    1. Select User-assigned as the Identity type.
    2. Select an Azure Subscription from the dropdown menu.
    3. Type the first few letters of the name of user-assigned managed identity in the box that shows the text Search by name. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
    4. Select Update.

[!INCLUDE cli v2] To update the identity associated with an attached Synapse Spark pool, execute the az ml compute update command with appropriate parameters. To assign a system-assigned identity, set the --identity parameter in the command to SystemAssigned, as shown:

az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>

This sample shows the expected output of the above command:

Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
{
    "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
    },
    "created_on": "2022-09-13 20:02:15.746490+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "identity": {
    "principal_id": "<PRINCIPAL_ID>",
    "tenant_id": "<TENANT_ID>",
    "type": "system_assigned"
    },
    "location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<AML_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
}

To assign a user-assigned identity, set the parameter --identity in the command to UserAssigned. Additionally, you should use the --user-assigned-identities parameter to pass the resource ID for the user-assigned identity, as shown:

az ml compute update --identity UserAssigned --user-assigned-identities /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>

This sample shows the expected output of the above command:

Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
{
  "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
  },
  "created_on": "2022-09-13 20:02:15.746490+00:00",
  "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
  "identity": {
    "type": "user_assigned",
    "user_assigned_identities": [
      {
        "client_id": "<CLIENT_ID>",
        "principal_id": "<PRINCIPAL_ID>",
        "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourcegroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
      }
    ]
  },
  "location": "eastus2",
  "name": "<ATTACHED_SPARK_POOL_NAME>",
  "node_count": 5,
  "node_family": "MemoryOptimized",
  "node_size": "Small",
  "provisioning_state": "Succeeded",
  "resourceGroup": "<RESOURCE_GROUP>",
  "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
  "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
  },
  "spark_version": "3.2",
  "type": "synapsespark"
}

Note

The parameter --user-assigned-identities can take a list of resource IDs and assign multiple user-defined identities to an attached Synapse Spark pool. The first user-assigned identity in the list will be used for submitting a job by default.

[!INCLUDE sdk v2] To use system-assigned identity, pass IdentityConfiguration, with type set to SystemAssigned, as the identity parameter of the SynapseSparkCompute class. This code snippet updates a Synapse Spark pool to use a system-assigned identity:

# import required libraries 
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute, IdentityConfiguration 
from azure.identity import DefaultAzureCredential
    
subscription_id = "<SUBSCRIPTION_ID>" 
resource_group_name = "<RESOURCE_GROUP>" 
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace 
) 

synapse_name = "<ATTACHED_SPARK_POOL_NAME>" 
synapse_resource ="/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>" 
synapse_identity = IdentityConfiguration(type="SystemAssigned") 

synapse_comp = SynapseSparkCompute(name=synapse_name, resource_id=synapse_resource,identity=synapse_identity) ml_client.begin_create_or_update(synapse_comp) 

A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the IdentityConfiguration class, as the identity parameter of the SynapseSparkCompute class. For the managed identity definition used in this way, set the type to UserAssigned. In addition, pass a user_assigned_identities parameter. The parameter user_assigned_identities is a list of objects of the UserAssignedIdentity class. The resource_idof the user-assigned identity populates each UserAssignedIdentity class object. This code snippet updates a Synapse Spark pool to use a user-assigned identity:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    SynapseSparkCompute,
    IdentityConfiguration,
    UserAssignedIdentity,
)
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(
    type="UserAssigned",
    user_assigned_identities=[
        UserAssignedIdentity(
            resource_id="/subscriptions/<SUBSCRIPTION_ID/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
        )
    ],
)

synapse_comp = SynapseSparkCompute(
    name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)

Note

If a pool with the specified name does not already exist in the workspace, the azure.ai.ml.MLClient.begin_create_or_update() function will attach a new Synapse Spark pool. However, if a Synapse Spark pool, with the specified name, is already attached to the workspace, an azure.ai.ml.MLClient.begin_create_or_update() function call will update the existing attached pool, with the new identity or identities.


Detach the Synapse Spark pool

We might want to detach an attached Synapse Spark pool, to clean up a workspace.


The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this, follow these steps:

  1. Open the Details page for the Synapse Spark pool, in the Azure Machine Learning studio.

  2. Select Detach, to detach the attached Synapse Spark pool.

[!INCLUDE cli v2]

An attached Synapse Spark pool can be detached by executing the az ml compute detach command with the name of the pool passed, using the --name parameter, as shown here:

az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

Are you sure you want to perform this operation? (y/n): y

[!INCLUDE sdk v2]

We'll use an MLClient.compute.begin_delete() function call. Pass the name of the attached Synapse Spark pool, along with the action Detach, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
ml_client.compute.begin_delete(name=synapse_name, action="Detach")

Serverless Spark compute in Azure Machine Learning

Some user scenarios might require access to a serverless Spark compute resource, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. Learn more about the serverless Spark compute experience.

Next steps