title | titleSuffix | description | author | ms.author | ms.reviewer | ms.service | ms.subservice | ms.topic | ms.date | ms.custom |
---|---|---|---|---|---|---|---|---|---|---|
Attach and manage a Synapse Spark pool in Azure Machine Learning |
Azure Machine Learning |
Learn how to attach and manage Spark pools with Azure Synapse. |
ynpandey |
yogipandey |
franksolomon |
machine-learning |
mldata |
how-to |
04/12/2024 |
template-how-to, devx-track-azurecli |
[!INCLUDE dev v2]
In this article, you'll learn how to attach a Synapse Spark Pool in Azure Machine Learning. You can attach a Synapse Spark Pool in Azure Machine Learning in one of these ways:
- Using Azure Machine Learning studio UI
- Using Azure Machine Learning CLI
- Using Azure Machine Learning Python SDK
- An Azure subscription; if you don't have an Azure subscription, create a free account before you begin.
- An Azure Machine Learning workspace. See Create workspace resources.
- Create an Azure Synapse Analytics workspace in Azure portal.
- Create an Apache Spark pool using the Azure portal.
[!INCLUDE cli v2]
- An Azure subscription; if you don't have an Azure subscription, create a free account before you begin.
- An Azure Machine Learning workspace. See Create workspace resources.
- Create an Azure Synapse Analytics workspace in Azure portal.
- Create an Apache Spark pool using the Azure portal.
- Create an Azure Machine Learning compute instance.
- Install Azure Machine Learning CLI.
[!INCLUDE sdk v2]
- An Azure subscription; if you don't have an Azure subscription, create a free account before you begin.
- An Azure Machine Learning workspace. See Create workspace resources.
- Create an Azure Synapse Analytics workspace in Azure portal.
- Create an Apache Spark pool using the Azure portal.
- Configure your development environment, or create an Azure Machine Learning compute instance.
- Install the Azure Machine Learning SDK for Python.
Azure Machine Learning offers different ways to attach and manage a Synapse Spark pool.
To attach a Synapse Spark Pool with the Studio Compute tab:
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png" alt-text="Screenshot showing creation of a new Synapse Spark Pool." lightbox= "media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png":::
- In the Manage section of the left pane, select Compute.
- Select Attached computes.
- On the Attached computes screen, select New, to see the options for attaching different types of computes.
- Select Synapse Spark pool.
The Attach Synapse Spark pool panel opens on the right side of the screen. In this panel:
-
Enter a Name, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning resource.
-
Select an Azure Subscription from the dropdown menu.
-
Select a Synapse workspace from the dropdown menu.
-
Select a Spark Pool from the dropdown menu.
-
Toggle the Assign a managed identity option, to enable it.
-
Select a managed Identity type to use with this attached Synapse Spark Pool.
-
Select Update, to complete the Synapse Spark Pool attach process.
[!INCLUDE cli v2]
With the Azure Machine Learning CLI, we can use intuitive YAML syntax and commands from the command line interface, to attach and manage a Synapse Spark pool.
To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:
-
name
– name of the attached Synapse Spark pool. -
type
– set this property tosynapsespark
. -
resource_id
– this property should provide the resource ID value of the Synapse Spark pool created in the Azure Synapse Analytics workspace. The Azure resource ID includes-
Azure Subscription ID,
-
resource Group Name,
-
Azure Synapse Analytics Workspace Name, and
-
name of the Synapse Spark Pool.
name: <ATTACHED_SPARK_POOL_NAME> type: synapsespark resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>
-
-
identity
– this property defines the identity type to assign to the attached Synapse Spark pool. It can take one of these values:-
system_assigned
-
user_assigned
name: <ATTACHED_SPARK_POOL_NAME> type: synapsespark resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME> identity: type: system_assigned
-
-
For the
identity
typeuser_assigned
, you should also provide a list ofuser_assigned_identities
values. Each user-assigned identity should be declared as an element of the list, by using theresource_id
value of the user-assigned identity. The first user-assigned identity in the list is used to submit a job by default.name: <ATTACHED_SPARK_POOL_NAME> type: synapsespark resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME> identity: type: user_assigned user_assigned_identities: - resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>
The YAML files above can be used in the az ml compute attach
command as the --file
parameter. A Synapse Spark pool can be attached to an Azure Machine Learning workspace, in a specified resource group of a subscription, with the az ml compute attach
command as shown here:
az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
This sample shows the expected output of the above command:
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please visit https://aka.ms/azuremlexperimental for more information.
{
"auto_pause_settings": {
"auto_pause_enabled": true,
"delay_in_minutes": 15
},
"created_on": "2022-09-13 19:01:05.109840+00:00",
"id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
"location": "eastus2",
"name": "<ATTACHED_SPARK_POOL_NAME>",
"node_count": 5,
"node_family": "MemoryOptimized",
"node_size": "Small",
"provisioning_state": "Succeeded",
"resourceGroup": "<RESOURCE_GROUP>",
"resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
"scale_settings": {
"auto_scale_enabled": false,
"max_node_count": 0,
"min_node_count": 0
},
"spark_version": "3.2",
"type": "synapsespark"
}
If the attached Synapse Spark pool, with the name specified in the YAML specification file, already exists in the workspace, then az ml compute attach
command execution updates the existing pool with the information provided in the YAML specification file. You can update the
- identity type
- user assigned identities
- tags
values through YAML specification file.
To display details of an attached Synapse Spark pool, execute the az ml compute show
command. Pass the name of the attached Synapse Spark pool with the --name
parameter, as shown:
az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
This sample shows the expected output of the above command:
<ATTACHED_SPARK_POOL_NAME>
{
"auto_pause_settings": {
"auto_pause_enabled": true,
"delay_in_minutes": 15
},
"created_on": "2022-09-13 19:01:05.109840+00:00",
"id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
"location": "eastus2",
"name": "<ATTACHED_SPARK_POOL_NAME>",
"node_count": 5,
"node_family": "MemoryOptimized",
"node_size": "Small",
"provisioning_state": "Succeeded",
"resourceGroup": "<RESOURCE_GROUP>",
"resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
"scale_settings": {
"auto_scale_enabled": false,
"max_node_count": 0,
"min_node_count": 0
},
"spark_version": "3.2",
"type": "synapsespark"
}
To see a list of all computes, including the attached Synapse Spark pools in a workspace, use the az ml compute list
command. Use the name parameter to pass the name of the workspace, as shown:
az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
This sample shows the expected output of the above command:
[
{
"auto_pause_settings": {
"auto_pause_enabled": true,
"delay_in_minutes": 15
},
"created_on": "2022-09-09 21:28:54.871251+00:00",
"id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
"identity": {
"principal_id": "<PRINCIPAL_ID>",
"tenant_id": "<TENANT_ID>",
"type": "system_assigned"
},
"location": "eastus2",
"name": "<ATTACHED_SPARK_POOL_NAME>",
"node_count": 5,
"node_family": "MemoryOptimized",
"node_size": "Small",
"provisioning_state": "Succeeded",
"resourceGroup": "<RESOURCE_GROUP>",
"resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
"scale_settings": {
"auto_scale_enabled": false,
"max_node_count": 0,
"min_node_count": 0
},
"spark_version": "3.2",
"type": "synapsespark"
},
...
]
[!INCLUDE sdk v2]
Azure Machine Learning Python SDK provides convenient functions for attaching and managing Synapse Spark pool, using Python code in Azure Machine Learning Notebooks.
To attach a Synapse Compute using Python SDK, first create an instance of azure.ai.ml.MLClient class. This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses azure.identity.DefaultAzureCredential
to connect to a workspace in the resource group of a specified Azure subscription. In the following code sample, define the SynapseSparkCompute
with these parameters:
name
- user-defined name of the new attached Synapse Spark pool.resource_id
- resource ID of the Synapse Spark pool created earlier in the Azure Synapse Analytics workspace
An azure.ai.ml.MLClient.begin_create_or_update() function call attaches the defined Synapse Spark pool to the Azure Machine Learning workspace.
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute
from azure.identity import DefaultAzureCredential
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)
synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_comp = SynapseSparkCompute(name=synapse_name, resource_id=synapse_resource)
ml_client.begin_create_or_update(synapse_comp)
To attach a Synapse Spark pool that uses system-assigned identity, pass IdentityConfiguration, with type set to SystemAssigned
, as the identity
parameter of the SynapseSparkCompute
class. This code snippet attaches a Synapse Spark pool that uses system-assigned identity:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute, IdentityConfiguration
from azure.identity import DefaultAzureCredential
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)
synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(type="SystemAssigned")
synapse_comp = SynapseSparkCompute(
name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)
A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the IdentityConfiguration class, as the identity
parameter of the SynapseSparkCompute
class. For the managed identity definition used in this way, set the type
to UserAssigned
. In addition, pass a user_assigned_identities
parameter. The parameter user_assigned_identities
is a list of objects of the UserAssignedIdentity class. The resource_id
of the user-assigned identity populates each UserAssignedIdentity
class object. This code snippet attaches a Synapse Spark pool that uses a user-assigned identity:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
SynapseSparkCompute,
IdentityConfiguration,
UserAssignedIdentity,
)
from azure.identity import DefaultAzureCredential
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)
synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(
type="UserAssigned",
user_assigned_identities=[
UserAssignedIdentity(
resource_id="/subscriptions/<SUBSCRIPTION_ID/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
)
],
)
synapse_comp = SynapseSparkCompute(
name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)
Note
The azure.ai.ml.MLClient.begin_create_or_update()
function attaches a new Synapse Spark pool, if a pool with the specified name does not already exist in the workspace. However, if a Synapse Spark pool with that specified name is already attached to the workspace, a call to the azure.ai.ml.MLClient.begin_create_or_update()
function will update the existing attached pool with the new identity or identities.
To ensure that the attached Synapse Spark Pool works properly, assign the Administrator Role to it, from the Azure Synapse Analytics studio UI. These steps show how to do it:
-
Open your Synapse Workspace in Azure portal.
-
In the left pane, select Overview.
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png" alt-text="Screenshot showing Open Synapse Studio." lightbox= "media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png":::
-
Select Open Synapse Studio.
-
In the Azure Synapse Analytics studio, select Manage in the left pane.
-
Select Access Control in the Security section of the left pane, second from the left.
-
Select Add.
-
The Add role assignment panel will open on the right side of the screen. In this panel:
-
Select Workspace item for Scope.
-
In the Item type dropdown menu, select Apache Spark pool.
-
In the Item dropdown menu, select your Apache Spark pool.
-
In Role dropdown menu, select Synapse Administrator.
-
In the Select user search box, start typing the name of your Azure Machine Learning Workspace. It shows you a list of attached Synapse Spark pools. Select your desired Synapse Spark pool from the list.
-
Select Apply.
:::image type="content" source="media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png" alt-text="Screenshot showing Add Role Assignment." lightbox= "media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png":::
-
You can manage the attached Synapse Spark pool from the Azure Machine Learning studio UI. Spark pool management functionality includes associated managed identity updates for an attached Synapse Spark pool. You can assign a system-assigned or a user-assigned identity while updating a Synapse Spark pool. You should create a user-assigned managed identity in Azure portal, before you assign it to a Synapse Spark pool.
To update managed identity for the attached Synapse Spark pool:
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_update_managed_identity.png" alt-text="Screenshot showing Synapse Spark Pool managed identity update." lightbox= "media/how-to-manage-synapse-spark-pool/synapse_compute_update_managed_identity.png":::
-
Open the Details page for the Synapse Spark pool in the Azure Machine Learning studio.
-
Find the edit icon, located on the right side of the Managed identity section.
-
To assign a managed identity for the first time, toggle Assign a managed identity to enable it.
-
To assign a system-assigned managed identity:
- Select System-assigned as the Identity type.
- Select Update.
-
To assign a user-assigned managed identity:
- Select User-assigned as the Identity type.
- Select an Azure Subscription from the dropdown menu.
- Type the first few letters of the name of user-assigned managed identity in the box that shows the text Search by name. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
- Select Update.
[!INCLUDE cli v2]
To update the identity associated with an attached Synapse Spark pool, execute the az ml compute update
command with appropriate parameters. To assign a system-assigned identity, set the --identity
parameter in the command to SystemAssigned
, as shown:
az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
This sample shows the expected output of the above command:
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
{
"auto_pause_settings": {
"auto_pause_enabled": true,
"delay_in_minutes": 15
},
"created_on": "2022-09-13 20:02:15.746490+00:00",
"id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
"identity": {
"principal_id": "<PRINCIPAL_ID>",
"tenant_id": "<TENANT_ID>",
"type": "system_assigned"
},
"location": "eastus2",
"name": "<ATTACHED_SPARK_POOL_NAME>",
"node_count": 5,
"node_family": "MemoryOptimized",
"node_size": "Small",
"provisioning_state": "Succeeded",
"resourceGroup": "<RESOURCE_GROUP>",
"resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<AML_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
"scale_settings": {
"auto_scale_enabled": false,
"max_node_count": 0,
"min_node_count": 0
},
"spark_version": "3.2",
"type": "synapsespark"
}
To assign a user-assigned identity, set the parameter --identity
in the command to UserAssigned
. Additionally, you should use the --user-assigned-identities
parameter to pass the resource ID for the user-assigned identity, as shown:
az ml compute update --identity UserAssigned --user-assigned-identities /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
This sample shows the expected output of the above command:
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
{
"auto_pause_settings": {
"auto_pause_enabled": true,
"delay_in_minutes": 15
},
"created_on": "2022-09-13 20:02:15.746490+00:00",
"id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
"identity": {
"type": "user_assigned",
"user_assigned_identities": [
{
"client_id": "<CLIENT_ID>",
"principal_id": "<PRINCIPAL_ID>",
"resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourcegroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
}
]
},
"location": "eastus2",
"name": "<ATTACHED_SPARK_POOL_NAME>",
"node_count": 5,
"node_family": "MemoryOptimized",
"node_size": "Small",
"provisioning_state": "Succeeded",
"resourceGroup": "<RESOURCE_GROUP>",
"resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
"scale_settings": {
"auto_scale_enabled": false,
"max_node_count": 0,
"min_node_count": 0
},
"spark_version": "3.2",
"type": "synapsespark"
}
Note
The parameter --user-assigned-identities
can take a list of resource IDs and assign multiple user-defined identities to an attached Synapse Spark pool. The first user-assigned identity in the list will be used for submitting a job by default.
[!INCLUDE sdk v2]
To use system-assigned identity, pass IdentityConfiguration
, with type set to SystemAssigned
, as the identity
parameter of the SynapseSparkCompute
class. This code snippet updates a Synapse Spark pool to use a system-assigned identity:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute, IdentityConfiguration
from azure.identity import DefaultAzureCredential
subscription_id = "<SUBSCRIPTION_ID>"
resource_group_name = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource ="/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(type="SystemAssigned")
synapse_comp = SynapseSparkCompute(name=synapse_name, resource_id=synapse_resource,identity=synapse_identity) ml_client.begin_create_or_update(synapse_comp)
A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the IdentityConfiguration class, as the identity
parameter of the SynapseSparkCompute
class. For the managed identity definition used in this way, set the type
to UserAssigned
. In addition, pass a user_assigned_identities
parameter. The parameter user_assigned_identities
is a list of objects of the UserAssignedIdentity class. The resource_id
of the user-assigned identity populates each UserAssignedIdentity
class object. This code snippet updates a Synapse Spark pool to use a user-assigned identity:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
SynapseSparkCompute,
IdentityConfiguration,
UserAssignedIdentity,
)
from azure.identity import DefaultAzureCredential
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)
synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(
type="UserAssigned",
user_assigned_identities=[
UserAssignedIdentity(
resource_id="/subscriptions/<SUBSCRIPTION_ID/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
)
],
)
synapse_comp = SynapseSparkCompute(
name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)
Note
If a pool with the specified name does not already exist in the workspace, the azure.ai.ml.MLClient.begin_create_or_update()
function will attach a new Synapse Spark pool. However, if a Synapse Spark pool, with the specified name, is already attached to the workspace, an azure.ai.ml.MLClient.begin_create_or_update()
function call will update the existing attached pool, with the new identity or identities.
We might want to detach an attached Synapse Spark pool, to clean up a workspace.
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this, follow these steps:
-
Open the Details page for the Synapse Spark pool, in the Azure Machine Learning studio.
-
Select Detach, to detach the attached Synapse Spark pool.
[!INCLUDE cli v2]
An attached Synapse Spark pool can be detached by executing the az ml compute detach
command with the name of the pool passed, using the --name
parameter, as shown here:
az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
This sample shows the expected output of the above command:
Are you sure you want to perform this operation? (y/n): y
[!INCLUDE sdk v2]
We'll use an MLClient.compute.begin_delete()
function call. Pass the name
of the attached Synapse Spark pool, along with the action Detach
, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute
from azure.identity import DefaultAzureCredential
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)
synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
ml_client.compute.begin_delete(name=synapse_name, action="Detach")
Some user scenarios might require access to a serverless Spark compute resource, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. Learn more about the serverless Spark compute experience.