title | titleSuffix | description | services | ms.service | ms.subservice | ms.topic | ms.author | author | ms.reviewer | ms.date | ms.custom |
---|---|---|---|---|---|---|---|---|---|---|---|
Create connections to external data sources (preview) |
Azure Machine Learning |
Learn how to use connections to connect to External data sources for training with Azure Machine Learning. |
machine-learning |
machine-learning |
mldata |
how-to |
franksolomon |
fbsolo-ms1 |
ambadal |
06/19/2023 |
data4ml, devx-track-azurecli |
[!INCLUDE dev v2]
In this article, you'll learn how to connect to data sources located outside of Azure, to make that data available to Azure Machine Learning services. Azure connections serve as key vault proxies, and interactions with connections are actually direct interactions with an Azure key vault. Azure Machine Learning connections store username and password data resources securely, as secrets, in a key vault. The key vault RBAC controls access to these data resources. For this data availability, Azure supports connections to these external sources:
- Snowflake DB
- Amazon S3
- Azure SQL DB
[!INCLUDE machine-learning-preview-generic-disclaimer]
-
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
-
An Azure Machine Learning workspace.
Important
An Azure Machine Learning connection securely stores the credentials passed during connection creation in the Workspace Azure Key Vault. A connection references the credentials from the key vault storage location for further use. You won't need to directly deal with the credentials after they are stored in the key vault. You have the option to store the credentials in the YAML file. A CLI command or SDK can override them. We recommend that you avoid credential storage in a YAML file, because a security breach could lead to a credential leak.
Note
For a successful data import, please verify that you have installed the latest azure-ai-ml package (version 1.5.0 or later) for SDK, and the ml extension (version 2.15.1 or later).
If you have an older SDK package or CLI extension, please remove the old one and install the new one with the code shown in the tab section. Follow the instructions for SDK and CLI as shown here:
az extension remove -n ml
az extension add -n ml --yes
az extension show -n ml #(the version value needs to be 2.15.1 or later)
pip uninstall azure-ai-ml
pip install azure-ai-ml
pip show azure-ai-ml #(the version value needs to be 1.5.0 or later)
Not available.
This YAML file creates a Snowflake DB connection. Be sure to update the appropriate values:
# my_snowflakedb_connection.yaml
$schema: http://azureml/sdk-2-0/Connection.json
type: snowflake
name: my-sf-db-connection # add your datastore name here
target: jdbc:snowflake://<myaccount>.snowflakecomputing.com/?db=<mydb>&warehouse=<mywarehouse>&role=<myrole>
# add the Snowflake account, database, warehouse name and role name here. If no role name provided it will default to PUBLIC
credentials:
type: username_password
username: <username> # add the Snowflake database user name here or leave this blank and type in CLI command line
password: <password> # add the Snowflake database password here or leave this blank and type in CLI command line
Create the Azure Machine Learning connection in the CLI:
az ml connection create --file my_snowflakedb_connection.yaml
az ml connection create --file my_snowflakedb_connection.yaml --set credentials.username="XXXXX" credentials.password="XXXXX"
from azure.ai.ml import MLClient, load_workspace_connection
ml_client = MLClient.from_config()
wps_connection = load_workspace_connection(source="./my_snowflakedb_connection.yaml")
wps_connection.credentials.username="XXXXX"
wps_connection.credentials.password="XXXXXXXX"
ml_client.connections.create_or_update(workspace_connection=wps_connection)
from azure.ai.ml import MLClient
from azure.ai.ml.entities import WorkspaceConnection
from azure.ai.ml.entities import UsernamePasswordConfiguration
target= "jdbc:snowflake://<myaccount>.snowflakecomputing.com/?db=<mydb>&warehouse=<mywarehouse>&role=<myrole>"
# add the Snowflake account, database, warehouse name and role name here. If no role name provided it will default to PUBLIC
name= <my_snowflake_connection> # name of the connection
wps_connection = WorkspaceConnection(name= name,
type="snowflake",
target= target,
credentials= UsernamePasswordConfiguration(username="XXXXX", password="XXXXXX")
)
ml_client.connections.create_or_update(workspace_connection=wps_connection)
-
Navigate to the Azure Machine Learning studio.
-
Under Assets in the left navigation, select Data. Next, select the Data Connection tab. Then select Create as shown in this screenshot:
:::image type="content" source="media/how-to-connection/create-new-data-connection.png" lightbox="media/how-to-connection/create-new-data-connection.png" alt-text="Screenshot showing the start of a new data connection in Azure Machine Learning studio UI.":::
-
In the Create connection pane, fill in the values as shown in the screenshot. Choose Snowflake for the category, and Username password for the Authentication type. Be sure to specify the Target textbox value in this format, filling in your specific values between the < > characters:
jdbc:snowflake://<myaccount>.snowflakecomputing.com/?db=<mydb>&warehouse=<mywarehouse>&role=<myrole>
:::image type="content" source="media/how-to-connection/create-snowflake-connection.png" lightbox="media/how-to-connection/create-snowflake-connection.png" alt-text="Screenshot showing creation of a new Snowflake connection in Azure Machine Learning studio UI.":::
-
Select Save to securely store the credentials in the key vault associated with the relevant workspace. This connection is used when running a data import job.
This YAML script creates an Azure SQL DB connection. Be sure to update the appropriate values:
# my_sqldb_connection.yaml
$schema: http://azureml/sdk-2-0/Connection.json
type: azure_sql_db
name: my-sqldb-connection
target: Server=tcp:<myservername>,<port>;Database=<mydatabase>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30
# add the sql servername, port addresss and database
credentials:
type: sql_auth
username: <username> # add the sql database user name here or leave this blank and type in CLI command line
password: <password> # add the sql database password here or leave this blank and type in CLI command line
Create the Azure Machine Learning connection in the CLI:
az ml connection create --file my_sqldb_connection.yaml
az ml connection create --file my_sqldb_connection.yaml --set credentials.username="XXXXX" credentials.password="XXXXX"
from azure.ai.ml import MLClient, load_workspace_connection
ml_client = MLClient.from_config()
wps_connection = load_workspace_connection(source="./my_sqldb_connection.yaml")
wps_connection.credentials.username="XXXXXX"
wps_connection.credentials.password="XXXXXxXXX"
ml_client.connections.create_or_update(workspace_connection=wps_connection)
from azure.ai.ml import MLClient
from azure.ai.ml.entities import WorkspaceConnection
from azure.ai.ml.entities import UsernamePasswordConfiguration
target= "Server=tcp:<myservername>,<port>;Database=<mydatabase>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
# add the sql servername, port address and database
name= <my_sql_connection> # name of the connection
wps_connection = WorkspaceConnection(name= name,
type="azure_sql_db",
target= target,
credentials= UsernamePasswordConfiguration(username="XXXXX", password="XXXXXX")
)
ml_client.connections.create_or_update(workspace_connection=wps_connection)
-
Navigate to the Azure Machine Learning studio.
-
Under Assets in the left navigation, select Data. Next, select the Data Connection tab. Then select Create as shown in this screenshot:
:::image type="content" source="media/how-to-connection/create-new-data-connection.png" lightbox="media/how-to-connection/create-new-data-connection.png" alt-text="Screenshot showing the start of a new data connection in Azure Machine Learning studio UI.":::
-
In the Create connection pane, fill in the values as shown in the screenshot. Choose AzureSqlDb for the category, and Username password for the Authentication type. Be sure to specify the Target textbox value in this format, filling in your specific values between the < > characters:
Server=tcp:<myservername>,<port>;Database=<mydatabase>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30
:::image type="content" source="media/how-to-connection/how-to-create-azuredb-connection.png" lightbox="media/how-to-connection/how-to-create-azuredb-connection.png" alt-text="Screenshot showing creation of a new Azure DB connection in Azure Machine Learning studio UI.":::
Create an Amazon S3 connection with the following YAML file. Be sure to update the appropriate values:
# my_s3_connection.yaml
$schema: http://azureml/sdk-2-0/Connection.json
type: s3
name: my_s3_connection
target: <mybucket> # add the s3 bucket details
credentials:
type: access_key
access_key_id: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX # add access key id
secret_access_key: XxXxXxXXXXXXXxXxXxxXxxXXXXXXXXxXxxXXxXXXXXXXxxxXxXXxXXXXXxXXxXXXxXxXxxxXXxXXxXXXXXxXxxXX # add access key secret
Create the Azure Machine Learning connection in the CLI:
az ml connection create --file my_s3_connection.yaml
from azure.ai.ml import MLClient, load_workspace_connection
ml_client = MLClient.from_config()
wps_connection = load_workspace_connection(source="./my_s3_connection.yaml")
ml_client.connections.create_or_update(workspace_connection=wps_connection)
from azure.ai.ml import MLClient
from azure.ai.ml.entities import WorkspaceConnection
from azure.ai.ml.entities import AccessKeyConfiguration
target=<mybucket> # add the s3 bucket details
name=<my_s3_connection> # name of the connection
wps_connection=WorkspaceConnection(name=name,
type="s3",
target= target,
credentials= AccessKeyConfiguration(access_key_id="XXXXXX",acsecret_access_key="XXXXXXXX")
)
ml_client.connections.create_or_update(workspace_connection=wps_connection)
-
Navigate to the Azure Machine Learning studio.
-
Under Assets in the left navigation, select Data. Next, select the Data Connection tab. Then select Create as shown in this screenshot:
:::image type="content" source="media/how-to-connection/create-new-data-connection.png" lightbox="media/how-to-connection/create-new-data-connection.png" alt-text="Screenshot showing the start of a new data connection in Azure Machine Learning studio UI.":::
-
In the Create connection pane, fill in the values as shown in the screenshot. Choose S3 for the category, and Username password for the Authentication type. Be sure to specify the Target textbox value in this format, filling in your specific values between the < > characters:
<target>
:::image type="content" source="media/how-to-connection/how-to-create-amazon-s3-connection.png" lightbox="media/how-to-connection/how-to-create-amazon-s3-connection.png" alt-text="Screenshot showing creation of a new Amazon S3 connection in Azure Machine Learning studio UI.":::
The following connection types can be used to connect to Git, Python feed, Azure Container Registry, and a connection that uses an API key. These connections are not data connections, but are used to connect to external services for use in your code.
Create a Git connection with one of following YAML file. Be sure to update the appropriate values:
-
Connect using a personal access token (PAT):
#Connection.yml name: test_ws_conn_git_pat type: git target: https://github.com/contoso/contosorepo credentials: type: pat pat: dummy_pat
-
Connect to a public repo (no credentials):
#Connection.yml name: git_no_cred_conn type: git target: https://https://github.com/contoso/contosorepo
Create the Azure Machine Learning connection in the CLI:
az ml connection create --file connection.yaml
The following example creates a Git connection to a GitHub repo. This connection is authenticated with a Personal Access Token (PAT):
from azure.ai.ml.entities import WorkspaceConnection
from azure.ai.ml.entities import UsernamePasswordConfiguration, PatTokenConfiguration
name = "my_git_conn"
target = "https://github.com/myaccount/myrepo"
wps_connection = WorkspaceConnection(
name=name,
type="git",
target=target,
credentials=PatTokenConfiguration(pat="XXXXXXXXX"),
)
ml_client.connections.create_or_update(workspace_connection=wps_connection)
You can't create a Git connection in studio.
Create a connection to a Python feed with one of following YAML file. Be sure to update the appropriate values:
-
Connect using a personal access token (PAT):
#Connection.yml name: test_ws_conn_python_pat type: python_feed target: https://test-feed.com credentials: type: pat pat: dummy_pat
-
Connect using a username and password:
name: test_ws_conn_python_user_pass type: python_feed target: https://test-feed.com credentials: type: username_password username: john password: pass
-
Connect to a public feed (no credentials):
name: test_ws_conn_python_no_cred type: python_feed target: https://test-feed.com3
Create the Azure Machine Learning connection in the CLI:
az ml connection create --file connection.yaml
The following example creates a Python feed connection. This connection is authenticated with a personal access token (PAT) or a username and password:
from azure.ai.ml.entities import WorkspaceConnection
from azure.ai.ml.entities import UsernamePasswordConfiguration, ManagedIdentityConfiguration
name = "my_pfeed_conn"
target = "https://XXXXXXXXX.core.windows.net/mycontainer"
wps_connection = WorkspaceConnection(
name=name,
type="python_feed",
target=target,
#credentials=UsernamePasswordConfiguration(username="xxxxx", password="xxxxx"),
credentials=PatTokenConfiguration(pat="XXXXXXXXX"),
#credentials=None
)
ml_client.connections.create_or_update(workspace_connection=wps_connection)
You can't create a Python feed connection in studio.
Create a connection to an Azure Container Registry with one of following YAML file. Be sure to update the appropriate values:
-
Connect using Microsoft Entra ID authentication:
name: test_ws_conn_cr_managed type: container_registry target: https://test-feed.com credentials: type: managed_identity client_id: client_id resource_id: resource_id
-
Connect using a username and password:
name: test_ws_conn_cr_user_pass type: container_registry target: https://test-feed.com2 credentials: type: username_password username: contoso password: pass
Create the Azure Machine Learning connection in the CLI:
az ml connection create --file connection.yaml
The following example creates an Azure Container Registry connection. This connection is authenticated using a managed identity:
from azure.ai.ml.entities import WorkspaceConnection
from azure.ai.ml.entities import UsernamePasswordConfiguration, PatTokenConfiguration
name = "my_acr_conn"
target = "https://XXXXXXXXX.core.windows.net/mycontainer"
wps_connection = WorkspaceConnection(
name=name,
type="container_registry",
target=target,
credentials=ManagedIdentityConfiguration (client_id="xxxxx", resource_id="xxxxx"),
)
ml_client.connections.create_or_update(workspace_connection=wps_connection)
You can't create an Azure Container Registry connection in studio.
The following example creates an API key connection:
from azure.ai.ml.entities import WorkspaceConnection
from azure.ai.ml.entities import UsernamePasswordConfiguration, ApiKeyConfiguration
name = "my_api_key"
target = "https://XXXXXXXXX.core.windows.net/mycontainer"
wps_connection = WorkspaceConnection(
name=name,
type="apikey",
target=target,
credentials=ApiKeyConfiguration(key="XXXXXXXXX"),
)
ml_client.connections.create_or_update(workspace_connection=wps_connection)
If you are using a data connection (Snowflake DB, Amazon S3, or Azure SQL DB), see these articles for more information: