# Basic AzureML SDK V2
Azure Machine Learning provides multiple ways to work ML Model life cycle. In this article, you'll learn how to work with Azure Machine Learning several resources and assets. These resources and assets are needed to run any job (i.e. train your Model). We will be using the following method to work with the resoures and assets:

 - Python SDK v2 for Azure Machine Learning.


## Create Workspace
The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. The workspace keeps a history of all jobs, including logs, metrics, output, and a snapshot of your scripts. The workspace stores references to resources like datastores and compute. It also holds all assets like models, environments, components and data asset.


This [Jupyter notebook](https://learn.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-v2?tabs=sdk) shows more ways to create an Azure ML workspace using SDK v2.

### Use MLClient
To create the workspace, you first need to create MLClient object. We use the [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python), our subscription id and the resource group to create the MLCLient object

In [None]:
# Enter details of your subscription
subscription_id = "xxx"
resource_group = "xxx"

# get a handle to the subscription

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group)

In [None]:
from azure.ai.ml.entities import Workspace
ws_basic = Workspace(
    name="xxx",
    location="Westus2", # Azure region (location) of workspace
    display_name="Meer Alam workspace-example",
    description="This example shows how to create a basic workspace"
)
ml_client.workspaces.begin_create(ws_basic) # use MLClient to connect to the subscription and resource group and create workspace

In [None]:
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient


credential = DefaultAzureCredential()
ml_client = None
try:
    ml_client = MLClient.from_config(credential)
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<Subscription ID>"
    resource_group = "<ResourceGroup Name>"
    workspace = "<<WorkspaceName>"
    ml_client = MLClient(credential, subscription_id, resource_group, workspace)
print(ml_client)

##Save the Workspace Config ??? Documentation points to Workspace.write_config but could not find any definition for the same

If workspace already exists, we can directly connect (get a handle) to the workspace

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
ml_client = None

# authentication package
from azure.identity import DefaultAzureCredential
try:
    ml_client = MLClient.from_config(credential)
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id="xxxx"
    resource_group_name="xxx"
    workspace_name="xxx"
   # ml_client = MLClient(credential, subscription_id, resource_group, workspace)

    # get a handle to the workspace
    ml_client = MLClient(
        #InteractiveBrowserCredential(),
        credential,
        subscription_id,
        resource_group_name,
        workspace_name
    )

# Computes
We can review the compute instances we have. We can create one, as needed, if do not have one

In [None]:
#List Compute targets in the Workspace
for compute in ml_client.compute.list():
    print(f"Compute {compute.name} is a {compute.type}")

Let us create a compute cluster, where we can submit a job

In [None]:
cpu_compute_target = AmlCompute(
    name="cpu-cluster",
    type="amlcompute",
    size="STANDARD_DS3_v2",
    location="westus2",
    min_instances=0,
    max_instances=2,
    idle_time_before_scale_down=120,
)
ml_client.begin_create_or_update(cpu_compute_target)

Alternatively we can use the following technique to create our compute cluster

In [None]:
from azure.ai.ml.entities import AmlCompute

# provision a small compute cluster
cpu_compute_target = "cpu-cluster"

try:
    cluster = ml_client.compute.get(cpu_compute_target)
except Exception:
    print("Creating a new cpu compute target...")
    compute = AmlCompute(
        name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4
    )
    cluster = ml_client.compute.begin_create_or_update(compute)

print(f"Got reference to cluster: {cluster.name}, Type: {cluster.type}")

Refer for more examples on Compute: https://github.com/Azure/azureml-examples/blob/main/sdk/resources/compute/compute.ipynb

# Datastores

In [None]:
#List existing Datastorses in the Workspace and select the default store

print([dstore.name for dstore in ml_client.datastores.list()])

dstore_name = "workspaceblobstore"

dstore = ml_client.datastores.get(dstore_name)
# OR 
dstore = ml_client.datastores.get_default()

print(f"Default Datastore name: {dstore.name}, Type: {dstore.type}")

This [Jupyter notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/resources/datastores/datastore.ipynb) shows more ways to create datastores using SDK v2.

In [None]:
# import required libraries
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml import command, Input
from azure.ai.ml.entities import (
    AzureBlobDatastore,
    AzureFileDatastore,
    AzureDataLakeGen1Datastore,
    AzureDataLakeGen2Datastore,
)
from azure.ai.ml.entities._datastore.credentials import (
    AccountKeyCredentials,
    SasTokenCredentials,
    ServicePrincipalCredentials,
)
from azure.ai.ml.entities import Environment

In [None]:
#Create a Datastore
'''
blob_datastore1 = AzureBlobDatastore(
    name="blob-example",
    description="Datastore pointing to a blob container.",
    account_name="mytestblobstore",
    container_name="data-container",
    credentials={
        "account_key": "XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX"
    },
)
ml_client.create_or_update(blob_datastore1)'''

In [None]:
# Example 1 - Data store with account key
blob_datastore1 = AzureBlobDatastore(
    name="blob_example",
    description="Datastore pointing to a blob container.",
    account_name="amlwalktstorage1190e2c28",
    container_name="azureml-blobstore-be972235-dac7-456b-bba5-5d39412d89e7",
    credentials=AccountKeyCredentials(
        account_key="Cnq4eg3SpgN7I4+F24LaplgeLimUDNhCsxT/TWXWsaRJLrpCdvNQL7Ab4iVzhpvVkyBinSRYnJhXbM4OA4G2Yg=="
    ),
)
ml_client.create_or_update(blob_datastore1)


Refer for more examples on Datastores: https://github.com/Azure/azureml-examples/blob/main/sdk/resources/datastores/datastore.ipynb

# Data Assets

In [None]:
#Data Assets
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

# Supported paths include:
# local: './<path>'
# blob:  'https://<account_name>.blob.core.windows.net/<container_name>/<path>'
# ADLS gen2: 'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/'
# Datastore: 'azureml://datastores/<data_store_name>/paths/<path>'

#E.g Datastore example 
# data_uri_path = 'azureml://datastores/workspaceblobstore/paths/samples/diabetes/v1/diabetes_raw_data.csv' (file needs to exist in the location)

#Blob/http location 
data_uri_path = 'https://azuremlexamples.blob.core.windows.net/datasets/diabetes.csv'

file_data_asset = Data(
    path= data_uri_path,
    type=AssetTypes.URI_FILE, ##URI_FILE/URI_FOLDER/MLTABLE
    description="Diabetes Sample Dataset",
    name="diabetes-dataset-uri-file",
    #version='1'
)
datastore_data_uri_path = "azureml://datastores/blob_example/paths/diabetes-data/diabetes.csv"
# https://amlwalktstorage1190e2c28.blob.core.windows.net/azureml-blobstore-be972235-dac7-456b-bba5-5d39412d89e7/diabetes-data/diabetes.csv
# Data Store URI: azureml://subscriptions/b30d9dbd-c0f7-405f-902c-3eabd080eb00/resourcegroups/aml-walkthrough-rg/workspaces/aml-walkthrough-ws/datastores/blob_example/paths/diabetes-data/

tabular_data_asset = Data(
    path= datastore_data_uri_path,
    type=AssetTypes.URI_FILE, ##URI_FILE/URI_FOLDER/MLTABLE
    description="Diabetes Sample Dataset",
    name="diabetes-dataset-uri-file",
   #version='1'
)

#creates and registers the data asset in the workspace
#ml_client.data.create_or_update(file_data_asset)
# ml_client.data.create_or_update(tabular_data_asset)


diabetes_uri_data = ml_client.data.get("diabetes-dataset-uri-file", version="1")
print(diabetes_uri_data.id, diabetes_uri_data.path)

Use the data in an experiment/job

In [None]:
%pip show pandas

In [None]:
import pandas as pd

df = pd.read_csv("azureml://subscriptions/b30d9dbd-c0f7-405f-902c-3eabd080eb00/resourcegroups/aml-walkthrough-rg/workspaces/aml-walkthrough-ws/datastores/blob_example/paths/diabetes-data/diabetes.csv")
df.head()

# MLTable
mltable is a way to abstract the schema definition for tabular data so that it is easier for consumers of the data to materialize the table into a Pandas/Dask/Spark dataframe.

In [None]:
#mltable

mltable_path = 'azureml://datastores/workspaceblobstore/paths/samples/diabetes_mltable/'
#Upload the sample MLTable.yml file and .csv file from 'sample_data' folder in to Blobstore location


mltable_asset = Data(
    path=mltable_path,
    type=AssetTypes.MLTABLE,
    description="Sklearn, Diabetes Sample Dataset",
    name="diabetes-mltable",
    #version='1'
)

diabetes_mltable = ml_client.data.create_or_update(mltable_asset)

print(diabetes_mltable.id, diabetes_mltable.path)

In [None]:
%pip install mltable

In [None]:
   # MLTable Examples
   # load mltable from local folder
   from mltable import load
   tbl = load('.\samples\mltable_sample')

   # load mltable from azureml datastore uri
   from mltable import load
   tbl = load(
       'azureml://subscriptions/<subscription-id>/
       resourcegroups/<resourcegroup-name>/workspaces/<workspace-name>/
       datastores/<datastore-name>/paths/<mltable-path-on-datastore>/')

   # load mltable from azureml data asset uri
   from mltable import load
   tbl = load(
       'azureml://subscriptions/<subscription-id>/
       resourcegroups/<resourcegroup-name>/providers/Microsoft.MachineLearningServices/
       workspaces/<workspace-name>/data/<data-asset-name>/versions/<data-asset-version>/')

In [None]:
#Reading MLTable Data using mltable package 
# https://learn.microsoft.com/en-us/python/api/mltable/mltable?source=recommendations&view=azure-ml-py

from mltable import load

tbl = load("azureml://subscriptions/b30d9dbd-c0f7-405f-902c-3eabd080eb00/resourcegroups/aml-walkthrough-rg/workspaces/aml-walkthrough-ws/datastores/blob_example/paths/diabetes-data")
df = tbl.to_pandas_dataframe()
df.head()

Refer for more on handling data assets here: https://github.com/Azure/azureml-examples/blob/main/sdk/assets/data/data.ipynb

### Read an MLTable in a job

#### Create an environment

Firstly, you need to create an environment that contains the mltable Python Library:

In [None]:
from azure.ai.ml.entities import Environment

env_docker_conda = Environment(
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
    conda_file="env-mltable.yml",
    name="mltable",
    description="Environment created for consuming MLTable.",
)

ml_client.environments.create_or_update(env_docker_conda)

Create a job

In [None]:
from azure.ai.ml import command
from azure.ai.ml.entities import Data
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes

inputs = {"input_data": Input(type=AssetTypes.MLTABLE, path="./sample_data")}

job = command(
    code="./src",  # local path where the code is stored
    command="python read_mltable.py --input_data ${{inputs.input_data}}",
    inputs=inputs,
    environment=env_docker_conda,
    compute="cpu-cluster",
)

# submit the command
returned_job = ml_client.jobs.create_or_update(job)
# get a URL for the status of the job
returned_job.services["Studio"].endpoint