In [1]:
! pip install -r requirements.txt



## Connect to Azure Workspace
1. Connect to Azure Machine Learning Workspace. The Azure Machine Learning Workspace is the top-level resource for the service. It provides a centralized place to work (a curated environment) with all the artifacts

In [10]:
# Handle to the workspace
from azure.ai.ml import MLClient

# Authentication package
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential, CredentialUnavailableError
import os
import dotenv
dotenv.load_dotenv()
SUBSCRIPTION_ID = os.getenv("SUBSCRIPTION_ID")
RESOURCE_GROUP_NAME = os.getenv("RESOURCE_GROUP_NAME")
WORKSPACE_NAME = os.getenv("WORKSPACE_NAME")
VAULT_URL = os.getenv("AZURE_VAULT_URL")
credential = DefaultAzureCredential()
# secret_name = "MLKEY"
# secret_client = SecretClient(vault_url = VAULT_URL, credential = credential)
# secret = secret_client.get_secret(secret_name)


from azure.ai.ml import command, Input

from azure.ai.ml.entities import(
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    CodeConfiguration
)

# Create BlobServiceClient
from azure.storage.blob import BlobServiceClient


2. Get a Handle to the workspace by providing subscription Id, Resource Group name and workspace name.

In [11]:
# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id=SUBSCRIPTION_ID,
    resource_group_name=RESOURCE_GROUP_NAME,
    workspace_name=WORKSPACE_NAME,
)


## Create a compute Resource
1. Azure Machine Learning needs a compute resource to run a job. This resource can be simgle or multi-node machines with Linux or Windows OS
2. For this example - we will use Standard_NC4as_T4_v3 with 4 cores, 28 GB RAM, 176 GB Storage
3. More reosurces are listed in - https://azure.microsoft.com/en-us/pricing/details/machine-learning/#pricing

In [14]:
from azure.ai.ml.entities import AmlCompute

gpu_compute_target = "gpu-cluster"

try:
    # Check if the compute target already exists
    gpu_cluster = ml_client.compute.get(gpu_compute_target)
    print(f"GPU Cluster already present: {gpu_compute_target}")
except Exception:
    print("Creating new gpu compute target..")
    # Create Azure ML compute object with the intended parameters
    gpu_cluster = AmlCompute(
        # Name Assigned to compute cluster
        name = "gpu-cluster",
        # Azure M Compute is the on-demand VM Service
        type = "amlcompute",
        # VM Family
        size = "Standard_NC4as_T4_v3",
        # Minimum running nodes when there is no job running
        min_instances= 0 ,
        # nodes in cluster
        max_instances= 4,
        # How many secods will the node be running after the job termination-in case async fn are running
        idle_time_before_scale_down = 180,
        # Dedicated or LowPriority
        tier = "LowPriority"       
    )
    # Now we pass the object to MLClient's create_or_update method
    gpu_cluster = ml_client.begin_create_or_update(gpu_cluster).result()
    print(f'AMLCompute with name {gpu_cluster.name} created, the compute size is {gpu_cluster.size}')




Creating new gpu compute target..


HttpResponseError: (BadRequest) {"id":"https://resourceprovider.batchai-eastus2.svc/subscriptions/bbf5eb8c-3046-4876-8eb1-5210a3e6750f/providers/Microsoft.BatchAI/locations/eastus2/operationresults/d8ccb84f-6da5-40e2-8333-8a6c367f904d","name":"d8ccb84f-6da5-40e2-8333-8a6c367f904d","status":"Failed","startTime":"2024-08-17T19:11:43.306Z","endTime":"2024-08-17T19:11:48.69Z","error":{"code":"ClusterMinNodesExceedCoreQuota","message":"The specified subscription has a total vCPU quota of 0 and cannot accomodate for at least 1 requested managed compute node which maps to 4 vCPUs. Talk to your Subscription Admin or refer to https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas#request-quota-increases to increase the total quota"}}
Code: BadRequest
Message: {"id":"https://resourceprovider.batchai-eastus2.svc/subscriptions/bbf5eb8c-3046-4876-8eb1-5210a3e6750f/providers/Microsoft.BatchAI/locations/eastus2/operationresults/d8ccb84f-6da5-40e2-8333-8a6c367f904d","name":"d8ccb84f-6da5-40e2-8333-8a6c367f904d","status":"Failed","startTime":"2024-08-17T19:11:43.306Z","endTime":"2024-08-17T19:11:48.69Z","error":{"code":"ClusterMinNodesExceedCoreQuota","message":"The specified subscription has a total vCPU quota of 0 and cannot accomodate for at least 1 requested managed compute node which maps to 4 vCPUs. Talk to your Subscription Admin or refer to https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas#request-quota-increases to increase the total quota"}}

## Create a job environment
1. An Azure Machine learning eenvironment encapsulates the dependencies needed to run machine elarning training script on the created compute resource. This environment is similar to python venv on local machine.
2. The Azure machine learning environemtn allows us to either use a curated environment - useful for common training and inference scenarios - or create a custom environment using a docker image or Conda configuration
3. In this scenario we will use curated Azure Machine Learning environment - "AzureML-tensorflow-2.16-ubuntu20.04-py38-cuda11-gpu"

In [None]:
curated_env_name = "AzureML-tensorflow-2.16-cuda11@latest"