# 5. Provisioning Azure ML resources  

This section programmatically creates—or retrieves if they already exist—the **Azure infrastructure** needed for later training runs on the cloud:

1. **Resource Group** – the logical container for every other Azure asset.  
2. **Azure ML Workspace** – the control plane that tracks experiments, models, and datasets.  
3. *(next notebook cells will add Compute clusters and custom Environments).*

> **Prerequisites**  
> * You are logged in via `az login` **or** have environment variables that `DefaultAzureCredential` can pick up (`AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET`).  
> * A `config.yaml` file at the project root containing subscription IDs and names (see next cell).  
> * Your user/service-principal has *Contributor* rights on the subscription or resource group.


### `config.yaml` expected keys

e.g

```yaml
azure:
  location: westeurope
  subscription_id: "00000000-1111-2222-3333-444444444444"
  resource_group_name: "rg-yolo11-demo"
  workspace_name: "mlw-yolo11-demo"
  environment_name: "yolo11-env"
  training_gpu_cluster: "gpu-cluster"
  compute_name: "cpu-cluster"


In [1]:
import yaml

from azure.identity import DefaultAzureCredential

from azure.mgmt.resource import ResourceManagementClient

from azure.core.exceptions import ResourceNotFoundError

from azure.ai.ml import MLClient
from azure.ai.ml.entities import Workspace
from azure.ai.ml.entities import Environment, BuildContext

from azure.ai.ml.entities import AmlCompute

## 1) Load project-specific settings from config.yaml

In [2]:
# Load configuration from the YAML file
with open("../config.yaml", "r") as file:
    config = yaml.safe_load(file)

In [3]:
location = config["azure"]["location"]
subscription_id = config["azure"]["subscription_id"]
resource_group_name = config["azure"]["resource_group_name"]
workspace_name = config["azure"]["workspace_name"]
environment_name = config["azure"]["environment_name"]
training_gpu_cluster = config["azure"]["training_gpu_cluster"]
compute_name = config["azure"]["compute_name"]

## 2) Authenticate – uses Azure CLI, environment vars, MSI, etc.

In [4]:
# Initialize DefaultAzureCredential
credential = DefaultAzureCredential()

## 3) RESOURCE GROUP

In [5]:
# Initialize the Resource Management client
resource_client = ResourceManagementClient(credential, subscription_id)

In [6]:
def create_resource_group(resource_client, resource_group_name, location):
    """
    Creates a resource group in Azure if it does not already exist.

    Parameters:
        resource_client: The client instance used to interact with Azure resource groups.
        resource_group_name (str): The name of the resource group.
        location (str): The Azure region where the resource group should be created.

    Returns:
        The resource group object if successful, or None if an error occurs.
    """
    try:
        # Try to get the resource group
        resource_group = resource_client.resource_groups.get(resource_group_name)
        print(f"Resource Group '{resource_group_name}' already exists in '{resource_group.location}'.")
    except ResourceNotFoundError:
        # If the resource group does not exist, create it
        resource_group_params = {"location": location}
        resource_group = resource_client.resource_groups.create_or_update(
            resource_group_name,
            resource_group_params
        )
        print(f"Resource Group '{resource_group_name}' created in '{resource_group.location}'.")
    except Exception as e:
        # Handle other exceptions
        print(f"An error occurred: {e}")
        return None
    return resource_group


In [None]:
# Call the function to create the Resource Group
resource_group = create_resource_group(resource_client, resource_group_name, location)

## 4) WORKSPACE

In [12]:
ml_client = MLClient(credential, subscription_id, resource_group_name)

Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


In [13]:
def create_workspace(ml_client, workspace_name, location):
    """
    Creates or retrieves an Azure ML workspace.

    This function first attempts to retrieve an existing workspace with the provided name.
    If the workspace does not exist (raising a ResourceNotFoundError), it creates a new workspace
    in the specified location. If any other exception occurs, it prints the error and returns None.

    Parameters:
        ml_client: MLClient
            An instance of MLClient used to interact with Azure ML workspaces.
        workspace_name (str):
            The name of the workspace.
        location (str):
            The Azure region where the workspace should be located.

    Returns:
        The workspace object if successful, or None if an error occurs.
    """
    try:
        # Try to get the existing Workspace
        workspace = ml_client.workspaces.get(workspace_name)
        print(f"Workspace '{workspace_name}' already exists in '{workspace.location}'.")
        return workspace
    except ResourceNotFoundError:
        # If the Workspace does not exist, create it asynchronously
        workspace_poller = ml_client.workspaces.begin_create(
            Workspace(
                name=workspace_name,
                location=location  # Use the 'location' variable
            )
        )
        workspace = workspace_poller.result()  # Wait for the operation to complete
        print(f"Workspace '{workspace_name}' created in '{workspace.location}'.")
        return workspace
    except Exception as e:
        # Handle other exceptions
        print(f"An error occurred: {e}")
        return None



In [None]:
workspace = create_workspace(ml_client, workspace_name, location)

### Next steps

* The next notebook cell will register a **custom Docker Environment** and spin up Compute targets (`AmlCompute`).  
* If you need to use a different subscription or tenant, just adjust `config.yaml` and rerun this cell.  
* All objects are idempotent—running the cell twice will detect existing resources instead of failing.