# Azure ML Workspace infrastructure setup

Before we can start generating synthetic data and fine-tuning a student model, we need an Azure AI Studio Hub (same as a ML Workspace). You can either use an existing one or create a new one for this recipe. And you can either do this manually using either [Azure ML Studio](https://aka.ms/raft-llama-31-learn-deploy-405b) or [Azure AI Studio](https://aka.ms/raft-llama-31-learn-deploy-405b-ai-studio) or automatically using this notebook with the Azure Management and Azure ML Python SDKs.

The typical way to configure an Azure AI Studio Hub, which is the same as an Azure ML Workspace is to use a `config.json` file. This notebook will load an existing `config.json` file if it is available or use environment variables. This is handy to run this notebook as part of automated tests in a Github Workflow CI/CD pipeline configured using environment variables.

Instead of assuming that the resource group and workspace exist if the `config.json` file exists, this notebook will provision the resource group if it is missing then the Hub if it is missing.

**Install dependencies by running below cell. This is not an optional step if running in a new environment.**

In [None]:
%pip install azure-ai-ml
%pip install azure-identity

%pip install mlflow
%pip install azureml-mlflow
%pip install azureml-core
%pip install msrestazure
%pip install azure-mgmt-resource

## Authenticate

Run `az login` in the terminal

### Get Azure credentials

In [None]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential,
)

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

## Load env vars

In [None]:
from dotenv import load_dotenv
import os

load_dotenv(".env")

## Load config.json if it exists

In [None]:
import json
from pathlib import Path

config = {}
if Path("config.json").exists():
    print("Loading config.json")
    with open("config.json") as f:
        config = json.load(f)

## Get subscription

Read subscription ID from `config.json` if available or `AZURE_SUBSCRIPTION_ID` environment variable or fallback to using the subscription found with the credentials if there's only one.

Will fail if no subscription is specified and more than one are found.

In [None]:
subscription_id = config.get("subscription_id") or os.getenv("AZURE_SUBSCRIPTION_ID")
if not subscription_id:
    from azure.mgmt.resource import SubscriptionClient

    #sub_client = get_client_from_cli_profile(SubscriptionClient)
    sub_client = SubscriptionClient(credential)
    subs = list(sub_client.subscriptions.list())
    if len(subs) == 0:
        raise Exception("No subscriptions found")
    if len(subs) > 1:
        raise Exception("Multiple subscriptions found, please set AZURE_SUBSCRIPTION_ID in .env file")
    print(f"Only one subscription found '{subs[0].display_name}' so selecting it")
    subscription_id = subs[0].subscription_id
else:
    print(f"Using subscription '{subscription_id}'")

## Get resource group

Get existing resource group or create it if it doesn't exist

In [None]:
region = config.get("location") or os.getenv("WORKSPACE_REGION") or "westus3"
print(f"Using location '{region}'")

In [None]:
import os

from azure.identity import DefaultAzureCredential
from azure.mgmt.resource import ResourceManagementClient
from azure.core.exceptions import ResourceNotFoundError

resource_group = config.get("resource_group") or os.getenv("RESOURCE_GROUP") or "rg-raft-distillation"

# Obtain the management object for resources.
resource_client = ResourceManagementClient(credential, subscription_id)

# Provision the resource group.
try:
    rg_result = resource_client.resource_groups.get(resource_group)
    print(f"Found existing resource group '{rg_result.name}' in location '{rg_result.location}'")
except ResourceNotFoundError as ex:
    print("Resource group does not exist, creating...")
    rg_result = resource_client.resource_groups.create_or_update(resource_group, {"location": region})

## Get MLClient

MLClient can be created at different scopes: resource group level, workspace level, registry level, ...

Here we create a resource group level MLClient and use it to figure out whether the workspace exists or we should provision it

In [None]:
ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group)

## Get ML workspace

Get ML workspace if it exists or provision it if it doesn't

In [None]:
from azure.core.exceptions import ResourceNotFoundError
workspace_name = config.get("workspace_name") or os.getenv("WORKSPACE_NAME") or "raft-distillation"
try:
    ml_client.workspaces.get(workspace_name)
    print(f"Found existing workspace '{workspace_name}'")
except ResourceNotFoundError as ex:
    print("Workspace does not exist, creating...")

    from azure.ai.ml.entities import Workspace
    ws_basic = Workspace(
        name=workspace_name,
        location=region,
        display_name="RAFT Distillation workspace",
        description="RAFT Distillation workspace",
        hbi_workspace=False,
        tags=dict(purpose="demo"),
    )

    ws = ml_client.workspaces.begin_create(ws_basic).result()

In [None]:

config["subscription_id"] = subscription_id
config["resource_group"] = resource_group
config["workspace_name"] = workspace_name

import json
with open("config.json", "w") as outfile:
    json.dump(config, outfile, indent=4)