# Setup Environment

Follow this notebook to deploy the Azure Resources for running the project. This notebook will:
- Create a resource group
- Create an Azure Purview Account under the resource group
- Create an Azure ML Managed Feature Store under the resource group
- Create a Service Principal
- Assign the Service Principal with following roles
  - `AzureML Data Scientist` role to the Azure ML Managed Feature Store
  - `Contributor` role to the Azure Purview Account
    - *Note: This notebook doesn't assign Purview's `Data Curator` role to the service principal. For manually assigning this role, refer to [Microsoft documentation](https://learn.microsoft.com/purview/how-to-create-and-manage-collections#add-role-assignments)*.

The notebook uses Azure's python SDK as well as Azure CLI to create the resources. Make sure you have required permissions to install these packages in your notebook environment. 

### Prerequisites

In [None]:
# Decide the target subscript and its tenant where the resources will be deployed
subscription_id = ""
tenant_id = ""

In [None]:
# Decide Prefix for the name of the environment. Try to keep it short and UNIQUE.
#   the prefix is used to identify/name the resources
#   i.e., resource group will be named <prefix>rg
#   i.e., storage account will be named <prefix>sa
#   i.e., purview will be named <prefix>pv
# etc.
prefix = ""

# Decide the location of the resources.
location = "westeurope"

In [None]:
resource_group_name = f"{prefix}rg" # name of the resource group
featurestore_name = f"{prefix}fs" # name of feature store
storage_account_name = f"{prefix}sa" # name of the storage account
purview_name = f"{prefix}pv" # The purview name. !It must be globally unique!
sp_name=f"{prefix}sp" # name of the service principal

In [None]:
# install necessary packages. skip those you have already installed.
!pip install azure-cli
!pip install azure-identity
!pip install azure-mgmt-purview
!pip install azureml-featurestore
!pip install azure-mgmt-resource
!pip install azure-mgmt-storage

### Obtain Azure Credential

The following cell will help you to gain Azure credential of your AAD account:

In [None]:
# obtain credential 
# This method will automatically open a browser window and ask you to login to your azure account
# (it will not open the browser window immediately when you run this cell, but will whenever trying to get a new token)
from azure.identity import InteractiveBrowserCredential
default_credential = InteractiveBrowserCredential(tenant_id=tenant_id)

In [None]:
# [Optinal] Another way of obtaining credentail
# If your browser tab of authentication can not pop up correcty, try to use following way to authenticate.
from azure.identity import DeviceCodeCredential
default_credential = DeviceCodeCredential(tenant_id=tenant_id)

### Create a Resource Group

The following cell defined a python function which will create a resource group if not exists yet:

In [None]:
from azure.mgmt.resource import ResourceManagementClient

# this function will create a resource group if it does not exist using `subscription_id`, `resource_group_name`, `location
def check_or_create_resource_group(subscription_id, resource_group_name, location):

    # Initialize the ResourceManagementClient
    resource_client = ResourceManagementClient(default_credential, subscription_id)

    # Check if the resource group already exists
    try:
        resource_group = resource_client.resource_groups.get(resource_group_name)
        print(f"Resource group '{resource_group_name}' already exists.")
    except:
        # If it doesn't exist, create a new one
        print(f"Resource group '{resource_group_name}' does not exist. Creating...")
        resource_group_params = {'location': location}
        resource_group = resource_client.resource_groups.create_or_update(
            resource_group_name, resource_group_params
        )
        print(f"Resource group '{resource_group_name}' created.")

Run the function to create the resource group:

In [None]:
# create the resource group
# ! this action may open your browser to login to azure portal. Follow the instruction to login.
check_or_create_resource_group(subscription_id, resource_group_name, location)

### Create an Azure Purview Account

The following cell will create an Azure Purview Account under the resource group created above:

In [None]:
from azure.mgmt.purview import PurviewManagementClient
from azure.mgmt.purview.models import *
import time

purview_client = PurviewManagementClient(default_credential, subscription_id)

# create a purview account
# notice: if you meet error 2005 which specifies quota limit, you can try to use a different location.
identity = Identity(type= "SystemAssigned")
sku = AccountSku(name= 'Standard', capacity= 4)
purview_resource = Account(identity=identity,sku=sku,location=location)

       
try:
	pa = (purview_client.accounts.begin_create_or_update(resource_group_name, purview_name, purview_resource)).result()
	print("location:", pa.location, " Microsoft Purview Account Name: ", purview_name, " Id: " , pa.id ," tags: " , pa.tags) 
except Exception as e:
	print(f"Error in submitting job to create account: {e}")
 
while (getattr(pa,'provisioning_state')) != "Succeeded" :
    pa = (purview_client.accounts.get(resource_group_name, purview_name))  
    print(getattr(pa,'provisioning_state'))
    if getattr(pa,'provisioning_state') == "Failed" :
        print("Error in creating Microsoft Purview account")
        break
    time.sleep(30)

### Create an Azure ML Managed Feature Store

The following cell will create an Azure ML Managed Feature Store under the resource group created above:

In [None]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import FeatureStore

fs_client = MLClient(
    default_credential,
    subscription_id,
    resource_group_name,
    featurestore_name,
)

fs = FeatureStore(name=featurestore_name, location=location)
# wait for featurestore creation
fs_poller = fs_client.feature_stores.begin_create(fs, update_dependent_resources=True)
print(fs_poller.result())

### Create a Service Principal

In [None]:
sp_name=f"{prefix}sp"

The following cell will run a Azure CLI command to create a Service Principal named `sp_name` under tenant of `tenant_id`.

In [None]:
# create the service principal
sp_creation_output = !az ad sp create-for-rbac --name $sp_name

!**Notice**: Make a memo of the following cell output. The `password` here is the `client_secret` of the Service Principal. You will need it when setting up the data pipeline parameter in Fabric workspace.

In [None]:
# analyze the output to get the service principal information
import json
import re

sp_creation_output_str = ''.join(sp_creation_output)

match = re.search(r'\{.*\}', sp_creation_output_str)

if match:
    sp_dict = json.loads(match.group())
    print(sp_dict)

In [None]:
# app_id/client_id of the service principal
app_id = sp_dict['appId']

### Assign Roles

Allow the Service Principal to access feature store. It should be assigned to the role `AzureML Data Scientist` so that it can act to registrate/retrieve feature sets to the store.

In [None]:
featurestore_arm_id = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.MachineLearningServices/workspaces/{featurestore_name}"

In [None]:
!az role assignment create \
    --assignee $app_id  \
    --role "AzureML Data Scientist" \
    --scope $featurestore_arm_id

Allow the Service Principal to access Purview. It should be assigned to the role `Contributor` and `Data curators` so that it can act to registrate/scan the data assets.

In [None]:
purview_arm_id = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.Purview/accounts/{purview_name}"

In [None]:
!az role assignment create \
    --assignee $app_id  \
    --role "Contributor" \
    --scope $purview_arm_id

### Output the Reources Details for Later Use

You will later need to setup the Fabric workspace Environment by uploading [src/environment/sparkProperties.yml](./../environment/sparkProperties.yml) in one of the steps. The `sparkProperties.yml` requires the details of the resources created above. Run the following cell to get most of the values.

*Note: You still need to provide the `<fabric-tenant-name>` manually.*

In [None]:
env_props = f"""
runtime_version: '1.1'
spark_conf:
  - spark.fsd.client_id: {sp_dict['appId']}
  - spark.fsd.tenant_id: {sp_dict['tenant']}
  - spark.fsd.subscription_id: {subscription_id}
  - spark.fsd.rg_name: {resource_group_name}
  - spark.fsd.name: {featurestore_name}
  - spark.fsd.fabric.tenant: <fabric-tenant-name> # Fetch from Fabric base URL, like https://<fabric-tenant-name>.powerbi.com/
  - spark.fsd.purview.account: {purview_name}
"""

print(env_props)