# Tutorial: Different Approaches for Provisioning a Managed Feature Store

In this tutorial you will use various approaches to provision a managed feature store.
- Create a feature store using CLI/SDK
  - Default experience of creating a managed feature store.
  - Create a managed feature store using a provided storage as the default blob storage for the feature store. 
  - Create a managed feature store using a provided storage as offline store.
  - Create a managed feature store using a provided Redis cluster as online store.
  - Create a managed feature store using provided user-assigned managed identity (UAI) as the managed identity used for feature store materialization jobs.
- Create a managed feature store using Azure Resource Manager (ARM) template.


## Prerequisites and Setup

- [Create a compute instance](https://learn.microsoft.com/azure/machine-learning/how-to-create-compute-instance).
- Run this notebook using the above created compute instance and choosing `Python 3.8 - AzureML` kernel.

### Setup root directory for the samples
This code cell sets up the root directory for the samples.

In [None]:
import os

root_dir = "../../"

if os.path.isdir(root_dir):
    print("The folder exists.")
else:
    print("The folder does not exist. Please create or fix the path")

### Setup subscription and resource group
Set subscription ID, resource group, and location for the feature store.

In [None]:
subscription_id = "your-subscription-id"
resource_group_name = "your-resource-group-name"
location = "eastus2"

Initialize `MLClient` to perform create, read, update, and delete (CRUD) operations at the resource group scope.

In [None]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import FeatureStore
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id=subscription_id,
    resource_group_name=resource_group_name,
)

#### Setup CLI

1. Install AzureML CLI extention and azure-ai-ml package
1. Authenticate
1. Set the default subscription

In [None]:
# Install AzureML CLI extension
!az extension add --name ml

In [None]:
# Install AzureML SDK
!pip install azure-ai-ml

In [None]:
# Authenticate
!az login

In [None]:
# Set default subscription
!az account set -s $subscription_id

## 1. Create a managed feature store using CLI/SDK
### 1.1 Default managed feature store creation experience

With the default managed feature store provisioning experience, you will:

- Provide the subscription ID, resource group name, and location where you want to provision your feature store.
- Provide a feature store name.

Then, the system will provision following resources:

  - A storage account with storage type as `Azure Data Lake Storage Gen2` (ADLS Gen2) that will be used as workspace default storage account.

  - A container in the above ADLS Gen2 storage account as an offline store.

  - A user-assigned managed identity (UAI) to-be used for feature store materialization jobs.

  - The RBAC for UAI on the feature store and on the offline store.

    Below are the details of RBAC permissions granted by the system:

| Scope  | Action/Role |
| ------------- | ------------- |
| Feature store  | `AzureML Data Scientist` role  |
| ADLS Gen2 storage container as the feature store offline store  | `Storage Blob Data Contributor` role  |
| ADLS Gen2 storage account as the workspace default storage account  | `Storage Blob Data Contributor` role  |

  - An offline store connection to the feature store workspace.

  - Other resources for the workspace, including Key Vault etc.

In [None]:
featurestore_name = "myfs-sdk"

fs = FeatureStore(
    name=featurestore_name,
    location=location,
)

# wait for feature store creation
fs_poller = ml_client.feature_stores.begin_create(fs)

print(fs_poller.result())

Alternatively, you can dump the feature store configuration to a YAML file and provision the feature store using CLI.

In [None]:
featurestore_name = "myfs-cli"

yaml_path = root_dir + "/featurestore/featurestore_default.yaml"
fs.dump(yaml_path)

!az ml feature-store create --name $featurestore_name --subscription $subscription_id --resource-group $resource_group_name --file $yaml_path

### 1.2 Create a managed feature store using a provided storage as the default blob storage for the feature store

Create a storage account that will be used as the workspace default storage account.

In [None]:
featurestore_name = "myfswithblob"

In [None]:
storage_account_name = featurestore_name + "storage1"
!az storage account create --name $storage_account_name --enable-hierarchical-namespace true --resource-group $resource_group_name --location $location --subscription $subscription_id

Now create a managed feature store using the above storage account as the default blob storage.

In [None]:
# Specify storage account
storage_account = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.Storage/storageAccounts/{storage_account_name}"

fs = FeatureStore(
    name=featurestore_name,
    location=location,
    storage_account=storage_account,
)

# wait for feature store creation
fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())

Alternatively, you can dump the feature store configuration to a YAML file and provision the feature store using CLI.

In [None]:
yaml_path = root_dir + "/featurestore/featurestore_with_blob.yaml"
fs.dump(yaml_path)

featurestore_name = "myfswithblobcli"

!az ml feature-store create --name $featurestore_name --subscription $subscription_id --resource-group $resource_group_name --file $yaml_path

### 1.3 Create a managed feature store using a provided storage as offline store
You can bring your own storage container as the offline store for feature store.

In [None]:
featurestore_name = "myfswithoffline"

Create the ADLS Gen2 storage account where the offline store container will be created.

In [None]:
offline_storage_account_name = featurestore_name + "offline1"

!az storage account create --name $offline_storage_account_name --enable-hierarchical-namespace true --resource-group $resource_group_name --location $location --subscription $subscription_id

Create the ADLS Gen2 container to-be used as the offline store.

In [None]:
offline_store_container_name = "offlinestore"
connection_string = "your connection string to your storage"

!az storage fs create --name $offline_store_container_name --account-name $offline_storage_account_name --subscription $subscription_id --connection-string f"{connection_string}"

Now create a managed feature store using the above storage container as the offline store.

In [None]:
from azure.ai.ml.entities import MaterializationStore

gen2_container_arm_id = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.Storage/storageAccounts/{offline_storage_account_name}/blobServices/default/containers/{offline_store_container_name}"

offline_store = MaterializationStore(
    type="azure_data_lake_gen2", target=gen2_container_arm_id
)

fs = FeatureStore(
    name=featurestore_name, location=location, offline_store=offline_store
)
# wait for feature store creation
fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())

Alternatively, you can dump the feature store configuration to a YAML file and provision the feature store using CLI.

In [None]:
yaml_path = root_dir + "/featurestore/featurestore_with_offline.yaml"
fs.dump(yaml_path)

featurestore_name = "myfswithofflinecli"

!az ml feature-store create --name $featurestore_name --subscription $subscription_id --resource-group $resource_group_name --file $yaml_path

### 1.4 Create a managed feature store using a provided Redis cluster as online store

In the following code cell, define the name of the Azure Cache for Redis that you want to create or reuse. Optionally, you can also override other default settings.

In [None]:
redis_name = "your-redis-cluster-name"
sku = "premium"
size = "P2"

You can select the Azure Cache for Redis cache tier (basic, standard, or premium). You should choose a SKU family that is available for the selected cache tier. See this documentation page to learn more about [how selecting different tiers may affect cache performance](https://learn.microsoft.com/azure/azure-cache-for-redis/cache-best-practices-performance). See this link learn more about [pricing for different SKU tiers and families of Azure Cache for Redis](https://azure.microsoft.com/en-us/pricing/details/cache/).

Execute the following code cell to create an Azure Cache for Redis with premium tier, SKU family P and cache capacity 2. It may take approximately 5-10 minutes to provision the Redis instance.

In [None]:
!az redis create --name $redis_name --resource-group $resource_group_name --location $location --sku $sku --vm-size $size

In [None]:
featurestore_name = "myfs-redis"

Now create a managed feature store using the above Azure Cache for Redis instance as the online store.

In [None]:
# Specify online store
redis_arm_id = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.Cache/Redis/{redis_name}"
online_store = MaterializationStore(type="redis", target=redis_arm_id)

fs = FeatureStore(
    name=featurestore_name,
    location=location,
    online_store=online_store,
)

# wait for feature store creation
fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())

Alternatively, you can dump the feature store configuration to a YAML file and provision the feature store using CLI.

In [None]:
yaml_path = root_dir + "/featurestore/featurestore_with_online.yaml"
fs.dump(yaml_path)

featurestore_name = "myfs-redis-cli"

!az ml feature-store create --name $featurestore_name --subscription $subscription_id --resource-group $resource_group_name --file $yaml_path

### 1.5 Create a managed feature store using provided user-assigned identity (UAI) as the managed identity

In the following code cell, provide a name for the user-assigned managed identity that you would like to create or reuse.

In [None]:
uai_name = "myfsuai"

Execute the following cell to create the UAI. Skip this step if you are reusing a UAI.

In [None]:
!az identity create --subscription $subscription_id --resource-group $resource_group_name --location $location --name $uai_name

##### Retrieve UAI properties
The following code retrieves principal ID, client ID, and ARM ID property values for the created UAI.

In [None]:
from azure.mgmt.msi import ManagedServiceIdentityClient
from azure.identity import DefaultAzureCredential

msi_client = ManagedServiceIdentityClient(DefaultAzureCredential(), subscription_id)
managed_identity = msi_client.user_assigned_identities.get(
    resource_name=uai_name, resource_group_name=resource_group_name
)

uai_principal_id = managed_identity.principal_id
uai_client_id = managed_identity.client_id
uai_arm_id = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{uai_name}"

Now create a managed feature store using the above UAI as the managed identity for the feature store materialization jobs.

In [None]:
from azure.ai.ml.entities import ManagedIdentityConfiguration

featurestore_name = "myfs-uai"

materialization_identity = ManagedIdentityConfiguration(
    client_id=uai_client_id, principal_id=uai_principal_id, resource_id=uai_arm_id
)

fs = FeatureStore(
    name=featurestore_name,
    location=location,
    materialization_identity=materialization_identity,
)

# wait for feature store creation
fs_poller = ml_client.feature_stores.begin_create(fs)
print(fs_poller.result())

Alternatively, you can dump the feature store configuration to a YAML file and provision the feature store using CLI.

In [None]:
yaml_path = root_dir + "/featurestore/featurestore_with_uai.yaml"
fs.dump(yaml_path)

featurestore_name = "myfs-uai-cli"

!az ml feature-store create --name $featurestore_name --subscription $subscription_id --resource-group $resource_group_name --file $yaml_path

## 2. Create a managed feature store using Azure Resource Manager (ARM) template

Provisioning a managed feature store is an involved process as several resources are provisioned during the process. We recommend that you provision a managed feature store using CLI/SDK using configuration that fits your business scenario using one of the experience paths described above, and then download the deployment template, update the parameters, and deploy the ARM template directly.

- Provision a managed feature store, `my-featurestore`, using CLI.
- The CLI command will output the deployment URL, and the deployment is named as `my-featurestore-xxxxxxx`. Click on the deployment link.
- From the deployment link page, click on the `Template` tab from the left navigation panel.
- From the `Template` page, click on the `Download` button from the top navigation menu.
- You will get two JSON files, `template.json` and `parameters.json`.
- Update the `parameters.json` file with appropriate parameter values.
- Run this command to start deployment: `az deployment group create --name {your-deployment-name} --resource-group {your-resource-group} --template-file template.json --parameters parameters.json`
- Run this command to check deployment status: `az deployment group show --name {your-deployment-name}`

Under `featurestore/arm-template` folder, you will find an example ARM template. This is the template generated by provisioning a default feature store. We go through the `parameters.json` file in the following section to explain the parameters for several resources that are important to a managed feature store. 

### 2.1 Feature store section

This section describes feature store name, resource group, and location. Please update this section for your managed feature store. Note, feature store is a `featurestore` kind of workspace.
```json
"parameters": {
        "workspaceName": {
            "value": "my-featurestore"
        },
        "description": {
            "value": "my-featurestore"
        },
        "friendlyName": {
            "value": "my-featurestore"
        },
        "kind": {
            "value": "featurestore"
        },
        "location": {
            "value": "eastus2"
        },
        "resourceGroupName": {
            "value": "my-resource-group"
        }
}
```

### 2.2 Default storage Section

This section describes the default blob storage used for the feature store workspace. Note, the `storageAccountOption` value is `new`, which means the system will provision a new storage with the name and resource group etc. provided by you. 

You can also reuse an existing storage account. In this case, you need to set `storageAccountOption` to `existing`, and provide the storage account name and resource group.

```json
"parameters": {
        "storageAccountOption": {
            "value": "new"
        },
        "storageAccountName": {
            "value": "myfeaturestorestorage4116d3e469"
        },
        "storageAccountType": {
            "value": "Standard_LRS"
        },
        "storageAccountBehindVNet": {
            "value": "false"
        },
        "storageAccountResourceGroupName": {
            "value": "my-resource-group"
        },
        "storageAccountLocation": {
            "value": "eastus2"
        }
}
```

### 2.3 Key Vault section

The system will provision a new Key Vault, and the name is usually `feature store name + keyvault + a random string`. Note, the `KeyVaultName` cannot have any special characters.
```json
"parameters": {
        "keyVaultOption": {
            "value": "new"
        },
        "keyVaultName": {
            "value": "myfeaturestorekeyvaultf7a63f0f7"
        },
        "keyVaultBehindVNet": {
            "value": "false"
        },
        "keyVaultResourceGroupName": {
            "value": "my-resource-group"
        },
        "keyVaultLocation": {
            "value": "eastus2"
        }
}
```

### 2.4 Application Insights section
The system will provision a new Application Insights for this feature store. You can replace `myfeaturestore` with the name of your managed feature store, and use a random string as a suffix for the resource name as shown below.

```json
"parameters": {
        "applicationInsightsOption": {
            "value": "new"
        },
        "logAnalyticsName": {
            "value": "myfeaturestorelogalyti09a8f09c8"
        },
        "logAnalyticsArmId": {
            "value": "/subscriptions/my-subscription-id/resourceGroups/my-resource-group/providers/Microsoft.OperationalInsights/workspaces/myfeaturestorelogalyti09a8f09c8"
        },
        "applicationInsightsName": {
            "value": "myfeaturestoreinsights4684a3029"
        },
        "applicationInsightsResourceGroupName": {
            "value": "my-resource-group"
        },
        "applicationInsightsLocation": {
            "value": "eastus2"
        }
}
```


### 2.5 Offline store section

In this example, the `offline_store_storage_account_name` is same as the one for default blob storage. The system will create a container named `myfeaturestorecontaine8ebdffb55` inside this storage to use as the offline store. You will also need to define the `offline_store_connection_name` string at the end for connection name.


```json
"parameters": {
        "offlineStoreStorageAccountOption": {
            "value": "new"
        },
        "offline_store_storage_account_name": {
            "value": "myfeaturestorestorage4116d3e469"
        },
        "offline_store_container_name": {
            "value": "myfeaturestorecontaine8ebdffb55"
        },
        "offline_store_resource_group_name": {
            "value": "my-resource-group"
        },
        "offline_store_subscription_id": {
            "value": "my-subscription-id"
        },
        "offline_store_connection_name": {
            "value": "OfflineStoreConnectionName-b4bb8712"
        }
}
```

### 2.6 Managed identity section

The system will provision a user-assigned managed identity to-be used for feature store materialization jobs. You can choose to grant this managed identity neccessary permissions to feature store and offline store by setting `grant_materialization_permissions` to `true`.

```json
"parameters": {
        "materializationIdentityOption": {
            "value": "new"
        },
        "materialization_identity_name": {
            "value": "materialization-uai-dbebceefeb2c331aab47f9c1a4716666"
        },
        "materialization_identity_subscription_id": {
            "value": "my-subscription-id"
        },
        "materialization_identity_resource_group_name": {
            "value": "my-resource-group"
        },
        "grant_materialization_permissions": {
            "value": "true"
        }
}
```

### 2.7 Online store section

If you want to configure an online store, you can provide the Redis cluster information as follows. This example assumes you have a Redis cluster named `my-redis-name`.

```json
"parameters": {
        "online_store_resource_id": {
            "value": "/subscriptions/my-subscription-id/resourceGroups/my-resource-group/providers/Microsoft.Cache/Redis/my-redis-name"
        },
        "online_store_resource_group_name": {
            "value": "my-resource-group"
        },
        "online_store_subscription_id": {
            "value": "my-subscription-id"
        },
        "online_store_connection_name": {
            "value": "OnlineStoreConnectionName-42ace817"
        }
}
```

### 2.7 Feature store spark section

This section is specific to the feature store workspace. You can define a [supported Spark runtime version](https://learn.microsoft.com/azure/machine-learning/apache-spark-azure-ml-concepts#session-level-conda-packages). 

```json
"parameters": {
        "spark_runtime_version": {
            "value": "3.2.0"
        }
}
```

### 2.8 Default storage vs offline store storage account

The managed feature store workspace has a default storage account, which will hold your artifacts such as models and logs etc. The managed feature store workspace also has an offline storage which holds materialized feature values. You can use the same storage account for both. In the above example, for the default feature store provisioning experience the system creates a new ADLS Gen2 storage account, which is used as the default blob storage. In addition, the system will create an ADLS Gen2 storage container to hold materialized feature values.

Note that:
- If you choose to provide an existing storage account as the default blob storage for the feature store, the system will create a new ADLS Gen2 storage and a container to be used as the offline store.

- If you choose to provide an existing ADLS Gen2 container as the offline store, the system will create a new storage for use as the default blob storage (unless you also provide an existing storage for it).


#### 2.8.1 Provide an existing ADLS Gen2 storage account as the default blob storage

```json
"parameters": {
        "storageAccountOption": {
            "value": "existing"
        },
        "storageAccountName": {
            "value": "myexistingstorage4116d3e469"
        },
        "storageAccountType": {
            "value": "Standard_LRS"
        },
        "storageAccountBehindVNet": {
            "value": "false"
        },
        "storageAccountResourceGroupName": {
            "value": "my-resource-group"
        },
        "storageAccountLocation": {
            "value": "eastus2"
        },
        "offlineStoreStorageAccountOption": {
            "value": "new"
        },
        "offline_store_storage_account_name": {
            "value": "myfeaturestorestorage4116d3e469"
        },
        "offline_store_container_name": {
            "value": "myfeaturestorecontaine8ebdffb55"
        },
        "offline_store_resource_group_name": {
            "value": "my-resource-group"
        },
        "offline_store_subscription_id": {
            "value": "my-subscription-id"
        },
        "offline_store_connection_name": {
            "value": "OfflineStoreConnectionName-b4bb8712"
        }
}
```

#### 2.8.2 Provide an existing ADLS Gen2 storage container as the offline store

```json
"parameters": {
        "storageAccountOption": {
            "value": "new"
        },
        "storageAccountName": {
            "value": "myfeaturestorestorage4116d3e469"
        },
        "storageAccountType": {
            "value": "Standard_LRS"
        },
        "storageAccountBehindVNet": {
            "value": "false"
        },
        "storageAccountResourceGroupName": {
            "value": "my-resource-group"
        },
        "storageAccountLocation": {
            "value": "eastus2"
        },
        "offlineStoreStorageAccountOption": {
            "value": "existing"
        },
        "offline_store_storage_account_name": {
            "value": "myexistingstorage4116d3e469"
        },
        "offline_store_container_name": {
            "value": "myexistingcontaine8ebdffb55"
        },
        "offline_store_resource_group_name": {
            "value": "my-resource-group"
        },
        "offline_store_subscription_id": {
            "value": "my-subscription-id"
        },
        "offline_store_connection_name": {
            "value": "OfflineStoreConnectionName-b4bb8712"
        }
}
```

#### 2.8.3 Provide an existing ADLS Gen2 storage account for default blob storage and a container for offline store respectively

```json
"parameters": {
        "storageAccountOption": {
            "value": "existing"
        },
        "storageAccountName": {
            "value": "myexistingstorage1"
        },
        "storageAccountType": {
            "value": "Standard_LRS"
        },
        "storageAccountBehindVNet": {
            "value": "false"
        },
        "storageAccountResourceGroupName": {
            "value": "my-resource-group"
        },
        "storageAccountLocation": {
            "value": "eastus2"
        },
        "offlineStoreStorageAccountOption": {
            "value": "existing"
        },
        "offline_store_storage_account_name": {
            "value": "myexistingstorage2"
        },
        "offline_store_container_name": {
            "value": "myexistingcontaine8ebdffb55"
        },
        "offline_store_resource_group_name": {
            "value": "my-resource-group"
        },
        "offline_store_subscription_id": {
            "value": "my-subscription-id"
        },
        "offline_store_connection_name": {
            "value": "OfflineStoreConnectionName-b4bb8712"
        }
}
```

## Clean up

Use the following command to delete the feature stores that you have provisioned. Note, this command uses `--all-resources` to clean up all related resources that were provisioned as  the part of managed feature store provisioning process.

In [None]:
featurestore_name = "myfs"
!az ml feature-store delete --subscription $subscription_id --resource-group $resource_group_name --name $featurestore_name --all-resources -y