# Azure AutoML Demo
MLRun function for using Azure AutoML, Including the following handlers:
1. `init_experiment` -     Initialize workspace and experiment in Azure ML.
2. `init_compute` -        Initialize Azure ML compute target to run experiment.
3. `register_dataset` -    Register dataset object (can be also an Iguazio FeatureVector) in Azure ML.
4. `download_model` -      Download trained model from Azure ML to local filesystem.
5. `upload_model` -        Upload pre-trained model from local filesystem to Azure ML.
6. `submit_training_job` - Submit training job to Azure AutoML and download trained model when completed.
7. `automl_train` -        Whole training flow for Azure AutoML:
                           - Initializing workspace and experiment in Azure ML
                           - Registers dataset/feature vector,
                           - submits training job
                           - downloads trained model

## 1. Setup MLRun Project

Creating MLRun project

In [1]:
import os
import json
from os import path
import pandas as pd
import mlrun
import mlrun.feature_store as fstore
from mlrun import code_to_function, auto_mount
from mlrun.runtimes.utils import generate_function_image_name
# Set the base project name
project_name_base = 'azure-automl-iris-demo'

# Initialize the MLRun project object
project = mlrun.get_or_create_project(project_name_base, context="./", user_project=True)

# Display the current project name
project_name = project.metadata.name
print(f'Project name: {project_name}')

> 2021-11-24 15:18:37,573 [info] loaded project azure-automl-iris-demo from MLRun DB
Project name: azure-automl-iris-demo-yoni2


## 2. Preparing Dataset (Iris)

- Loading iris dataset and splitting into train and test sets.

- Preparing training URI for the MLRun function

In [2]:
DATA_URL = "https://s3.wasabisys.com/iguazio/data/iris/iris_dataset.csv"
iris_uri = 'v3io:///users/yoni2/azure-automl/iris.csv'
label_column_name = 'label' # target label

# Create Iris DataFrame:
iris_dataset = mlrun.get_dataitem(DATA_URL).as_df()

# Split to train, test:
train_data = iris_dataset.sample(frac=0.8, random_state=42)
test_data = iris_dataset.drop(train_data.index)
test_data_no_target = test_data.drop(columns=["label"])

train_file = "iris.csv"
train_data.to_csv(train_file, index=False, header=True)

# Showing dataframe from URI:
mlrun.get_dataitem(iris_uri).show()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),label
0,6.1,2.8,4.7,1.2,1
1,5.7,3.8,1.7,0.3,0
2,7.7,2.6,6.9,2.3,2
3,6.0,2.9,4.5,1.5,1
4,6.8,2.8,4.8,1.4,1
...,...,...,...,...,...
115,6.9,3.1,5.4,2.1,2
116,5.9,3.0,4.2,1.5,1
117,6.5,3.0,5.2,2.0,2
118,5.7,2.6,3.5,1.0,1


## 3. Submit Azure AutoML Training Job

### Submit Azure Secrets
For more information about working with secrets see:  [MLRun docs: Working with secrets](https://docs.mlrun.org/en/latest/secrets.html)

Fill your Azure secrets and run the block below **once** for providing securely MLRun your Azure secrets. (`secrets_uploaded = False`).

After running this block delete your secrets and set `secrets_uploaded = True`.

In [3]:
# Do once and remove for security:
secrets_uploaded = True
if not secrets_uploaded: 
    # Fill Azure secrets:
    secrets = {"AZURE_TENANT_ID": "",
               "AZURE_SERVICE_PRINCIPAL_ID": "",
               "AZURE_SERVICE_PRINCIPAL_PASSWORD": "",
               "AZURE_SUBSCRIPTION_ID": "",
               "AZURE_RESOURCE_GROUP": "",
               "AZURE_WORKSPACE_NAME": "",
               "AZURE_STORAGE_CONNECTION_STRING": ""}
    
    
    # Upload secrets to MLRun:
    mlrun.get_run_db().create_project_secrets(
        project_name,
        provider=mlrun.api.schemas.SecretProviderName.kubernetes,
        secrets=secrets
    )

In [4]:
secrets_spec = mlrun.new_task().with_secrets('kubernetes', ['AZURE_TENANT_ID',
                                                            'AZURE_SERVICE_PRINCIPAL_ID',
                                                            'AZURE_SERVICE_PRINCIPAL_PASSWORD',
                                                            'AZURE_SUBSCRIPTION_ID',
                                                            'AZURE_RESOURCE_GROUP',
                                                            'AZURE_WORKSPACE_NAME',
                                                            'AZURE_STORAGE_CONNECTION_STRING'])

### Import azureml_utils from marketplace



In [5]:
marketplace = False

if marketplace:
    # Importing serving function from marketplace:
    azureml_fn = mlrun.import_function('hub://azureml_utils')

else:
    azureml_fn = code_to_function(
        name="azure",
        project=project_name,
        filename="azure_automl.py",
        kind="job",
        image="mlrun/mlrun",
        requirements="requirements.txt",
        with_doc=False
    ).apply(auto_mount())
    
    # Build Docker image (if not already built)
    azureml_fn.deploy(skip_deployed=True)
    azureml_fn.spec.image = generate_function_image_name(azureml_fn)
    azureml_fn.export()
    azureml_fn = mlrun.import_function('function.yaml')


> 2021-11-24 15:18:38,387 [info] function spec saved to path: function.yaml


### Automl configuration & run parameters

- The `automl_settings` object is the setup for Azure AutoML. It holds the `task` type, number of  models to train - `iterations`, the desired metric - `primary metric`, the allowed types of models `allowed_models` and more.

- The `params` are the parameters for the MLRun function, such as experiment (`experiment_name`) and cpu cluster (`cpu_cluster_name`) names in AzureML, dataset properties for registration, target label for training - `label_column_name`, number of models to download `save_n_models` and more.

In [6]:
# Configure automl settings:
automl_settings = json.dumps({
            "task": 'classification',
            "debug_log": 'automl_errors.log',
#             "experiment_exit_score" : 0.9,
            "enable_early_stopping": False,
            "allowed_models": ['LogisticRegression', 'SGD', 'SVM'],
            "iterations": 2,
            "iteration_timeout_minutes": 2,
            "max_concurrent_iterations": 2,
            "max_cores_per_iteration": -1,
            "n_cross_validations": 5,
            "primary_metric": 'accuracy',
            "featurization": 'off',
            "model_explainability": False,
            "enable_voting_ensemble": False,
            "enable_stack_ensemble": False
        })

# Setting params to azure_run function:
params = {
    "experiment_name": 'azure-automl-test',
    "cpu_cluster_name": 'azureml-cpu',
    "dataset_name": 'iris',
    "dataset_description": 'iris training data',
    "label_column_name": label_column_name,
    "create_new_version": True,
    "register_model_name": "iris-model",
    "save_n_models": 2,
    "automl_settings": automl_settings
}

### Run Azure AutoML train:

This MLRun function will perform the following:
- Initialize workspace and experiment in your AzureML
- Register the dataset/feature vector to Iguazio and to AzureML.
- Submit the training job to AzureML and print the live training results fro each model
- Generate the top trained models.

In [None]:
azureml_run = azureml_fn.run(
    runspec=secrets_spec,
    handler="automl_train",
    inputs={"training_data_uri": DATA_URL},
    params=params,
)


> 2021-11-24 15:18:38,432 [info] starting run azure-automl_train uid=2741532296df46729bb48bedb571ef3e DB=http://mlrun-api:8080
> 2021-11-24 15:18:38,649 [info] Job is running in the background, pod: azure-automl-train-7wmzr
Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (azure-identity 1.7.1 (/usr/local/lib/python3.7/site-packages), Requirement.parse('azure-identity<1.5.0,>=1.2.0'), {'azureml-dataprep'}).
> 2021-11-24 15:18:45,365 [info] Loading AzureML Workspace
> 2021-11-24 15:18:47,768 [info] Initializing AzureML experiment
> 2021-11-24 15:18:49,025 [info] Initializing AzureML compute target
> 2021-11-24 15:18:49,230 [info] Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
> 2021-11-24 15:18:49,353 [info] Setting up experiment parameters
> 2021-11-24 15:18:53,743 [info] Submitting and running ex

## 4. Deploying the Model-Serving Function

### Importing `v2_model_server` function from marketplace for serving the model

Firstly we collect the model paths from our run object and getting the best model.

Then importing the serving function from marketplace and adding our best model to the serving function.

In [None]:
# Get trained models:
model_paths = [azureml_run.outputs[key] for key in azureml_run.outputs.keys() if "model" in key]
best_model_path = model_paths[0]

# Data for testing:
data_to_test = test_data_no_target.sample(5).values.tolist()
my_data = {'inputs': data_to_test}

model_name = 'best_model'

# Importing serving function from marketplace:
serving_fn = mlrun.import_function('hub://v2_model_server')
serving_fn.add_model(model_name, model_path=best_model_path)

### Building and Deploying the Serving Function

In [None]:
function_address = serving_fn.deploy()

## 5. Using the Live Model-Serving Function

In [None]:
print (f'The address for the function is {function_address} \n')

!curl $function_address

After deploying the serving function with the required model we can make prediction:

In [None]:
serving_fn.invoke(f'/v2/models/{model_name}/infer', my_data)

## 6. Clean up

For cleaning up AzureML resources see:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-auto-train-models#clean-up-resources