**<center><h1>Introduction</h1></center>**


If you choose to train models using Azure Databricks and track your work using MLflow, you can add an integration with Azure Machine Learning to store model training metrics and artifacts and keep a clear overview of your work. Using Azure Machine Learning as the backend for your MLflow experiments that run on Azure Databricks compute gives you the benefit of having a centralized and scalable workspace where you can access all your assets to run experiments or review them. In this module, you will learn about the integration between all these products and how you can manage your work from the Azure Machine Learning workspace.

**<h2>Learning Objectives</h2>**

After completing this module, you'll be able to:

- Describe Azure Machine Learning.
- Run an experiment.
- Log metrics with MLflow.
- Run Pipeline Step on Databricks Compute.

<hr>

**<center><h1>Describe Azure Machine Learning</h1></center>**

Azure Machine Learning is a platform for operating machine learning workloads in the cloud.

<img src="images/04-01-01-what-azure-machine-learning.jpg" />

Built on the Microsoft Azure cloud platform, Azure Machine Learning enables you to manage:

- Scalable on-demand compute for machine learning workloads.
- Data storage and connectivity to ingest data from a wide range of sources.
- Machine learning workflow orchestration to automate model training, deployment, and management processes.
- Model registration and management, so you can track multiple versions of models and the data on which they were trained.
- Metrics and monitoring for training experiments, datasets, and published services.
-  Model deployment for real-time and batch inferencing.





<hr>

**<center><h1>Run Azure Databricks experiments in Azure Machine Learning</h1></center>**

[MLflow](https://www.mlflow.org/) is an open-source library for managing the life cycle of your machine learning experiments. [MLFlow Tracking](https://mlflow.org/docs/latest/quickstart.html#using-the-tracking-api) is a component of MLflow that logs and tracks your training run metrics and model artifacts, no matter your experiment's environment.

A recommended approach for running Azure Machine Learning (AML) Experiments on Azure Databricks cluster is to use MLflow Tracking and connect Azure Machine Learning as the backend for MLflow experiments.

The following diagram illustrates that with MLflow Tracking, you track an experiment's run metrics and store model artifacts in your Azure Machine Learning workspace.

<img src="images/04-01-02-mlflow-diagram.png" />

**<h2>Track AML Experiments in Azure Databricks</h2>**

When running AML experiments in Azure Databricks, there are three key steps:

1. Configure MLflow tracking URI to use AML.
2. Configure a MLflow experiment.
3. Run your experiment.



**<h3>1. Configure MLflow tracking URI to use AML</h3>**

In order to configure MLflow Tracking and connect Azure Machine Learning as the backend for MLFlow experiments, you need to follow these steps as shown in the code snippet:

- Get your AML workspace object.
- From your AML workspace object, get the unique tracking URI address.
- Setup MLflow tracking URI to point to AML workspace.

```
import mlflow
from azureml.core import Workspace

# Get your AML workspace
ws = Workspace.from_config()

# Get the unique tracking URI address to the AML workspace
tracking_uri = ws.get_mlflow_tracking_uri()

# Set up MLflow tracking URI to point to AML workspace
mlflow.set_tracking_uri(tracking_uri)
```

**<h3>2. Configure a MLflow experiment</h3>**

Provide the name for the MLflow experiment as shown below. Note that the same experiment name will appear in Azure Machine Learning.
```
experiment_name = 'MLflow-AML-Exercise'
mlflow.set_experiment(experiment_name)
```
**<h3>3. Run your experiment</h3>**

Once the experiment is set up, you can start your training run with ```start_run()``` as shown below:
```
with mlflow.start_run() as run:
    ...
    ...
```
Your model training and logging code are provided within the ```with``` block.







<hr>

**<center><h1>Log metrics in Azure Machine Learning with MLflow</h1></center>**

In the previous unit, we discussed how to set up Azure Machine Learning as the backend for MLflow experiments. We also looked at how to start your model training on Azure Databricks as a MLflow experiment. In this section, we will look at how to log model metrics and artifacts to the MLflow logging API. These logged metrics and artifacts are then captured in an Azure Machine Learning workspace that provides a centralized, secure, and scalable location to store training metrics and artifacts.

In your MLflow experiment, once you train and evaluate your model, you can use the MLflow logging API, ```mlflow.log_metric()```, to start logging your model metrics as shown below:

```
with mlflow.start_run() as run:
    ...
    ...
    # Make predictions on hold-out data
    y_predict = clf.predict(X_test)
    y_actual = y_test.values.flatten().tolist()

    # Evaluate and log model metrics on hold-out data
    rmse = math.sqrt(mean_squared_error(y_actual, y_predict))
    mlflow.log_metric('rmse', rmse)
    mae = mean_absolute_error(y_actual, y_predict)
    mlflow.log_metric('mae', mae)
    r2 = r2_score(y_actual, y_predict)
    mlflow.log_metric('R2 score', r2)
```


Next, you can use MLflow’s ```log_artifact()``` API to save model artifacts such as your Predicted vs True curve as shown:


```
import matplotlib.pyplot as plt

with mlflow.start_run() as run:
    ...
    ...
    plt.scatter(y_actual, y_predict)
    plt.savefig("./outputs/results.png")
    mlflow.log_artifact("./outputs/results.png")
```

**<h2>Reviewing experiment metrics and artifacts in Azure ML Studio</h2>**

Since Azure Machine Learning is set up as the backend for MLflow experiments, you can review all the training metrics and artifacts from within the Azure Machine Learning Studio. From within the studio, navigate to the ```Experiments``` tab, and open the experiment run that corresponds to the MLflow experiment. In the ```Metrics``` tab of the run, you will observe the model metrics that were logged via MLflow tracking APIs.

<img src="images/04-01-03-01-azure-machine-learning-metrics.png" />

Next, when you open the ```Outputs + logs``` tab you will observe the model artifacts that were logged via MLflow tracking APIs.

<img src="images/04-01-03-01-azure-machine-learning-artifacts.png" />

In summary, using MLflow integration with Azure Machine Learning, you can run experiments in Azure Databricks and leverage Azure Machine Learning workspace capabilities of centralized, secure, and scalable solution to store model training metrics and artifacts.



<hr>

**<center><h1>Run Azure Machine Learning pipelines on Azure Databricks compute</h1></center>**

Azure Machine Learning supports multiple types of compute for experimentation and training. Specifically, you can run an **Azure Machine Learning pipeline** on Databricks compute.

**<h2>What is an Azure Machine Learning pipeline?</h2>**

In Azure Machine Learning, a pipeline is a workflow of machine learning tasks in which each task is implemented as a step. Steps can be arranged sequentially or in parallel, enabling you to build sophisticated flow logic to orchestrate machine learning operations. Each step can be run on a specific compute target, making it possible to combine different types of processing as required to achieve an overall goal.

**<h2>Running pipeline step on Databricks Compute</h2>**

Azure Machine Learning supports a specialized pipeline step called DatabricksStep with which you can run a notebook, script, or compiled JAR on an Azure Databricks cluster. In order to run a pipeline step on a Databricks cluster, you need to do the following steps:

1. Attach Azure Databricks Compute to Azure Machine Learning workspace.
2. Define DatabricksStep in a pipeline.
3. Submit the pipeline.

**<h3>Attaching Azure Databricks Compute</h3>**

The following code example can be used to attach an existing Azure Databricks cluster:
```
from azureml.core import Workspace
from azureml.core.compute import ComputeTarget, DatabricksCompute

# Load the workspace from the saved config file
ws = Workspace.from_config()

# Specify a name for the compute (unique within the workspace)
compute_name = 'db_cluster'

# Define configuration for existing Azure Databricks cluster
db_workspace_name = 'db_workspace'
db_resource_group = 'db_resource_group'
# Get the access token from the Databricks workspace
db_access_token = '1234-abc-5678-defg-90...' 
db_config = DatabricksCompute.attach_configuration(resource_group=db_resource_group,
                                                   workspace_name=db_workspace_name,
                                                   access_token=db_access_token)

# Create the compute
databricks_compute = ComputeTarget.attach(ws, compute_name, db_config)
databricks_compute.wait_for_completion(True)
```

**<h3>Defining DatabricksStep in a pipeline</h3>**

To create a pipeline, you must first define each step and then create a pipeline that includes the steps. The specific configuration of each step depends on the step type. For example, the following code defines a **DatabricksStep** step to run a python script, ```process_data.py```, on the attached Databricks compute.
```
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import DatabricksStep

script_directory = "./scripts"
script_name = "process_data.py"

dataset_name = "nyc-taxi-dataset"

spark_conf = {"spark.databricks.delta.preview.enabled": "true"}

databricksStep = DatabricksStep(name = "process_data", 
                                run_name = "process_data", 
                                python_script_params=["--dataset_name", dataset_name],  
                                spark_version = "7.3.x-scala2.12", 
                                node_type = "Standard_DS3_v2", 
                                spark_conf = spark_conf, 
                                num_workers = 1, 
                                python_script_name = script_name, 
                                source_directory = script_directory,
                                pypi_libraries = [PyPiLibrary(package = 'scikit-learn'), 
                                                  PyPiLibrary(package = 'scipy'), 
                                                  PyPiLibrary(package = 'azureml-sdk'), 
                                                  PyPiLibrary(package = 'azureml-dataprep[pandas]')], 
                                compute_target = databricks_compute, 
                                allow_reuse = False
                               )
```
The above step defines the configuration to create a new Databricks job cluster to run the Python script. The cluster is created on the fly to run the script and the cluster is subsequently deleted after the step execution is completed.



**<h3>Submit the pipeline</h3>**

After defining the step, you can assign it to a pipeline, and run it as an experiment:
```
from azureml.pipeline.core import Pipeline
from azureml.core import Experiment

# Construct the pipeline
pipeline = Pipeline(workspace = ws, steps = [databricksStep])

# Create an experiment and run the pipeline
experiment = Experiment(workspace = ws, name = "process-data-pipeline")
pipeline_run = experiment.submit(pipeline)
```




<hr>

**<center><h1>Exercise - Use Azure Databricks with Azure Machine Learning</h1></center>**

Now, you will run experiments in Azure Machine Learning from Azure Databricks.

In this exercise, you will:

- Running an Azure ML experiment on Databricks.
- Reviewing experiment metrics in Azure ML Studio.


**<h2>Instructions</h2>**

Follow these instructions to complete the exercise:

1. Open the exercise instructions at https://aka.ms/mslearn-dp090.
2. Complete the **Running experiments in Azure Machine Learning** exercises.




<hr>