**<center><h1>Introduction</h1></center>**

When doing machine learning tasks in Azure Databricks, you can use MLflow to track and review your work. In this module, you will learn what MLflow is and how you can use its various features.


**<h2>Learning Objectives</h2>**

After completing this module, you’ll be able to:

- Describe the capabilities of MLflow.
- Describe MLflow terms.
- Start a run in MLflow.

<hr>

**<center><h1>Understand capabilities of MLflow</h1></center>**

**MLflow** is an open-source product designed to manage the Machine Learning development lifecycle. That is, MLflow allows data scientists to train models, register those models, deploy the models to a web server, and manage model updates.


**<h2>The importance of MLflow</h2>**

MLflow is an important part of machine learning with Azure Databricks, as it integrates key operational processes with the Azure Databricks interface. MLflow makes it easy for data scientists to train models and make them available without writing a great deal of code.

As a side note, MLflow will also operate on workloads outside of Azure Databricks. The examples in this module will all use Azure Databricks but this is not a requirement.

**<h2>MLflow product components</h2>**

There are four components to MLflow:

- MLflow Tracking
- MLflow Projects
- MLflow Models
- MLflow Model Registry

**<h2>MLflow Tracking</h2>**

MLflow Tracking allows data scientists to work with experiments. For each run in an experiment, a data scientist may log parameters, versions of libraries used, evaluation metrics, and generated output files when training machine learning models.

MLflow Tracking provides the ability to audit the results of prior model training executions.


<img src="images/03-01-01-parameters.png" />


**<h2>MLflow Projects</h2>**

An MLflow Project is a way of packaging up code in a manner, which allows for consistent deployment and the ability to reproduce results. MLflow supports several environments for projects, including via Conda, Docker, and directly on a system.


**<h2>MLflow Models</h2>**

MLflow offers a standardized format for packaging models for distribution. This standardized model format allows MLflow to work with models generated from several popular libraries, including ```scikit-learn```, ```Keras```, ```MLlib```, ```ONNX```, and more. Review the [MLflow Models documentation](https://mlflow.org/docs/latest/models.html) for information on the full set of supported model flavors.

**<h2>MLflow Model Registry</h2>**

The MLflow Model Registry allows data scientists to register models in a registry.

<img src="images/03-01-01-registry.png" />

From there, MLflow Models and MLflow Projects combine with the MLflow Model Registry to allow operations team members to deploy models in the registry, serving them either through a REST API or as part of a batch inference solution using Azure Databricks.

<img src="images/03-01-01-deploy.png" />






<hr>

**<center><h1>Use MLflow terminology</h1></center>**

There are several terms, which will be important to understand when working with MLflow. Most of these terms are fairly common in the data science space. Other products, such as Azure Machine Learning, use very similar terminology to allow for simplified cross-product development of skills. The following sections include key terms and concepts for each MLflow product.

**<h2>MLflow Tracking</h2>**

MLflow Tracking is built around **runs**, that is, executions of code for a data science task. Each run contains several key attributes, including:

- **Parameters:** Key-value pairs, which represent inputs. Use parameters to track **hyperparameters**, that is, inputs to functions, which affect the machine learning process.
- **Metrics:** Key-value pairs, which represent how the model is performing. This can include evaluation measures such as Root Mean Square Error, and metrics can be updated throughout the course of a run. This allows a data scientist, for example, to track Root Mean Square Error for each epoch of a neural network.
- **Artifacts:** Output files. Artifacts may be stored in any format, and can include models, images, log files, data files, or anything else, which might be important for model analysis and understanding.

These runs can be combined together into **experiments**, which are intended to collect and organize runs. For example, a data scientist may create an experiment to train a classifier against a particular data set. Each run might try a different algorithm or different set of inputs. The data scientist can then review the individual runs in order to determine which run generated the best model.

**<h2>MLflow Projects</h2>**

A project in MLflow is a method of packaging data science code. This allows other data scientists or automated processes to use the code in a consistent manner.

Each project includes at least one **entry point**, which is a file (either **.py** or **.sh**) that is intended to act as the starting point for project use. Projects also specify details about the **environment**. This includes the specific packages (and versions of packages) used in developing the project, as new versions of packages may include breaking changes.

**<h2>MLflow Models</h2>**

A **model** in MLflow is a directory containing an arbitrary set of files along with an MLmodel file in the root of the directory.

MLflow allows models to be of a particular **flavor**, which is a descriptor of which tool or library generated a model. This allows MLflow to work with a wide variety of modeling libraries, such as ```scikit-learn```, ```Keras```, ```MLlib```, ```ONNX```, and many more. Each model has a **signature**, which describes the expected inputs and outputs for the model.

**<h2>MLflow Model Registry</h2>**
The MLflow Model Registry allows a data scientist to keep track of a **model** from MLflow Models. In other words, the data scientist **registers** a model with the Model Registry, storing details such as the name of the model. Each registered model may have multiple **versions**, which allow a data scientist to keep track of model changes over time.

It is also possible to **stage** models. Each model version may be in one stage, such as **Staging, Production, or Archived**. Data scientists and administrators may **transition** a model version from one stage to the next.



<hr>

**<center><h1>Run experiments</h1></center>**

MLflow experiments allow data scientists to track training runs in a collection called an **experiment**. This is useful for comparing changes over time or comparing the relative performance of models with different hyperparameter values.

Creating an experiment in Azure Databricks happens automatically when you start a run. Here is an example of starting a run in MLflow, logging two parameters, and logging one metric:

```
with mlflow.start_run():
    mlflow.log_param("input1", input1)
    mlflow.log_param("input2", input2)
    # Perform operations here like model training.
    mlflow.log_metric("rmse", rmse)

```


In this case, the experiment's name will be the name of the notebook. It is possible to export a variable named ```MLFLOW_EXPERIMENT_NAME``` to change the name of your experiment should you choose.


**<h2>Reviewing Experiments</h2>**

Inside a notebook, the **Experiment** menu option displays a context bar, which includes information on runs of the current experiment.
<img src="images/03-01-03-experiment.png" />
Selecting the External Link icon in the experiment run will provide additional details on a particular run.
<img src="images/03-01-03-external-link.png" />
This link will provide the information that MLflow Tracker logged, including notes, parameters, metrics, tags, and artifacts.
<img src="images/03-01-03-run.png" />




<hr>

**<center><h1>Exercise - Use MLflow to track an experiment</h1></center>**

Now, it's your chance to use Azure Databricks and MLflow to run an experiment and track the results of different experimental tests.

In this exercise, you will:

- Running an experiment.
- Reviewing experiment metrics.

**<h2>Instructions</h2>**

Follow these instructions to complete the exercise:

1. Open the exercise instructions at https://aka.ms/mslearn-dp090.
2, Complete the **Using MLflow to Track Experiments** exercise.




<hr>