# MLflow Experiment Tracking 
MLflow tracking is a powerful tool for logging and organizing machine learning experiments. It provides a centralized repository to log parameters, metrics, artifacts, and code versions. Here are some key concepts:

- **Experiment**: A named process, typically representing a machine learning workflow, that can contain multiple runs.
- **Run**: A single execution of a script or piece of code within an experiment.
- **Parameters**: Input values to a run, such as hyperparameters.
- **Metrics**: Output values or performance indicators logged during a run.
- **Artifacts**: Output files, such as models or plots, logged during a run.

By using MLflow, teams can effectively track and reproduce experiments, facilitating collaboration and model reproducibility.

## Exercise Overview
In this exercise, we'll explore how to leverage MLflow to log and organize metrics, parameters, and artifacts in the context of machine learning workflows. We also look at how trained models can be automatically saved as Mlflow models. These models are available via the Mlflow Registry and can be retrieved from the Model Registry via a reference if required. Last but not least, let's look at how we can also log the datasets used for a run. If we are using DVC, the currently checked out Git commit of the DVC repo could also be logged as a tag. 

## 1) - Logging Metrics and Parameters with MLflow
> *Note:* The tracking server can be reached via the URL `http://localhost:5001`.

In this exercise, we will practice using MLflow to log metrics and parameters in a machine learning workflow.
We will use the same functions as we used in the dagster ops job exercise just with little adjustments.

As in the Dagster exercise, the places in the code where something needs to be added are marked with `#...`.

### Part 1: Create an experiment and start a run
Before you can start the exercise, we need to import some packages. The package `mlflow` is particularly important for tracking experiments with Mlflow. 

In [None]:
import time
from typing import Tuple

import matplotlib.pyplot as plt
import mlflow
import numpy as np
import pandas as pd
from sklearn.metrics import (
    ConfusionMatrixDisplay,
    classification_report,
    confusion_matrix,
    precision_score,
)
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from xgboost.callback import TrainingCallback

You can also use MLflow without an active MLflow tracking server. In this case, MLflow saves all data in a folder in the current root directory of the project.
For our exercise, however, we use a tracking server. 

Please set the variable `mlflow_tracking_uri` as the tracking URI to be used by MLflow. 

In [None]:
mlflow_tracking_uri = "http://mlflow:5001"
mlflow.set_tracking_uri(mlflow_tracking_uri)

Before you can start logging metrics and parameters, you must first create an MLflow experiment. The experiment is unique via its name. If an experiment with the same name is to be created several times, an exception is thrown. This is intercepted directly by the following code block. In the following, we refer to the experiment via the experiment ID (`exp_id`).

Please create an experiment with the name `Spotify genre classification`. 

In [None]:
try:
    exp_id = mlflow.create_experiment(
        name="Spotify genre classification"
    )
except mlflow.exceptions.RestException:
    exp_id = mlflow.get_experiment_by_name(
        name="Spotify genre classification"
    ).experiment_id

There are several ways in which a run can be started and ended.
Please start a run with the previously created experiment ID (`exp_id`) and then end the run again.

In [None]:
_ = mlflow.start_run(experiment_id=exp_id)

In [None]:
mlflow.end_run()

The `with` statement can be used to start a run that is automatically ended as soon as the content of the with statement has been processed. Please start a run with the with statement.

In [None]:
with mlflow.start_run(experiment_id=exp_id):
    time.sleep(2)

Now that we have created an experiment and performed two runs, open the web UI of the [Mlflow tracking server](http://localhost:5001) and take a look at the experiment and the runs. Admittedly, it is still relatively empty at the moment, but this will change shortly.

### Part 2: Log parameters and metrics
Now that we have familiarized ourselves with the tracking server and starting runs, it is time to log the first parameters and metrics. A parameter can be anything that makes up the current run. This includes, for example, hyperparameters that you want to optimize over time. It can be very helpful to log parameters to compare runs over time and determine the causes of better or worse model performance. 

Metrics, in turn, help us gain a better understanding of the performance of the models created in the runs. This includes metrics and information collected during or after model training.

> *Note*: Please complete the tasks in this part and then run the cells one by one. 

In [None]:
_ = mlflow.start_run(experiment_id=exp_id)

We have identified the size of the test set as an interesting parameter. Please log the size of the test set (`test_size`) as a parameter under the name (`key`) "Test size".

In [None]:
def split_data() -> Tuple[pd.DataFrame, pd.DataFrame, pd.Series, pd.Series, pd.Series]:
    data = pd.read_csv("./data/genres_standardized.csv", sep=";")
    columns = list(data.columns)
    columns.remove("genre")
    data["genre"] = data["genre"].astype("category")
    data["target"] = data["genre"].cat.codes
    test_size = 0.2
    mlflow.log_param(key="Test size", value=test_size)
    X_train, X_test, y_train, y_test = train_test_split(
        data[columns], data["target"], test_size=test_size
    )
    return X_train, X_test, y_train, y_test, data["genre"]

In [None]:
input_train, input_test, target_train, target_test, target_names = split_data()

The following `TrainingCallback` complements the code known from the Dagster exercise. This callback is used to log a metric during the training of the classifier. 
Please log the metric `metric_value` with the key `metric_name`. Set the step parameter of the `log_metric` function to `epoch`.

Then execute the code from `Part 2`.

In [None]:
class MlflowCallback(TrainingCallback):
    def after_iteration(self, model, epoch, evals_log) -> bool:
        for data, metric in evals_log.items():
            for metric_name, log in metric.items():
                metric_value = sum(log) / len(log)
                mlflow.log_metric(
                    key=metric_name, value=metric_value, step=epoch
                )
        return False

In [None]:
def train_classifier(
    input_train: pd.DataFrame, target_train: pd.Series
) -> XGBClassifier:
    number_of_estimators: int = 100
    learning_rate: float = 0.1
    max_depth: int = 8
    min_child_weight: float = 1.0
    gamma: float = 0
    number_of_jobs: int = 4

    model = XGBClassifier(
        learning_rate=learning_rate,
        n_estimators=number_of_estimators,
        max_depth=max_depth,
        min_child_weight=min_child_weight,
        gamma=gamma,
        n_jobs=number_of_jobs,
        callbacks=[MlflowCallback()],
    )
    model.fit(
        input_train, target_train, eval_set=[(input_train, target_train)], verbose=False
    )

    return model

In [None]:
classifier = train_classifier(input_train, target_train)
mlflow.end_run()

Open the web UI of the [Mlflow Tracking Server](http://localhost:5001) again and view the runs.

## 2) Log models, artifacts and datasets
Now that we can create experiments and have performed the first runs where parameters and metrics are logged, it's time to log more complex data / larger data. This includes models, artifacts and datasets. In general, you can save any type of file as an artifact, including models and datasets. However, there are advantages for saving datasets and models explicitly as models or datasets. 

### Part 1: Log models
We start by logging a trained classifier as a model. Mlflow offers a variety of model falvors that can be used to log models as Mlflow models and store them in a registry: 

* `Python Function (python_function)`
* `R Function (crate)`
* `H2O (h2o)`
* `Keras (keras)`
* `MLeap (mleap)`
* `PyTorch (pytorch)`
* `Scikit-learn (sklearn)`
* `Spark MLlib (spark)`
* `TensorFlow (tensorflow)`
* `ONNX (onnx)`
* `MXNet Gluon (gluon)`
* `XGBoost (xgboost)`
* `LightGBM (lightgbm)`
* `CatBoost (catboost)`
* `Spacy(spaCy)`
* `fastai(fastai)`
* `Statsmodels (statsmodels)`
* `Prophet (prophet)`
* `Pmdarima (pmdarima)`
* `OpenAI (openai) (Experimental)`
* `LangChain (langchain) (Experimental)`
* `John Snow Labs (johnsnowlabs) (Experimental)`
* `Diviner (diviner)`
* `Transformers (transformers) (Experimental)`
* `SentenceTransformers (sentence_transformers) (Experimental)`

The classifier that we have trained is an `XGBoostClassifier`. To log our classifier, we use the function `log_model` from the package `mlflow.xgboos`. 

Please execute the cells from `Part 1` to start a run where the trained classifier is logged.

In [None]:
_ = mlflow.start_run(experiment_id=exp_id)

In [None]:
classifier = train_classifier(input_train, target_train)
mlflow.xgboost.log_model(classifier, "spotify_genre_classifier")

In [None]:
mlflow.end_run()

Open the web UI of the [Mlflow Tracking Server](http://localhost:5001) again and view the runs.

### Part 2: Log artifacts
As already mentioned, any data can be logged as artifacts for a run. In the following example, we want to save both the confusion matrix and the classification report as a file and then log them as artifacts. 

Please complete the code so that both files are logged as artifacts. Then execute the code from `Part 2` cell by cell. 

In [None]:
_ = mlflow.start_run(experiment_id=exp_id)

In [None]:
def predict(classifier: XGBClassifier, input_test: pd.DataFrame) -> np.ndarray:
    predictions = classifier.predict(input_test)
    return predictions

In [None]:
predictions = predict(classifier=classifier, input_test=input_test)

In [None]:
def analyze(
    target_test: pd.Series,
    predictions: np.ndarray,
    target_names: pd.Series,
):
    category_labels = target_names.cat.categories
    fig, ax = plt.subplots(figsize=(10, 10))
    ConfusionMatrixDisplay.from_predictions(
        target_test, predictions, ax=ax, display_labels=category_labels
    )
    ax.tick_params(axis="x", labelrotation=70, labelbottom=True)
    fig.savefig("./data/confusion_materix.png", pad_inches=20)
    report = classification_report(target_test, predictions, output_dict=True)
    df_classification_report = pd.DataFrame(report).transpose()
    df_classification_report.to_csv("./data/classification_report.csv")
    mlflow.log_artifact("./data/classification_report.csv")
    mlflow.log_artifact("./data/confusion_materix.png")

In [None]:
analyze(target_test=target_test, predictions=predictions, target_names=target_names)
mlflow.end_run()

Open the web UI of the [Mlflow Tracking Server](http://localhost:5001) again and view the runs.

### Part 3: Log datasets
Last but not least, we want to log datasets for our runs. To do this, we first load the dataset, which is available as a CSV file, as a Pandas DataFrame. With MLflow, it is possible to create an MLflow dataset based on a DataFrame. The target can be specified as a parameter, in our case this is `genre`. Please create an MLflow dataset and log it as a dataset for a run.

In [None]:
_ = mlflow.start_run(experiment_id=exp_id)
data = pd.read_csv("./data/genres_standardized.csv", sep=";")
dataset = mlflow.data.from_pandas(data, targets="genre")
mlflow.log_input(dataset)
mlflow.end_run()

Open the web UI of the [Mlflow Tracking Server](http://localhost:5001) again and view the runs.

### Part 4: All together
To complete this exercise, we would like to combine the different functions and implementations and run them as one big run where everything is logged. Please paste the code to log the dataset and classifier into the cell below, run the code and view the result via the [Mlflow Tracking Server](http://localhost:5001) web UI.

In [None]:
with mlflow.start_run(experiment_id=exp_id):
    data = pd.read_csv("./data/genres_standardized.csv", sep=";")
    dataset = mlflow.data.from_pandas(data, targets="genre")
    mlflow.log_input(dataset)
    input_train, input_test, target_train, target_test, target_names = split_data()
    classifier = train_classifier(input_train, target_train)
    mlflow.xgboost.log_model(classifier, "spotify_genre_classifier")
    predictions = predict(classifier=classifier, input_test=input_test)
    analyze(target_test=target_test, predictions=predictions, target_names=target_names)


## 3) Combine Mlflow with Dagster
We have prepared another notebook (`/notebooks/dagster/dagster_exercise_ops_job_mlflow.ipynb`) for you to use Dagster's MLflow integration to automatically create a MLflow run of the Dagster pipelines.
To do this, you need to add `required_resource_keys={"mlflow"}` to each `op` decorator when mlflow is used for logging in the OP. This will ensure that the Dagster pipeline is only executed when an MLflow resource is available for the Dagster job. You do not need to create an experiment or start a run. This is done by Dagster and the MLflow integration. 

The entry `resource_defs={"mlflow": mlflow_tracking}` must be added to the `job` decorator of the `spotify_genre_classification` job. This makes MLflow available to the job as a resource and can be used during execution. Finally, add the `@end_mlflow_on_run_finished` decorator to the job. This will end the MLflow run as soon as the Dagster job is finished. **Save the notebook dagster_exercise_ops_job_mlflow.

If you now open the [Dagster UI](http://localhost:3000), update the code location and open the Launchpad of the `spotify_genre_classification` job of the dagster mllfow code location, Dagster displays an error message that the configuration is incomplete. Let Dagster adjust the configuration. 

Adjust the configuration so that it looks like this: 

``` yaml
ops:
  analyze:
    config:
      confusion_matrix_path: ./data/confusion_materix.png
      report_path: ./data/classification_report.csv
  split_data:
    config:
      data_path: ./data/genres_standardized.csv
      seperator: ;
      target_column: genre
      test_set_size: 0.2
  train_classifier:
    config:
      gamma: 0
      learning_rate: 0.1
      max_depth: 10
      min_child_weight: 1
      number_of_estimators: 100
      number_of_jobs: 4
resources:
  mlflow:
    config:
      experiment_name: Spotify genre classification mlflow
      mlflow_tracking_uri: http://mlflow:5001
```