## Experiment tracking

> __Tracking is an API and UI which allows us to log experiment's data and later visualizing it__

Using it we can log:
- model parameters
- code versions (git commit hashes)
- metrics
- generated artifacts

__`mlflow` tracking is organized around runs, which is simply some form of execution of our program__.

Each run is recorded by `mlflow` either to:
- local files
- SQLAlchemy database
- remote storage (via [`mlflow.set_tracking_uri()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri) function)

For more information about storage [check out relevant part of documentation](https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded).

> __Via `MLFlow` we can track, version and create comprehensive experiment from everything, starting with ETL and ending with deployment__

There are a few main concepts to keep in mind when using it:
- __experiment__ - mainly [`mlflow.set_experiment(UNIQUE_NAME_OF_EXPERIMENT)`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment) which sets current experiments and optionally creates it if it doesn't exist.
- __run__ - single run, experiment can consist of multiple of those. Context manager [`mlflow.start_run()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run)
- __logging__ - logging data from an experiment; here are the related function:
    - [`mlflow.log_param(key, value)`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_param) - logs hyperparameters and other settable parameters under current run
    - [`mlflow.log_metric(key, value)`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metric)
    - [`mlflow.log_artifact(local_path)`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_artifact) - logs created file (e.g. models, generated text etc.) under the current run
    
Given the above, let's see how to run and log __non-flavored__ (e.g. without specific integrations) dummy experiment:

In [14]:
import mlflow


def create_dummy_file():
    features = "rooms, zipcode, median_price, school_rating, transport"
    with open("features.txt", "w") as f:
        f.write(features)


create_dummy_file()

# Create experiment (artifact_location=./ml_runs by default)
mlflow.set_experiment("Dummy Experiments")

# By default experiment we've set will be used
with mlflow.start_run():
    mlflow.log_artifact("features.txt")
    mlflow.log_param("learning_rate", 0.01)
    for i in range(10):
        mlflow.log_metric("Iteration", i, step=i)

To visualize & explore saved data we can use `mlflow ui` command and open web browser under [`http://localhost:5000 `](http://localhost:5000) (__data will be saved inside `./mlruns`__)

Run below in the terminal:

In [8]:
# !mlflow ui --help

After navigating to the the experiment, we can see the `Iteration` being logged like below:

![](images/mlflow_ui.png)

## Model format

> MLFlow provides standard format for saving machine learning models (from various libraries) which helps us with model usage (e.g. inference on REST API, cloud etc.) 

MLFlow models consist of:
- directory with arbitrary files defined by the model)
- `MLmodel` file (written in yaml) which specifies what is contained within the model

Let's see how to save our model (in this case `sklearn`) in Python...

In [9]:
mlflow.sklearn.save_model(model, "my_model")

NameError: name 'model' is not defined

which creates the following directory in our `cwd`:

```bash
my_model/
├── MLmodel
└── model.pkl
```

Contents of the `MLModel` are equally easy to grasp:

```yml
---
time_created: 2021-04-03T17:28:53.35

flavors:
  sklearn:
    sklearn_version: 0.24.1
    pickled_model: model.pkl
  python_function:
    loader_module: mlflow.sklearn
```

### Model signature

In order to deploy (and sometimes even run, like in `tensorflow`) we need to specify __model signature__

> __Model signature specifies type and shape of inputs going through the model__

- Standard casting rules apply (upcasting is fine, downcasting would raise an error)
- Helps reading inputs when those are send using JSON via REST API or a-like

We can add it to `MLModel` file, two options to do so below:

#### Column signature

> Specify input signature by specifying each possible column input

This mode is supported by all flavors (frameworks), yet those might not be the easiest in all cases.

Example for `iris` dataset:

```yaml
signature:
    inputs: '[{"name": "sepal length (cm)", "type": "double"}, {"name": "sepal width
      (cm)", "type": "double"}, {"name": "petal length (cm)", "type": "double"}, {"name":
      "petal width (cm)", "type": "double"}]'
    outputs: '[{"type": "integer"}]'
```

#### Tensor signature

> Specify input for deep learning inputs (e.g. images) via tensor shape

Image oriented example:

```yaml
signature:
    inputs: '[{"name": "images", "dtype": "uint8", "shape": [-1, 28, 28, 1]}]'
    outputs: '[{"shape": [-1, 10], "dtype": "float32"}]'
```

#### Inferring input shapes

Often it is easier (and less error-prone) to infer `dtype` and shape through our code. One can easily do this via [`mlflow.models.infer_signature`](https://mlflow.org/docs/latest/python_api/mlflow.models.html#mlflow.models.infer_signature).

Check out code below for an example

In [11]:
import pandas as pd
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature

iris = datasets.load_iris()
iris_train = pd.DataFrame(iris.data, columns=iris.feature_names)
clf = RandomForestClassifier(max_depth=7, random_state=0)
clf.fit(iris_train, iris.target)
signature = infer_signature(iris_train, clf.predict(iris_train))

mlflow.sklearn.log_model(clf, "iris_rf", signature=signature)

`infer_signature` is really simple:
- Pass input data (usually `torch.Tensor`, `pd.DataFrame`, `np.ndarray` or other standard types)
- Pass data through the model as the second argument - this will create `outputs` automatically


`mlflow.sklearn.log_model` saves the model to the file in `cwd` named `iris_rf` with our specified signature.
We could later load it from the disk (__it has to be tailored to the flavor we saved it in!__):

In [None]:
# Load sklearn model

sklearn_model = mlflow.sklearn.load_model("iris_rf")

In summary, we've seen how MLFlow can be used to track experiments.