# MLFlow for Model Registry

## Introduction
> A __model registry__ is a central location for storing (register) models.

MLFlow offers a popular model registry.

> The MLFLow model registry manages the lifecycle of the `MLModel` (lineage (the experiment and run that produced it), version, staging and annotations).

The MLFlow model registry can be set up as shown below.

```
		mlflow server \
			--backend-store-uri sqlite:///mlflow.db \
			--default-artifact-root ./artifacts \
			--host 127.0.0.1
```

> __Note:__ To connect to an MLFlow server, an environment model called `MLFLOW_TRACKING_URI` must be set. 

If the environment model is not set, it can cause the following error:
```
INVALID_PARAMETER_VALUE:  Model registry functionality is unavailable; got unsupported URI './mlruns' for model registry data storage. Supported URI schemes are: ['postgresql', 'mysql', 'sqlite', 'mssql']
```

To fix this, set the MLFLOW_TRACKING_URI, as follows:

```
export MLFLOW_TRACKING_URI=http://localhost:5000
```

Recall that this environment variable will only be set in the terminal in which it is run. If this terminal is terminated, the command must be run again when a new terminal is opened. To avoid this, add the above line to your `.bashrc` file, which runs each time you open a new terminal (or a corresponding file, depending on the terminal in use).

> __Note:__ The Python API method, `mlflow.set_tracking_uri(...)`, will only work if the `MLFLOW_TRACKING_URI` environment variable has been set, in which case it will replace it.

Check out the docs [here](https://www.mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers).

## Important Concepts:
- __Registered model__: The `MLModel` that was explicitly registered with the metadata mentioned above.
- __Model version__: The first registered model receives `1` as the version number, which increases by unity for each new model registration.
- __Model stage__: This is similar to `git` or data warehouses:
    - `staging`
    - `production`
    - `archived`
- __Model annotations__: The model is documented in markdown (algorithm description, the dataset used, etc.).

All of the above are required __after saving a model via `log_model`,__ and they can be achieved via an __API or a UI__.

For this lesson, we will use an API. Check [here](https://www.mlflow.org/docs/latest/model-registry.html#ui-workflow) for information on how to carry out the tasks via the UI.

There are three programmatic approaches for registering a model:
- Via `log_model`'s `registered_model_name` argument (__when the user intends to register a model from each experiment__).
- [`mlflow.register_model`](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.register_model) (__explicit registration of the chosen model with the run `id` specified__).
- [`mlflow.Client`'s `create_registered_model`](https://www.mlflow.org/docs/latest/python_api/mlflow.tracking.html#mlflow.tracking.MlflowClient.create_registered_model) (__similar to that above. However, the Client is connected to a specific run; hence, the explicit specification of the run `id` is not required__).

In [1]:
# Version one

from random import random, randint
from sklearn.ensemble import RandomForestRegressor

import mlflow
import mlflow.sklearn

with mlflow.start_run(run_name="YOUR_RUN_NAME") as run:
    params = {"n_estimators": 5, "random_state": 42}
    sk_learn_rfr = RandomForestRegressor(**params)

    # Log parameters and metrics using the MLflow APIs
    mlflow.log_params(params)
    mlflow.log_param("param_1", randint(0, 100))
    mlflow.log_metrics({"metric_1": random(), "metric_2": random() + 1})

    # Log the sklearn model and register as version 1
    mlflow.sklearn.log_model(
        sk_model=sk_learn_rfr,
        artifact_path="sklearn-model",
        registered_model_name="sk-learn-random-forest-reg-model"
    )

Successfully registered model 'sk-learn-random-forest-reg-model'.
2021/07/04 19:40:05 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: sk-learn-random-forest-reg-model, version 1
Created version '1' of model 'sk-learn-random-forest-reg-model'.


In [None]:
# Version two

result = mlflow.register_model(
    "runs:/d16076a3ec534311817565e6527539c0/sklearn-model",
    "sk-learn-random-forest-reg"
)

In [None]:
# Version three

from mlflow.tracking import MlflowClient

client = MlflowClient()
client.create_registered_model("sk-learn-random-forest-reg-model")

## Experiments vs Models

> Run experiments to evaluate many models; thereafter, log only the best ones to the model registry.

We encourage you to carry out many experiments to aid your development; however, avoid cluttering the central model store.

## Retrieving the Model

As shown previously, `mlflow.<framework>.load_model` can be utilised for loading by
- specifying the model version.
- specifying the stage.

In [None]:
# loading by version

import mlflow.pyfunc

model_name = "sk-learn-random-forest-reg-model"
model_version = 1

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{model_version}"
)

model.predict(data)


In [None]:
# loading by stage

import mlflow.pyfunc

model_name = "sk-learn-random-forest-reg-model"
stage = 'Staging'

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{stage}"
)

model.predict(data)



## Things to Note

We can also
- update a model.
- change its description.
- transition from one stage to another (allowed values are: `Staging`, `Production`, `Archived` or `None` (as the Python variable)).
- Search for[registered models](https://www.mlflow.org/docs/latest/model-registry.html#listing-and-searching-mlflow-models). 
- Delete [registered models](https://www.mlflow.org/docs/latest/model-registry.html#deleting-mlflow-models), although `Archiving` is preferable to keep track of experiments).

## Example

Here, using what we have learnt thus far, we will create __an `MLFLow` project__ (refer to [this official tutorial](https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html) for help on getting started).
- Create a new, private GitHub repository, and assign a name to it (e.g `example experiment`).
- Choose an example dataset from [`scikit-learn`](https://scikit-learn.org/stable/).
- Create a simple command-line parser with [`argparse`](https://docs.python.org/3/library/argparse.html), and a few experiment parameters.
- Run experiments to determine the best algorithm in `sklearn` for your task (choose up to `5`).
- Commit everything, and attempt to replicate the results directly from GitHub.
- Save the model in MLFlow's `model` format.
- Deploy the model locally.

Below is a small snippet using `argparse`. Save it locally in a file called `foo.py`, and run `python<3> foo.py --help`. Refer to the documentation for more information. 

In [None]:
import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')

args = parser.parse_args()

In summary, MLFlow can be used to register models on the MLFlow server. This will facilitate effortless model-sharing across a company.

## Conclusion
At this point, you should have a good understanding of 
- MLFlow model registry and its important associated concepts.
- the difference between experiments and models.
- how to retrieve a model.