# MLFlow for Model Registry

> A __model registry__ is a central place to store (register) models

MLFlow provides a popular model registry

> The MLFLow model registry  manages the lifecycle of `MLModel` (lineage (which experiment and run produced it), version, staging, annotations)

The MLFlow model registry can be set up as shown below.

```
		mlflow server \
			--backend-store-uri sqlite:///mlflow.db \
			--default-artifact-root ./artifacts \
			--host 127.0.0.1
```

> COMMON ISSUE: To connect to a mlflow server, you need to have an environment model called `MLFLOW_TRACKING_URI` set. 

One error which this can cause is the following:
```
INVALID_PARAMETER_VALUE:  Model registry functionality is unavailable; got unsupported URI './mlruns' for model registry data storage. Supported URI schemes are: ['postgresql', 'mysql', 'sqlite', 'mssql']
```

To fix this, let your MLFLOW_TRACKING_URI as shown below

```
export MLFLOW_TRACKING_URI=http://localhost:5000
```

Please remember, that this environment variable will only be set in the teminal you run it in. If you quit this terminal, you'll need to run that command again when you open up a new terminal. To avoid this, add that line to your `.bashrc` file, which runs every time you open a new terminal (or corresponding file depending on which terminal you use).

> COMMON MISTAKE: The Python API method `mlflow.set_tracking_uri(...)` will only work if your `MLFLOW_TRACKING_URI` environment variable is already set, in which case it will replace it.

Check out the docs [here](https://www.mlflow.org/docs/latest/tracking.html#mlflow-tracking-servers)

A few new concepts are introduced:
- __Registered model__ - `MLModel` which was explicitly registered (with metadata mentioned above)
- __Model version__ - First registered model gets a `1` which is incremented after each new model registration
- __Model stage__ - similar to `git` or data warehouses:
    - `staging`
    - `production`
    - `archived`
- __Model annotations__ - markdown documenting the model (algorithm description, dataset used etc.)

All of the above has to be done __after saving model via `log_model`__ and can be done via __API or UI__.

We will use API, to check how to do it via UI look [here](https://www.mlflow.org/docs/latest/model-registry.html#ui-workflow).

There are three programmatic ways to register the model:
- Via `log_model`'s `registered_model_name` argument; __when we want to register model from each experiment__
- [`mlflow.register_model`](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.register_model); __explicit registration of chosen model with run `id` specified__
- [`mlflow.Client`'s `create_registered_model`](https://www.mlflow.org/docs/latest/python_api/mlflow.tracking.html#mlflow.tracking.MlflowClient.create_registered_model); __like above, but Client is connected to specific run hence no need for explicit run `id` specification__

In [1]:
# Version one

from random import random, randint
from sklearn.ensemble import RandomForestRegressor

import mlflow
import mlflow.sklearn

with mlflow.start_run(run_name="YOUR_RUN_NAME") as run:
    params = {"n_estimators": 5, "random_state": 42}
    sk_learn_rfr = RandomForestRegressor(**params)

    # Log parameters and metrics using the MLflow APIs
    mlflow.log_params(params)
    mlflow.log_param("param_1", randint(0, 100))
    mlflow.log_metrics({"metric_1": random(), "metric_2": random() + 1})

    # Log the sklearn model and register as version 1
    mlflow.sklearn.log_model(
        sk_model=sk_learn_rfr,
        artifact_path="sklearn-model",
        registered_model_name="sk-learn-random-forest-reg-model"
    )

Successfully registered model 'sk-learn-random-forest-reg-model'.
2021/07/04 19:40:05 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: sk-learn-random-forest-reg-model, version 1
Created version '1' of model 'sk-learn-random-forest-reg-model'.


In [None]:
# Version two

result = mlflow.register_model(
    "runs:/d16076a3ec534311817565e6527539c0/sklearn-model",
    "sk-learn-random-forest-reg"
)

In [None]:
# Version three

from mlflow.tracking import MlflowClient

client = MlflowClient()
client.create_registered_model("sk-learn-random-forest-reg-model")

## Experiments vs Models

> Run experiments to evaluate many models, and then log only the best ones to the model registry

It's fine to try out lots of experiments when you are developing, but you don't want to clutter up the central model store.

## Retrieving the model

Like previously we can use `mlflow.<framework>.load_model` for loading by:
- specifying the model version
- specifying the stage

In [None]:
# loading by version

import mlflow.pyfunc

model_name = "sk-learn-random-forest-reg-model"
model_version = 1

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{model_version}"
)

model.predict(data)


In [None]:
# loading by stage

import mlflow.pyfunc

model_name = "sk-learn-random-forest-reg-model"
stage = 'Staging'

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{stage}"
)

model.predict(data)



### Other things

We can also:
- update model
- change it's description
- transition from one stage to another (allowed values are: `Staging`, `Production`, `Archived` or `None` (as Python variable)
- [searching registered models](https://www.mlflow.org/docs/latest/model-registry.html#listing-and-searching-mlflow-models) 
- [deleting registered models](https://www.mlflow.org/docs/latest/model-registry.html#deleting-mlflow-models) (although it's better to use `Archived` to keep track of experiments)

## Exercise

Now that we know a little bit about `MLFlow` we will __create `MLFLow` project__ (you can look at [this official tutorial](https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html) to help you get started):
- Create new private GitHub repository (e.g `example experiment`)
- Choose an example dataset from [`scikit-learn`](https://scikit-learn.org/stable/)
- Create simple command line parser with [`argparse`](https://docs.python.org/3/library/argparse.html) and a few experiment parameters
- Run experiments to find best algorithm in `sklearn` for your task (choose up to `5`)
- Commit everything and try to replicate results directly from GitHub
- Save the model in MLFlow's `model` format
- Deploy the model locally

Below is a small snippet using `argparse`, refer to documentation for more info (also save it locally in a file called `foo.py` and run `python<3> foo.py --help`)

In [None]:
import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
                    help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
                    const=sum, default=max,
                    help='sum the integers (default: find the max)')

args = parser.parse_args()

To summarise this notebook: MLFlow can be used to register models on the mlflow server. This means that models can be shared across the company easily.