## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2).
* Backend store: postgresql database.
* Artifacts store: s3 bucket.

The experiments can be explored by accessing the remote server.

The exampe uses AWS to host a remote server. In order to run the example you'll need an AWS account. Follow the steps described in the file `mlflow_on_aws.md` to create a new AWS account and launch the tracking server. 

In [66]:
import mlflow
import os
import tempfile
from pathlib import Path

os.environ["AWS_PROFILE"] = "mlflow" # fill in with your AWS profile. More info: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/setup.html#setup-credentials

TRACKING_SERVER_HOST = "18.133.27.208" # fill in with the public DNS of the EC2 instance
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5000")

In [24]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://18.133.27.208:5000'


In [25]:
mlflow.list_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote-3/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote-3/1', experiment_id='1', lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='./artifacts/2', experiment_id='2', lifecycle_stage='active', name='my-experiment-2', tags={}>,
 <Experiment: artifact_location='./artifacts/3', experiment_id='3', lifecycle_stage='active', name='my-experiment-3', tags={}>,
 <Experiment: artifact_location='mlflow-artifacts:/4', experiment_id='4', lifecycle_stage='active', name='my-experiment-4', tags={}>,
 <Experiment: artifact_location='./artifacts/5', experiment_id='5', lifecycle_stage='active', name='my-experiment-5', tags={}>,
 <Experiment: artifact_location='mlflow-artifacts:/6', experiment_id='6', lifecycle_stage='active', name='my-experiment-6', tags={}>,
 <Experiment: artifact_location='./artifacts/7', experiment_id

In [26]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

n = 10
mlflow.set_experiment(f"my-experiment-{n}")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")

2022/05/30 10:57:57 INFO mlflow.tracking.fluent: Experiment with name 'my-experiment-10' does not exist. Creating a new experiment.


default artifacts URI: 's3://mlflow-artifacts-remote-3/10/9a6eba387dfa49b1ad9b89542fba8cf8/artifacts'


In [37]:
experiment = mlflow.list_experiments()[-1]
last_run = mlflow.list_run_infos(experiment_id=experiment.experiment_id)[-1]
print(last_run.artifact_uri)

s3://mlflow-artifacts-remote-3/10/9a6eba387dfa49b1ad9b89542fba8cf8/artifacts


### Interacting with the model registry

In [38]:
from mlflow.tracking import MlflowClient


client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5000")

In [39]:
client.list_registered_models()

[<RegisteredModel: creation_timestamp=1653845643389, description='', last_updated_timestamp=1653845643494, latest_versions=[<ModelVersion: creation_timestamp=1653845643494, current_stage='None', description='', last_updated_timestamp=1653845643494, name='iris-classifier', run_id='584391ae4c7942959be21d8529f3ea7a', run_link='', source='s3://mlflow-artifacts-remote-3/1/584391ae4c7942959be21d8529f3ea7a/artifacts/models', status='READY', status_message='', tags={}, user_id='', version='1'>], name='iris-classifier', tags={}>]

In [58]:
run_id = client.list_run_infos(experiment_id=experiment.experiment_id)[-1].run_id
name = 'iris-classifier'
mv = mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='iris-classifier'
)

Registered model 'iris-classifier' already exists. Creating a new version of this model...
2022/05/30 12:38:07 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: iris-classifier, version 4
Created version '4' of model 'iris-classifier'.


### Download artifacts

In [96]:
with tempfile.TemporaryDirectory() as tmpdirname:
    mlflow.artifacts.download_artifacts(f"models:/{name}/None", dst_path=tmpdirname)
    path = Path(tmpdirname)
    for item in path.iterdir():
        try:
            with item.open() as f:
                content = f.read()
            print("\nFile:", item.name)
            print(content)
        except UnicodeDecodeError:
            continue
    sk_model = mlflow.sklearn.load_model(tmpdirname)
    model = mlflow.pyfunc.load_model(tmpdirname)


File: python_env.yaml
python: 3.9.7
build_dependencies:
- pip==21.2.4
- setuptools==61.2.0
- wheel==0.37.1
dependencies:
- -r requirements.txt


File: requirements.txt
mlflow
cloudpickle==2.1.0
scikit-learn==1.0.2
typing-extensions==4.1.1

File: MLmodel
artifact_path: models
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.9.7
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.0.2
mlflow_version: 1.26.0
model_uuid: 980ae8b83a4e4cb7b33c0c3c4a2d8367
run_id: 9a6eba387dfa49b1ad9b89542fba8cf8
utc_time_created: '2022-05-30 08:57:58.186435'


File: conda.yaml
channels:
- conda-forge
dependencies:
- python=3.9.7
- pip<=21.2.4
- pip:
  - mlflow
  - cloudpickle==2.1.0
  - scikit-learn==1.0.2
  - typing-extensions==4.1.1
name: mlflow-env

