## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2).
* Backend store: postgresql database.
* Artifacts store: s3 bucket.

The experiments can be explored by accessing the remote server.

The exampe uses AWS to host a remote server. In order to run the example you'll need an AWS account. Follow the steps described in the file `mlflow_on_aws.md` to create a new AWS account and launch the tracking server. 

In [1]:
import mlflow
import os

#os.environ["AWS_PROFILE"] = "" # fill in with your AWS profile. More info: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/setup.html#setup-credentials
os.environ["AWS_ACCESS_KEY_ID"] = "YCAJEmmavBysHOS7W93LEn0eI"
os.environ["AWS_SECRET_ACCESS_KEY"] = "YCOLJS9kwaQ77xz3xGAYpn3-iXVDLza0bEgrkxS0"
os.environ['MLFLOW_S3_ENDPOINT_URL'] = "https://storage.yandexcloud.net"
TRACKING_SERVER_HOST = "51.250.111.121" # fill in with the public DNS of the EC2 instance
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5001")

In [2]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://51.250.111.121:5001'


In [3]:
mlflow.list_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote/1', experiment_id='1', lifecycle_stage='active', name='my-experiment-1', tags={}>]

In [14]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("my-experiment-1")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")

default artifacts URI: 's3://mlflow-artifacts-remote/1/2efa6c40390e418d80989730b4922bed/artifacts'


In [6]:
mlflow.list_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote/1', experiment_id='1', lifecycle_stage='active', name='my-experiment-1', tags={}>]

### Interacting with the model registry

In [7]:
from mlflow.tracking import MlflowClient


client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5001")

In [8]:
client.list_registered_models()

[<RegisteredModel: creation_timestamp=1658839296332, description='', last_updated_timestamp=1658839296332, latest_versions=[], name='test', tags={}>]

In [13]:
run_id = client.list_run_infos(experiment_id='1')[0].run_id
mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='test'
)

Registered model 'test' already exists. Creating a new version of this model...
2022/07/26 21:34:20 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: test, version 1
Created version '1' of model 'test'.


<ModelVersion: creation_timestamp=1658871260220, current_stage='None', description='', last_updated_timestamp=1658871260220, name='test', run_id='b985abafd05544cbbac77a547e367035', run_link='', source='s3://mlflow-artifacts-remote/1/b985abafd05544cbbac77a547e367035/artifacts/models', status='READY', status_message='', tags={}, user_id='', version='1'>

In [15]:
run_id

'b985abafd05544cbbac77a547e367035'

In [9]:
run_id = client.list_run_infos(experiment_id='1')

In [12]:
client.list_run_infos(experiment_id='1')[0].run_id

'b985abafd05544cbbac77a547e367035'