## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2).
* Backend store: postgresql database.
* Artifacts store: s3 bucket.

The experiments can be explored by accessing the remote server.

The exampe uses AWS to host a remote server. In order to run the example you'll need an AWS account. Follow the steps described in the file `mlflow_on_aws.md` to create a new AWS account and launch the tracking server. 

mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri postgresql://mlflow:DtTzSH76KQ5E9FKnrawX@mlflow-backend-db.cxcgoicaojyy.us-east-2.rds.amazonaws.com:5432/mlflow_db --default-artifact-root s3://mlflow-artifacts-remote-rv9vs

In [24]:
import mlflow
import os

os.environ["AWS_PROFILE"] = "default" # fill in with your AWS profile. More info: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/setup.html#setup-credentials

TRACKING_SERVER_HOST = "ec2-18-220-176-72.us-east-2.compute.amazonaws.com" # fill in with the public DNS of the EC2 instance
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5000")

Personal note: En la explicación para conectarse a un servidor de AWS usaron un sistema operativo con Linux distinto al que yo estoy usando, yo estoy usando Ubuntu y ahi el comando "yum" no es válido, en su lugar debo usar "apt".

If you are using the Ubuntu OS, it is important that you do not use the yum command. The “apt” is the package manager of Debian based system just like yum in RHEL. For instance, the “apt” package manager is used in the case of Ubuntu:

In [18]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://ec2-18-220-176-72.us-east-2.compute.amazonaws.com:5000'


In [15]:
mlflow.search_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote-rv9vs/1', creation_time=1717714566110, experiment_id='1', last_update_time=1717714566110, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote-rv9vs/0', creation_time=1717714180664, experiment_id='0', last_update_time=1717714180664, lifecycle_stage='active', name='Default', tags={}>]

[<Experiment: artifact_location='s3://mlflow-artifacts-remote-rv9vs/0', creation_time=1717714180664, experiment_id='0', last_update_time=1717714180664, lifecycle_stage='active', name='Default', tags={}>]

In [25]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("my-experiment-1")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")



default artifacts URI: 's3://mlflow-artifacts-remote-rv9vs/1/64fa379a58a24fb3890d226368613364/artifacts'


In [26]:
mlflow.search_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote-rv9vs/1', creation_time=1717714566110, experiment_id='1', last_update_time=1717714566110, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote-rv9vs/0', creation_time=1717714180664, experiment_id='0', last_update_time=1717714180664, lifecycle_stage='active', name='Default', tags={}>]

### Interacting with the model registry

In [27]:
from mlflow.tracking import MlflowClient


client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5000")

In [28]:
client.search_registered_models()

[]

In [29]:
#run_id = client.list_run_infos(experiment_id='1')[0].run_id
run_id = client.search_runs(experiment_ids='1')[0].info.run_id
mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='iris-classifier'
)

Successfully registered model 'iris-classifier'.
2024/06/06 23:09:35 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: iris-classifier, version 1
Created version '1' of model 'iris-classifier'.


<ModelVersion: aliases=[], creation_timestamp=1717715375553, current_stage='None', description='', last_updated_timestamp=1717715375553, name='iris-classifier', run_id='64fa379a58a24fb3890d226368613364', run_link='', source='s3://mlflow-artifacts-remote-rv9vs/1/64fa379a58a24fb3890d226368613364/artifacts/models', status='READY', status_message='', tags={}, user_id='', version='1'>

In [30]:
client.search_registered_models()

[<RegisteredModel: aliases={}, creation_timestamp=1717715375518, description='', last_updated_timestamp=1717715375553, latest_versions=[<ModelVersion: aliases=[], creation_timestamp=1717715375553, current_stage='None', description='', last_updated_timestamp=1717715375553, name='iris-classifier', run_id='64fa379a58a24fb3890d226368613364', run_link='', source='s3://mlflow-artifacts-remote-rv9vs/1/64fa379a58a24fb3890d226368613364/artifacts/models', status='READY', status_message='', tags={}, user_id='', version='1'>], name='iris-classifier', tags={}>]