## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2).
* Backend store: postgresql database.
* Artifacts store: s3 bucket.

The experiments can be explored by accessing the remote server.

The exampe uses AWS to host a remote server. In order to run the example you'll need an AWS account. Follow the steps described in the file `mlflow_on_aws.md` to create a new AWS account and launch the tracking server. 

In [1]:
import os 


In [15]:
import mlflow
import os

os.environ["AWS_PROFILE"] = "default" # fill in with your AWS profile. More info: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/setup.html#setup-credentials

TRACKING_SERVER_HOST = "52.214.249.103" # fill in with the public DNS of the EC2 instance
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5000")

In [16]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://52.214.249.103:5000'


In [17]:
mlflow.search_experiments() # list_experiments API has been removed, you can use search_experiments instead.()

[<Experiment: artifact_location='s3://mlflow_abiodun/2', creation_time=1723650656591, experiment_id='2', last_update_time=1723650656591, lifecycle_stage='active', name='green-taxi-duration', tags={}>,
 <Experiment: artifact_location='s3://mlflow_abiodun/1', creation_time=1723627040587, experiment_id='1', last_update_time=1723627040587, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow_abiodun/0', creation_time=1723623907473, experiment_id='0', last_update_time=1723623907473, lifecycle_stage='active', name='Default', tags={}>]

In [18]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("my-experiment-12")
mlflow.end_run()

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path ='models')
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")
mlflow.end_run()

2024/08/14 20:09:10 INFO mlflow.tracking.fluent: Experiment with name 'my-experiment-12' does not exist. Creating a new experiment.
2024/08/14 20:09:12 INFO mlflow.tracking._tracking_service.client: 🏃 View run burly-shad-212 at: http://52.214.249.103:5000/#/experiments/3/runs/6cc004bfd51146f09e12977fb093ee84.
2024/08/14 20:09:12 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://52.214.249.103:5000/#/experiments/3.


default artifacts URI: 's3://mlflow-abiodun/3/6cc004bfd51146f09e12977fb093ee84/artifacts'


In [19]:
 print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")

default artifacts URI: 's3://mlflow-abiodun/3/15b40c57d8184efeb04f61eab842dcc1/artifacts'


In [20]:
mlflow.search_experiments()

[<Experiment: artifact_location='s3://mlflow-abiodun/3', creation_time=1723666150699, experiment_id='3', last_update_time=1723666150699, lifecycle_stage='active', name='my-experiment-12', tags={}>,
 <Experiment: artifact_location='s3://mlflow_abiodun/2', creation_time=1723650656591, experiment_id='2', last_update_time=1723650656591, lifecycle_stage='active', name='green-taxi-duration', tags={}>,
 <Experiment: artifact_location='s3://mlflow_abiodun/1', creation_time=1723627040587, experiment_id='1', last_update_time=1723627040587, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow_abiodun/0', creation_time=1723623907473, experiment_id='0', last_update_time=1723623907473, lifecycle_stage='active', name='Default', tags={}>]

### Interacting with the model registry

In [21]:
from mlflow.tracking import MlflowClient


client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5000")

In [39]:
for index,item in client.search_runs(experiment_ids='3')[0]:
    if index =='run_id':
        run_id=item

In [48]:
for index,item in client.search_runs(experiment_ids='3')[0]:
    if index == 'info' :
        print(item)

<RunInfo: artifact_uri='s3://mlflow-abiodun/3/15b40c57d8184efeb04f61eab842dcc1/artifacts', end_time=None, experiment_id='3', lifecycle_stage='active', run_id='15b40c57d8184efeb04f61eab842dcc1', run_name='polite-squid-499', run_uuid='15b40c57d8184efeb04f61eab842dcc1', start_time=1723666167322, status='RUNNING', user_id='codespace'>


In [49]:
run_id = '15b40c57d8184efeb04f61eab842dcc1'
mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='iris-classifier'
)

Successfully registered model 'iris-classifier'.
2024/08/14 20:35:23 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: iris-classifier, version 1
Created version '1' of model 'iris-classifier'.


<ModelVersion: aliases=[], creation_timestamp=1723667723179, current_stage='None', description='', last_updated_timestamp=1723667723179, name='iris-classifier', run_id='15b40c57d8184efeb04f61eab842dcc1', run_link='', source='s3://mlflow-abiodun/3/15b40c57d8184efeb04f61eab842dcc1/artifacts/models', status='READY', status_message='', tags={}, user_id='', version='1'>

In [53]:
#client.search_runs(experiment_ids='3')[0]