## Train a model 

- Load and prepare the Iris dataset for modeling.

- Train a Logistic Regression model and evaluate its performance.

- Prepare the model hyperparameters and calculate metrics for logging.

In [8]:
import mlflow
from mlflow.models import infer_signature

import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define the model hyperparameters
params = {
    "solver": "lbfgs",
    "max_iter": 1000,
    "multi_class": "auto",
    "random_state": 8888,
}

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred = lr.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)


####  Log the model and its metadata to MLflow


In this next step, we’re going to use the model that we trained, the hyperparameters that we specified for the model’s fit, and the loss metrics that were calculated by evaluating the model’s performance on the test data to log to MLflow.

The steps that we will take are:

- Initiate an MLflow run context to start a new run that we will log the model and metadata to.

- Log model parameters and performance metrics.

- Tag the run for easy retrieval.

- Register the model in the MLflow Model Registry while logging (saving) the model.

In [17]:
# Set our tracking server uri for logging
mlflow.set_tracking_uri(uri="http://127.0.0.1:8800")

# ## create a new mlflow Experiment
mlflow.set_experiment('MlFLOW Basic')

## start an mlflow run
with mlflow.start_run():
    ## log the praramters
    mlflow.log_params(params)

    ## log the metrics 
    mlflow.log_metric("accuracy",accuracy)

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Traning Info","Basic Lr Model for iris data")

    # Infer the model signature

    signature=infer_signature(X_train,lr.predict(X_train))
     

     ## Log the Model 
    model_info=mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_train,
        registered_model_name="tracking-Quickstart"
     )

2024/04/05 09:33:20 INFO mlflow.tracking.fluent: Experiment with name 'MlFLOW Basic' does not exist. Creating a new experiment.


Successfully registered model 'tracking-Quickstart'.
2024/04/05 09:33:23 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: tracking-Quickstart, version 1
Created version '1' of model 'tracking-Quickstart'.


In [19]:
load_model=mlflow.pyfunc.load_model(model_info.model_uri)

prediction=load_model.predict(X_test)

iris_feature_names=datasets.load_iris().feature_names

result=pd.DataFrame(X_test,columns=iris_feature_names)
result['actual_class']=y_test
result['predicted_class']=prediction
result[:4]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),actual_class,predicted_class
0,6.1,2.8,4.7,1.2,1,1
1,5.7,3.8,1.7,0.3,0,0
2,7.7,2.6,6.9,2.3,2,2
3,6.0,2.9,4.5,1.5,1,1


### Querying Runs Programmatically


You can also access all of the functions in the Tracking UI programmatically with MlflowClient.

For example, the following code snippet search for runs that has the best validation loss among all runs in the experiment.

In [29]:
client=mlflow.tracking.MlflowClient()
experiment_id="190520218521630719"
best_run=client.search_runs(
    experiment_id,max_results=1,
    order_by=["metrics.accuracy ASC"]
)[0]
print(best_run.info)

<RunInfo: artifact_uri='mlflow-artifacts:/190520218521630719/7e6613428c3c4bebbe9c83edf1f1a82b/artifacts', end_time=1712289803769, experiment_id='190520218521630719', lifecycle_stage='active', run_id='7e6613428c3c4bebbe9c83edf1f1a82b', run_name='receptive-wren-984', run_uuid='7e6613428c3c4bebbe9c83edf1f1a82b', start_time=1712289800256, status='FINISHED', user_id='ayush'>


### Creating Child Runs
You can also create multiple runs inside a single run. This is useful for scenario like hyperparameter tuning, cross-validation folds, where you need another level of organization within an experiment. You can create child runs by passing parent_run_id to mlflow.start_run() function. For example: