Installing MLflow

In [None]:
!pip install mlflow

In [2]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.4)

In [None]:
import mlflow
from sklearn.linear_model import LogisticRegression
mlflow.sklearn.autolog()
with mlflow.start_run():
 clf = LogisticRegression()
 clf.fit(X_train, y_train)

The mlflow.sklearn.autolog() instruction enables you to automatically 
log the experiment in the local directory. It captures the metrics produced by the 
underlying ML library in use. MLflow Tracking is the module responsible for 
handling metrics and logs. By default, the metadata of an MLflow run is stored in 
the local filesystem. 

The mlruns folder is generated alongside your notebook folder and contains all the 
experiments executed by your code in the current context.

Your experiment is identified as UUID on the preceding sample by 
46dc6db17fb5471a9a23d45407da680f. At the root of the directory, you have a 
yaml file named meta.yaml.

This is the basic metadata of your experiment, with information including start time, end 
time, identification of the run (run_id and run_uuid), an assumption of the life cycle 
stage, and the user who executed the experiment. The settings are basically based on a 
default run, but provide valuable and readable information regarding your experiment.

The model.pkl file contains a serialized version of the model. For a scikit-learn model, 
there is a binary version of the Python code of the model. Upon autologging, the metrics 
are leveraged from the underlying machine library in use. The default packaging strategy 
was based on a conda.yaml file, with the right dependencies to be able to serialize the 
model.

The MLmodel file is the main definition of the project from an MLflow project with 
information related to how to run inference on the current model.

The metrics folder contains the training score value of this particular run of the training 
process, which can be used to benchmark the model with further model improvements 
down the line.

The params folder on the first listing of folders contains the default parameters of the 
logistic regression model, with the different default possibilities listed transparently and 
stored automatically.