## MLflow 5 minute Tracking Quickstart / Início rápido do rastreamento de 5 minutos do MLflow

This notebook demonstrates using a local MLflow Tracking Server to log, register, and then load a model as a generic Python Function (pyfunc) to perform inference on a Pandas DataFrame. / Este bloco de anotações demonstra o uso de um MLflow Tracking Server local para registrar, registrar e carregar um modelo como uma função Python genérica (pyfunc) para executar inferência em um Pandas DataFrame.

Throughout this notebook, we'll be using the MLflow fluent API to perform all interactions with the MLflow Tracking Server. / Ao longo deste notebook, usaremos a API fluente do MLflow para executar todas as interações com o MLflow Tracking Server.

In [3]:
import mlflow
from mlflow.models import infer_signature

import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

### Set the MLflow Tracking URI / Definir o URI de rastreamento de MLflow

Depending on where you are running this notebook, your configuration may vary for how you initialize the interface with the MLflow Tracking Server. / Dependendo de onde você está executando este bloco de anotações, sua configuração pode variar para como você inicializa a interface com o MLflow Tracking Server.

For this example, we're using a locally running tracking server, but other options are available (The easiest is to use the free managed service within [Databricks Community Edition](https://community.cloud.databricks.com/)). / Para este exemplo, estamos usando um servidor de rastreamento em execução local, mas outras opções estão disponíveis (A mais fácil é usar o serviço gerenciado gratuito dentro do [Databricks Community Edition](https://community.cloud.databricks.com/)).

Please see [the guide to running notebooks here](https://www.mlflow.org/docs/latest/getting-started/running-notebooks/index.html) for more information on setting the tracking server uri and configuring access to either managed or self-managed MLflow tracking servers. / Consulte [o guia para executar blocos de anotações aqui](https://www.mlflow.org/docs/latest/getting-started/running-notebooks/index.html) para obter mais informações sobre como definir o uri do servidor de rastreamento e configurar o acesso a servidores de rastreamento MLflow gerenciados ou autogerenciados.

In [4]:
# NOTE: review the links mentioned above for guidance on connecting to a managed tracking server, such as the free Databricks Community Edition

mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")

## Load training data and train a simple model / Carregar dados de treinamento e treinar um modelo simples

For our quickstart, we're going to be using the familiar iris dataset that is included in scikit-learn. Following the split of the data, we're going to train a simple logistic regression classifier on the training data and calculate some error metrics on our holdout test data. / Para nosso início rápido, usaremos o conhecido conjunto de dados de íris que está incluído no scikit-learn. Após a divisão dos dados, vamos treinar um classificador de regressão logística simples sobre os dados de treinamento e calcular algumas métricas de erro em nossos dados de teste de holdout. 

Note that the only MLflow-related activities in this portion are around the fact that we're using a `param` dictionary to supply our model's hyperparameters; this is to make logging these settings easier when we're ready to log our model and its associated metadata. / Observe que as únicas atividades relacionadas ao MLflow nesta parte giram em torno do fato de que estamos usando um dicionário `param` para fornecer os hiperparâmetros do nosso modelo; isso facilita o registro dessas configurações quando estivermos prontos para registrar nosso modelo e seus metadados associados.

In [5]:
# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the model hyperparameters
params = {"solver": "lbfgs", "max_iter": 1000, "multi_class": "auto", "random_state": 8888}

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred = lr.predict(X_test)

# Calculate accuracy as a target loss metric
accuracy = accuracy_score(y_test, y_pred)

## Define an MLflow Experiment / Definir um experimento MLflow

In order to group any distinct runs of a particular project or idea together, we can define an Experiment that will group each iteration (runs) together. / Para agrupar quaisquer execuções distintas de um determinado projeto ou ideia, podemos definir um Experimento que agrupará cada iteração (execuções).
Defining a unique name that is relevant to what we're working on helps with organization and reduces the amount of work (searching) to find our runs later on. / Definir um nome exclusivo que seja relevante para o que estamos trabalhando ajuda na organização e reduz a quantidade de trabalho (pesquisa) para encontrar nossas execuções posteriormente.

In [7]:
mlflow.set_experiment("MLflow Quickstart")

MlflowException: API request to http://127.0.0.1:8080/api/2.0/mlflow/experiments/get-by-name failed with exception HTTPConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: /api/2.0/mlflow/experiments/get-by-name?experiment_name=MLflow+Quickstart (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc110582e60>: Failed to establish a new connection: [Errno 111] Connection refused'))

## Log the model, hyperparameters, and loss metrics to MLflow.

In order to record our model and the hyperparameters that were used when fitting the model, as well as the metrics associated with validating the fit model upon holdout data, we initiate a run context, as shown below. Within the scope of that context, any fluent API that we call (such as `mlflow.log_params()` or `mlflow.sklearn.log_model()`) will be associated and logged together to the same run. 

In [None]:
# Start an MLflow run
with mlflow.start_run():
    # Log the hyperparameters
    mlflow.log_params(params)

    # Log the loss metric
    mlflow.log_metric("accuracy", accuracy)

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for iris data")

    # Infer the model signature
    signature = infer_signature(X_train, lr.predict(X_train))

    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_train,
        registered_model_name="tracking-quickstart",
    )

Registered model 'tracking-quickstart' already exists. Creating a new version of this model...
2023/11/07 12:17:01 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: tracking-quickstart, version 3
Created version '3' of model 'tracking-quickstart'.


## Load our saved model as a Python Function

Although we can load our model back as a native scikit-learn format with `mlflow.sklearn.load_model()`, below we are loading the model as a generic Python Function, which is how this model would be loaded for online model serving. We can still use the `pyfunc` representation for batch use cases, though, as is shown below.

In [None]:
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

Downloading artifacts:   0%|          | 0/6 [00:00<?, ?it/s]

## Use our model to predict the iris class type on a Pandas DataFrame

In [7]:
predictions = loaded_model.predict(X_test)

iris_feature_names = datasets.load_iris().feature_names

# Convert X_test validation feature data to a Pandas DataFrame
result = pd.DataFrame(X_test, columns=iris_feature_names)

# Add the actual classes to the DataFrame
result["actual_class"] = y_test

# Add the model predictions to the DataFrame
result["predicted_class"] = predictions

result[:4]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),actual_class,predicted_class
0,6.1,2.8,4.7,1.2,1,1
1,5.7,3.8,1.7,0.3,0,0
2,7.7,2.6,6.9,2.3,2,2
3,6.0,2.9,4.5,1.5,1,1
