<div style = "10px">
<img src="https://mlflow.org/docs/latest/images/logo-light.svg" height="50px">
</div>

*Streamline*: to be more efficient, more effective or simplier

**MLflow** is tool for managing the Machine Learning Lifecycle

- It's open-source a plattform
- purpose-built to assist ML teams in handling the complexities of the ML process.
- MLflow focuses on the full lifecycle for machine learning projects, ensuring  that each phase manageable, traceable, reproducible.

1. Start a Tracking Server (local): 
   ```bash
   mlflow server --host 127.0.0.1 --port 8080
   ```
2. Set the Tracking Server URI if not using Databricks
   ```python
   import mlflow
   mlflow.set_tracking_uri(uri="http://<host>:<port>")
   ``` 
3. Train a model and prepare the data for logging 
   ```python
   import mlflow
   from mlflow.models import infer_signature
   import pandas as pd
   from sklearn import datasets
   from sklearn.model_selection import train_test_split
   from sklearn.linear_model import LogisticRegression
   from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


   # Load the Iris dataset
   X, y = datasets.load_iris(return_X_y=True)

   # Split the data into training and test sets
   X_train, X_test, y_train, y_test = train_test_split(
      X, y, test_size=0.2, random_state=42
   )

   # Define the model hyperparameters
   params = {
      "solver": "lbfgs",
      "max_iter": 1000,
      "multi_class": "auto",
      "random_state": 8888,
   }

   # Train the model
   lr = LogisticRegression(**params)
   lr.fit(X_train, y_train)

   # Predict on the test set
   y_pred = lr.predict(X_test)

   # Calculate metrics
   accuracy = accuracy_score(y_test, y_pred)
   ```
4. Log the model and its metadata to MLflow: records the *model*, *performance metrics*, *paramaters*

   ```python
   # Create a new MLflow Experiment
   mlflow.set_experiment("MLflow Quickstart")

   # Start an MLflow run
   with mlflow.start_run():
      # Log the hyperparameters
      mlflow.log_params(params)

      # Log the loss metric
      mlflow.log_metric("accuracy", accuracy)

      # Set a tag that we can use to remind ourselves what this run was for
      mlflow.set_tag("Training Info", "Basic LR model for iris data")

      # Infer the model signature
      # used to define the input and output schema of a machine learning model
      signature = infer_signature(X_train, lr.predict(X_train))

      # Log the model
      model_info = mlflow.sklearn.log_model(
         sk_model=lr,
         artifact_path="models/iris_model",
         signature=signature,
         input_example=X_train,
         registered_model_name="tracking-quickstart",
      )
   ```
5. Load the model as Python Function (pyfunc) and use it for inference. 

   ```python
   # Load the model back for predictions as a generic Python Function model
   loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

   predictions = loaded_model.predict(X_test)

   iris_feature_names = datasets.load_iris().feature_names

   result = pd.DataFrame(X_test, columns=iris_feature_names)
   result["actual_class"] = y_test
   result["predicted_class"] = predictions

   result[:4]
   ```

The training block code is outside of `mlflow.start_run()`. If there are some issues about the code, we can solve before to log de model.

## Four official ways to interact with MLflow

There are at least 4 official ways to interact with MLflow.

| Method            | Description            | Use Case                              |
| ----------------- | ---------------------- | ------------------------------------- |
| **Fluent API**    | High-level Python API  | Logging runs, metrics, models         |
| **MLflow Client** | Lower-level Python API | Managing runs, experiments, artifacts |
| **REST API**      | HTTP interface         | Custom apps, non-Python clients       |
| **CLI**           | Command-line interface | Quick tasks or scripting pipelines    |


Steps

1. Start MLflow Tracking Server (or Client). The server should be listening
2. Create experiment
3. Crear run (nested for tune hyperparameters)
4. Train the model using current hyperparameter
5. Log params (dictionary for *Fluent API* and one (key, value) for Client)
6. Log metrics (dictionary for *Fluent API* and one (key, value) for Client)
7. Log model 

### MLflow Client 

In [135]:
from mlflow.tracking import MlflowClient
from sklearn.ensemble import RandomForestRegressor 
from sklearn.model_selection import TimeSeriesSplit
import time
from dataset import generate_apple_sales_data_with_promo_adjustment
from sklearn.model_selection import train_test_split
from itertools import product
import numpy as np
import matplotlib.pyplot as plt
import os
import pickle
from mlflow.models import infer_signature
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [122]:
# 0. Get data
data = generate_apple_sales_data_with_promo_adjustment()

data['trend'] = data.date.dt.year/data.date.dt.year.min()

# drop irrelevant date field and target field
X = data.drop(columns=["date", "demand"])
X = X.astype('float64')
y = data["demand"]

Configuring the MLflow Tracking Client

We now have a client interface to the tracking server that can both send data to and retrieve data from the tracking server.

In [None]:
# 1. Initialize client
client = MlflowClient(tracking_uri="http://127.0.0.1:8080")

# 2. Create or get experiment
experiment_name = "Client-API-Simulation"
experiment = client.get_experiment_by_name(experiment_name)

# If not exists create it otherwise get the 
# id of the experiment
if experiment is None:
    experiment_id = client.create_experiment(experiment_name)
else:
    experiment_id = experiment.experiment_id

# 3. Start a new run
run = client.create_run(experiment_id)

In [136]:
# 4. Set parameters
params = {
    "n_estimators": 100,
    "max_depth": 6,
    "min_samples_split": 10,
    "min_samples_leaf": 4,
    "bootstrap": True,
    "oob_score": False,
    "random_state": 888,
}

In [None]:
# 5. Log parameters
for name_param, value_param in params.items():
    client.log_param(run.info.run_id, 
                     name_param, value_param)

# 6. split in train and test
time_series_split = TimeSeriesSplit(n_splits=5)

for step, (idx_train, idx_test) in enumerate(
    time_series_split.split(X, y)):
    
    # Split the data into training and validation sets
    X_train, X_val =  X.iloc[idx_train], X.iloc[idx_test]
    y_train, y_val = y[idx_train], y[idx_test]

    # Train the RandomForestRegressor
    rf = RandomForestRegressor(**params)
    
    # Fit the model on the training data
    rf.fit(X_train, y_train)

    # Predict on the validation set
    y_pred = rf.predict(X_val)


    # Calculate error metrics
    mae = mean_absolute_error(y_val, y_pred)
    mse = mean_squared_error(y_val, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_val, y_pred)

    # Assemble the metrics we're going to write into a collection
    metrics = {"mae": mae, "mse": mse, "rmse": rmse, "r2": r2}
    
    # 5. Log metrics
    for name_metric, value_metric in metrics.items():
        client.log_metric(
            run.info.run_id, 
            name_metric, 
            value_metric,
            step=step)
    
    # Save the model locally
    local_model_path = f"model/model_{step}.pkl"
    
    os.makedirs("model", exist_ok=True)

    with open(local_model_path, "wb") as f:
        pickle.dump(rf, f)
    
    # Log the model using the MLflow Client API
    artifact_path = "model"  # Subdirectory in the artifact repository
    model_uri = f"runs:/{run.info.run_id}/{artifact_path}"  # URI for the logged model

    # Log the model
    client.log_artifact(run.info.run_id, local_model_path, artifact_path)

    
    # If we want we can register the model
    # and apply teh version
    # Register the model
    registered_model_name = "RandomForestModel"
    client.create_registered_model(registered_model_name)

    # Create a new version of the model in the registry
    client.create_model_version(
        name=registered_model_name,
        source=model_uri,
        run_id=run.info.run_id
    )


In [126]:
# Output run info
print("Run ID:", run.info.run_id)
print("Experiment ID:", experiment_id)
print("Run completed and logged using Client API.")

Run ID: 0cd7e450f4954b8aaa6c485de2fbd3c8
Experiment ID: 424303544545129281
Run completed and logged using Client API.


**More About ML Client**

By default *MLflow Tracking Server* includes a *Default Experiment*, and this is used to save all information about experiment not declarated.

This is useful when we forget to create a new experiment before using the MLflow traking capabilities(log, ...)

**Creating Experiments**




Tags and experiments

If we run using the same input dataset, logically they belong to the same experiment, all the metadata (about of dataset) is filled in tags.

<center>
<img src="https://mlflow.org/docs/latest/assets/images/tag-exp-run-relationship-fc898eccc4bb05fe59f41372ab5f6b50.svg" height="300">
</center>

In [None]:
# experiment description
experiment_description = (
    "This is the grocery forecasting project."
    "This experiment contains the produce models for apples."
)

# Provide searchable tags that define characteristics of
# Runs that will be included in this Experiment
experiment_tags = {
    "project_name":"grocery-forecasting",
    "store_dept":"produce",
    "team":"stores-ml",
    "project_quarter":"Q3-2023",
    "mlflow.note.content":experiment_description
}

# Create the Experiment, providing a unique name
produce_apples_experiment = client.create_experiment(
    name='Apples_Models', 
    tags=experiment_tags
) 

**Search Experiments**

We can search the experiments that has the same project_name

In [128]:
# Search all experiments
all_experiments = client.search_experiments()

# Search experiments based on `project_name`
client.search_experiments(
    filter_string="tags.`project_name`='grocery-forecasting'"
)

[<Experiment: artifact_location='mlflow-artifacts:/446874737954528824', creation_time=1746984158981, experiment_id='446874737954528824', last_update_time=1746998852686, lifecycle_stage='active', name='Apples_Models', tags={'mlflow.note.content': 'This is the grocery forecasting project.This '
                         'experiment contains the produce models for apples.',
  'project_name': 'grocery-forecasting',
  'project_quarter': 'Q3-2023',
  'store_dept': 'produce',
  'team': 'stores-ml'}>]

**Delete Experiments**

Soft delete experiment. The experiment is not permanently removed from  the backend store it is marked as deleted and becomes hidden in the MLflow UI.

In [129]:
# Delete experiment
client.delete_experiment("446874737954528824")

# Show all deleted experiments
deleted_experiments = client.search_experiments(view_type="DELETED_ONLY")

for deleted_experiment in deleted_experiments:
    print("Experiment: {}".format(deleted_experiment.experiment_id))

# We can restore the experiement use restore_experiment
client.restore_experiment("446874737954528824")

# We can modify the key: value of the tag of the experiment
# If the key not exists, it will be created.
client.set_experiment_tag("446874737954528824", 'project_name', 'grocery-forecasting')

Experiment: 614683981825209504
Experiment: 347573977917312028
Experiment: 446874737954528824


### Fluent API

**Logging our first runs with MLflow**

The code below is the final experiment after many attemps (all old experiments was deleted)

We are going to use `Fluent` API. The `fluent` APIs use a globally referenced state of the MLflow tracking server's uri. 

This global instance allows for us to use these 'higher-level' (simpler) APIs to perform every action that we can otherwise do with the `MlflowClient`


### Four main components

- **MLflow Traking**: Logs parameters, metrics, artifacts() for experiments:
  - It allow log and *query* experiments. If MLflow run in local server all the metadata and artifact will be stored locally 
- **MLflow Project**: Package code and dependencies for reproducibility 
- **MLflow Models**: Standarizes model packaging and deployment.
- **MLflow Model Registry**: Manage model versions and stages(e.g. development, production)

>**Artifact** are files or directories associated with a run, such as trained models, serialized objects, visualizations, datasets, or any other outputs generated during an experiment.

Steps

1. Start MLflow Tracking Server (or Client). The server should be listening
2. Create experiment
3. Crear run (nested for tune hyperparameters)
4. Train the model using current hyperparameter
5. Log params (dictionary for *Fluent API* and one (key, value) for Client)
6. Log metrics (dictionary for *Fluent API* and one (key, value) for Client)
7. Log model 

In [None]:
import mlflow
from sklearn.model_selection import train_test_split
from itertools import product
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [42]:
mlflow.set_tracking_uri("http://127.0.0.1:8080")

If we start the MLflow Tracking Server without specifying a backend store or artifact root:
- *Metadata* : Stored in `./mlruns` on the machine running the server.
- *Artifacts* : Stored in `./mlruns` on the machine running the server.

When we set *URI="http://127.0.0.1:8080"*, all the metadata and artifacts about experiment will be stored in the 

A Custom Backend Store and Artifact Root would be

```bash
mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root s3://my-bucket/mlflow \
    --host 127.0.0.1 \
    --port 8080
```

- **Metadata** : Stored in the `mlflow.db` SQLite file on the machine running the server.
- **Artifacts** : Stored in the `s3://my-bucket/mlflow bucket`.

In [None]:
# Sets the current active experiment to the "Apple_Models" experiment and
# returns the Experiment metadata
apple_experiment = mlflow.set_experiment("Apples_Models")

# Define a run name for this iteration of training.
# If this is not set, a unique name will be auto-generated for your run.
run_name = "apples_rf_test"

# Define an artifact path that the outputs of the experiments (models, datasets, ...) will be saved to.
artifact_path = "model/rf_apples"

In [57]:
mlflow.search_experiments()

[<Experiment: artifact_location='mlflow-artifacts:/446874737954528824', creation_time=1746984158981, experiment_id='446874737954528824', last_update_time=1746998852686, lifecycle_stage='active', name='Apples_Models', tags={'mlflow.note.content': 'This is the grocery forecasting project.This '
                         'experiment contains the produce models for apples.',
  'project_name': 'grocery-forecasting',
  'project_quarter': 'Q3-2023',
  'store_dept': 'produce',
  'team': 'stores-ml'}>,
 <Experiment: artifact_location='mlflow-artifacts:/0', creation_time=1746920722099, experiment_id='0', last_update_time=1746920722099, lifecycle_stage='active', name='Default', tags={}>]

In [None]:
X_train, X_val, y_train, y_val = train_test_split(X, y,test_size=0.1)

params = {
    "n_estimators": 100,
    "max_depth": 6,
    "min_samples_split": 10,
    "min_samples_leaf": 4,
    "bootstrap": True,
    "oob_score": False,
    "random_state": 888,
}

signature = infer_signature(X_train, y_train)

In [None]:
# Initiate the MLflow run context
with mlflow.start_run(run_name=run_name) as run:
    # Log the parameters used for the model fit
    mlflow.log_params(params)

    # Log the error metrics that were calculated during validation
    mlflow.log_metrics(metrics)

    # Log an instance of the trained model for later use
    mlflow.sklearn.log_model(
        sk_model=rf, 
        input_example=X_val, 
        artifact_path=artifact_path,
        signature=signature
    )

Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

🏃 View run apples_rf_test at: http://127.0.0.1:8080/#/experiments/446874737954528824/runs/80df25d043e143618c2532bd8b6ffdd2
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/446874737954528824


**MLflow Nested Runs**

The function starts a new nested run in MLflow. Nested runs are useful for organizing hyperparameter tuning experiments as they allow you to group individual runs under a parent run.

In [None]:
def train(params):
    # Train the RandomForestRegressor
    rf = RandomForestRegressor(**params)

    # Fit the model on the training data
    rf.fit(X_train, y_train)

    # Predict on the validation set
    y_pred = rf.predict(X_val)

    # Calculate error metrics
    mae = mean_absolute_error(y_val, y_pred)
    mse = mean_squared_error(y_val, y_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_val, y_pred)

    # Assemble the metrics we're going to write into a collection
    metrics = {"mae": mae, "mse": mse, "rmse": rmse, "r2": r2}
    
    return rf, metrics, X_train, y_train


hyperparameters = {
    "n_estimators": [100, 200],
    "max_depth": [6, 10, 15],
    "min_samples_split": [10],
    "min_samples_leaf": [4],
    "bootstrap": [True],
    "oob_score": [False],
    "random_state": [888]}
# Initiate the MLflow run context


for value_params in product(*hyperparameters.values()):
    with mlflow.start_run(nested = True):
        params = dict(zip(hyperparameters.keys(), value_params))
        rf, metrics, X_train, y_train = train(params)

        # Get the schema of the data
        signature = infer_signature(X_train, y_train)
        
        # Log the parameters used for the model fit
        mlflow.log_params(params)

        # Log the error metrics that were calculated during validation
        mlflow.log_metrics(metrics)

        # Log an instance of the trained model for later use
        model_info = mlflow.sklearn.log_model(
            sk_model=rf, 
            input_example=X_train, 
            artifact_path=artifact_path,
            signature=signature
        )



Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

🏃 View run efficient-sow-531 at: http://127.0.0.1:8080/#/experiments/446874737954528824/runs/90f2c3ba83a84ae3b8622fcae7f07f1f
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/446874737954528824




Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

🏃 View run blushing-mouse-769 at: http://127.0.0.1:8080/#/experiments/446874737954528824/runs/c463b38a1da346f6a1296fc2efdb7800
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/446874737954528824




Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

🏃 View run shivering-frog-724 at: http://127.0.0.1:8080/#/experiments/446874737954528824/runs/b9086db451ea4d658e01133a4af0160b
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/446874737954528824




Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

🏃 View run rogue-shoat-513 at: http://127.0.0.1:8080/#/experiments/446874737954528824/runs/ca4c1546743a4c38bcdff88193087806
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/446874737954528824




Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

🏃 View run luminous-fowl-963 at: http://127.0.0.1:8080/#/experiments/446874737954528824/runs/cd1e9e85420848f2a5ceb78aa69935d0
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/446874737954528824




Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

🏃 View run thundering-fowl-902 at: http://127.0.0.1:8080/#/experiments/446874737954528824/runs/ccf44ca9ddc64574893c0a6e6a91c08b
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/446874737954528824


In [85]:
# Search all runs related to experiment id
runs = mlflow.search_runs(
    experiment_ids=[apple_experiment.experiment_id], 
    output_format='pandas'
    )

In [87]:
id_model = runs.loc[(runs.status == 'FINISHED') & 
                    (runs['metrics.mae'] == runs['metrics.mae'].min()),
                    "run_id"].iloc[0]

In [93]:
model_uri = 'runs:/{}/{}'.format(id_model, artifact_path)

In [95]:
sklearn_pyfunc  = mlflow.sklearn.load_model(model_uri)

Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

In [99]:
predict = sklearn_pyfunc.predict(X_val)

ml.