# Machine Learning Experimentation with MLflow, TensorFlow, and CatBoost

In this tutorial, we demonstrate a comprehensive machine learning workflow using MLflow for experiment tracking, TensorFlow for building neural network models, and CatBoost for gradient boosting models. Our objective is to predict the median income of households in California districts, based on several features such as median income, housing average age, and geographical location.

## Experiment Setup with MLflow

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It includes features for experiment tracking, model versioning, and deployment. MLflow helps in comparing different models and tracking experiments to ensure reproducibility.

```python
import mlflow
mlflow.set_tracking_uri("sqlite:///mlflow.db")
mlflow.set_experiment("income")


# MLflow UI Overview

MLflow UI is an integral part of the MLflow platform, designed to simplify the management and analysis of machine learning experiments. It offers a comprehensive interface for tracking experiments, comparing model performances, and versioning models. Here's a closer look at the functionalities provided by the MLflow UI:

## Experiment Tracking

- **Log Parameters and Metrics**: Users can log various parameters (e.g., hyperparameters, feature sets) and metrics (e.g., accuracy, RMSE) for each run, facilitating detailed performance analysis.

- **Visual Comparisons**: The UI allows for side-by-side comparisons of different runs, making it easy to identify the best-performing models based on their metrics.

- **Artifact Storage**: Artifacts such as model binaries, plots, and additional files generated during the runs can be stored and accessed through the UI. This feature supports in-depth analysis and review of model outputs and diagnostics.

## Model Management

- **Model Versioning**: MLflow UI aids in managing different versions of models, tracking their performance over time, and selecting the best version for deployment.

- **Run History**: It provides a detailed history of runs, including start and end times, parameters, metrics, and tags. This historical view helps in understanding model improvements and iterations over time.

## Visualization and Analysis

- **Metric Plots**: The UI offers plotting capabilities for metrics over time, allowing users to visually assess model training dynamics, overfitting, or convergence.

- **Custom Queries**: Users can query and filter runs based on specific metrics, parameters, or tags, enabling targeted analysis of experiments.

## Collaboration and Sharing

- **Experiment Sharing**: The platform supports sharing experiments and results with team members, facilitating collaboration on machine learning projects.

- **Annotation and Tagging**: Runs can be annotated with tags or comments, providing additional context or insights for future reference.

The MLflow UI is accessible via a web browser, making it a user-friendly tool for data scientists and engineers to monitor and analyze their machine learning experiments effectively. Its integration with the broader MLflow platform ensures a seamless workflow for experiment tracking, model tuning, and deployment.



# Conclusion

At the end of the experiment, you will encounter an interface in the MLflow UI similar to the one depicted below. This interface provides a comprehensive overview of your experiment's results, including metrics, parameters, and artifacts, enabling you to analyze the performance of various models at a glance.

![MLflow UI Experiment Overview](./images/Screenshot.png)

This visual representation in the MLflow UI simplifies the comparison between different runs, helping you to identify the most effective model configurations and make informed decisions about which models to further develop or deploy.


In [12]:
import mlflow
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing
import tensorflow_addons as tfa

from tensorflow.keras.callbacks import EarlyStopping

from tabtransformertf.models.fttransformer import FTTransformerEncoder, FTTransformer
from tabtransformertf.utils.preprocessing import df_to_dataset

import catboost as cb
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
import seaborn as sns

from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.losses import MeanSquaredError

In [13]:
%matplotlib inline
plt.rcParams["figure.figsize"] = (20,10)
plt.rcParams.update({'font.size': 15})

The command `mlflow ui --backend-store-uri sqlite://mlflow.db` is used to launch the MLflow user interface (UI) locally. MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. 

Here's what each part of the command does:

- `mlflow ui`: This part of the command launches the MLflow UI, which provides a graphical interface for users to interact with MLflow components such as experiments, runs, parameters, metrics, and artifacts.

- `--backend-store-uri sqlite://mlflow.db`: This part specifies the backend store URI for MLflow. In this case, it's using SQLite as the backend store, and `mlflow.db` is the SQLite database file where MLflow will store metadata such as experiment and run information.

So, when you run this command, MLflow will start a local server hosting its UI, and it will use SQLite as the backend store to persist metadata related to experiments and runs.


In [None]:
mlflow ui --backend-store-uri sqlite://mlflow.db

In [14]:
mlflow.set_tracking_uri("sqlite:///mlflow.db")
mlflow.set_experiment("income")

<Experiment: artifact_location='file:///d:/Mlflow/tutorials-main/mlflow/tutorials/mlflow/mlruns/1', creation_time=1707252810494, experiment_id='1', last_update_time=1707252810494, lifecycle_stage='active', name='income', tags={}>

## Load Data

In [15]:
dset = fetch_california_housing()
data = dset['data']
y = dset['target']
LABEL = dset['target_names'][0]

NUMERIC_FEATURES = ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Longitude', 'Latitude']
FEATURES = NUMERIC_FEATURES

data = pd.DataFrame(data, columns=dset['feature_names'])
data[LABEL] = y

data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [16]:
train_data, test_data = train_test_split(data, test_size=0.2)
print(f"Train dataset shape: {train_data.shape}")
print(f"Test dataset shape: {test_data.shape}")

Train dataset shape: (16512, 9)
Test dataset shape: (4128, 9)


## Data Processing

## Feature Scaling

Before training our models, it's crucial to apply feature scaling to ensure all numeric features contribute equally to the model's performance. This is especially important for models that are sensitive to the scale of the input features, such as neural networks and models that use distance measures.




In [17]:
X_train, X_val = train_test_split(train_data, test_size=0.2)

sc = StandardScaler()
X_train.loc[:, NUMERIC_FEATURES] = sc.fit_transform(X_train[NUMERIC_FEATURES])
X_val.loc[:, NUMERIC_FEATURES] = sc.transform(X_val[NUMERIC_FEATURES])
test_data.loc[:, NUMERIC_FEATURES] = sc.transform(test_data[NUMERIC_FEATURES])

## Baseline
### Model with RandomForestRegressor
The RandomForestRegressor serves as our baseline model. It's an ensemble method that builds multiple decision trees and merges them for a more accurate and robust prediction. This model is known for its ability to handle complex datasets with a mix of numerical and categorical features effectively.





In [18]:
mlflow.sklearn.autolog(disable=True)

with mlflow.start_run(run_name='rf_baseline'):
    params = {
        "n_estimators": 100,
        "max_depth": 20
    }

    mlflow.set_tag("model_name", "RF")
    mlflow.log_params(params)

    rf = RandomForestRegressor(n_estimators=100, max_depth=20)
    rf.fit(X_train[FEATURES], X_train[LABEL])

    rf_preds = rf.predict(test_data[FEATURES])
    rf_rms = mean_squared_error(test_data[LABEL], rf_preds, squared=False)

    mlflow.log_metric("test_rmse", rf_rms)
    mlflow.sklearn.log_model(rf, "sk_models")



### CatBoost Model
Next, we use CatBoost, a gradient boosting library that excels in dealing with categorical features directly, without the need for extensive preprocessing. CatBoost is designed to provide state-of-the-art results with minimal tuning and is particularly user-friendly for both regression and classification tasks.


In [19]:
catb_train_dataset = cb.Pool(X_train[FEATURES], X_train[LABEL]) 
catb_val_dataset = cb.Pool(X_val[FEATURES], X_val[LABEL]) 
catb_test_dataset = cb.Pool(test_data[FEATURES], test_data[LABEL])

In [None]:
with mlflow.start_run(run_name="catboost"):
    mlflow.set_tag("model_name", "CatBoost")
    catb = cb.CatBoostRegressor()
    catb.fit(catb_train_dataset, eval_set=catb_val_dataset, early_stopping_rounds=50)
    catb_preds = catb.predict(catb_test_dataset)
    catb_rms = mean_squared_error(test_data[LABEL], catb_preds, squared=False)

    mlflow.log_metric("test_rmse", catb_rms)
    mlflow.catboost.log_model(catb, "cb_models")

### Multi-layer Perceptron (MLP) Model
Finally, we explore a Multi-layer Perceptron (MLP), a type of neural network that consists of at least three layers of nodes: an input layer, hidden layers, and an output layer. MLPs can capture complex relationships in data by adjusting weights through backpropagation. In this context,

In [47]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.losses import MeanSquaredError
import tensorflow_addons as tfa
from tensorflow.keras.callbacks import EarlyStopping
import mlflow
from sklearn.metrics import mean_squared_error

def build_mlp(params):
    """
    Builds a Multi-layer Perceptron model based on provided parameters.
    
    :param params: A dictionary containing model configuration like layer sizes and dropout rate.
    :return: A compiled Keras Sequential model.
    """
    # Define the MLP model architecture
    mlp = Sequential([
        Dense(params["layer1_size"], activation=params['activation']),
        Dropout(params['dropout_rate']),
        Dense(params["layer2_size"], activation=params['activation']),
        Dropout(params['dropout_rate']),
        Dense(1, activation='relu')  # Output layer for regression
    ])
    return mlp

def train_mlp(mlp, train_params, train_dataset, val_dataset):
    """
    Compiles and trains the MLP model using provided training parameters and datasets.
    
    :param mlp: The MLP model to be trained.
    :param train_params: A dictionary with training configuration such as learning rate and weight decay.
    :param train_dataset: The dataset for training the model.
    :param val_dataset: The dataset for validating the model performance.
    :return: The trained MLP model.
    """
    # Configure the optimizer with weight decay
    optimizer = tfa.optimizers.AdamW(
        learning_rate=train_params["learning_rate"],
        weight_decay=train_params["weight_decay"],
    )

    # Compile the model with loss and metrics
    mlp.compile(
        optimizer=optimizer,
        loss=MeanSquaredError(name="mse"),
        metrics=[tf.keras.metrics.RootMeanSquaredError(name="rmse")]
    )

    # Early stopping to prevent overfitting
    early_stopping = EarlyStopping(
        monitor="val_rmse",
        mode="min",
        patience=train_params["early_stop_patience"],
        restore_best_weights=True,
    )

    # Train the model
    mlp.fit(
        train_dataset,
        epochs=train_params["num_epochs"],
        validation_data=val_dataset,
        callbacks=[early_stopping],
    )
    return mlp

def mlp_mlflow_run(name, mlp_params, train_params, train_dataset, val_dataset, test_dataset, y_test):
    """
    Wrapper function to train the MLP model and log the experiment with MLflow.
    
    :param name: The name of the MLflow run.
    :param mlp_params: Model parameters for building the MLP.
    :param train_params: Training parameters.
    :param train_dataset: Training dataset.
    :param val_dataset: Validation dataset.
    :param test_dataset: Test dataset for final evaluation.
    :param y_test: True labels for the test dataset.
    """
    with mlflow.start_run(run_name=name):
        mlflow.log_params(mlp_params)
        mlflow.log_params(train_params)
        mlflow.set_tag("model_name", "MLP")

        # Build, train, and evaluate the model
        mlp = build_mlp(mlp_params)
        mlp = train_mlp(mlp, train_params, train_dataset, val_dataset)

        test_preds = mlp.predict(test_dataset)
        test_rms = mean_squared_error(y_test, test_preds.ravel(), squared=False)
        

        mlflow.log_metric("test_rmse", test_rms)
        mlflow.tensorflow.log_model(mlp, "tf_models")

In [None]:
# To TF Dataset
mlp_train_ds = tf.data.Dataset.from_tensor_slices((X_train[FEATURES], X_train[LABEL])).batch(512).shuffle(512*4).prefetch(512)
mlp_val_ds = tf.data.Dataset.from_tensor_slices((X_val[FEATURES], X_val[LABEL])).batch(512).shuffle(512*4).prefetch(512)
mlp_test_ds = tf.data.Dataset.from_tensor_slices(test_data[FEATURES]).batch(512).prefetch(512)

mlp_params = {
    "layer1_size": 512,
    "layer2_size": 128,
    "layer3_size": 64,
    "dropout_rate": 0.3,
    "activation": 'relu'

}
train_params = dict(
    learning_rate=0.008, weight_decay=0.00001, early_stop_patience=10, num_epochs=1000
)

mlp_mlflow_run(
    "mlp_base",
    mlp_params,
    train_params,
    mlp_train_ds,
    mlp_val_ds,
    mlp_test_ds,
    test_data[LABEL],
)

In [None]:

mlp_params = {
    "layer1_size": 512,
    "layer2_size": 264,
    "layer3_size": 64,
    "dropout_rate": 0.1,
    "activation": 'relu'

}
train_params = dict(
    learning_rate=0.001, weight_decay=0.00001, early_stop_patience=30, num_epochs=1000
)

mlp_mlflow_run(
    "mlp_base_5",
    mlp_params,
    train_params,
    mlp_train_ds,
    mlp_val_ds,
    mlp_test_ds,
    test_data[LABEL],
)

In [None]:
mlp_mlflow_run

## FT Transformers

In [54]:
# To TF Dataset
import tensorflow as tf
import numpy as np

def df_to_dataset(dataframe, target=None, shuffle=True, batch_size=32):
    df = dataframe.copy()
    if target:
        labels = df.pop(target).values
        # Convert dataframe to a dictionary of series, then to a dictionary of numpy arrays
        dataset = {key: value.values[:, np.newaxis] for key, value in df.items()}  # Adjusted line
        dataset = tf.data.Dataset.from_tensor_slices((dataset, labels))
    else:
        dataset = {key: value.values[:, np.newaxis] for key, value in df.items()}  # Adjusted line
        dataset = tf.data.Dataset.from_tensor_slices(dataset)
    
    if shuffle:
        dataset = dataset.shuffle(buffer_size=len(dataframe))
    dataset = dataset.batch(batch_size)
    return dataset

train_dataset = df_to_dataset(X_train[FEATURES + [LABEL]], LABEL, shuffle=True)
val_dataset = df_to_dataset(X_val[FEATURES + [LABEL]], LABEL, shuffle=False)  # No shuffle
test_dataset = df_to_dataset(test_data[FEATURES], shuffle=False) # No target, no shuffle

In [55]:
def build_fttransformer(
    params_to_log, params_to_skip, out_dim=1, out_activation="relu"
):
    """
    Builds an FTTransformer model with specified parameters.

    Parameters:
    - params_to_log: Dictionary of parameters that will be logged and used in the FTTransformerEncoder.
    - params_to_skip: Dictionary of parameters to skip during logging but used in the FTTransformerEncoder.
    - out_dim: Output dimension of the final layer. Default is 1.
    - out_activation: Activation function for the output layer. Default is "relu".

    Returns:
    - An FTTransformer model ready for training.
    """
    # Define encoder
    ft_encoder = FTTransformerEncoder(
        **params_to_log,
        **params_to_skip
    )
    # Add prediction head to the encoder
    ft_transformer = FTTransformer(
        encoder=ft_encoder,
        out_dim=out_dim,
        out_activation=out_activation,
    )

    return ft_transformer


def train_model(model, train_params, train_dataset, val_dataset):
    optimizer = tfa.optimizers.AdamW(
        learning_rate=train_params["learning_rate"],
        weight_decay=train_params["weight_decay"],
    )
    """
    Compiles and trains the given model using specified parameters and datasets.

    Parameters:
    - model: The model to train.
    - train_params: Training parameters including learning rate and weight decay.
    - train_dataset: The dataset for training the model.
    - val_dataset: The dataset for validating the model performance.

    Returns:
    - The trained model.
    """

    model.compile(
        optimizer=optimizer,
        loss={
            "output": tf.keras.losses.MeanSquaredError(name="mse"),
            "importances": None,
        },
        metrics={
            "output": [tf.keras.metrics.RootMeanSquaredError(name="rmse")],
            "importances": None,
        },
    )

    early = EarlyStopping(
        monitor="val_output_loss",
        mode="min",
        patience=train_params["early_stop_patience"],
        restore_best_weights=True,
    )
    callback_list = [early]

    hist = model.fit(
        train_dataset,
        epochs=train_params["num_epochs"],
        validation_data=val_dataset,
        callbacks=callback_list,
    )
    return model


In [56]:
mlflow.tensorflow.autolog(disable=True)


import mlflow
from sklearn.metrics import mean_squared_error
import tensorflow as tf

# Assuming build_fttransformer and train_model functions are defined as per previous refinements

def fttransformer_mlflow_run(
    name: str,
    encoder_params: dict,
    train_params: dict,
    params_to_skip: dict,
    train_dataset: tf.data.Dataset,
    val_dataset: tf.data.Dataset,
    test_dataset: tf.data.Dataset,
    y_test: np.ndarray,
):
    """
    Trains an FTTransformer model and logs the experiment using MLflow.

    Parameters:
    - name: Name of the MLflow run.
    - encoder_params: Parameters for the FTTransformer encoder.
    - train_params: Training parameters.
    - params_to_skip: Parameters to exclude from logging but include in model building.
    - train_dataset: Dataset for training.
    - val_dataset: Dataset for validation.
    - test_dataset: Dataset for testing.
    - y_test: Actual labels for the test dataset.

    This function logs the encoder and training parameters, model architecture, and test performance metrics in MLflow.
    """
    with mlflow.start_run(run_name=name):
        mlflow.set_tag("model_name", "FTTransformer")

        # Log encoder and training parameters
        mlflow.log_params(encoder_params)
        mlflow.log_params(train_params)

        # Disable automatic logging to customize what gets logged
        mlflow.tensorflow.autolog(disable=True)

        # Build and train the FTTransformer model
        ft_transformer = build_fttransformer(
            encoder_params,
            params_to_skip,
            out_dim=1,
            out_activation="relu",
        )
        ft_transformer = train_model(
            ft_transformer, train_params, train_dataset, val_dataset
        )

        # Evaluate the model on the test dataset
        test_preds = ft_transformer.predict(test_dataset)
        test_rms = mean_squared_error(y_test, test_preds.ravel(), squared=False)

        # Log the test RMSE
        mlflow.log_metric("test_rmse", test_rms)

        # Log the FTTransformer model in MLflow
        # Ensure to specify the TensorFlow model's save format as 'tf' for compatibility
        mlflow.tensorflow.log_model(tf_model=ft_transformer, artifact_path="tf_models", registered_model_name="FTTransformer")




In [57]:
train_params = dict(
    learning_rate=0.001, weight_decay=0.00001, early_stop_patience=10, num_epochs=1000
)

params_to_skip = dict(
    numerical_data=X_train[NUMERIC_FEATURES].values,
    categorical_data=None,
    y=X_train[LABEL].values
)

### Linear Embeddings

In [None]:
linear_embeddings_params = dict(
    numerical_features=NUMERIC_FEATURES,
    categorical_features=[],
    numerical_embedding_type="linear",
    embedding_dim=64,
    depth=3,
    heads=6,
    attn_dropout=0.3,
    ff_dropout=0.3,
    explainable=True,
)

fttransformer_mlflow_run(
    name='linear',
    encoder_params=linear_embeddings_params,
    train_params=train_params,
    params_to_skip=params_to_skip,
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    test_dataset=test_dataset,
    y_test=test_data[LABEL],
)

### Periodic

In [None]:
periodic_params_to_log = dict(
    numerical_features=NUMERIC_FEATURES,
    categorical_features=[],
    numerical_embedding_type='periodic',
    numerical_bins=128,
    embedding_dim=64,
    depth=3,
    heads=6,
    attn_dropout=0.3,
    ff_dropout=0.3,
    explainable=True,
)

fttransformer_mlflow_run(
    name='periodic',
    encoder_params=periodic_params_to_log,
    train_params=train_params,
    params_to_skip=params_to_skip,
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    test_dataset=test_dataset,
    y_test=test_data[LABEL],
)

### PLE - Quantile Binning

In [None]:
pleq_params_to_log = dict(
    numerical_features=NUMERIC_FEATURES,
    categorical_features=[],
    numerical_embedding_type='ple',
    numerical_bins=128,
    embedding_dim=64,
    depth=3,
    heads=6,
    attn_dropout=0.3,
    ff_dropout=0.3,
    explainable=True,
)

pleq_params_to_skip = params_to_skip.copy()
pleq_params_to_skip['y'] = None

fttransformer_mlflow_run(
    name='ple_quantile',
    encoder_params=pleq_params_to_log,
    train_params=train_params,
    params_to_skip=pleq_params_to_skip,
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    test_dataset=test_dataset,
    y_test=test_data[LABEL],
)

### PLE - Target Binning

In [None]:
plet_params_to_log = dict(
    numerical_features=NUMERIC_FEATURES,
    categorical_features=[],
    numerical_embedding_type='ple',
    numerical_bins=128,
    embedding_dim=64,
    depth=3,
    heads=6,
    attn_dropout=0.3,
    ff_dropout=0.3,
    explainable=True,
    task='regression',
    ple_tree_params = {
        "min_samples_leaf": 20,
    }
)


fttransformer_mlflow_run(
    name='ple_target',
    encoder_params=plet_params_to_log,
    train_params=train_params,
    params_to_skip=params_to_skip,
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    test_dataset=test_dataset,
    y_test=test_data[LABEL],
)

In [63]:
model_id = "runs:/1dedda07d5b74951bee1226cdffdfdb0/tf_models" # take it from mlflow
loaded_ft = mlflow.tensorflow.load_model(model_id)
##




WARNI [tensorflow] From c:\Users\User\anaconda3\envs\mlflow_env\lib\site-packages\keras\src\saving\legacy\saved_model\load.py:107: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.



# Model Comparison in MLflow

MLflow provides a powerful interface for comparing different models within the same experiment. This feature allows you to easily compare metrics like accuracy, loss, and other custom metrics across various runs. By utilizing the comparison tool, you can quickly identify which model performs best based on your specified criteria.

The comparison view can be accessed from the experiment page, where you can select multiple runs and click on the "Compare" button. This brings up a detailed comparison view that highlights differences in parameters, metrics, and provides visualizations for quick insights.

Below is an example of what the model comparison interface looks like in MLflow UI:

![MLflow UI Model Comparison](./images/Screenshot2.png)

This interface simplifies the task of evaluating different model versions, making it easier to make informed decisions about which model to deploy or further refine.


In [64]:
final_model_prediciton = loaded_ft.predict(test_dataset)

