# MLflow Integration Example

This notebook demonstrates how to use ml-assert's MLflow integration to track assertion results and model metrics.

In [1]:
from datetime import datetime

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from ml_assert.core.base import AssertionResult
from ml_assert.core.dsl import ModelAssertion
from ml_assert.integrations.mlflow import MLflowLogger

## 1. Basic Setup

First, let's create some sample data and train a simple model.

In [2]:
# Create sample data
np.random.seed(42)
X = pd.DataFrame(
    {
        "feature1": np.random.normal(0, 1, 1000),
        "feature2": np.random.normal(0, 1, 1000),
        "feature3": np.random.normal(0, 1, 1000),
    }
)
y = (X["feature1"] + X["feature2"] > 0).astype(int)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
y_scores = model.predict_proba(X_test)[:, 1]  # Get probability scores for ROC AUC

## 2. MLflow Integration

Now let's set up the MLflow logger and run some assertions.

In [3]:
# Initialize MLflow logger
mlflow_logger = MLflowLogger(
    experiment_name="ml-assert-example", run_name="model-validation-run"
)

# Start a new run
mlflow_logger.start_run()

## 3. Running Assertions and Logging Results

Let's run some model performance assertions and log the results to MLflow.

In [4]:
# Run model performance assertions
model_assertion = ModelAssertion(y_test, y_pred)
model_assertion._y_scores = y_scores  # Set scores for ROC AUC
result = (
    model_assertion.accuracy(threshold=0.8)
    .precision(threshold=0.75)
    .recall(threshold=0.75)
    .f1(threshold=0.75)
    .roc_auc(threshold=0.8)
    .validate()
)

# Log the assertion result
mlflow_logger.log_assertion_result_mlassert(
    result=result, step_name="model_performance"
)

## 4. Logging Multiple Assertions

Let's create and log multiple assertion results for different aspects of the model.

In [5]:
# Create assertion results for different metrics
accuracy_result = AssertionResult(
    success=True,
    message="Accuracy check passed",
    timestamp=datetime.now(),
    metadata={"accuracy": 0.85},
)

fairness_result = AssertionResult(
    success=False,
    message="Fairness check failed",
    timestamp=datetime.now(),
    metadata={"demographic_parity": 0.15},
)

# Log multiple assertion results
mlflow_logger.log_assertion_result_mlassert(accuracy_result, step_name="accuracy_check")
mlflow_logger.log_assertion_result_mlassert(fairness_result, step_name="fairness_check")

## 5. Using Context Manager

The MLflowLogger can also be used as a context manager for cleaner code.

In [6]:
# Using context manager
with mlflow_logger.run() as logger:
    # Run some assertions
    result = AssertionResult(
        success=True,
        message="Feature importance check passed",
        timestamp=datetime.now(),
        metadata={
            "feature1_importance": 0.4,
            "feature2_importance": 0.35,
            "feature3_importance": 0.25,
        },
    )
    logger.log_assertion_result_mlassert(result, step_name="feature_importance")

## 6. End the Run

Finally, let's end the MLflow run.

In [7]:
mlflow_logger.end_run()

## 7. Viewing Results in MLflow UI

You can now view the logged results in the MLflow UI by running:
```bash
mlflow ui
```

The UI will show:
- All assertion results with their success/failure status
- Metrics and parameters logged for each assertion
- Timestamps and messages for each assertion
- Metadata associated with each assertion