# ML Flow Example Notebook v02

by David Cochran

**New**
- in-notebook execution and logging
- successfully logs custom test scores
- Once the custom metrics are logged, they can be added to the Metrics view of the logged model from the left-hand sidebar under "Select Metrics."

**Notes** 
- Employ ML Flow methods for tracking process and registering model

- See this article for MLflow implementation in Azure notebooks:
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow-cli-runs 

- See this for explanation of custom sklearn metrics needing to be imported AFTER autologging is initiated:
https://mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog

# Setup

## Connect to Azure Resources

In [None]:
# Connect to Azure Resources
from azureml.core import Workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Load workspace information from the config file and utilize below
ws = Workspace.from_config()

# Authenticate
credential = DefaultAzureCredential()

# Use the Workspace ws information to provide the following
SUBSCRIPTION = ws.subscription_id
RESOURCE_GROUP = ws.resource_group
WS_NAME = ws.name

# Create a handle to this workspace
ml_client = MLClient(
    credential=credential,
    subscription_id=SUBSCRIPTION,
    resource_group_name=RESOURCE_GROUP,
    workspace_name=WS_NAME,
)

# # Print the workspace information (if desired)
# print(credential)
# print(SUBSCRIPTION)
# print(RESOURCE_GROUP)
# print(WS_NAME)
# print(ws.location)

## Data Setup

- Line up the data source

- Store as variable `df`


In [None]:
import pandas as pd

# Pull in data -- cleaned data ready for ML

# Get online data using !wget is done in Microsoft Learn in lower parts of this exercise:
# https://learn.microsoft.com/en-us/training/modules/explore-analyze-data-with-python/3-exercise-explore-data


# Provide the URL for the RAW version of the dataset in GitHub
# Ensure the GitHub URL includes "https://raw.githubusercontent.com/"
# !wget https://raw.githubusercontent.com/drcochran-newman/datasets/main/churn_modeling/churn_cleaned.csv

# Now read the data from the dataset now saved locally in your current Azure ML directory
# Use the same file name
df = pd.read_csv('churn_cleaned.csv.5')

df.head()

# ML Process

In [None]:
# Imports

# Train/Test
from sklearn.model_selection import train_test_split

# Algorithm
# from sklearn.linear_model import LogisticRegression
# from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
# from sklearn.ensemble import GradientBoostingClassifier

# MLflow
import mlflow
import mlflow.sklearn

In [None]:
########################################################

# ML Variables

# Target Variable
target = "Exited"

# Train-Test Split
split = 0.2

# Random Seed
seed = 42

########################################################

In [None]:
########################################################

# Specify details for this training run

experiment_name = 'churn_variations_v2'

algorithm = 'RandomForestClassifier'

training_iteration = '.16'

registered_model_name = algorithm + training_iteration

run_name = registered_model_name

model = RandomForestClassifier(
    random_state=seed,
    max_depth = 12,
    n_estimators = 7
)

########################################################

In [None]:
########################################################

# ML Train, Test, Track and Log

########################################################

# Associate with experiment
mlflow.set_experiment(experiment_name)

# Start Logging
mlflow.start_run(run_name=run_name)

# Enable autologging
mlflow.sklearn.autolog()

"""
Autologging must be enabled before scikit-learn metric APIs are imported from sklearn.metrics. 
Metric APIs imported before autologging is enabled do not log metrics to MLflow runs.
See: https://mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog
"""

# Import Metrics
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, log_loss

# Define Features — all columns except target variable
features = df.drop(target, axis=1)

# Define Labels — only the target variable column
labels = df[target]

# Create Train and Test Splits
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = split, random_state = seed)

# Train and test the model
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred) * 100
precision = precision_score(y_test, y_pred) * 100
recall = recall_score(y_test, y_pred) * 100
# log_loss_score = log_loss(y_test, y_pred) # Not sure this is correct calculated or formatted
f1 = f1_score(y_test, y_pred) * 100
auc = roc_auc_score(y_test, y_pred)

# Print test score results so we can see them immediately at the end of the run.
print()
print("Model:")
print(f"  {model}")
print("\nResults:")
print(f"  Accuracy: {accuracy}%")
print(f"  Precision: {precision}%")
print(f"  Recall: {recall}%")
# print(f"  Log Loss: {log_loss_score}") # Not sure this is correct calculated or formatted
print(f"  F1: {f1}")
print(f"  auc: {auc}")
print()

# Register the model to the workspace
print("Registering the model via MLFlow")
mlflow.sklearn.log_model(
    sk_model=model,
    registered_model_name=registered_model_name,
    artifact_path=registered_model_name,
)

# Save the model to a file
mlflow.sklearn.save_model(
    sk_model=model,
    path=registered_model_name
)

# Stop Logging
mlflow.end_run()