# Run AutoML and register the best model


## Accelerating Churn model creation using Databricks Auto-ML
### A glass-box solution that empowers data teams without taking away control

Databricks simplifies model creation and MLOps. However, bootstrapping new ML projects can still be long and inefficient.

Instead of creating the same boilerplate for each new project, Databricks Auto-ML can automatically generate state-of-the-art models for Classifications, regression, and forecasts.

Models can be directly deployed or leverage generated notebooks to bootstrap projects with best practices, saving you weeks of effort.

<img width="1000" src="https://github.com/QuentinAmbard/databricks-demo/raw/main/retail/resources/images/auto-ml-full.png"/>


<br>

### Using Databricks Auto ML with our Churn dataset

<br>

<img style="float: right" width="600" src="https://github.com/QuentinAmbard/databricks-demo/raw/main/retail/resources/images/churn-auto-ml.png"/>

<br>

Auto ML is available under **Machine Learning - Experiments**. All we have to do is create a new AutoML experiment, select the table containing the ground-truth labels, and join it with the features in the feature table.

Our prediction target is the `churn` column.

Click on **Start**, and Databricks will do the rest.

While this is done using the UI, you can also leverage the [Python API](https://docs.databricks.com/applications/machine-learning/automl.html#automl-python-api-1)

<br>

#### Join/Use features directly from the Feature Store from the [UI](https://docs.databricks.com/machine-learning/automl/train-ml-model-automl-ui.html#use-existing-feature-tables-from-databricks-feature-store) or [python API]()
* Select the table containing the ground-truth labels (i.e., `dbdemos.schema.churn_label_table`)
* Join remaining features from the feature table (i.e., `dbdemos.schema.churn_feature_table`)

Please take a look at the __Quickstart__ version of this demo for an example of AutoML in action.

In [0]:
%pip install --quiet mlflow==2.19 databricks-feature-engineering==0.8.0

In [0]:
%load_ext autoreload
%autoreload 2

In [0]:
dbutils.library.restartPython()

In [0]:
import os
notebook_path =  '/Workspace/' + os.path.dirname(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get())
%cd $notebook_path
%cd ../features

# Define Variables

In [0]:
# Feature table to store the computed features.
dbutils.widgets.text(
    "advanced_churn_label_table",
    "dev.koeppen_dabs_demo.advanced_churn_label_table",
    label="Label Table",
)

# Feature table to store the computed features.
dbutils.widgets.text(
    "advanced_churn_feature_table",
    "dev.koeppen_dabs_demo.advanced_churn_feature_table",
    label="Feature Table",
)

# Feature table to store the computed features.
dbutils.widgets.text(
    "avg_price_increase",
    "dev.koeppen_dabs_demo.avg_price_increase",
    label="Avg Price Increase Function",
)

# Feature table to store the computed features.
dbutils.widgets.text(
    "experiment_name",
    "advanced_mlops_churn_experiment",
    label="Experiment Name",
)

# Feature table to store the computed features.
dbutils.widgets.text(
    "model_name",
    "dev.koeppen_dabs_demo.advanced_mlops_churn_model",
    label="Model Name",
)

# Feature table to store the computed features.
dbutils.widgets.text(
    "features_from_registered_automl_model",
    "dev.koeppen_dabs_demo.features_from_registered_automl_model",
    label="features_from_registered_automl_model",
)

In [0]:
advanced_churn_label_table = dbutils.widgets.get("advanced_churn_label_table")
advanced_churn_feature_table = dbutils.widgets.get("advanced_churn_feature_table")
avg_price_increase = dbutils.widgets.get("avg_price_increase")
experiment_name = dbutils.widgets.get("experiment_name")
model_name = dbutils.widgets.get("model_name")
features_from_registered_automl_model = dbutils.widgets.get("features_from_registered_automl_model")

In [0]:
output_schema = advanced_churn_feature_table.split(".")[0]
output_database = advanced_churn_feature_table.split(".")[1]
spark.sql(f"USE CATALOG {output_schema}");
spark.sql(f"USE SCHEMA {output_database}")

### labels_df has our customer_id, transaction_ts, churn, and split values

In [0]:
labels_df = spark.table(advanced_churn_label_table)

### Our advanced_churn_feature_table has all the features we extracted from the CSV and pre-processing we did in Data Preparation (it doesn't have churn label)

In [0]:
from databricks.feature_store import FeatureFunction, FeatureLookup

feature_lookups = [
    FeatureLookup(
      table_name= advanced_churn_feature_table,
      lookup_key=["customer_id"],
      timestamp_lookup_key="transaction_ts"
    ),
    FeatureFunction(
      udf_name=avg_price_increase,
      input_bindings={
        "monthly_charges_in" : "monthly_charges",
        "tenure_in" : "tenure",
        "total_charges_in" : "total_charges"
      },
      output_name="avg_price_increase"
    )
]

# Step 1: Read features
from databricks.feature_engineering import FeatureEngineeringClient
fe = FeatureEngineeringClient()

# Create Feature specifications object
training_set_specs = fe.create_training_set(
  df=labels_df, # DataFrame with lookup keys and label/target (+ any other input)
  label="churn",
  feature_lookups=feature_lookups,
  exclude_columns=["customer_id", "transaction_ts", 'split']
)
training_df = training_set_specs.load_df()


In [0]:
display(training_df)

In [0]:
import mlflow
import databricks.automl
from databricks.feature_engineering import FeatureEngineeringClient
from pyspark.sql import functions as F

def start_automl_run(dataset, target_col, experiment_name=None, timeout_minutes=15):
    return databricks.automl.classify(
        dataset=dataset,
        target_col=target_col,
        timeout_minutes=timeout_minutes,
        experiment_name=experiment_name
    )


In [0]:
automl_result = start_automl_run(
    dataset=training_df,         
    target_col="churn",
    timeout_minutes=15,
    experiment_name=experiment_name
)
best_model_uri = automl_result.best_trial.model_path
best_run_id= automl_result.best_trial.mlflow_run_id
print(f"Best model run ID: {best_run_id}")
print(f"Registered champion model: {best_model_uri}")


In [0]:
from mlflow import register_model

registration = mlflow.register_model(
    model_uri=best_model_uri,
    name=model_name
)

print("Model version:", registration.version)
print("Run ID:", registration.run_id)
version=registration.version
run_id=registration.run_id

from mlflow import register_model
from mlflow.tracking import MlflowClient

client = MlflowClient()

fe = FeatureEngineeringClient()
fe.log_model(
    model=best_model_uri,
    artifact_path="automl_model",
    flavor=mlflow.pyfunc,
    training_set=training_df,
    name=model_name,
    input_example=training_df.limit(5).toPandas(),
    description="AutoML model with feature lineage"
)



versions = client.search_model_versions(f"run_id='{best_run_id}' and name='{model_name}'")
model_version_details = client.get_model_version(name=model_name, version=versions)

run_id=model_version_details.run_id

print("Model version:", versions)
print("Run ID:", run_id)


In [0]:
from mlflow.tracking import MlflowClient
client = MlflowClient()

# Assign alias
if registration.version == '1':
    client.set_registered_model_alias(name=model_name, 
                                      alias="champion", 
                                      version=registration.version)
else:
    client.set_registered_model_alias(name=model_name, 
                                      alias="challenger", 
                                      version=registration.version)

print(f"Assigned alias {'champion' if registration.version == '1' else 'challenger'} to version {registration.version}")

In [0]:
import mlflow.sklearn
# Load the model
pipeline_model = mlflow.sklearn.load_model(best_model_uri)
# Get the last step (the actual estimator)
estimator = pipeline_model.steps[-1][1]  

In [0]:
client = MlflowClient()

# We can also tag the model version with the F1 score for visibility
client.set_model_version_tag(
  name=model_name,
  version=version,
  key="model_type",
  value=f"{type(estimator).__name__}"
)

# We can also tag the model version with the F1 score for visibility
client.set_model_version_tag(
  name=model_name,
  version=version,
  key="modeling_method",
  value="AutoML"
)

# We can also tag the model version with the F1 score for visibility
client.set_model_version_tag(
  name=model_name,
  version=version,
  key="best_run_id",
  value=best_run_id
)

In [0]:
dbutils.notebook.exit(0)