## Monthly AutoML Retrain

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/mlops-end2end-flow-7.png" width="1200">

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-science&org_id=1549883858499596&notebook=%2F07_retrain_churn_automl&demo_name=mlops-end2end&event=VIEW&path=%2F_dbdemos%2Fdata-science%2Fmlops-end2end%2F07_retrain_churn_automl&version=1">
<!-- [metadata={"description":"MLOps end2end workflow: Batch to automatically retrain model on a monthly basis.",
 "authors":["quentin.ambard@databricks.com"],
 "db_resources":{},
  "search_tags":{"vertical": "retail", "step": "Model testing", "components": ["mlflow"]},
                 "canonicalUrl": {"AWS": "", "Azure": "", "GCP": ""}}] -->

### A cluster has been created for this demo
To run this demo, just select the cluster `dbdemos-mlops-end2end-shawnzou2020` from the dropdown menu ([open cluster configuration](https://dbc-abdbb8e0-f50f.cloud.databricks.com/#setting/clusters/0410-014028-ndqe9et5/configuration)). <br />
*Note: If the cluster was deleted after 30 days, you can re-create it with `dbdemos.create_cluster('mlops-end2end')` or re-install the demo: `dbdemos.install('mlops-end2end')`*

## Monthly training job

We can programatically schedule a job to retrain our model, or retrain it based on an event if we realize that our model doesn't behave as expected.

This notebook should be run as a job. It'll call the Databricks Auto-ML API, get the best model and request a transition to Staging.

In [0]:
%run ./_resources/00-setup $reset_all_data=false $catalog="hive_metastore"



USE CATALOG `hive_metastore`
using cloud_storage_path /Users/quentin.ambard@databricks.com/demos/retail
using catalog.database `hive_metastore`.`retail_quentin_ambard`


In [0]:
fs = FeatureStoreClient()
features = fs.read_table(f'{dbName}.dbdemos_mlops_churn_features')

In [0]:
import databricks.automl
model = databricks.automl.classify(features, target_col = "churn", data_dir= "dbfs:/tmp/", timeout_minutes=5) 

2023/06/21 12:59:34 INFO databricks.automl.client.manager: AutoML will optimize for F1 score metric, which is tracked as val_f1_score in the MLflow experiment.
2023/06/21 12:59:35 INFO databricks.automl.shared.databricks_utils: No host name to create absolute URL
2023/06/21 12:59:35 INFO databricks.automl.client.manager: MLflow Experiment ID: 402627122880526
2023/06/21 12:59:35 INFO databricks.automl.client.manager: MLflow Experiment: #mlflow/experiments/402627122880526
2023/06/21 13:00:30 INFO databricks.automl.shared.databricks_utils: No host name to create absolute URL
2023/06/21 13:00:30 INFO databricks.automl.client.manager: Data exploration notebook: #notebook/402627122880544
2023/06/21 13:05:32 INFO databricks.automl.client.manager: AutoML experiment completed successfully.


Unnamed: 0,Train,Validation,Test
f1_score,0.6,0.614,0.629
false_negatives,286.0,109.0,105.0
score,0.755,0.742,0.745
example_count,4229.0,1382.0,1432.0
recall_score,0.73,0.723,0.747
true_negatives,2419.0,741.0,757.0
false_positives,749.0,248.0,260.0
true_positives,775.0,284.0,310.0
accuracy_score,0.755,0.742,0.745
precision_recall_auc,0.537,0.546,0.57


In [0]:
import mlflow
from mlflow.tracking.client import MlflowClient

client = MlflowClient()

run_id = model.best_trial.mlflow_run_id
model_name = "dbdemos_mlops_churn"
model_uri = f"runs:/{run_id}/model"

client.set_tag(run_id, key='db_table', value=f'{dbName}.dbdemos_mlops_churn_features')
client.set_tag(run_id, key='demographic_vars', value='seniorCitizen,gender_Female')

model_details = mlflow.register_model(model_uri, model_name)

Registered model 'dbdemos_mlops_churn' already exists. Creating a new version of this model...
Created version '57' of model 'dbdemos_mlops_churn'.


In [0]:
model_version_details = client.get_model_version(name=model_name, version=model_details.version)

client.update_model_version(
  name=model_details.name,
  version=model_details.version,
  description="This model version was built using autoML and automatically getting the best model."
)

Out[24]: <ModelVersion: creation_timestamp=1687352733531, current_stage='None', description=('This model version was built using autoML and automatically getting the best '
 'model.'), last_updated_timestamp=1687352739860, name='dbdemos_mlops_churn', run_id='01ef78769730451c8143c8c02f47f6e7', run_link='', source='dbfs:/databricks/mlflow-tracking/402627122880526/01ef78769730451c8143c8c02f47f6e7/artifacts/model', status='READY', status_message='', tags={}, user_id='7644138420879474', version='57'>

In [0]:
# Transition request to staging
staging_request = {'name': model_name, 'version': model_details.version, 'stage': 'Staging', 'archive_existing_versions': 'true'}
mlflow_call_endpoint('transition-requests/create', 'POST', json.dumps(staging_request))

Out[25]: {'request': {'creation_timestamp': 1687352740008,
  'user_id': 'quentin.ambard@databricks.com',
  'activity_type': 'REQUESTED_TRANSITION',
  'comment': '',
  'to_stage': 'Staging'}}


## Next: Building a dashboard with Customer Churn information

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/mlops-end2end-flow-dashboard.png" width="600px" style="float:right"/>

We now have all our data ready, including customer churn. 

The Churn table containing analysis and Churn predictions can be shared with the Analyst and Marketing team.

With Databricks SQL, we can build our Customer Churn monitoring Dashboard to start tracking our Marketing campaign effect!

For a complete Customer Churn example & dashboards, run `dbdemos.install('lakehouse-retail-churn')`.