# End-to-End MLOps demo with MLFlow, Auto ML and Models in Unity Catalog

## Challenges moving ML project into production

Moving ML project from a standalone notebook to a production-grade data pipeline is complex and require multiple competencies.

Having a model up and running in a notebook isn't enough. We need to cover the end to end ML Project life cycle and solve the following challenges:

* Update data over time (production-grade ingestion pipeline)
* How to save, share and re-use ML features in the organization
* How to ensure a new model version respect quality standard and won't break the pipeline
* Model governance: what is deployed, how is it trained, by who, which data?
* How to monitor and re-train the model...

In addition, these project typically invole multiple teams, creating friction and potential silos

* Data Engineers, in charge of ingesting, preparing and exposing the data
* Data Scientist, expert in data analysis, building ML model
* ML engineers, setuping the ML infrastructure pipelines (similar to devops)

This has a real impact on the business, slowing down projects and preventing them from being deployed in production and bringing ROI.

## What's MLOps ?

MLOps is is a set of standards, tools, processes and methodology that aims to optimize time, efficiency and quality while ensuring governance in ML projects.

MLOps orchestrate a project life-cycle and adds the glue required between the component and teams to smoothly implement such ML pipelines.

Databricks is uniquely positioned to solve this challenge with the Lakehouse pattern. Not only we bring Data Engineers, Data Scientists and ML Engineers together in a unique platform, but we also provide tools to orchestrate ML project and accelerate the go to production.

## MLOps process walkthrough

In this quickstart demo, we'll walkthrough a few common steps in the MLOps process. The end result of this process is a model used to power a dashboard for downstream business stakeholders which is:
* preparing features
* training a model for deployment
* registering the model for its use to be goverened
* validating the model in a champion-challenger analysis
* invoking a trained ML model as a pySpark UDF


<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/mlops/mlops-uc-end2end-0.png?raw=true" width="1200">

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-science&org_id=1832744760933926&notebook=%2F01-mlops-quickstart%2F00_mlops_end2end_quickstart_presentation&demo_name=mlops-end2end&event=VIEW&path=%2F_dbdemos%2Fdata-science%2Fmlops-end2end%2F01-mlops-quickstart%2F00_mlops_end2end_quickstart_presentation&version=1">

### A cluster has been created for this demo
To run this demo, just select the cluster `dbdemos-mlops-end2end-edgar_aguilerarod` from the dropdown menu ([open cluster configuration](https://dbc-07122dbb-1c85.cloud.databricks.com/#setting/clusters/0102-173414-9ev1v92w/configuration)). <br />
*Note: If the cluster was deleted after 30 days, you can re-create it with `dbdemos.create_cluster('mlops-end2end')` or re-install the demo: `dbdemos.install('mlops-end2end')`*

In this first quickstart, we'll cover the foundation of MLOps.

The advanced section will go into more details, including:
- Model serving
- Realtime Feature serving with Online Tables
- A/B testing 
- Automated re-training
- Infra setup abd hooks with Databricks MLOps Stack
- ...

In [0]:
%restart_python

In [0]:
%run ../_resources/00-setup

## Customer churn detection

To explore MLOps, we'll be implementing a customer churn model.

Our marketing team asked us to create a Dashboard tracking Churn risk evolution. In addition, we need to provide our renewal team with a daily list of customers at Churn risk to increase our final revenue.

Our Data Engineer team provided us a dataset collecting information on our customer base, including churn information. That's where our implementation starts.

Let's see how we can implement such a model, but also provide our marketing and renewal team with Dashboards to track and analyze our Churn prediction.

Ultimately, you'll build able to build a complete DBSQL Churn Dashboard containing all our customer & churn information, but also start a Genie space to ask any question using plain english!

In [0]:
telcoDF = spark.table("mlops_churn_bronze_customers")
display(telcoDF)

## Feature Engineering
Our first job is to analyze the data, and prepare a set of features.


Next: [Analyze the data and prepare features]($./01_feature_engineering)