# New Model deployement with A/B testing 


<img style="float: right;" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/main/images/fsi/fraud-detection/model-serving-ab-testing.png" width="800px" />

Our new model is now saved in our Registry.

Our next step is now to deploy it while ensuring that it's behaving as expected. We want to be able to deploy the new version in the REST API:

* Without making any production outage
* Slowly routing requests to the new model
* Supporting auto-scaling & potential bursts
* Performing some A/B testing ensuring the new model is providing better outcomes
* Monitorig our model outcome and technical metrics (CPU/load etc)

Databricks makes this process super simple with Serverless Model Serving endpoint.


<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=984752964297111&notebook=%2F04-Data-Science-ML%2F04.5-AB-testing-model-serving-fraud&demo_name=lakehouse-fsi-fraud&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-fsi-fraud%2F04-Data-Science-ML%2F04.5-AB-testing-model-serving-fraud&version=1&user_hash=086247655aad7f847fc5af0bced92d31b6454844129a39a1b73eef221886867a">

In [0]:
%pip install databricks-sdk==0.36.0 mlflow==2.19.0
dbutils.library.restartPython()

In [0]:
%run ../_resources/00-setup $reset_all_data=false


## Routing our Model Serving endpoint to multiple models
<img style="float: right; margin-left: 10px" width="700px" src="https://cms.databricks.com/sites/default/files/inline-images/db-498-blog-imgs-1.png" />

Databricks Model Serving endpoints allow you to serve different models and dynamically redirect a subset of the traffic to a given model.

Open your <a href="#mlflow/endpoints/dbdemos_fsi_fraud" target="_blank"> Model Serving Endpoint</a>, edit the configuration and add our second model.

Select the traffic ratio you want to send to the new model (20%), save and Databricks will handle the rest for you. 

Your endpoint will automatically bundle the new model, and start routing a subset of your queries to this model.

Let's see how this can be done using the API.

In [0]:
from databricks.sdk.service.serving import ServedEntityInput, EndpointCoreConfigInput, AutoCaptureConfigInput, TrafficConfig, Route
from databricks.sdk import WorkspaceClient
from mlflow import MlflowClient
from datetime import timedelta

model_name = f"{catalog}.{db}.dbdemos_fsi_fraud"
serving_endpoint_name = "dbdemos_fsi_fraud_endpoint"

w = WorkspaceClient()
mlflow_client = MlflowClient(registry_uri="databricks-uc")
served_entities=[
        ServedEntityInput(
            name="prod_model",
            entity_name=model_name,
            entity_version=mlflow_client.get_model_version_by_alias(model_name, "prod").version,
            scale_to_zero_enabled=True,
            workload_size="Small"
        ),
        ServedEntityInput(
            name="candidate_model",
            entity_name=model_name,
            entity_version=mlflow_client.get_model_version_by_alias(model_name, "candidate").version,
            scale_to_zero_enabled=True,
            workload_size="Small"
        )
    ]
traffic_config=TrafficConfig(routes=[
        Route(
            served_model_name="prod_model",
            traffic_percentage=90
        ),
        Route(
            served_model_name="candidate_model",
            traffic_percentage=10
        )
    ])

print('Updating the endpoint, this will take a few sec, please wait...')
w.serving_endpoints.update_config_and_wait(name=serving_endpoint_name, served_entities=served_entities, traffic_config=traffic_config, timeout=timedelta(minutes=30))


## Our new model is now serving 10% of our requests

Open your <a href="#mlflow/endpoints/dbdemos_fsi_fraud_endpoint" target="_blank"> Model Serving Endpoint</a> to view the changes and track the 2 models performance

In [0]:
import mlflow
from mlflow.store.artifact.models_artifact_repo import ModelsArtifactRepository
from mlflow.models.model import Model

mlflow.set_registry_uri('databricks-uc')
p = ModelsArtifactRepository(f"models:/{model_name}@prod").download_artifacts("")
dataset =  {"dataframe_split": Model.load(p).load_input_example(p).to_dict(orient='split')}

In [0]:
from mlflow import deployments
client = mlflow.deployments.get_deploy_client("databricks")
#Let's do multiple call to track the results in the model endpoint inference table
for i in range(10):
    predictions = client.predict(endpoint=serving_endpoint_name, inputs=dataset)
    print(predictions)


## Model monitoring and A/B testing analysis

Because the Model Serving runs within our Lakehouse, Databricks will automatically save and track all our Model Endpoint results as a Delta Table.

We can then easily plug a feedback loop to start analysing the revenue in $ each model is offering. 

All these metrics, including A/B testing validation (p-values etc) can then be pluged into a Model Monitoring Dashboard and alerts can be sent for errors, potentially triggering new model retraining or programatically updating the Endpoint routes to fallback to another model.


<img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/main/images/fsi/fraud-detection/model-serving-monitoring.png" width="1200px" />


## Conclusion: the power of the Lakehouse

In this demo, we've seen an end 2 end flow with the Lakehouse:

- Data ingestion made simple with Delta Live Table
- Leveraging Databricks warehouse to Analyze existing Fraud
- Model Training with AutoML for citizen Data Scientist
- Ability to tune our model for better results, improving our revenue
- Ultimately, the ability to Deploy and track our models in real time, made possible with the full lakehouse capabilities.

[Go back to the introduction]($../00-FSI-fraud-detection-introduction-lakehouse) or discover how to use Databricks Workflow to orchestrate this tasks: [05-Workflow-orchestration-fsi-fraud]($../05-Workflow-orchestration/05-Workflow-orchestration-fsi-fraud)