
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>



# Real-time Deployment with Model Serving (Offline Features)

This demo focuses on **real-time model deployment** using Databricks Model Serving with **offline feature tables** stored in Unity Catalog (Delta tables with a primary key). You’ll train on offline features, deploy two model versions to one endpoint, and validate **A/B testing** traffic splits.

**Learning Objectives**

By the end you will be able to:
- Use a **UC Delta feature table (offline)** for model training and evaluation.
- Register models to Unity Catalog and manage **Champion/Challenger** aliases.
- Create a **Model Serving** endpoint with **two versions** and configure a **traffic split** for A/B testing.
- **Query** the endpoint (batching requests) and **visualize** prediction distribution by served model.
- *(Optional)* Enable **inference table auto-capture** for request/response logging and observability.


## REQUIRED - SELECT CLASSIC COMPUTE
Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:
1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.
   
   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Requirements

Please review the following requirements before starting the lesson:

* To run this notebook, you need to use one of the following Databricks runtime(s): **17.3.x-cpu-ml-scala2.13**

* Online Tables must be enabled for the workspace.


## Classroom Setup

Before starting the demo, run the provided classroom setup script. This script will define configuration variables necessary for the demo. Execute the following cell:

In [0]:
%run ../Includes/Classroom-Setup-4.1

**Other Conventions:**

Throughout this demo, we'll refer to the object `DA`. This object, provided by Databricks Academy, contains variables such as your username, catalog name, schema name, working directory, and dataset locations. Run the code block below to view these details:

In [0]:
print(f"Username:          {DA.username}")
print(f"Catalog Name:      {DA.catalog_name}")
print(f"Schema Name:       {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"User DB Location:  {DA.paths.datasets}")

## Offline Feature Tables for Real-Time Inferencing

Before deploying a model for real-time inference, it’s important to understand the role of **feature tables** in Model Serving.

A **feature table** is simply a **materialized Delta table in Unity Catalog** with a defined **primary key**. These tables store curated feature values used during both model training and inference.

In this demo, we’ll focus entirely on **offline feature tables**, which:
- Reside in Unity Catalog as Delta tables.  
- Are typically refreshed on a schedule (for example, daily or hourly batch updates).  
- Can be used directly by Model Serving endpoints without any additional configuration.  

> In contrast to real-time or streaming “online” feature stores (now deprecated as standalone “online tables”), offline tables are suitable for most production workloads where features are precomputed and served from Delta Lake.

##Real-time Deployment With Offline Feature Tables
Here we consider a scenario where you have already gone through the development process (data preparation, and model development) and you're ready to deploy a model with offline features. We will first look at deploying two models that were created as a part of the classroom setup - a champion model and a challenger model with aliases `champion` and `challenger`, respectively. 

We will serve our two models using a 50/50 traffic split for A/B Testing. First, Let's read in our data and explore its lineage. 


### Step 1: Inspect Offline The Feature Table and Model Versions

For this demonstration, we will use a fictional dataset from a Telecom Company, which includes customer information. This dataset encompasses **customer demographics**, including internet subscription details such as subscription plans, monthly charges and payment methods. 

As a part of the classroom setup for this course, a feature table was created called **features** that **did _not_ include feature lookups.** This is the table we are reading in during the next step.

#### Lineage Inspection
- Navigate to the catalog and schema used with this Vocareum environment (see the output from the previous cell).
- Find the table called `features` and model called `ml_model`. 
  - Click on Lineage. 
  - Click on **See lineage graph** and inspect it. This will show the footprint of how the catalog assets were made.

### Step 2: Read in Features and Response Variable from Feature Store

Here we will read in our dataset and split between features and response variables. We will show how this can be performed with the Databricks SDK using the Feature Engineering Client.

> #### What's the difference between `fe.read_table()` and `read.spark.table()`?
Essentially, we use `fe.read_table()` whenever we are specifically working with feature tables stored within Feature Store and `spark.read.table()` for general-purpose reading. Note that `fe.read_table()` is part of the Databricks Feature Engineering API and integrates well with other Feature Store APIs like logging models (see ****Part 2: Real-Time Deployment with Online Feature Tables****). On the other hand, `spark.read.table()` is a broader Spark SQL method for reading data from any table within the Spark session.

In [0]:
from databricks.feature_engineering import FeatureEngineeringClient

# Initialize Feature Engineering Client
fe = FeatureEngineeringClient()

# Define primary key 
primary_key = "customerID"

# Read in feature table
feature_table_name = f"{DA.catalog_name}.{DA.schema_name}.features"
X_train_df = fe.read_table(name=feature_table_name)
X_train_pdf = X_train_df.drop(primary_key).toPandas()

# Read in response table 
response_table_name = f"{DA.catalog_name}.{DA.schema_name}.response"
Y_train_df = spark.read.table(response_table_name)
Y_train_pdf = Y_train_df.drop(primary_key).toPandas()

### Step 3: Real-time A/B Testing with Model Serving

Let's serve the two models we logged in the previous step using Model Serving. Model Serving supports endpoint management via the UI and the API. 

Below you will find instructions for using the UI and it is simpler method compared to the API. **In this demo, we will use the API to configure and create the endpoint**.

**Both the UI and the API support querying created endpoints in real-time**. We will use the API to query the endpoint using a test-set.


> #### What is A/B Testing? 
> A/B testing is a method to compare two versions of a model or system by splitting user traffic and measuring performance metrics to determine which version delivers better results. 

In [0]:
endpoint_name = f"ML_AS_03_Demo4_{DA.unique_name('_')}"
print(f"Endpoint name: {endpoint_name}")

### Option 1: Serve model(s) using UI

After registering the (new version(s) of the) model to the model registry. To provision a serving endpoint via UI, follow the steps below.

1. In the left sidebar, click **Serving**.

2. To create a new serving endpoint, click **Create serving endpoint**.   
  
    a. In the **Name** field, enter the name printed above.  
  
    b. Click in the Entity field. A dialog appears. Go to **My models**, and then select the **'ml_model'** from the drop-down menus. 

    c. Click **Confirm**.
  
    d. In the **Version** drop-down menu, select the **version 1**.    
  
    e. Make sure the **Compute Scale-out** field is set to Custom.
  
    f. *[OPTIONAL]* to deploy another model (e.g. for A/B testing):
    - Click on **+Add served entity**.
    - Enter the above mentioned details as above, but use **version 2**.
    - Set the traffic split to 50% for each model.
  
    g. Click **Create**. The endpoint page opens and the endpoint creation process starts.   
  
See the Databricks documentation for details ([AWS](https://docs.databricks.com/machine-learning/model-serving/create-manage-serving-endpoints.html#ui-workflow)|[Azure](https://learn.microsoft.com/azure/databricks/machine-learning/model-serving/create-manage-serving-endpoints#--ui-workflow)).

### Option 2: Serve Model(s) Using the Databricks Python SDK


#### Get Models to Serve

In order to serve the model, we will initialize the MLflow client with `MLflowClient` and the workspace client with `WorkspaceClient`. We will configure the MLflow client to point to Unity Catalog instead of the Workspace with `set_registry_uri("databricks-uc")`. The workspace client will be used to create the model serving endpoint. 

In [0]:
from mlflow.tracking import MlflowClient
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import EndpointTag

# Point to UC model registry
mlflow.set_registry_uri("databricks-uc")
# Initialize MLflow client
client = mlflow.MlflowClient()
# Initialize workspace client
w = WorkspaceClient()

Define variables that will be used for configuring the endpoint like `model_name`. The output from running the next cell will show version 1 of our model registered as the champion model and version 2 as being the challenger. 

In [0]:
# Define model name
model_name = f"dbacademy.{DA.schema_name}.ml_model"
# Parse model name from UC namespace
served_model_name =  model_name.split('.')[-1]
# Define the endpoint name
endpoint_name = f"ML_AS_03_Demo4_{DA.unique_name('_')}"

# Get version of our model registered to UC as a part of the classroom setup
model_version_champion = client.get_model_version_by_alias(name=model_name, alias="Champion").version # Get champion version
model_version_challenger = client.get_model_version_by_alias(name=model_name, alias="Challenger").version # Get challenger version


print(f"Model version Champion: {model_version_champion}")
print(f"Model version Challenger: {model_version_challenger}")

#### Configure

Define our model serving endpoint with `endpoint_config`. The configuration below shows two versions of the same being deployed (`model_version_champion` and `model_version_challenger`) along with how to configure traffic during inferencing.

In [0]:
from databricks.sdk.service.serving import EndpointCoreConfigInput

endpoint_config_dict = {
    "served_models": [
        {
            "model_name": model_name,
            "model_version": model_version_champion,
            "scale_to_zero_enabled": True,
            "workload_size": "Small"
        },
        {
            "model_name": model_name,
            "model_version": model_version_challenger,
            "scale_to_zero_enabled": True,
            "workload_size": "Small"
        },
    ],
    "traffic_config": {
        "routes": [
            {"served_model_name": f"{served_model_name}-{model_version_champion}", "traffic_percentage": 50},
            {"served_model_name": f"{served_model_name}-{model_version_challenger}", "traffic_percentage": 50},
        ]
    },
    "auto_capture_config":{
        "catalog_name": DA.catalog_name,
        "schema_name": DA.schema_name,
        "table_name_prefix": "db_academy" # Name of the inference table
    }
}


endpoint_config = EndpointCoreConfigInput.from_dict(endpoint_config_dict)

#### Serve the endpoint
Use the configuration just created to serve the model.
> The time to create a model serving endpoint < 1 minute

In [0]:
try:
  w.serving_endpoints.create(
    name=endpoint_name,
    config=endpoint_config,
    tags=[EndpointTag.from_dict({"key": "db_academy", "value": "serve_fs_model_example"})]
  )
  print(f"Creating endpoint {endpoint_name} with models {model_name} versions {model_version_champion} & {model_version_challenger}")

except Exception as e:
  if "already exists" in e.args[0]:
    print(f"Endpoint with name {endpoint_name} already exists")

  else:
    raise(e)

#### Verify Endpoint Creation

Let's verify that the endpoint is created and ready to be used for inference using the `assert` command, which is used to check whether a given condition is true.

In [0]:
endpoint = w.serving_endpoints.wait_get_serving_endpoint_not_updating(endpoint_name)

assert endpoint.state.config_update.value == "NOT_UPDATING" and endpoint.state.ready.value == "READY" , "Endpoint not ready or failed"

#### Query the Endpoint and Visualize

Here we will use the training dataset to query our endpoint.

1. Define the dataset to sample from.
1. Query by batch to highlight model-split traffic. 

In [0]:
dataframe_records = X_train_pdf.iloc[:1000].to_dict(orient='records') #1k sample records

Here we will query in batches so we can see the traffic split per 100 rows (there are around 2000 rows in this dataset)

To help visualize the A/B testing output, create a visual using the UI (you only need to do this once;  rerunning the cell will update the visualization). 
1. After running the next cell, select the + sign on the second table and select **Visualization**. 
1. The default visual should represent the Yes/No split per model.

> Since the dataset we're working with is not very large, you might have to run the cell a few times to get a fairly close 50/50 split. 

In [0]:
import pandas as pd

print("Inference results:")

batch_size = 100  # Number of records per batch
num_batches = (len(dataframe_records) + batch_size - 1) // batch_size  # Total number of batches

all_predictions = []
all_models = []

# Process data in batches
for i in range(num_batches):
    batch_records = dataframe_records[i * batch_size:(i + 1) * batch_size]  # Slice batch

    # Query the model serving endpoint
    query_response = w.serving_endpoints.query(name=endpoint_name, dataframe_records=batch_records)

    # Collect predictions and model served details
    all_predictions.extend(query_response.predictions)
    all_models.extend([query_response.served_model_name] * len(query_response.predictions))  # Duplicate model name per prediction

# Convert to DataFrame
results_df = pd.DataFrame({
    "prediction": all_predictions,
    "model_served": all_models
})

# Count occurrences of predictions
count_results = results_df['prediction'].value_counts().reset_index()
count_results.columns = ['prediction', 'count']

# Display aggregated count of predictions
display(count_results)

# Aggregate count of predictions per model
model_count_results = results_df.groupby(["model_served", "prediction"]).size().reset_index(name="count")

# Display results grouped by model and prediction type
display(model_count_results)


## Conclusion

This demonstration showed how to deploy and serve machine learning models in real time using **Databricks Model Serving** with **offline feature tables** stored in Unity Catalog.  
You learned how to:

- Register and manage models with **Champion/Challenger** aliases.  
- Configure a **Model Serving endpoint** with multiple model versions for **A/B testing**.  
- Send inference requests and visualize traffic distribution between model versions.  

This workflow highlights how Databricks simplifies real-time deployment and model experimentation using reliable, batch-refreshed **offline Delta feature tables**.


&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>