# 04d. Deploy Model as Serving Endpoint

**Purpose**: Create or update Databricks serving endpoint with production model

**Prerequisites**:
* Model has 'production' alias (04c_set_production_alias.ipynb)

**Outputs**:
* Live serving endpoint
* Endpoint URL for real-time predictions

In [0]:
# Install dependencies
%pip install --upgrade typing_extensions>=4.6.0 pydantic>=2.0.0 --quiet
%pip install databricks-sdk --quiet
!pip install -r /Workspace/Users/ashish.kamboj@tigeranalytics.com/home-credit-hyperpersonalization/requirements.txt

dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
import sys
import os
from mlflow.tracking import MlflowClient
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import EndpointCoreConfigInput, ServedEntityInput

# Add project root to path
project_root = os.path.dirname(os.getcwd()) if os.getcwd().endswith('notebooks') else os.getcwd()
if project_root not in sys.path:
    sys.path.insert(0, project_root)

from utils.common_utils import load_config, setup_logging, print_section_header

print("✅ Imports successful")

✅ Imports successful


In [0]:
config = load_config('../config/config.yaml')
setup_logging(config)

print_section_header("Deploy Serving Endpoint")

model_name = config['mlflow']['databricks']['registered_model_name']
endpoint_name = config['deployment']['endpoint_name']

print(f"Model: {model_name}")
print(f"Endpoint: {endpoint_name}")


                            Deploy Serving Endpoint                             

Model: datafabric_catalog.customer_hc_silver.next_best_product_model
Endpoint: next-best-product-endpoint


In [0]:
# Get model version with 'production' alias
client = MlflowClient()

print("Looking for model with 'production' alias...")

try:
    production_version = client.get_model_version_by_alias(model_name, 'production')
    model_version = production_version.version
    print(f"✅ Found Version {model_version} with 'production' alias")
    print(f"Model URI: models:/{model_name}@production")
    
except Exception as e:
    print(f"❌ No model found with 'production' alias")
    print(f"\nAvailable versions:")
    all_versions = client.search_model_versions(f"name='{model_name}'")
    for v in sorted(all_versions, key=lambda x: int(x.version)):
        version_details = client.get_model_version(model_name, v.version)
        aliases = version_details.aliases if hasattr(version_details, 'aliases') else []
        print(f"  Version {v.version}: Aliases={aliases}")
    raise ValueError("Please run 04c_set_production_alias.ipynb first")

Looking for model with 'production' alias...
✅ Found Version 5 with 'production' alias
Model URI: models:/datafabric_catalog.customer_hc_silver.next_best_product_model@production


In [0]:
# Create or update serving endpoint
print(f"\nDeploying endpoint: {endpoint_name}")
print(f"Model: {model_name}")
print(f"Version: {model_version}")
print(f"\n⚠️ Make sure you've retrained the model with pyarrow==20.0.0 (see cell 6)")
print(f"\nProceed with deployment? Uncomment the code below and rerun this cell.\n")

# Uncomment to deploy:
try:
    w = WorkspaceClient()
    
    # Check if endpoint already exists
    try:
        existing_endpoint = w.serving_endpoints.get(endpoint_name)
        print(f"\n⚠️ Endpoint '{endpoint_name}' already exists. Updating...")
        
        # Update existing endpoint
        w.serving_endpoints.update_config_and_wait(
            name=endpoint_name,
            served_entities=[
                ServedEntityInput(
                    entity_name=model_name,
                    entity_version=str(model_version),
                    scale_to_zero_enabled=True,
                    workload_size="Small"
                )
            ]
        )
        print(f"✅ Endpoint updated with version {model_version}")
        
    except Exception as e:
        if "RESOURCE_DOES_NOT_EXIST" in str(e) or "does not exist" in str(e).lower():
            print(f"\nCreating new endpoint '{endpoint_name}'...")
            
            # Create new endpoint
            w.serving_endpoints.create_and_wait(
                name=endpoint_name,
                config=EndpointCoreConfigInput(
                    served_entities=[
                        ServedEntityInput(
                            entity_name=model_name,
                            entity_version=str(model_version),
                            scale_to_zero_enabled=True,
                            workload_size="Small"
                        )
                    ]
                )
            )
            print(f"✅ Endpoint created successfully!")
        else:
            raise e
    
    print(f"\n✅ Deployment successful!")
    
except Exception as e:
    print(f"❌ Error: {str(e)}")

print("\n👉 Uncomment the code above after retraining the model")


Deploying endpoint: next-best-product-endpoint
Model: datafabric_catalog.customer_hc_silver.next_best_product_model
Version: 5

⚠️ Make sure you've retrained the model with pyarrow==20.0.0 (see cell 6)

Proceed with deployment? Uncomment the code below and rerun this cell.


⚠️ Endpoint 'next-best-product-endpoint' already exists. Updating...
✅ Endpoint updated with version 5

✅ Deployment successful!

👉 Uncomment the code above after retraining the model


In [0]:
# Get endpoint details and URL
try:
    endpoint = w.serving_endpoints.get(endpoint_name)
    endpoint_url = f"https://{w.config.host}/serving-endpoints/{endpoint_name}/invocations"
    
    print("\n" + "=" * 80)
    print("SERVING ENDPOINT DETAILS")
    print("=" * 80)
    
    print(f"\nEndpoint Name: {endpoint.name}")
    print(f"Endpoint ID: {endpoint.id}")
    print(f"State: {endpoint.state.ready if endpoint.state else 'Unknown'}")
    
    if endpoint.config and endpoint.config.served_entities:
        print(f"\nServed Model:")
        for entity in endpoint.config.served_entities:
            print(f"  Model: {entity.entity_name}")
            print(f"  Version: {entity.entity_version}")
            print(f"  Workload Size: {entity.workload_size}")
            print(f"  Scale to Zero: {entity.scale_to_zero_enabled}")
    
    print(f"\n🔗 Endpoint URL: {endpoint_url}")
    
except Exception as e:
    print(f"❌ Error getting endpoint details: {str(e)}")


SERVING ENDPOINT DETAILS

Endpoint Name: next-best-product-endpoint
Endpoint ID: da1c193a3bb74a7081a5ab42d9ce4d6d
State: EndpointStateReady.READY

Served Model:
  Model: datafabric_catalog.customer_hc_silver.next_best_product_model
  Version: 5
  Workload Size: Small
  Scale to Zero: True

🔗 Endpoint URL: https://https://adb-1364099644588382.2.azuredatabricks.net/serving-endpoints/next-best-product-endpoint/invocations


In [0]:
print_section_header("Endpoint Deployment Summary")

print(f"""
✅ Serving Endpoint Deployed!

Endpoint Name: {endpoint_name}
Model: {model_name}
Version: {model_version}
Alias: production

🔗 Endpoint URL: {endpoint_url}

📝 How to Use:

1. **Databricks UI**: 
   Serving → {endpoint_name} → Query Endpoint

2. **REST API**:
   POST {endpoint_url}
   Headers: Authorization: Bearer <token>
   Body: {{'dataframe_records': [{{...features...}}]}}

3. **Python SDK**:
   See 07_databricks_realtime_inference.ipynb

👉 Next Steps:
1. Test endpoint using 07_databricks_realtime_inference.ipynb
2. Run batch inference using 05_batch_inference.ipynb
3. Set up monitoring using 06_model_monitoring.ipynb
""")


                          Endpoint Deployment Summary                           


✅ Serving Endpoint Deployed!

Endpoint Name: next-best-product-endpoint
Model: datafabric_catalog.customer_hc_silver.next_best_product_model
Version: 5
Alias: production

🔗 Endpoint URL: https://https://adb-1364099644588382.2.azuredatabricks.net/serving-endpoints/next-best-product-endpoint/invocations

📝 How to Use:

1. **Databricks UI**: 
   Serving → next-best-product-endpoint → Query Endpoint

2. **REST API**:
   POST https://https://adb-1364099644588382.2.azuredatabricks.net/serving-endpoints/next-best-product-endpoint/invocations
   Headers: Authorization: Bearer <token>
   Body: {'dataframe_records': [{...features...}]}

3. **Python SDK**:
   See 07_databricks_realtime_inference.ipynb

👉 Next Steps:
1. Test endpoint using 07_databricks_realtime_inference.ipynb
2. Run batch inference using 05_batch_inference.ipynb
3. Set up monitoring using 06_model_monitoring.ipynb

