# 🚀 Deployment — **Example Notebook** (SageMaker-ready)

This notebook **continues after** `evaluation_example.ipynb` and `validation_example.ipynb` to **deploy** a validated candidate to production using **Amazon SageMaker**.

### 🎯 Goals of this step
- **Serve predictions reliably**: create a secure, scalable, observable inference endpoint (real-time or serverless).
- **Safe rollout**: canary / blue‑green traffic shifting with **automatic rollback** helper.
- **Production guardrails**: data capture, CloudWatch logs, autoscaling, alarms, and **Model Monitor** schedules.
- **Governance**: source the **Approved** model from the **Model Registry** and tag artifacts for lineage.
- **Idempotency**: re‑runs won’t create duplicate resources; updates are safe.

> Places you must customize are marked with **`# <- TODO ✏️`**.


## 🧰 Prerequisites
Uncomment if your kernel is missing packages (Studio often has most already):


## 🏆 Champion Auto‑Selection & Lineage

This notebook will **automatically locate the champion** model in the **SageMaker Model Registry** based on a tag such as **`candidate_run_id`** emitted by `validation_example.ipynb`.  
It will then propagate **lineage tags** (e.g., `candidate_run_id`, `data_version`, `eval_mean_recall@target`) to:
- Model (CreateModel or ModelPackage container)
- EndpointConfig
- Endpoint

> Customize the keys you emit from validation and the keys you want to tag onto the deployed resources. Look for **`# <- TODO ✏️`** markers below.


In [None]:
# %pip install boto3 sagemaker pandas numpy s3fs pyarrow

## 🚪 SageMaker Studio Bootstrap

In [None]:
import os, boto3, json, time, datetime
from pathlib import Path
try:
    import sagemaker
    sm_sess = sagemaker.Session()
    region = boto3.Session().region_name or os.getenv("AWS_REGION","")
    try:
        role = sagemaker.get_execution_role()
    except Exception:
        role = os.getenv("SAGEMAKER_ROLE","")
    bucket = sm_sess.default_bucket()
    print("✅ SageMaker context")
    print(" Region:", region)
    print(" Role:  ", role)
    print(" Bucket:", bucket)
    os.environ.setdefault("AWS_REGION", region or "")
    os.environ.setdefault("SM_DEFAULT_BUCKET", bucket or "")
except Exception as e:
    print("ℹ️ Running without SageMaker context. Reason:", e)

## ⚙️ Configuration — **edit here**

In [None]:
from datetime import datetime

TS = datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")

CONFIG = {
    "lineage": {
        "candidate_run_id": os.getenv("CANDIDATE_RUN_ID",""),  # <- TODO ✏️ If set, we will look for a model package tagged with this run id
        "require_champion_tag": True,                           # <- TODO ✏️ If True, prefer packages tagged is_champion=true
        # Map tags -> keys you expect from validation time (either Tags or CustomerMetadataProperties on ModelPackage)
        "expected_keys": [                                      # <- TODO ✏️ customize your lineage keys
            "candidate_run_id", "data_version", "feature_schema_path",
            "splits_path", "eval_mean_recall_at_target", "eval_roc_auc"
        ]
    },

    "source": {
        "type": os.getenv("MODEL_SOURCE","registry"),                       # 'registry' | 'artifact'  # <- TODO ✏️
        "package_group": os.getenv("MODEL_PACKAGE_GROUP","churn-model-group"), # <- TODO ✏️ (if using registry)
        "approval_status": os.getenv("MODEL_APPROVAL_STATUS","Approved"),      # Approved | PendingManualApproval
        "specific_package_arn": os.getenv("MODEL_PACKAGE_ARN",""),             # optional precise ARN
        # For 'artifact' source (BYOC or framework container):
        "model_tar": os.getenv("MODEL_TAR","model.tar.gz"),                 # <- TODO ✏️ ensure file exists
        "inference_image_uri": os.getenv("INFERENCE_IMAGE",""),             # <- TODO ✏️ e.g. prebuilt xgboost image
        "env": {                                                           # optional container env
            "SAGEMAKER_PROGRAM": os.getenv("SAGEMAKER_PROGRAM","inference.py"), # <- TODO ✏️ entrypoint inside tar
            "SAGEMAKER_SUBMIT_DIRECTORY": os.getenv("SAGEMAKER_SUBMIT_DIRECTORY","model.tar.gz"),
            "SAGEMAKER_REQUIREMENTS": os.getenv("SAGEMAKER_REQUIREMENTS","requirements.txt"),
        },
    },
    "deployment": {
        "endpoint_name": os.getenv("ENDPOINT_NAME", f"churn-endpoint-{TS}"), # <- TODO ✏️
        "instance_type": os.getenv("INSTANCE_TYPE","ml.m5.large"),            # <- TODO ✏️
        "initial_instance_count": int(os.getenv("INITIAL_INSTANCE_COUNT","2")),
        "serverless": {                                                       # serverless optional
            "enabled": os.getenv("SERVERLESS","false").lower()=="true",       # <- TODO ✏️ true to use serverless
            "memory_size_in_mb": int(os.getenv("SVL_MEMORY","4096")),
            "max_concurrency": int(os.getenv("SVL_MAX_CONCURRENCY","10")),
        },
        "vpc": {                                                              # optional VPC config
            "enable": os.getenv("VPC_ENABLE","false").lower()=="true",        # <- TODO ✏️
            "subnets": os.getenv("VPC_SUBNETS","").split(",") if os.getenv("VPC_SUBNETS") else [],
            "security_group_ids": os.getenv("VPC_SGS","").split(",") if os.getenv("VPC_SGS") else [],
        },
        "tags": [ {"Key":"project","Value":"lp-mlops"}, {"Key":"stage","Value":"prod"} ],  # <- TODO ✏️
        "data_capture": {
            "enable": os.getenv("DATA_CAPTURE","true").lower()=="true",       # <- TODO ✏️
            "sampling_percentage": int(os.getenv("CAPTURE_PCT","50")),
            "s3_prefix": os.getenv("CAPTURE_PREFIX", f"s3://{os.getenv('SM_DEFAULT_BUCKET','')}/data-capture/{TS}"),
            "capture_content_type_header": {"CsvContentTypes": ["text/csv"], "JsonContentTypes": ["application/json"]},
            "enable_inference_input": True,
        },
        "rollout": {
            "strategy": os.getenv("ROLLOUT","canary"),                         # 'all-at-once' | 'canary' | 'blue-green'  # <- TODO ✏️
            "canary_percent": int(os.getenv("CANARY_PERCENT","10")),           # percentage for new variant
            "bake_minutes": int(os.getenv("BAKE_MIN","15")),                   # bake time before full shift
        }
    },
    "autoscaling": {
        "enable": os.getenv("AUTOSCALING","true").lower()=="true",            # <- TODO ✏️
        "min_capacity": int(os.getenv("AS_MIN","2")),
        "max_capacity": int(os.getenv("AS_MAX","6")),
        "target_invocations_per_min": int(os.getenv("AS_TARGET","600")),      # ~10 RPS per instance
        "scale_in_cooldown": int(os.getenv("AS_IN_COOLDOWN","120")),
        "scale_out_cooldown": int(os.getenv("AS_OUT_COOLDOWN","60")),
    },
    "monitoring": {
        "enable": os.getenv("MONITORING","true").lower()=="true",             # <- TODO ✏️
        "baseline_dataset_uri": os.getenv("BASELINE_S3",""),                  # <- TODO ✏️ (optional) S3 with baseline data
        "schedule_cron": os.getenv("MON_CRON","cron(0 * * * ? *)"),           # hourly
        "instance_type": os.getenv("MON_INSTANCE","ml.m5.large"),
        "volume_size_gb": int(os.getenv("MON_VOL","30")),
        "max_runtime_seconds": int(os.getenv("MON_MAXRUN","3600")),
    }
}

CONFIG

## 🧱 Utilities

In [None]:
import botocore
sm = boto3.client("sagemaker")
rt = boto3.client("sagemaker-runtime")
appscaling = boto3.client("application-autoscaling")

def ensure_model_from_registry(cfg):
    # choose latest Approved (unless specific ARN provided)
    if cfg["source"]["specific_package_arn"]:
        arn = cfg["source"]["specific_package_arn"]
        print("Using specified package:", arn)
        return arn
    res = sm.list_model_packages(ModelPackageGroupName=cfg["source"]["package_group"], SortBy="CreationTime", SortOrder="Descending", MaxResults=50)
    for it in res.get("ModelPackageSummaryList", []):
        if it.get("ModelApprovalStatus") == cfg["source"]["approval_status"]:
            print("Selected Approved package:", it["ModelPackageArn"])
            return it["ModelPackageArn"]
    raise RuntimeError("No Approved package found. Adjust approval status or package group.")

def ensure_model_from_artifact(cfg, model_name):
    image = cfg["source"]["inference_image_uri"]
    assert image, "inference_image_uri is required for 'artifact' source"
    model_data = cfg["source"]["model_tar"]
    assert Path(model_data).exists(), f"Missing model tar: {model_data}"
    vpc_config = None
    if cfg["deployment"]["vpc"]["enable"]:
        vpc_config = {
            "Subnets": cfg["deployment"]["vpc"]["subnets"],
            "SecurityGroupIds": cfg["deployment"]["vpc"]["security_group_ids"]
        }
    try:
        sm.describe_model(ModelName=model_name)
        print("Model already exists:", model_name)
    except sm.exceptions.ClientError:
        sm.create_model(
            ModelName=model_name,
            PrimaryContainer={
                "Image": image,
                "ModelDataUrl": model_data if model_data.startswith("s3://") else None,
                "Mode": "SingleModel",
                "Environment": cfg["source"]["env"],
            },
            ExecutionRoleArn=os.getenv("SAGEMAKER_ROLE") or os.getenv("ROLE_ARN") or "",
            VpcConfig=vpc_config or {},
            Tags=CONFIG['deployment']['tags']
        )
        print("Created Model:", model_name)
    return model_name

def current_endpoint_config_name(endpoint_name):
    try:
        desc = sm.describe_endpoint(EndpointName=endpoint_name)
        return desc.get("EndpointConfigName"), desc.get("EndpointStatus")
    except sm.exceptions.ClientError:
        return None, None

def wait_endpoint(endpoint_name):
    print("⏳ Waiting for endpoint:", endpoint_name)
    wait = True
    while wait:
        desc = sm.describe_endpoint(EndpointName=endpoint_name)
        st = desc["EndpointStatus"]
        print("  status:", st)
        if st in ["InService", "Failed"]:
            break
        time.sleep(30)
    if st == "Failed":
        print("❌ Endpoint failed:", desc.get("FailureReason"))
        raise RuntimeError(desc.get("FailureReason"))
    print("✅ Endpoint InService.")


### 🔎 Champion resolver (Registry)
Searches for a **Model Package** in the specified **Model Package Group** with:
1) matching `candidate_run_id` (if provided), else
2) a tag `is_champion=true` (if `require_champion_tag=True`), else
3) the **latest Approved** package.
It also collects lineage keys from **Tags** and **CustomerMetadataProperties**.


In [None]:
from typing import Dict, List

def _tags_to_dict(tags: List[dict]) -> Dict[str, str]:
    out = {}
    for t in tags or []:
        k, v = t.get("Key"), t.get("Value")
        if k is not None and v is not None:
            out[str(k)] = str(v)
    return out

def collect_lineage_from_package(model_package_arn: str, expected_keys: List[str]) -> Dict[str, str]:
    d = sm.describe_model_package(ModelPackageName=model_package_arn)
    tag_map = _tags_to_dict(sm.list_tags(ResourceArn=model_package_arn).get("Tags", []))
    # Pull supplemental keys from CustomerMetadataProperties if present
    cmp = d.get("CustomerMetadataProperties") or {}
    lineage = {}
    for k in expected_keys:
        if k in tag_map:
            lineage[k] = tag_map[k]
        elif k in cmp:
            lineage[k] = cmp[k]
    # Also try to surface a couple of quality metrics if present in ModelMetrics
    mm = d.get("ModelMetrics") or {}
    if "ModelQuality" in mm and "Statistics" in mm["ModelQuality"] and "S3Uri" in mm["ModelQuality"]["Statistics"]:
        lineage.setdefault("model_quality_stats_s3", mm["ModelQuality"]["Statistics"]["S3Uri"])
    return lineage

def resolve_champion_package(cfg: dict) -> Dict[str, str]:
    group = cfg["source"]["package_group"]
    approval = cfg["source"]["approval_status"]
    candidate_run_id = cfg["lineage"]["candidate_run_id"]
    need_champion = cfg["lineage"]["require_champion_tag"]
    expected_keys = cfg["lineage"]["expected_keys"]
    # Scan registry
    paginator = sm.get_paginator("list_model_packages")
    for page in paginator.paginate(ModelPackageGroupName=group, SortBy="CreationTime", SortOrder="Descending"):
        for summary in page.get("ModelPackageSummaryList", []):
            if summary.get("ModelApprovalStatus") != approval:
                continue
            arn = summary["ModelPackageArn"]
            tags = _tags_to_dict(sm.list_tags(ResourceArn=arn).get("Tags", []))
            # 1) candidate_run_id exact match
            if candidate_run_id and tags.get("candidate_run_id") == candidate_run_id:
                print("✅ Found package with candidate_run_id:", candidate_run_id)
                return {"arn": arn, "lineage": collect_lineage_from_package(arn, expected_keys)}
            # 2) champion tag
            if need_champion and tags.get("is_champion","").lower() == "true":
                print("✅ Found package tagged as champion")
                return {"arn": arn, "lineage": collect_lineage_from_package(arn, expected_keys)}
            # else fallback continues
    # 3) Fallback: latest Approved
    first_page = sm.list_model_packages(ModelPackageGroupName=group, SortBy="CreationTime", SortOrder="Descending", MaxResults=10)
    for s in first_page.get("ModelPackageSummaryList", []):
        if s.get("ModelApprovalStatus") == approval:
            arn = s["ModelPackageArn"]
            print("ℹ️ Falling back to latest Approved package.")
            return {"arn": arn, "lineage": collect_lineage_from_package(arn, expected_keys)}
    raise RuntimeError("No Approved model package found to deploy.")


## 🧭 Plan the Deployment

In [None]:
source_type = CONFIG["source"]["type"]
endpoint_name = CONFIG["deployment"]["endpoint_name"]
model_name_new = f"{endpoint_name}-model-{TS}"
endpoint_config_name_new = f"{endpoint_name}-cfg-{TS}"

common_tags = CONFIG["deployment"]["tags"].copy()

if source_type == "registry":
    resolved = resolve_champion_package(CONFIG)
    package_arn = resolved["arn"]
    lineage_tags = [{"Key": k, "Value": str(v)} for k, v in resolved["lineage"].items()]
    # Keep these visible across resources
    common_tags += lineage_tags + [
        {"Key":"deployed_from","Value":"model-registry"},
        {"Key":"model_package_arn","Value":package_arn},
    ]
    model_package_container = {"ModelPackageArn": package_arn}
    print("Will deploy from Model Package (champion):", package_arn)
else:
    # Artifact route: you can also load lineage from a local JSON if produced in validation
    # e.g., with keys in CONFIG['lineage']['expected_keys']  # <- TODO ✏️
    lineage_tags = []
    model_name = ensure_model_from_artifact(CONFIG, model_name_new)
    common_tags += lineage_tags + [
        {"Key":"deployed_from","Value":"artifact"},
        {"Key":"model_artifact","Value":CONFIG["source"]["model_tar"]},
    ]
    print("Will deploy from local artifact as model:", model_name)


## 🧩 Create EndpointConfig (single / canary / blue‑green)

In [None]:
data_capture_cfg = None
if CONFIG["deployment"]["data_capture"]["enable"]:
    data_capture_cfg = {
        "EnableCapture": True,
        "InitialSamplingPercentage": CONFIG["deployment"]["data_capture"]["sampling_percentage"],
        "DestinationS3Uri": CONFIG["deployment"]["data_capture"]["s3_prefix"],
        "CaptureOptions": [{"CaptureMode": "Input"}, {"CaptureMode": "Output"}],
        "CaptureContentTypeHeader": CONFIG["deployment"]["data_capture"]["capture_content_type_header"]
    }

variant_new = {
    "VariantName": "variant-new",
    "InitialVariantWeight": 1.0,
    "InitialInstanceCount": CONFIG["deployment"]["initial_instance_count"],
    "InstanceType": CONFIG["deployment"]["instance_type"],
}

serverless = CONFIG["deployment"]["serverless"]
if serverless["enabled"]:
    variant_new["ServerlessConfig"] = {
        "MemorySizeInMB": serverless["memory_size_in_mb"],
        "MaxConcurrency": serverless["max_concurrency"]
    }
    variant_new.pop("InitialInstanceCount", None)
    variant_new.pop("InstanceType", None)

# Build container spec
if CONFIG["source"]["type"] == "registry":
    containers = [model_package_container]
else:
    containers = [{
        "Image": CONFIG["source"]["inference_image_uri"],
        "ModelDataUrl": CONFIG["source"]["model_tar"] if str(CONFIG["source"]["model_tar"]).startswith("s3://") else "",
        "Environment": CONFIG["source"]["env"]
    }]

# Determine rollout strategy
strategy = CONFIG["deployment"]["rollout"]["strategy"]
prev_cfg_name, ep_status = current_endpoint_config_name(endpoint_name)

if strategy == "all-at-once" or prev_cfg_name is None:
    # Single variant
    production_variants = [variant_new]
else:
    # Canary / blue‑green: include previous variant with higher weight
    canary_pct = CONFIG["deployment"]["rollout"]["canary_percent"] / 100.0
    variant_old = {
        "VariantName": "variant-old",
        "InitialVariantWeight": max(0.0, 1.0 - canary_pct),
    }
    # you must copy instance/serverless settings of the existing endpoint; for brevity use same as new
    if not serverless["enabled"]:
        variant_old.update({
            "InitialInstanceCount": CONFIG["deployment"]["initial_instance_count"],
            "InstanceType": CONFIG["deployment"]["instance_type"],
        })
    else:
        variant_old["ServerlessConfig"] = variant_new["ServerlessConfig"]

    production_variants = [variant_old, variant_new]

# Create EndpointConfig
try:
    sm.create_endpoint_config(
        EndpointConfigName=endpoint_config_name_new,
        ProductionVariants=production_variants,
        DataCaptureConfig=data_capture_cfg,
        Tags=common_tags
    )
    print("Created EndpointConfig:", endpoint_config_name_new)
except sm.exceptions.ClientError as e:
    if "already exists" in str(e):
        print("EndpointConfig exists:", endpoint_config_name_new)
    else:
        raise

## 🚀 Create / Update Endpoint

In [None]:
# Create endpoint if it does not exist, else update
try:
    sm.describe_endpoint(EndpointName=endpoint_name)
    print("Updating endpoint:", endpoint_name)
    sm.update_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name_new)
except sm.exceptions.ClientError:
    print("Creating endpoint:", endpoint_name)
    sm.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name_new, Tags=CONFIG["deployment"]["tags"])

wait_endpoint(endpoint_name)

if strategy in ["canary","blue-green"]:
    print("🧪 Canary bake period (minutes):", CONFIG["deployment"]["rollout"]["bake_minutes"])
    time.sleep(1)  # keep short for example; in real life, wait bake_minutes * 60

In [None]:
# Ensure endpoint carries lineage tags (update_endpoint doesn't accept Tags)
try:
    ep = sm.describe_endpoint(EndpointName=endpoint_name)
    ep_arn = ep["EndpointArn"]
    sm.add_tags(ResourceArn=ep_arn, Tags=common_tags)
    print("✅ Applied lineage tags to Endpoint")
except Exception as e:
    print("⚠️ Could not tag endpoint:", e)

## 📈 (Optional) Autoscaling

In [None]:
if CONFIG["autoscaling"]["enable"] and not CONFIG["deployment"]["serverless"]["enabled"]:
    resource_id = f"endpoint/{endpoint_name}/variant/variant-new"
    ns = "sagemaker"
    try:
        appscaling.register_scalable_target(
            ServiceNamespace=ns,
            ResourceId=resource_id,
            ScalableDimension="sagemaker:variant:DesiredInstanceCount",
            MinCapacity=CONFIG["autoscaling"]["min_capacity"],
            MaxCapacity=CONFIG["autoscaling"]["max_capacity"]
        )
        appscaling.put_scaling_policy(
            PolicyName=f"invocations-per-target-{endpoint_name}",
            ServiceNamespace=ns,
            ResourceId=resource_id,
            ScalableDimension="sagemaker:variant:DesiredInstanceCount",
            PolicyType="TargetTrackingScaling",
            TargetTrackingScalingPolicyConfiguration={
                "TargetValue": CONFIG["autoscaling"]["target_invocations_per_min"],
                "PredefinedMetricSpecification": {"PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"},
                "ScaleInCooldown": CONFIG["autoscaling"]["scale_in_cooldown"],
                "ScaleOutCooldown": CONFIG["autoscaling"]["scale_out_cooldown"]
            }
        )
        print("✅ Autoscaling configured for", resource_id)
    except Exception as e:
        print("⚠️ Autoscaling setup failed:", e)
else:
    print("Autoscaling disabled or using Serverless (auto-managed).")

## 🫖 Smoke Test — sample prediction

In [None]:
import json, numpy as np, pandas as pd
# Build a minimal JSON sample — replace with a real row from your schema
# <- TODO ✏️ adjust payload format to your inference script or algorithm
sample = [{"age": 45, "tenure_months": 12, "monthly_charges": 39.9, "contract_type": "month-to-month", "country": "PT"}]

try:
    resp = rt.invoke_endpoint(EndpointName=endpoint_name, ContentType="application/json", Body=json.dumps(sample).encode("utf-8"))
    body = resp["Body"].read().decode("utf-8")
    print("Response:", body[:500])
except Exception as e:
    print("⚠️ Inference failed:", e)

## 🔀 Traffic Shift to 100% (after bake)

In [None]:
if CONFIG["deployment"]["rollout"]["strategy"] in ["canary","blue-green"]:
    # Update weights to 100% new variant
    try:
        # Rebuild EndpointConfig with 100% weight
        endpoint_config_full = f"{endpoint_name}-cfg-full-{TS}"
        variant_full = {
            "VariantName": "variant-new",
            "InitialVariantWeight": 1.0,
            "InitialInstanceCount": CONFIG["deployment"]["initial_instance_count"],
            "InstanceType": CONFIG["deployment"]["instance_type"],
        }
        if CONFIG["deployment"]["serverless"]["enabled"]:
            variant_full.pop("InitialInstanceCount", None)
            variant_full.pop("InstanceType", None)
            variant_full["ServerlessConfig"] = {
                "MemorySizeInMB": CONFIG["deployment"]["serverless"]["memory_size_in_mb"],
                "MaxConcurrency": CONFIG["deployment"]["serverless"]["max_concurrency"]
            }
        sm.create_endpoint_config(EndpointConfigName=endpoint_config_full, ProductionVariants=[variant_full], DataCaptureConfig=data_capture_cfg)
        sm.update_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_full)
        wait_endpoint(endpoint_name)
        print("✅ Shifted traffic to 100% new variant")
    except Exception as e:
        print("⚠️ Traffic shift failed:", e)
else:
    print("Not a canary/blue-green rollout; full traffic already on new variant.")

## ⏪ Rollback Helper

In [None]:
def rollback_to_previous_config(endpoint_name):
    desc = sm.describe_endpoint(EndpointName=endpoint_name)
    hist = sm.list_endpoint_configs(SortBy="CreationTime", SortOrder="Descending", MaxResults=10)
    current = desc["EndpointConfigName"]
    for ec in hist.get("EndpointConfigs", []):
        if ec["EndpointConfigName"] != current:
            sm.update_endpoint(EndpointName=endpoint_name, EndpointConfigName=ec["EndpointConfigName"])
            wait_endpoint(endpoint_name)
            print("Rolled back to:", ec["EndpointConfigName"])
            return ec["EndpointConfigName"]
    print("No previous config found.")
    return None

# Example (disabled):
# rollback_to_previous_config(CONFIG['deployment']['endpoint_name'])

## 🧹 (Optional) Cleanup — **Danger zone**

In [None]:
# Uncomment to delete endpoint after testing
# sm.delete_endpoint(EndpointName=CONFIG["deployment"]["endpoint_name"])
# print("Deleted endpoint:", CONFIG["deployment"]["endpoint_name"])

## ✅ Best Practices Recap
- **Governance**: pull the **Approved** candidate from **Model Registry**; keep lineage via tags & metadata.
- **Security**: enable VPC, KMS encryption, proper IAM, and private subnets for endpoints.  # <- TODO ✏️
- **Observability**: enable **data capture** and **CloudWatch** logs; define **alarms** for latency & errors.  # <- TODO ✏️
- **Scalability**: use **Autoscaling** for instance endpoints or **Serverless Inference** for spiky traffic.
- **Safety**: roll out with **canary**/**blue‑green** and a **bake period**; keep rollback script handy.
- **Cost control**: right‑size instances; use serverless for intermittent traffic; clean up test endpoints.
