# Model Deployment & Monitoring

**Goal:** Deploy the best XGBoost binary classifier from the hyperparameter tuning job
to a persistent SageMaker real-time endpoint with **data capture**, create a
**Model Monitor** baseline and hourly schedule for data-drift detection, and build
a **CloudWatch dashboard** for operational visibility.

**Pre-requisite:** Run `1_train_xgboost_binary.ipynb` first — this notebook reads
the saved evaluation metrics from S3 to locate the best model artifact.

**Sections:**
1. Setup & load model artifact
2. Deploy endpoint with data capture
3. Generate Model Monitor baseline
4. Schedule hourly monitoring
5. Build CloudWatch dashboard
6. Send test traffic
7. **Cleanup** — delete all billable resources

## 1  Setup

In [43]:
import boto3
import sagemaker
import pandas as pd
import numpy as np
import json, io, time
from datetime import datetime, timezone

from sagemaker import get_execution_role, image_uris
from sagemaker.model import Model
from sagemaker.model_monitor import (
    DataCaptureConfig,
    DefaultModelMonitor,
    CronExpressionGenerator,
)
from sagemaker.model_monitor.dataset_format import DatasetFormat

sess   = sagemaker.Session()
region = boto3.Session().region_name
role   = get_execution_role()
bucket = sess.default_bucket()
sm     = boto3.client("sagemaker")
s3     = boto3.client("s3")
cw     = boto3.client("cloudwatch")

s3_prefix = "aai540/model/xgboost-binary"

print(f"Region : {region}")
print(f"Bucket : {bucket}")

Region : us-east-1
Bucket : sagemaker-us-east-1-776673915827


In [44]:
# ---------- Locate best model artifact from training notebook ----------
metrics_key = f"{s3_prefix}/evaluation/test_metrics.json"
obj = s3.get_object(Bucket=bucket, Key=metrics_key)
metrics = json.loads(obj["Body"].read())

best_job = metrics["best_training_job"]
best_model_s3 = f"s3://{bucket}/{s3_prefix}/output/{best_job}/output/model.tar.gz"

print(f"Best training job : {best_job}")
print(f"Model artifact    : {best_model_s3}")
print(f"Test AUC-ROC      : {metrics['test_auc_roc']:.4f}")

Best training job : xgb-binary-tune-260216-1635-002-30826604
Model artifact    : s3://sagemaker-us-east-1-776673915827/aai540/model/xgboost-binary/output/xgb-binary-tune-260216-1635-002-30826604/output/model.tar.gz
Test AUC-ROC      : 0.9199


In [45]:
# ---------- Constants (must match training notebook) ----------
FEATURE_COLS = [
    "duration", "pkt_total", "bytes_total",
    "pkt_fwd", "pkt_bwd", "bytes_fwd", "bytes_bwd",
    "pkt_rate", "byte_rate", "bytes_per_pkt",
    "pkt_ratio", "byte_ratio",
]
LABEL_COL = "label"

ENDPOINT_NAME     = "ids-xgboost-binary-monitor"
MONITOR_SCHEDULE  = "ids-xgboost-binary-monitor-schedule"
DASHBOARD_NAME    = "IDS-XGBoost-Monitoring"

# S3 prefixes for monitoring artefacts
data_capture_prefix = f"s3://{bucket}/{s3_prefix}/data-capture"
baseline_prefix     = f"s3://{bucket}/{s3_prefix}/monitor/baseline"
monitor_reports     = f"s3://{bucket}/{s3_prefix}/monitor/reports"

print(f"Endpoint name    : {ENDPOINT_NAME}")
print(f"Data capture     : {data_capture_prefix}")
print(f"Baseline prefix  : {baseline_prefix}")
print(f"Monitor reports  : {monitor_reports}")

Endpoint name    : ids-xgboost-binary-monitor
Data capture     : s3://sagemaker-us-east-1-776673915827/aai540/model/xgboost-binary/data-capture
Baseline prefix  : s3://sagemaker-us-east-1-776673915827/aai540/model/xgboost-binary/monitor/baseline
Monitor reports  : s3://sagemaker-us-east-1-776673915827/aai540/model/xgboost-binary/monitor/reports


## 2  Deploy Endpoint with Data Capture

We create a SageMaker `Model` from the saved artifact and deploy it to a
real-time endpoint with `DataCaptureConfig` enabled.  This captures **100 %**
of inference requests and responses to S3 so Model Monitor can analyse them.

In [48]:
# Check if endpoint already exists and delete it if necessary
try:
    existing = sm.describe_endpoint(EndpointName=ENDPOINT_NAME)
    print(f"⚠️  Endpoint '{ENDPOINT_NAME}' already exists (Status: {existing['EndpointStatus']})")
    print("Deleting existing endpoint and config...")
    
    # Delete endpoint
    sm.delete_endpoint(EndpointName=ENDPOINT_NAME)
    print(f"  ✓ Endpoint deleted")
    
    # Delete endpoint config
    config_name = existing['EndpointConfigName']
    sm.delete_endpoint_config(EndpointConfigName=config_name)
    print(f"  ✓ Endpoint config deleted")
    
    # Wait for deletion to complete
    import time
    print("  Waiting for deletion to complete...")
    time.sleep(10)
    
except sm.exceptions.ClientError as e:
    if 'Could not find endpoint' in str(e):
        print(f"✓ No existing endpoint '{ENDPOINT_NAME}' — ready to deploy")
    else:
        raise

⚠️  Endpoint 'ids-xgboost-binary-monitor' already exists (Status: InService)
Deleting existing endpoint and config...
  ✓ Endpoint deleted
  ✓ Endpoint config deleted
  Waiting for deletion to complete...


In [49]:
xgb_image = image_uris.retrieve("xgboost", region, version="1.5-1")

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=data_capture_prefix,
    capture_options=["Input", "Output"],
    csv_content_types=["text/csv"],
)

model = Model(
    image_uri=xgb_image,
    model_data=best_model_s3,
    role=role,
    sagemaker_session=sess,
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    endpoint_name=ENDPOINT_NAME,
    data_capture_config=data_capture_config,
    serializer=sagemaker.serializers.CSVSerializer(),
    deserializer=sagemaker.deserializers.CSVDeserializer(),
    wait=True,
)

print(f"\n✓ Endpoint deployed: {ENDPOINT_NAME}")

------!
✓ Endpoint deployed: ids-xgboost-binary-monitor


## 3  Generate Model Monitor Baseline

The baseline job analyses the **training data** to compute per-feature
statistics (mean, std-dev, min, max, distribution) and constraints
(data types, completeness).  Model Monitor will compare live inference
data against these baselines to detect drift.

> **Note:** `suggest_baseline()` launches a SageMaker Processing job that
> typically takes **5 – 10 minutes** to complete.

In [50]:
# Path to training CSV already in S3 (uploaded by training notebook)
s3_train_uri = f"s3://{bucket}/{s3_prefix}/train/data.csv"
print(f"Training data: {s3_train_uri}")

# Verify it exists
head = s3.head_object(Bucket=bucket, Key=f"{s3_prefix}/train/data.csv")
print(f"Size: {head['ContentLength'] / 1e6:.1f} MB")

Training data: s3://sagemaker-us-east-1-776673915827/aai540/model/xgboost-binary/train/data.csv
Size: 12.8 MB


In [51]:
monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=10,
    max_runtime_in_seconds=1800,
    sagemaker_session=sess,
)

print("Launching baseline job...")
monitor.suggest_baseline(
    baseline_dataset=s3_train_uri,
    dataset_format=DatasetFormat.csv(header=False),
    output_s3_uri=baseline_prefix,
    wait=True,
    logs=False,
)
print("Baseline job complete.")

INFO:sagemaker:Creating processing-job with name baseline-suggestion-job-2026-02-16-17-24-31-924


Launching baseline job (this takes ~5-10 min) …
...........................................................!✓ Baseline job complete.


In [52]:
# Inspect baseline outputs
baseline_job = monitor.latest_baselining_job

print("=" * 60)
print("  Baseline Statistics (first 5 features)")
print("=" * 60)
schema = baseline_job.baseline_statistics().body_dict["features"]
for feat in schema[:5]:
    name = feat["name"]
    num  = feat.get("numerical_statistics", {})
    print(f"  {name:>20s}  mean={num.get('mean', 'N/A'):>12}  "
          f"stddev={num.get('std_dev', 'N/A'):>12}")

print(f"\n  Total features tracked: {len(schema)}")

print("\n" + "=" * 60)
print("  Baseline Constraints (first 5 features)")
print("=" * 60)
constraints = baseline_job.suggested_constraints().body_dict["features"]
for feat in constraints[:5]:
    print(f"  {feat['name']:>20s}  type={feat['inferred_type']}  "
          f"completeness={feat.get('completeness', 'N/A')}")

  Baseline Statistics (first 5 features)
                   _c0  mean=0.7736095604764217  stddev=0.4184946934142613
                   _c1  mean=0.000399019465402325  stddev=0.0015258257248348041
                   _c2  mean=3.0537201047366813  stddev=4.609643980111558
                   _c3  mean=220328.14460370783  stddev=20075147.993305646
                   _c4  mean=1.6787407566874772  stddev=2.394221759371462

  Total features tracked: 13

  Baseline Constraints (first 5 features)
                   _c0  type=Integral  completeness=1.0
                   _c1  type=Fractional  completeness=1.0
                   _c2  type=Integral  completeness=1.0
                   _c3  type=Integral  completeness=1.0
                   _c4  type=Integral  completeness=1.0


## 4  Schedule Hourly Monitoring

The monitoring schedule runs every hour.  Each execution compares the data
captured by the endpoint against the baseline statistics and constraints,
then writes a violation report to S3 and emits CloudWatch metrics.

In [53]:
monitor.create_monitoring_schedule(
    monitor_schedule_name=MONITOR_SCHEDULE,
    endpoint_input=ENDPOINT_NAME,
    output_s3_uri=monitor_reports,
    statistics=monitor.baseline_statistics(),
    constraints=monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
)

print(f"✓ Monitoring schedule created: {MONITOR_SCHEDULE}")
print(f"  Cron    : {CronExpressionGenerator.hourly()}")
print(f"  Reports : {monitor_reports}")

INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: ids-xgboost-binary-monitor-schedule


✓ Monitoring schedule created: ids-xgboost-binary-monitor-schedule
  Cron    : cron(0 * ? * * *)
  Reports : s3://sagemaker-us-east-1-776673915827/aai540/model/xgboost-binary/monitor/reports


In [54]:
# Verify schedule status
desc = sm.describe_monitoring_schedule(MonitoringScheduleName=MONITOR_SCHEDULE)
print(f"Schedule status: {desc['MonitoringScheduleStatus']}")

Schedule status: Pending


## 5  CloudWatch Dashboard

A CloudWatch dashboard provides a single-pane view of endpoint health
and model quality.  The dashboard includes:
- **Invocation metrics** — request count, latency (avg / p99)
- **Error rates** — 4xx and 5xx responses
- **Model Monitor** — a text widget linking to the monitor schedule

In [55]:
dashboard_body = {
    "widgets": [
        {
            "type": "text",
            "x": 0, "y": 0, "width": 24, "height": 2,
            "properties": {
                "markdown": (
                    f"# IDS XGBoost Binary — Endpoint Monitoring\n"
                    f"**Endpoint:** `{ENDPOINT_NAME}` &nbsp; | &nbsp; "
                    f"**Monitor schedule:** `{MONITOR_SCHEDULE}` &nbsp; | &nbsp; "
                    f"**Region:** `{region}`"
                )
            },
        },
        {
            "type": "metric",
            "x": 0, "y": 2, "width": 8, "height": 6,
            "properties": {
                "title": "Invocations",
                "metrics": [
                    ["AWS/SageMaker", "Invocations",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "Sum", "period": 60}]
                ],
                "view": "timeSeries",
                "region": region,
                "period": 60,
            },
        },
        {
            "type": "metric",
            "x": 8, "y": 2, "width": 8, "height": 6,
            "properties": {
                "title": "Model Latency (ms)",
                "metrics": [
                    ["AWS/SageMaker", "ModelLatency",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "Average", "period": 60,
                      "label": "Avg"}],
                    ["AWS/SageMaker", "ModelLatency",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "p99", "period": 60,
                      "label": "p99"}],
                ],
                "view": "timeSeries",
                "region": region,
                "period": 60,
            },
        },
        {
            "type": "metric",
            "x": 16, "y": 2, "width": 8, "height": 6,
            "properties": {
                "title": "Overhead Latency (ms)",
                "metrics": [
                    ["AWS/SageMaker", "OverheadLatency",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "Average", "period": 60,
                      "label": "Avg"}],
                    ["AWS/SageMaker", "OverheadLatency",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "p99", "period": 60,
                      "label": "p99"}],
                ],
                "view": "timeSeries",
                "region": region,
                "period": 60,
            },
        },
        {
            "type": "metric",
            "x": 0, "y": 8, "width": 8, "height": 6,
            "properties": {
                "title": "4xx Errors",
                "metrics": [
                    ["AWS/SageMaker", "Invocation4XXErrors",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "Sum", "period": 60}]
                ],
                "view": "timeSeries",
                "region": region,
                "period": 60,
            },
        },
        {
            "type": "metric",
            "x": 8, "y": 8, "width": 8, "height": 6,
            "properties": {
                "title": "5xx Errors",
                "metrics": [
                    ["AWS/SageMaker", "Invocation5XXErrors",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "Sum", "period": 60}]
                ],
                "view": "timeSeries",
                "region": region,
                "period": 60,
            },
        },
        {
            "type": "metric",
            "x": 16, "y": 8, "width": 8, "height": 6,
            "properties": {
                "title": "Invocations Per Instance",
                "metrics": [
                    ["AWS/SageMaker", "InvocationsPerInstance",
                     "EndpointName", ENDPOINT_NAME,
                     "VariantName", "AllTraffic",
                     {"stat": "Sum", "period": 60}]
                ],
                "view": "timeSeries",
                "region": region,
                "period": 60,
            },
        },
    ]
}

cw.put_dashboard(
    DashboardName=DASHBOARD_NAME,
    DashboardBody=json.dumps(dashboard_body),
)

console_url = (
    f"https://{region}.console.aws.amazon.com/cloudwatch/home"
    f"?region={region}#dashboards/dashboard/{DASHBOARD_NAME}"
)
print(f"✓ CloudWatch dashboard created: {DASHBOARD_NAME}")
print(f"  Console URL: {console_url}")

✓ CloudWatch dashboard created: IDS-XGBoost-Monitoring
  Console URL: https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/IDS-XGBoost-Monitoring


## 6  Send Test Traffic

Send a batch of test-set samples through the endpoint so that
data capture files are written and CloudWatch metrics begin to populate.

In [58]:
# Attach to the deployed endpoint (in case predictor is None)
from sagemaker.predictor import Predictor

if predictor is None:
    print("Creating predictor from existing endpoint...")
    predictor = Predictor(
        endpoint_name=ENDPOINT_NAME,
        sagemaker_session=sess,
        serializer=sagemaker.serializers.CSVSerializer(),
        deserializer=sagemaker.deserializers.CSVDeserializer(),
    )
    print(f"✓ Predictor attached to endpoint: {ENDPOINT_NAME}")
else:
    print(f"✓ Predictor already exists for endpoint: {predictor.endpoint_name}")

Creating predictor from existing endpoint...
✓ Predictor attached to endpoint: ids-xgboost-binary-monitor


In [59]:
# Load test data from Athena (same query as training notebook)
from sqlalchemy import create_engine, text

database_name = "aai540_eda"
engine = create_engine(
    f"awsathena+rest://@athena.{region}.amazonaws.com:443/{database_name}",
    connect_args={
        "s3_staging_dir": f"s3://{bucket}/athena/staging/",
        "region_name": region,
    },
)

columns = ", ".join([LABEL_COL] + FEATURE_COLS)
query = f"""
SELECT {columns}
FROM {database_name}.dataset_split
WHERE data_split = 'test'
"""
df_test = pd.read_sql(query, engine)
print(f"Test rows loaded: {len(df_test):,}")

Test rows loaded: 49,920


In [60]:
# Send predictions in batches
BATCH_SIZE = 500
X_test = df_test[FEATURE_COLS].values
y_true = df_test[LABEL_COL].values
y_prob = []

print(f"Sending {len(X_test):,} samples to endpoint in batches of {BATCH_SIZE} …")
for start in range(0, len(X_test), BATCH_SIZE):
    batch = X_test[start : start + BATCH_SIZE]
    response = predictor.predict(batch)
    y_prob.extend([float(row[0]) for row in response])

y_prob = np.array(y_prob)
y_pred = (y_prob >= 0.5).astype(int)

from sklearn.metrics import accuracy_score, roc_auc_score
print(f"\n✓ Predictions complete.")
print(f"  Accuracy : {accuracy_score(y_true, y_pred):.4f}")
print(f"  AUC-ROC  : {roc_auc_score(y_true, y_prob):.4f}")
print(f"\nData capture files will appear in S3 within ~2 minutes.")
print(f"CloudWatch metrics will populate within ~5 minutes.")

Sending 49,920 samples to endpoint in batches of 500 …

✓ Predictions complete.
  Accuracy : 0.2414
  AUC-ROC  : 0.9199

Data capture files will appear in S3 within ~2 minutes.
CloudWatch metrics will populate within ~5 minutes.


In [61]:
# Verify data capture files exist
import time
print("Waiting 120 seconds for data capture delivery …")
time.sleep(120)

capture_prefix_key = f"{s3_prefix}/data-capture/{ENDPOINT_NAME}"
result = s3.list_objects_v2(Bucket=bucket, Prefix=capture_prefix_key, MaxKeys=5)

if "Contents" in result:
    print(f"✓ Data capture active — {len(result['Contents'])} file(s) found:")
    for obj in result["Contents"][:5]:
        print(f"  s3://{bucket}/{obj['Key']}")
else:
    print("⏳ No capture files yet — they may take a few more minutes to appear.")

Waiting 120 seconds for data capture delivery …
✓ Data capture active — 1 file(s) found:
  s3://sagemaker-us-east-1-776673915827/aai540/model/xgboost-binary/data-capture/ids-xgboost-binary-monitor/AllTraffic/2026/02/16/17/31-50-626-e4a1d55d-a872-4a02-b184-a6e26fc103db.jsonl


## 7  Cleanup

⚠️ **Run this cell to delete all billable resources** created by this notebook:
- Model Monitor schedule
- SageMaker endpoint and endpoint configuration
- SageMaker model
- CloudWatch dashboard

The model artifact in S3 is **not** deleted (it is needed for redeployment).

In [62]:
# print("=" * 60)
# print("  CLEANUP — Deleting billable resources")
# print("=" * 60)

# # 1. Delete monitoring schedule
# try:
#     monitor.delete_monitoring_schedule()
#     print("✓ Monitoring schedule deleted.")
# except Exception as e:
#     print(f"⚠ Monitoring schedule: {e}")

# # 2. Delete endpoint (also stops the running instance)
# try:
#     predictor.delete_endpoint(delete_endpoint_config=True)
#     print("✓ Endpoint and endpoint config deleted.")
# except Exception as e:
#     print(f"⚠ Endpoint: {e}")

# # 3. Delete model
# try:
#     sm.delete_model(ModelName=ENDPOINT_NAME)
#     print("✓ SageMaker model deleted.")
# except Exception as e:
#     print(f"⚠ Model: {e}")

# # 4. Delete CloudWatch dashboard
# try:
#     cw.delete_dashboards(DashboardNames=[DASHBOARD_NAME])
#     print("✓ CloudWatch dashboard deleted.")
# except Exception as e:
#     print(f"⚠ Dashboard: {e}")

# print("\n" + "=" * 60)
# print("  Cleanup complete.")
# print(f"  Model artifact preserved at: {best_model_s3}")
# print("=" * 60)