## 04 – Deployment & Monitoring

🟩 **GOOD:** This notebook demonstrates production-grade deployment and monitoring patterns for CodeCraft AI, following Clean Architecture and AWS-native best practices.

### Purpose
- Show how to deploy AI services using AWS-native tools (SageMaker, Lambda, ECS, etc.)
- Demonstrate monitoring, logging, and alerting for operational excellence
- Provide reproducible, testable deployment and monitoring workflows for stakeholders and technical reviewers

### Prerequisites
- AWS account with permissions for SageMaker, Lambda, CloudWatch, and related services
- All secrets and config must be injected via environment variables or AWS Secrets Manager/SSM (never hardcoded)
- This notebook is a client only; all deployment and monitoring logic is handled by AWS-native services

> 🟦 **NOTE:** For local dev, you can simulate deployment steps, but production deployments must use AWS-native CI/CD and monitoring tools.

### Environment-Aware Configuration

🟦 **NOTE:** All configuration and secrets are injected via environment variables or AWS Secrets Manager/SSM. No values are hardcoded.

In [12]:
import os
import boto3

# 🟦 NOTE: Example environment-aware config for deployment/monitoring
AWS_REGION = os.getenv("AWS_REGION")
DEPLOYMENT_STACK = os.getenv("CODECRAFT_DEPLOYMENT_STACK", "dev")
MODEL_NAME = os.getenv("CODECRAFT_MODEL_NAME", "codecraft-ai-model")
MONITORING_TOPIC_ARN = os.getenv("CODECRAFT_MONITORING_TOPIC_ARN", "")

session = boto3.Session(region_name=AWS_REGION)
sagemaker = session.client("sagemaker")
cloudwatch = session.client("cloudwatch")
sns = session.client("sns") if MONITORING_TOPIC_ARN else None

### Example: SageMaker Endpoint Status Check

🟩 **GOOD:** Use this cell to check the health/status of a SageMaker endpoint.

In [18]:
def check_sagemaker_endpoint_status(endpoint_name):
    """
    🟩 GOOD: Checks the status of a SageMaker endpoint and provides actionable diagnostics.
    🟨 CAUTION: Handles missing or misnamed endpoints gracefully for all environments.
    """
    try:
        response = sagemaker.describe_endpoint(EndpointName=endpoint_name)
        status = response["EndpointStatus"]
        print(f"🟩 GOOD: SageMaker endpoint '{endpoint_name}' status: {status}")
        return status
    except Exception as e:
        error_message = str(e)
        if "ValidationException" in error_message and "Could not find endpoint" in error_message:
            print(
                f"🟥 CRITICAL: SageMaker endpoint '{endpoint_name}' not found.\n"
                "🟦 NOTE: This is expected if the endpoint has not been deployed yet, or if you are in a dev environment.\n"
                "🟫 OPS: To proceed, you must:\n"
                "  - Deploy the model as a SageMaker endpoint (see AWS docs or IaC pipeline).\n"
                "  - Or, update MODEL_NAME to match an existing endpoint for this environment.\n"
                "  - Ensure AWS_REGION is set to the correct region.\n"
                "  - Ensure your IAM role has sagemaker:DescribeEndpoint permission.\n"
                "🟪 ARCH: Endpoint naming and region must be consistent across CI/CD, IaC, and runtime config.\n"
                "🟦 NOTE: If you are in dev and have not deployed an endpoint, this is not an error. For production, this is blocking.\n"
                "🟦 NEXT ACTION: If you want to test this workflow, you must first deploy a SageMaker endpoint. "
                "You can do this via the AWS Console, CLI, IaC, or (for dev only) using the boto3 example below."
            )
            # 🟦 NOTE: Example (dev only) - Deploy a minimal endpoint if you have a registered model
            print(
                "\n🟦 EXAMPLE: Deploy a SageMaker endpoint from a notebook cell (dev only):\n"
                "from boto3 import client\n"
                "sm = client('sagemaker', region_name=AWS_REGION)\n"
                "# Replace 'your-model-name' and 'your-endpoint-name' with real values\n"
                "sm.create_endpoint_config(\n"
                "    EndpointConfigName='your-endpoint-name-config',\n"
                "    ProductionVariants=[{\n"
                "        'VariantName': 'AllTraffic',\n"
                "        'ModelName': 'your-model-name',\n"
                "        'InstanceType': 'ml.m5.large',\n"
                "        'InitialInstanceCount': 1\n"
                "    }]\n"
                ")\n"
                "sm.create_endpoint(\n"
                "    EndpointName='your-endpoint-name',\n"
                "    EndpointConfigName='your-endpoint-name-config'\n"
                ")\n"
                "# 🟨 CAUTION: For production, use CI/CD and IaC, not notebooks."
            )
            return None
        else:
            print(f"🟥 CRITICAL: Unexpected error while checking endpoint status: {e}")
            raise
        return None

# 🟦 NOTE: Example usage
if __name__ == "__main__":
    try:
        check_sagemaker_endpoint_status(MODEL_NAME)
    except Exception as exc:
        print(f"🟥 CRITICAL: Deployment/monitoring check failed: {exc}")

🟥 CRITICAL: SageMaker endpoint 'codecraft-ai-model' not found.
🟦 NOTE: This is expected if the endpoint has not been deployed yet, or if you are in a dev environment.
🟫 OPS: To proceed, you must:
  - Deploy the model as a SageMaker endpoint (see AWS docs or IaC pipeline).
  - Or, update MODEL_NAME to match an existing endpoint for this environment.
  - Ensure AWS_REGION is set to the correct region.
  - Ensure your IAM role has sagemaker:DescribeEndpoint permission.
🟪 ARCH: Endpoint naming and region must be consistent across CI/CD, IaC, and runtime config.
🟦 NOTE: If you are in dev and have not deployed an endpoint, this is not an error. For production, this is blocking.
🟦 NEXT ACTION: If you want to test this workflow, you must first deploy a SageMaker endpoint. You can do this via the AWS Console, CLI, IaC, or (for dev only) using the boto3 example below.

🟦 EXAMPLE: Deploy a SageMaker endpoint from a notebook cell (dev only):
from boto3 import client
sm = client('sagemaker', region

In [21]:
# 🟦 NOTE: Example - Deploy a SageMaker endpoint from a notebook cell (dev/test only)
import boto3
import os

def deploy_sagemaker_endpoint(model_name, endpoint_name, instance_type="ml.m5.large"):
    """
    🟩 GOOD: Deploys a SageMaker endpoint for dev/test.
    🟨 CAUTION: For production, use CI/CD and IaC (CloudFormation, CDK, Terraform).
    """
    sm = boto3.client("sagemaker", region_name=os.getenv("AWS_REGION"))
    try:
        # 🟦 NOTE: This assumes a SageMaker model already exists with the given model_name.
        sm.create_endpoint_config(
            EndpointConfigName=f"{endpoint_name}-config",
            ProductionVariants=[
                {
                    "VariantName": "AllTraffic",
                    "ModelName": model_name,
                    "InstanceType": instance_type,
                    "InitialInstanceCount": 1,
                }
            ],
        )
        sm.create_endpoint(
            EndpointName=endpoint_name,
            EndpointConfigName=f"{endpoint_name}-config"
        )
        print(f"🟩 GOOD: SageMaker endpoint '{endpoint_name}' deployment initiated.")
    except sm.exceptions.ClientError as e:
        print(f"🟥 CRITICAL: Failed to deploy endpoint: {e}")
    except Exception as e:
        print(f"🟥 CRITICAL: Unexpected error during endpoint deployment: {e}")

# 🟦 NOTE: Example usage (dev/test only)
# deploy_sagemaker_endpoint("your-sagemaker-model-name", "your-endpoint-name")


### Example: CloudWatch Metrics Query

🟩 **GOOD:** Use this cell to query CloudWatch metrics for your deployed model or service.

In [22]:
def get_cloudwatch_invocations(endpoint_name, minutes=60):
    from datetime import datetime, timedelta, timezone
    end_time = datetime.now(timezone.utc)
    start_time = end_time - timedelta(minutes=minutes)
    try:
        response = cloudwatch.get_metric_statistics(
            Namespace="AWS/SageMaker",
            MetricName="Invocations",
            Dimensions=[{"Name": "EndpointName", "Value": endpoint_name}],
            StartTime=start_time,
            EndTime=end_time,
            Period=60,
            Statistics=["Sum"]
        )
        datapoints = response.get("Datapoints", [])
        print(f"🟩 GOOD: Found {len(datapoints)} invocation datapoints for endpoint '{endpoint_name}' in the last {minutes} minutes.")
        return datapoints
    except Exception as e:
        print(f"🟥 CRITICAL: Failed to get CloudWatch metrics: {e}")
        return []

# 🟦 NOTE: Example usage
if __name__ == "__main__":
    get_cloudwatch_invocations(MODEL_NAME)

🟩 GOOD: Found 0 invocation datapoints for endpoint 'codecraft-ai-model' in the last 60 minutes.


### Example: Monitoring Alert (SNS Notification)

🟦 **NOTE:** Use this cell to send a notification if a critical event or threshold is detected. In production, automate this with CloudWatch Alarms and SNS.

In [23]:
def send_monitoring_alert(message, subject="CodeCraft AI Monitoring Alert"):
    if not sns or not MONITORING_TOPIC_ARN:
        print("🟦 Diagnostics: SNS not configured. Skipping alert.")
        return
    try:
        sns.publish(
            TopicArn=MONITORING_TOPIC_ARN,
            Message=message,
            Subject=subject
        )
        print(f"🟩 GOOD: Monitoring alert sent to {MONITORING_TOPIC_ARN}")
    except Exception as e:
        print(f"🟥 CRITICAL: Failed to send SNS alert: {e}")

# 🟦 NOTE: Example usage
if __name__ == "__main__" and MONITORING_TOPIC_ARN:
    send_monitoring_alert("Test alert from CodeCraft AI deployment monitoring.")