## 1. Real-Time Inspection of a Deployed AI Model

This script acts as a "diagnostic scanner" for a live Artificial Intelligence model. It uses the **Boto3** library to talk to **Amazon SageMaker**, retrieving the technical specifications and current health of a specific model environment (the "Endpoint").

---

### 1. The Core Objective: Verifying the "Live" Environment

In machine learning, an **Endpoint** is the actual web address where a model lives so that applications can send it data and get back predictions. This code is designed to pull back the curtain and show exactly what is running "under the hood" of that specific address.

### 2: Retrieving the Vital Signs

The first part of the script asks SageMaker for a detailed status report.

* **The Identity (ARN):** It retrieves the "Amazon Resource Name," which is the unique, permanent ID for this specific endpoint within the entire AWS global network.
* **The Health Status:** It checks if the model is currently `InService` (active and working), `Creating` (starting up), or `Failed`.
* **Timestamping:** It records exactly when the model was first created and the last time any changes were made to it.

### 3. Following the "Paper Trail"

An endpoint is essentially a wrapper. To find out what is actually inside it, the script follows a two-step chain of command:

1. **Endpoint $\rightarrow$ Config:** It looks up the "Endpoint Configuration," which is the blueprint describing what kind of hardware (servers/instances) is being used.
2. **Config $\rightarrow$ Model:** It dives into that configuration to find the **Production Variant**. This reveals the specific version of the AI model file (the "Model Name") that is currently processing requests.


In [None]:
import boto3

sagemaker_client = boto3.client('sagemaker')
endpoint_name = "ueba-endpoint2026216-v2"

response = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
print(f"Endpoint ARN: {response['EndpointArn']}")
print(f"Status: {response['EndpointStatus']}")
print(f"Endpoint Config: {response['EndpointConfigName']}")
print(f"Creation Time: {response['CreationTime']}")
print(f"Last Modified: {response['LastModifiedTime']}")

# Get the model name from the endpoint config
config_name = response['EndpointConfigName']
config = sagemaker_client.describe_endpoint_config(EndpointConfigName=config_name)
model_name = config['ProductionVariants'][0]['ModelName']
print(f"Model Name: {model_name}")

Endpoint ARN: arn:aws:sagemaker:us-east-1:805801076223:endpoint/ueba-endpoint2026216-v2
Status: InService
Endpoint Config: ueba-endpoint2026216-v2-with-capture-20260223182617
Creation Time: 2026-02-23 11:02:21.097000+00:00
Last Modified: 2026-02-23 18:30:16.023000+00:00
Model Name: tensorflow-inference-2026-02-23-11-02-19-723


## 2. IAM Role Identification for SageMaker Operations

This code retrieves the **Identity and Access Management (IAM)** role associated with the current SageMaker environment. It allows the script to assume the necessary permissions for interacting with other AWS services, such as Amazon S3, CloudWatch, or ECR.

---

### 1. Permission Discovery

The primary function of `get_execution_role()` is to automatically detect the security profile granted to the notebook instance or Studio environment.

* **Seamless Integration:** It prevents hard-coding sensitive security credentials (like ARNs) directly into the script.
* **Automatic Detection:** If the script is running inside a managed SageMaker environment, it identifies the exact role assigned during the environment's creation.

### 2. AWS Resource Access

By identifying this role, SageMaker gains the authorization needed to perform critical tasks:

* **Data Access:** Reading training datasets from or writing model artifacts to **Amazon S3** buckets.
* **Logging:** Emitting training logs and performance metrics to **CloudWatch**.
* **Container Management:** Pulling required Docker images from **Amazon Elastic Container Registry (ECR)**.

### 3. Cross-Environment Functionality

* **Within SageMaker:** The function returns the ARN (Amazon Resource Name) string of the attached IAM role automatically.
* **Outside SageMaker (Local):** The function will typically fail or throw an error if run on a local machine, as there is no "attached" execution role. In such cases, the role must be specified manually as a string.

### 4. Security Best Practices

* **Least Privilege:** This role should only have the minimum permissions necessary for the specific machine learning task to ensure high security.
* **Traceability:** All actions performed using the retrieved role are logged in **AWS CloudTrail**, providing a clear audit trail for compliance.



In [None]:
#2 IAM rold identification for SageMaker operations
import sagemaker
print(sagemaker.get_execution_role())

arn:aws:iam::805801076223:role/LabRole


# 3.  Verify SageMaker Version and ARN Role


In [None]:
# 3. Install SageMaker verify its version and Role ARN
import sys
import os

# This forces the notebook to look at the site-packages where we just installed things
!pip install sagemaker -U
import sagemaker
from sagemaker import get_execution_role

print(f"SageMaker Version: {sagemaker.__version__}")
role = get_execution_role()
print(f"Role ARN: {role}")

Collecting sagemaker
  Using cached sagemaker-3.4.1-py3-none-any.whl.metadata (20 kB)
Collecting sagemaker-train<2.0.0,>=1.4.1 (from sagemaker)
  Using cached sagemaker_train-1.4.1-py3-none-any.whl.metadata (7.8 kB)
Collecting sagemaker-serve<2.0.0,>=1.4.1 (from sagemaker)
  Using cached sagemaker_serve-1.4.1-py3-none-any.whl.metadata (1.6 kB)
Collecting sagemaker-mlops<2.0.0,>=1.4.1 (from sagemaker)
  Using cached sagemaker_mlops-1.4.1-py3-none-any.whl.metadata (5.7 kB)
Collecting deepdiff (from sagemaker-serve<2.0.0,>=1.4.1->sagemaker)
  Using cached deepdiff-8.6.1-py3-none-any.whl.metadata (8.6 kB)
Collecting mlflow (from sagemaker-serve<2.0.0,>=1.4.1->sagemaker)
  Using cached mlflow-3.10.0-py3-none-any.whl.metadata (31 kB)
Collecting sagemaker_schema_inference_artifacts (from sagemaker-serve<2.0.0,>=1.4.1->sagemaker)
  Using cached sagemaker_schema_inference_artifacts-0.0.5-py3-none-any.whl.metadata (2.3 kB)
Collecting pytest (from sagemaker-serve<2.0.0,>=1.4.1->sagemaker)
  Using

ImportError: cannot import name 'get_execution_role' from 'sagemaker' (unknown location)

## 4. Data Quality Baseline Generation

This script initiates a **SageMaker Processing Job** to analyze a dataset and establish a statistical baseline for a machine learning model. This baseline acts as a "gold standard" used to detect data drift or quality issues in live production traffic.

---

### 1. Job Identity and Security

The script defines a unique identity for the operation using a timestamped name. It assigns a specific **IAM Role** (`LabRole`), providing the necessary permissions to read from and write to cloud storage.

### 2. The Analytical Engine

The analysis is performed within a specialized Docker container (`sagemaker-model-monitor-analyzer`).

* **Input Configuration**: The container is instructed to locate the dataset within its local environment.
* **Data Parsing**: Parameters confirm the data is in **CSV format with a header row**.
* **Output Path**: A internal directory is designated to collect the generated statistics and suggested constraints.

### 3. Resource Allocation

The script provisions dedicated hardware to handle the computational load:

* **Compute Power**: An `ml.m5.xlarge` instance is utilized.
* **Storage Capacity**: 20GB of temporary disk space is attached to the instance for data processing.
* **Time Limit**: A safety timeout of one hour prevents the job from running indefinitely in case of an error.

### 4. Data Movement Pipeline

The configuration maps remote storage to the processing instance:

* **Input (S3 to Container)**: The validation dataset is pulled from a specific S3 path and placed into the container's `/opt/ml/processing/input/` folder.
* **Output (Container to S3)**: Once the analysis concludes, the resulting **statistics.json** and **constraints.json** files are uploaded to the designated S3 output location.


In [None]:
# 4. Data Quality baseline generation

import boto3
import time

sm_client = boto3.client('sagemaker')

role = "arn:aws:iam::805801076223:role/LabRole"
job_name = f"data-quality-baseline-{int(time.time())}"

# 1. DATA QUALITY BASELINE
response = sm_client.create_processing_job(
    ProcessingJobName=job_name, # Fixed: CamelCase, no underscore
    RoleArn=role,
    StoppingCondition={'MaxRuntimeInSeconds': 3600},
    AppSpecification={
        'ImageUri': '156813124714.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer',
        'ContainerArguments': [
            '--baseline_dataset', '/opt/ml/processing/input/baseline_dataset',
            '--dataset_format', '{"csv": {"header": true}}',
            '--output_path', '/opt/ml/processing/output'
        ]
    },
    ProcessingResources={
        'ClusterConfig': {
            'InstanceCount': 1,
            'InstanceType': 'ml.m5.xlarge',
            'VolumeSizeInGB': 20
        }
    },
    ProcessingInputs=[
        {
            'InputName': 'baseline_dataset',
            'S3Input': {
                'S3Uri': 's3://assignment-4-3088428428401/ueba-base-daily-cnn-gru/20260216/validation/validation_input.csv',
                'LocalPath': '/opt/ml/processing/input/baseline_dataset',
                'S3DataType': 'S3Prefix',
                'S3InputMode': 'File'
            }
        }
    ],
    ProcessingOutputConfig={
        'Outputs': [
            {
                'OutputName': 'monitoring_output',
                'S3Output': {
                    'S3Uri': 's3://assignment-4-3088428428401/baseline-output/data_quality',
                    'LocalPath': '/opt/ml/processing/output',
                    'S3UploadMode': 'EndOfJob'
                }
            }
        ]
    }
)

print(f"ðŸš€ Baseline Job Launched successfully: {job_name}")

ðŸš€ Baseline Job Launched successfully: data-quality-baseline-1771874271


## 5. Monitoring Schedule Health Verification

This script acts as a **system probe** to check the status of automated monitoring for a live machine learning model. It verifies that the "guardrail" established to track data quality or model performance is active and functional.

---

### 1. The Core Objective: Verifying Surveillance

In machine learning operations, a **Monitoring Schedule** is an automated task that periodically inspects live traffic. This script determines if that task is correctly "InService" or if it has encountered a configuration error.

### 2. Status Analysis and Diagnostics

The script uses a "Check-and-Report" logic to handle different operational states:

* **Status Retrieval:** It asks the system for the current state of the schedule (e.g., `Scheduled`, `Pending`, or `Failed`).
* **Failure Troubleshooting:** If a `Failed` state is detected, the script automatically pulls the **Failure Reason**. This provides direct insight into issues like missing data, incorrect S3 paths, or permission errors.
* **The "Waiting" State:** A schedule may be active but not yet executing jobs. This often occurs if the model has not yet received enough real-world data to trigger an analysis.

### 3. Automated Discovery Logic

If the specific name provided is not found, the script shifts to a "discovery mode":

* **Endpoint-Wide Search:** It lists every monitoring schedule currently attached to the specific model location (the "Endpoint").
* **Inventory Report:** This provides a clear list of all available monitors, ensuring no active guardrail is overlooked due to a naming discrepancy.

### 4. Why Constant Monitoring Matters

Machine learning models are subject to "decay" as world conditions change. Automated schedules ensure that:

* **Drift Detection:** Shifts in data patterns are caught before they impact business decisions.
* **Accuracy Tracking:** Performance is measured against a known "baseline" or "ground truth."
* **Reliability:** The system confirms that the automation itself has not crashed or been misconfigured.

### 5. Benefits for Operations Teams

* **Real-Time Visibility:** Provides an instant snapshot of the model's safety systems.
* **Reduced Downtime:** Quick identification of failure reasons allows for faster resolution of monitoring gaps.
* **Audit Readiness:** Maintains a clear record of which schedules are active and their current health status.


In [None]:
# 5. Schedule Data Quality Baseline
import boto3

sm_client = boto3.client('sagemaker')
endpoint_name = "ueba-endpoint2026216-v2"
schedule_name = "ueba-performance-schedule"  # replace with exact name if different

try:
    response = sm_client.describe_monitoring_schedule(
        MonitoringScheduleName=schedule_name
    )
    status = response['MonitoringScheduleStatus']
    print(f"Schedule Status: {status}")
    if status == 'Failed':
        print(f"Failure Reason: {response.get('FailureReason', 'None')}")
    else:
        print("Schedule is not failed â€“ it may just be waiting for data capture.")
except sm_client.exceptions.ResourceNotFound:
    print(f"Schedule '{schedule_name}' not found.")
    # List all schedules for this endpoint
    resp = sm_client.list_monitoring_schedules(EndpointName=endpoint_name)
    schedules = resp.get('MonitoringScheduleSummaries', [])
    if schedules:
        print("Available schedules for this endpoint:")
        for s in schedules:
            print(f"  - {s['MonitoringScheduleName']} (Status: {s['MonitoringScheduleStatus']})")
    else:
        print("No monitoring schedules found for this endpoint.")

Schedule Status: Scheduled
Schedule is not failed â€“ it may just be waiting for data capture.


In [None]:
import boto3
import json

runtime = boto3.client('runtime.sagemaker')
endpoint_name = "ueba-endpoint2026216-v2"

# Try 'instances' format
payload_instances = {"instances": [[-0.44759133, -0.13453007]]}
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload_instances)
)
print("âœ… Success with 'instances'!")
print("Response:", response['Body'].read().decode())

âœ… Success with 'instances'!
Response: [2.089785728362839e-13]


In [None]:
import boto3
import json

sm = boto3.client('sagemaker')
endpoint_name = 'ueba-endpoint2026216-v2'

try:
    # 1. Get the current Endpoint details
    desc_endpoint = sm.describe_endpoint(EndpointName=endpoint_name)
    current_config_name = desc_endpoint['EndpointConfigName']
    print(f"--- Current Live Endpoint Configuration ---")
    print(f"Name: {current_config_name}")
    print(f"Status: {desc_endpoint['EndpointStatus']}")

    # 2. Get the details of that specific Configuration
    desc_config = sm.describe_endpoint_config(EndpointConfigName=current_config_name)
    capture_config = desc_config.get('DataCaptureConfig', {})

    print(f"\n--- Data Capture Settings ---")
    print(f"Enabled: {capture_config.get('EnableCapture')}")
    print(f"Destination: {capture_config.get('DestinationS3Uri')}")

    # Check for the typo in CaptureContentTypeHeader
    capture_options = capture_config.get('CaptureOptions', [])
    print(f"Capture Options: {capture_options}")

except Exception as e:
    print(f"Error fetching config: {e}")

--- Current Live Endpoint Configuration ---
Name: ueba-model-performance-monitor-2
Status: InService

--- Data Capture Settings ---
Enabled: True
Destination: s3://assignment-4-3088428428401/ueba-base-daily-cnn-gru/data-capture
Capture Options: [{'CaptureMode': 'Input'}, {'CaptureMode': 'Output'}]


## 8. Model Quality Monitoring Automation

This script creates a **Model Quality Monitoring Schedule** in Amazon SageMaker. It establishes a recurring job to evaluate a live model by comparing real-time predictions with actual outcomes (**Ground Truth**).

---

### 1. The Core Objective: Systematic Performance Auditing

The implementation serves as a permanent "auditor" for the model. It is designed to detect **Model Decay**, where accuracy declines over time due to shifts in real-world data patterns. By comparing predictions to verified truths, the system calculates essential performance metricsâ€”such as **Accuracy**, **Precision**, and **Recall**â€”on a fixed interval.

### 2. Temporal Logic: The Cron Expression

The `ScheduleConfig` utilizes a **Cron Expression** (`cron(0 * * * ? *)`) to define precisely when the analysis occurs.

* The monitor activates **every hour at the zero minute**.
* This frequency enables near real-time tracking of performance fluctuations throughout the production day.

### 3. Comparison of Baseline and Production Data

Detection of model failure requires a stable reference point and fresh inputs.

* **BaselineConfig**: Links to the `constraints.json` and `statistics.json` files produced during the initial training or validation phase.
* **MonitoringInputs**: Extracts raw data captured directly from the live **Endpoint**.
* **Ground Truth**: The `ground_truth_input` parameter specifies an S3 location containing verified "correct answers." The monitor merges these labels with model predictions to determine accuracy.

### 4. Technical Configuration and Requirements

Specific parameters define the "rules of engagement" for the monitoring engine:

* **Problem Type**: Classified as `BinaryClassification`, indicating the model performs two-class decision-making (e.g., Identifying Fraud vs. Non-Fraud).
* **Attribute Mapping**: The script identifies which data columns contain the **prediction**, the **probability**, and the **actual label**.
* **Data Format**: Uses `JSON Lines`, a standard for high-speed, scalable data capture within cloud environments.

### 5. Automated CloudWatch Integration

The `Environment` configurations enable the direct publishing of results to **Amazon CloudWatch**.

* This integration converts monitoring outputs into visual time-series graphs.
* It facilitates the creation of automated alerts, ensuring immediate notification if accuracy drops below a predefined safety threshold.



In [None]:
# 8. Model Quality Monitoring Automation
import boto3
from datetime import datetime

sm_client = boto3.client('sagemaker')

schedule_name = f"mq-final-{datetime.now().strftime('%Y%m%d%H%M%S')}"

response = sm_client.create_monitoring_schedule(
    MonitoringScheduleName=schedule_name,
    MonitoringScheduleConfig={
        'ScheduleConfig': {
            'ScheduleExpression': 'cron(0 * * * ? *)'  # every hour at minute 0
        },
        'MonitoringJobDefinition': {
            'BaselineConfig': {
                'ConstraintsResource': {'S3Uri': baseline_uri + 'constraints.json'},
                'StatisticsResource': {'S3Uri': baseline_uri + 'statistics.json'}
            },
            'MonitoringInputs': [
                {
                    'EndpointInput': {
                        'EndpointName': endpoint_name,
                        'LocalPath': '/opt/ml/processing/input/endpoint',
                        'S3InputMode': 'File',
                        'S3DataDistributionType': 'FullyReplicated'
                    }
                }
            ],
            'MonitoringOutputConfig': {
                'MonitoringOutputs': [
                    {
                        'S3Output': {
                            'S3Uri': output_uri,
                            'LocalPath': '/opt/ml/processing/output',
                            'S3UploadMode': 'EndOfJob'
                        }
                    }
                ]
            },
            'MonitoringResources': {
                'ClusterConfig': {
                    'InstanceCount': 1,
                    'InstanceType': 'ml.m5.xlarge',
                    'VolumeSizeInGB': 20
                }
            },
            'MonitoringAppSpecification': {
                'ImageUri': image_uri
            },
            'Environment': {
                'publish_cloudwatch_metrics': 'Enabled',
                'problem_type': 'BinaryClassification',
                'inference_attribute': 'prediction',          # column/key name in merged output
                'probability_attribute': 'probability',       # column/key name
                'ground_truth_attribute': 'label',            # column/key name
                'ground_truth_input': ground_truth_parent,    # s3://.../model-quality-groundtruth/
                'dataset_format': '{"json": {"lines": true}}' # critical for jsonl capture
            },
            'RoleArn': role_arn,
            'StoppingCondition': {'MaxRuntimeInSeconds': 3600}
        }
    }
)

print(f"Schedule successfully created: {schedule_name}")

Schedule successfully created: mq-final-20260224113031


# Comprehensive Audit and Debugging View for a SageMaker Model Monitor schedule

This script provides a **Comprehensive Audit and Debugging View** for a specific SageMaker Model Monitor schedule. It moves beyond simple status checks by dissecting the underlying "blueprint" (the Monitoring Job Definition) to show exactly where the system is looking for data and where it is depositing its findings.

---

### 1. The Core Objective: Full System Transparency

Monitoring schedules in SageMaker are complex because they link together multiple cloud resources: security roles, hardware settings, and various S3 data paths. This code pulls all these disparate parts into a single, readable report.

### 2. High-Level Vital Signs

The first section of the report provides the "metadata" for the schedule:

* **Status:** Tells if the monitor is currently `Scheduled`, `Pending`, or `Failed`.
* **Execution History:** Identifies exactly when the schedule last "woke up" to perform an analysis. This is crucial for confirming that the automation is actually triggering as expected (e.g., every hour).

### 3. Dissecting the Monitoring Configuration

The script uses `pprint` to display the **MonitoringScheduleConfig**. This is the most critical technical part of the code because it reveals:

* **The "What":** The specific metric type being monitored (e.g., Data Quality vs. Model Quality).
* **The "How":** The hardware settings (like instance type) and the specific Docker container used to run the math.
* **The "When":** The cron expression that determines the frequency of the checks.

### 4. S3 Path Audit: The Data Roadmap

The "IMPORTANT S3 PATHS" section is the scriptâ€™s most practical feature. It maps out the three pillars of model monitoring:

* **Baseline Files:** Shows where the "gold standard" statistics and rules are stored. If these paths are wrong, the monitor will fail immediately.
* **Ground Truth:** For **Model Quality** monitors, this points to the location where labels (the "correct answers") are stored to be compared against model's predictions.
* **Output Destination:** The most important path for usersâ€”this is where SageMaker saves the **Constraint Violations** and **Performance Reports** that need to read to know if model is drifting.

### 5. Debugging and Reliability

By printing the **Full Raw Response**, the script provides a "deep dive" for advanced troubleshooting. If a monitor is behaving strangely, this raw data reveals hidden details like the exact IAM role being used or environment variables that might be interfering with the process.


In [None]:
import boto3
import json
from pprint import pprint

sm_client = boto3.client('sagemaker')

schedule_name = "mq-final-20260224113031"

# 1. Get full details of the schedule
desc = sm_client.describe_monitoring_schedule(
    MonitoringScheduleName=schedule_name
)

print("=" * 70)
print(f"DETAILED INFO FOR SCHEDULE: {schedule_name}")
print("=" * 70)

# Print high-level status and times
print("Status:", desc.get('MonitoringScheduleStatus'))
print("Creation Time:", desc.get('CreationTime'))
print("Last Modified Time:", desc.get('LastModifiedTime'))
print("Last Execution Time:", desc.get('LastMonitoringExecutionSummary', {}).get('LastExecutionTime', 'Never'))

# Print full MonitoringScheduleConfig (the most important part)
print("\nMonitoringScheduleConfig:")
pprint(desc['MonitoringScheduleConfig'])

# Extract and print all relevant S3 paths clearly
job_def = desc['MonitoringScheduleConfig']['MonitoringJobDefinition']

print("\n=== IMPORTANT S3 PATHS ===")
print("Baseline Constraints JSON:")
print("  â†’", job_def.get('BaselineConfig', {}).get('ConstraintsResource', {}).get('S3Uri', 'Not set'))

print("Baseline Statistics JSON:")
print("  â†’", job_def.get('BaselineConfig', {}).get('StatisticsResource', {}).get('S3Uri', 'Not set'))

print("Ground Truth Input Location (where monitor looks for labels):")
print("  â†’", job_def.get('Environment', {}).get('ground_truth_input', 'Not set'))

print("Monitoring Output Location (where reports + violations go):")
print("  â†’", job_def.get('MonitoringOutputConfig', {})
                 .get('MonitoringOutputs', [{}])[0]
                 .get('S3Output', {})
                 .get('S3Uri', 'Not set'))

# Optional: print the entire raw response (very verbose)
print("\nFull raw response (for debugging):")
pprint(desc)

DETAILED INFO FOR SCHEDULE: mq-final-20260224113031
Status: Scheduled
Creation Time: 2026-02-24 11:30:31.438000+00:00
Last Modified Time: 2026-02-24 11:30:37.154000+00:00
Last Execution Time: Never

MonitoringScheduleConfig:
{'MonitoringJobDefinition': {'BaselineConfig': {'ConstraintsResource': {'S3Uri': 's3://assignment-4-3088428428401/ueba-base-daily-cnn-gru/monitor-results/model-quality/baseline/constraints.json'},
                                                'StatisticsResource': {'S3Uri': 's3://assignment-4-3088428428401/ueba-base-daily-cnn-gru/monitor-results/model-quality/baseline/statistics.json'}},
                             'Environment': {'dataset_format': '{"json": '
                                                               '{"lines": '
                                                               'true}}',
                                             'ground_truth_attribute': 'label',
                                             'ground_truth_input': 's3://assig