---
**Author:** Arifa Kokab  
**For:** AAI-540 Machine Learning Operations  
**Institution:** University of San Diego

# Model Monitoring

## Note on Model Provenance

The facial expression analysis model was trained externally on Google Colab using a high-performance GPU, and the trained artifact was subsequently imported to AWS SageMaker for inference and deployment. We also show how to access SageMaker Batch Transform job logs in CloudWatch for monitoring and troubleshooting.

## Accessing SageMaker Batch Transform Logs in CloudWatch

This section retrieves and displays recent SageMaker Batch Transform job logs from Amazon CloudWatch Logs.
Accessing these logs allows for monitoring of job progress, troubleshooting errors, and auditing inference runs—ensuring transparency and reliability in the model deployment workflow.

In [4]:
import boto3

In [5]:
logs_client = boto3.client('logs', region_name='us-east-1')

# The SageMaker log group for batch transform jobs
log_group = '/aws/sagemaker/TransformJobs'

# List available log streams (each stream corresponds to a job)
response = logs_client.describe_log_streams(logGroupName=log_group, orderBy='LastEventTime', descending=True, limit=5)

print("Recent Batch Transform Job Log Streams:")
for i, stream in enumerate(response['logStreams']):
    print(f"{i+1}: {stream['logStreamName']}")

# Fetch and print last 20 log events from the most recent job
if response['logStreams']:
    log_stream_name = response['logStreams'][0]['logStreamName']
    print(f"\nFetching events from log stream: {log_stream_name}\n")
    events = logs_client.get_log_events(
        logGroupName=log_group,
        logStreamName=log_stream_name,
        limit=20,
        startFromHead=False
    )
    for event in events['events']:
        print(event['message'])
else:
    print("No log streams found for recent batch transform jobs.")

Recent Batch Transform Job Log Streams:
1: pytorch-inference-2025-06-26-08-18-43-515/i-022195994fcc4083d-1750926117/data-log
2: pytorch-inference-2025-06-26-08-18-43-515/i-022195994fcc4083d-1750926117
3: pytorch-inference-2025-06-24-02-40-35-997/i-0be436c185d0084de-1750733033/data-log
4: pytorch-inference-2025-06-24-02-40-35-997/i-0be436c185d0084de-1750733033
5: pytorch-inference-2025-06-24-02-15-27-050/i-0a21b480caeb3c4fe-1750731496/data-log

Fetching events from log stream: pytorch-inference-2025-06-26-08-18-43-515/i-022195994fcc4083d-1750926117/data-log

2025-06-26T08:24:28.404:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD


## Conclusion

This notebook demonstrated how to perform large-scale batch inference using SageMaker Batch Transform and how to access and interpret model monitoring logs via AWS CloudWatch.  
Through log analysis, we identified timeout issues affecting specific input frames, highlighting the importance of robust monitoring for production ML workflows.  
These insights underscore the value of integrating model performance checks and real-time monitoring into the MLOps pipeline, enabling prompt identification and resolution of inference bottlenecks.  
By systematically evaluating model behavior at scale, we ensure reliability, transparency, and continuous improvement for deployed AI systems.