# Handler Customization Methods - Complete Guide

This notebook demonstrates **all three methods** for customizing `/ping` and `/invocations` handlers in vLLM on SageMaker.

## Three Methods Overview

### Method 1: Environment Variables (Highest Priority)
- **File**: `handlers_env_var.py`
- **How**: Point to handler functions via environment variables
- **Env Vars**: `CUSTOM_FASTAPI_PING_HANDLER`, `CUSTOM_FASTAPI_INVOCATION_HANDLER`
- **Use When**: You need explicit control and want to override all other methods

### Method 2: Decorators
- **File**: `handlers_decorator.py`
- **How**: Use `@custom_ping_handler` and `@custom_invocation_handler` decorators
- **Env Vars**: Only `CUSTOM_SCRIPT_FILENAME` needed
- **Use When**: You want clean, explicit handler registration (recommended)

### Method 3: Function Discovery (Lowest Priority)
- **File**: `handlers_discovery.py`
- **How**: Name functions `custom_sagemaker_ping_handler`, `custom_sagemaker_invocation_handler`
- **Env Vars**: Only `CUSTOM_SCRIPT_FILENAME` needed
- **Use When**: You want the simplest approach with convention over configuration

## Handler Resolution Priority
```
1. Environment Variables (Method 1) ‚Üê Highest priority
2. Decorator Registration (Method 2)
3. Function Discovery (Method 3)
4. Framework Defaults ‚Üê Lowest priority
```

## Choose Your Method
Set the `METHOD` variable below to test different approaches:
- `"env_var"` - Environment Variables
- `"decorator"` - Decorators (recommended)
- `"discovery"` - Function Discovery

In [80]:
# ============================================================
# CONFIGURATION: Choose your handler customization method
# ============================================================

METHOD = "discovery"  # Options: "env-var", "decorator", "discovery"

print(f"Selected method: {METHOD}")
print("\nYou can change this and re-run the notebook to test different methods!")

Selected method: discovery

You can change this and re-run the notebook to test different methods!


In [81]:
import boto3
import json
import time
from datetime import datetime
from pathlib import Path

In [82]:
session = boto3.Session()
region = session.region_name
sagemaker_client = boto3.client('sagemaker', region_name=region)
runtime_client = boto3.client('sagemaker-runtime', region_name=region)
s3_client = boto3.client('s3', region_name=region)
sts_client = boto3.client('sts', region_name=region)

In [83]:
timestamp = datetime.now().strftime('%Y%m%d-%H%M%S')
model_name = f'vllm-{METHOD}-{timestamp}'
endpoint_config_name = f'vllm-{METHOD}-config-{timestamp}'
endpoint_name = f'vllm-{METHOD}-endpoint-{timestamp}'
account_id = sts_client.get_caller_identity()['Account']

In [85]:
# ============================================================
# PARAMETERS - Update these for your environment
# ============================================================

# Container image
# Make sure this exists!!!!!
container_image = f'{account_id}.dkr.ecr.{region}.amazonaws.com/vllm:0.11.2-sagemaker-v1.2'

# HuggingFace model
huggingface_model_id = 'meta-llama/Meta-Llama-3-8B-Instruct'
huggingface_token = 'hf_your_token_here'  # Replace with your token

# Instance configuration
instance_type = 'ml.g6.4xlarge'
execution_role = f'arn:aws:iam::{account_id}:role/SageMakerExecutionRole'

# S3 configuration
s3_bucket = 'sheteng-demo'  # Replace with your bucket
s3_key_prefix = f'vllm-handlers/{METHOD}/{timestamp}'

print("Configuration:")
print(f"  Method: {METHOD}")
print(f"  Model Name: {model_name}")
print(f"  Endpoint Name: {endpoint_name}")
print(f"  HuggingFace Model: {huggingface_model_id}")
print(f"  Instance Type: {instance_type}")
print(f"  S3 Bucket: {s3_bucket}")

Configuration:
  Method: discovery
  Model Name: vllm-discovery-20251127-015140
  Endpoint Name: vllm-discovery-endpoint-20251127-015140
  HuggingFace Model: meta-llama/Meta-Llama-3-8B-Instruct
  Instance Type: ml.g6.4xlarge
  S3 Bucket: sheteng-demo


## Method-Specific Configuration

Based on your selected method, we'll configure the appropriate handler file and environment variables.

In [86]:
# ============================================================
# Configure handler file and environment based on method
# ============================================================

# Map method to handler file
handler_files = {
    "env-var": "handlers_env_var.py",
    "decorator": "handlers_decorator.py",
    "discovery": "handlers_discovery.py"
}

handler_filename = handler_files[METHOD]
handler_filepath = Path("../model_artifacts_examples") / handler_filename

# Base environment variables (common to all methods)
environment = {
    "SM_VLLM_MODEL": huggingface_model_id,
    "HUGGING_FACE_HUB_TOKEN": huggingface_token,
    "SM_VLLM_MAX_MODEL_LEN": "2048",
    "CUSTOM_SCRIPT_FILENAME": handler_filename,
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "DEBUG",
}

# Method-specific environment variables
if METHOD == "env-var":
    # Method 1: Explicitly point to handler functions
    environment["CUSTOM_FASTAPI_PING_HANDLER"] = f"{handler_filename}:health_check"
    environment["CUSTOM_FASTAPI_INVOCATION_HANDLER"] = f"{handler_filename}:inference"
    print(f"‚úì Method 1: Environment Variables")
    print(f"  Handler file: {handler_filename}")
    print(f"  Ping handler: {environment['CUSTOM_FASTAPI_PING_HANDLER']}")
    print(f"  Invocation handler: {environment['CUSTOM_FASTAPI_INVOCATION_HANDLER']}")

elif METHOD == "decorator":
    # Method 2: Decorators handle registration automatically
    print(f"‚úì Method 2: Decorators")
    print(f"  Handler file: {handler_filename}")
    print(f"  Handlers registered via @custom_ping_handler and @custom_invocation_handler")

elif METHOD == "discovery":
    # Method 3: Function names follow convention
    print(f"‚úì Method 3: Function Discovery")
    print(f"  Handler file: {handler_filename}")
    print(f"  Handlers discovered by function names:")
    print(f"    - custom_sagemaker_ping_handler")
    print(f"    - custom_sagemaker_invocation_handler")

print(f"\nüìÑ Handler file location: {handler_filepath}")

‚úì Method 3: Function Discovery
  Handler file: handlers_discovery.py
  Handlers discovered by function names:
    - custom_sagemaker_ping_handler
    - custom_sagemaker_invocation_handler

üìÑ Handler file location: ../model_artifacts_examples/handlers_discovery.py


In [87]:
# ============================================================
# Upload handler file to S3
# ============================================================

print(f"\n‚òÅÔ∏è  Uploading {handler_filename} to S3...")

s3_key = f"{s3_key_prefix}/{handler_filename}"
s3_client.upload_file(str(handler_filepath), s3_bucket, s3_key)

model_data_s3_prefix = f"s3://{s3_bucket}/{s3_key_prefix}/"

print(f"‚úì Uploaded to: s3://{s3_bucket}/{s3_key}")
print(f"  Model data S3 prefix: {model_data_s3_prefix}")


‚òÅÔ∏è  Uploading handlers_discovery.py to S3...
‚úì Uploaded to: s3://sheteng-demo/vllm-handlers/discovery/20251127-015140/handlers_discovery.py
  Model data S3 prefix: s3://sheteng-demo/vllm-handlers/discovery/20251127-015140/


In [88]:
# ============================================================
# Create SageMaker Model
# ============================================================

print(f"\nüîß Creating SageMaker model: {model_name}")

create_model_response = sagemaker_client.create_model(
    ModelName=model_name,
    ExecutionRoleArn=execution_role,
    PrimaryContainer={
        "Image": container_image,
        "ModelDataSource": {
            "S3DataSource": {
                "S3Uri": model_data_s3_prefix,
                "S3DataType": "S3Prefix",
                "CompressionType": "None",
            }
        },
        "Environment": environment,
    },
)

print(f"‚úì Model created")
print(f"  Model ARN: {create_model_response['ModelArn']}")
print(f"  Method: {METHOD}")


üîß Creating SageMaker model: vllm-discovery-20251127-015140
‚úì Model created
  Model ARN: arn:aws:sagemaker:us-west-2:875423407011:model/vllm-discovery-20251127-015140
  Method: discovery


In [89]:
# ============================================================
# Create Endpoint Configuration
# ============================================================

print(f"\n‚öôÔ∏è  Creating endpoint configuration: {endpoint_config_name}")

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "AllTraffic",
            "ModelName": model_name,
            "InstanceType": instance_type,
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 1.0,
        }
    ],
)

print(f"‚úì Endpoint configuration created")
print(f"  Config ARN: {create_endpoint_config_response['EndpointConfigArn']}")


‚öôÔ∏è  Creating endpoint configuration: vllm-discovery-config-20251127-015140
‚úì Endpoint configuration created
  Config ARN: arn:aws:sagemaker:us-west-2:875423407011:endpoint-config/vllm-discovery-config-20251127-015140


In [90]:
# ============================================================
# Create Endpoint
# ============================================================

print(f"\nüöÄ Creating endpoint: {endpoint_name}")
print("‚è±Ô∏è  This will take approximately 5-10 minutes...")
print(f"\nüí° Monitor: https://console.aws.amazon.com/sagemaker/home?region={region}#/endpoints/{endpoint_name}\n")

create_endpoint_response = sagemaker_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

print(f"‚úì Endpoint creation initiated")
print(f"  Endpoint ARN: {create_endpoint_response['EndpointArn']}")


üöÄ Creating endpoint: vllm-discovery-endpoint-20251127-015140
‚è±Ô∏è  This will take approximately 5-10 minutes...

üí° Monitor: https://console.aws.amazon.com/sagemaker/home?region=us-west-2#/endpoints/vllm-discovery-endpoint-20251127-015140

‚úì Endpoint creation initiated
  Endpoint ARN: arn:aws:sagemaker:us-west-2:875423407011:endpoint/vllm-discovery-endpoint-20251127-015140


In [91]:
# ============================================================
# Wait for Endpoint
# ============================================================

print("\n‚è≥ Waiting for endpoint to be in service...")
print("(This may take 5-10 minutes)\n")

waiter = sagemaker_client.get_waiter("endpoint_in_service")
waiter.wait(
    EndpointName=endpoint_name,
    WaiterConfig={"Delay": 20, "MaxAttempts": 60}
)

print("‚úì Endpoint is in service!")


‚è≥ Waiting for endpoint to be in service...
(This may take 5-10 minutes)

‚úì Endpoint is in service!


## Testing the Custom Handlers

Now let's test the custom handlers. The response will include a `method` field showing which customization method was used.

In [93]:
# ============================================================
# Test 1: Basic Inference
# ============================================================

print(f"\nü§ñ Test 1: Basic Inference (Method: {METHOD})")

request_body = {
    "prompt": "What is the capital of Amazon?",
    "max_tokens": 100,
    "temperature": 0.7,
}

print(f"Prompt: {request_body['prompt']}")

response = runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(request_body),
)

result = json.loads(response["Body"].read().decode("utf-8"))

print(f"\n‚úì Response received:")
print(f"  Method: {result.get('method', 'N/A')}")
print(f"  Model: {result.get('model', 'N/A')}")
if "predictions" in result:
    print(f"  Prediction: {result['predictions'][0][:150]}...")
if "usage" in result:
    print(f"  Tokens: {result['usage']}")

print(f"\nFull response:")
print(json.dumps(result, indent=2))


ü§ñ Test 1: Basic Inference (Method: discovery)
Prompt: What is the capital of Amazon?

‚úì Response received:
  Method: function_discovery
  Model: vllm
  Prediction:  The answer is that Amazon doesn't have a traditional capital city. The company is headquartered in Seattle, Washington, USA, and has multiple offices...
  Tokens: {'prompt_tokens': 8, 'completion_tokens': 100, 'total_tokens': 108}

Full response:
{
  "predictions": [
    " The answer is that Amazon doesn't have a traditional capital city. The company is headquartered in Seattle, Washington, USA, and has multiple offices and facilities around the world.\n\nHowever, Amazon has built several research and development centers, called \"Amazon Lab126,\" in various locations, including:\n1. Palo Alto, California, USA\n2. Cambridge, Massachusetts, USA\n3. Sunnyvale, California, USA\n4. Shenzhen, Guangdong, China\n5. Bengaluru, Karnataka,"
  ],
  "model": "vllm",
  "method": "function_discovery",
  "usage": {
    "prompt_token

In [94]:
# ============================================================
# Test 2: Error Handling
# ============================================================

print(f"\nü§ñ Test 2: Error Handling (Method: {METHOD})")

request_body_invalid = {
    "max_tokens": 50,
    # Missing "prompt" field
}

print("Sending request without 'prompt' field...")

try:
    response = runtime_client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=json.dumps(request_body_invalid),
    )
    
    result = json.loads(response["Body"].read().decode("utf-8"))
    
    if "error" in result:
        print(f"\n‚úì Error handled correctly:")
        print(f"  Error: {result['error']}")
    else:
        print(f"\n‚ö†Ô∏è  Expected error but got:")
        print(json.dumps(result, indent=2))

except Exception as e:
    print(f"\n‚úì Error caught by client:")
    print(f"  {str(e)[:200]}")


ü§ñ Test 2: Error Handling (Method: discovery)
Sending request without 'prompt' field...

‚úì Error caught by client:
  An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{"error": "Missing required field: prompt"}". See https://us-west-2.con


In [95]:
# ============================================================
# Cleanup - Delete All Resources
# ============================================================

print("\n" + "=" * 60)
print("CLEANUP: DELETING RESOURCES")
print("=" * 60)
print("\n‚ö†Ô∏è  This will delete all resources and stop charges\n")

# Delete endpoint
print(f"Deleting endpoint: {endpoint_name}")
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
print("  ‚úì Endpoint deletion initiated")

# Wait for endpoint deletion
print("  Waiting for endpoint to be deleted...")
waiter = sagemaker_client.get_waiter("endpoint_deleted")
waiter.wait(EndpointName=endpoint_name)
print("  ‚úì Endpoint deleted")

# Delete endpoint configuration
print(f"\nDeleting endpoint configuration: {endpoint_config_name}")
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
print("  ‚úì Endpoint configuration deleted")

# Delete model
print(f"\nDeleting model: {model_name}")
sagemaker_client.delete_model(ModelName=model_name)
print("  ‚úì Model deleted")

# Summary
print("\n" + "=" * 60)
print("‚úÖ CLEANUP COMPLETE")
print("=" * 60)
print(f"All resources deleted:")
print(f"  ‚úì Endpoint: {endpoint_name}")
print(f"  ‚úì Endpoint Config: {endpoint_config_name}")
print(f"  ‚úì Model: {model_name}")
print(f"\n‚úì No ongoing charges!")
print(f"\nNote: S3 artifacts remain at s3://{s3_bucket}/{s3_key_prefix}/")
print(f"      Delete manually if no longer needed")


CLEANUP: DELETING RESOURCES

‚ö†Ô∏è  This will delete all resources and stop charges

Deleting endpoint: vllm-discovery-endpoint-20251127-015140
  ‚úì Endpoint deletion initiated
  Waiting for endpoint to be deleted...
  ‚úì Endpoint deleted

Deleting endpoint configuration: vllm-discovery-config-20251127-015140
  ‚úì Endpoint configuration deleted

Deleting model: vllm-discovery-20251127-015140
  ‚úì Model deleted

‚úÖ CLEANUP COMPLETE
All resources deleted:
  ‚úì Endpoint: vllm-discovery-endpoint-20251127-015140
  ‚úì Endpoint Config: vllm-discovery-config-20251127-015140
  ‚úì Model: vllm-discovery-20251127-015140

‚úì No ongoing charges!

Note: S3 artifacts remain at s3://sheteng-demo/vllm-handlers/discovery/20251127-015140/
      Delete manually if no longer needed
