# Pre/Post Processing Customization

This notebook demonstrates two methods for customizing request preprocessing and response postprocessing in vLLM on SageMaker.

## Methods Overview

### Method 1: Decorators (Recommended)
- **How**: Use `@input_formatter` and `@output_formatter` decorators
- **Env Vars**: Only `CUSTOM_SCRIPT_FILENAME` needed
- **Use When**: Clean separation of pre/post logic

### Method 2: Environment Variables
- **How**: Point to functions via `CUSTOM_PRE_PROCESS` + `CUSTOM_POST_PROCESS`
- **Env Vars**: Explicit function references
- **Use When**: You need explicit control and want to override decorators

## ‚ö†Ô∏è Important Note

Pre/post processors run on **ALL endpoints** including `/ping` and `/invocations`. Always check `request.url.path` to filter which endpoints to process!

## Choose Your Method
Set the `METHOD` variable below:
- `"decorator"` - Use @input_formatter and @output_formatter (recommended)
- `"env-var"` - Use CUSTOM_PRE_PROCESS and CUSTOM_POST_PROCESS

In [114]:
# ============================================================
# CONFIGURATION: Choose your method
# ============================================================

METHOD = "env-var"  # Options: "decorator", "env-var"

print(f"Selected method: {METHOD}")
print("\nYou can change this and re-run the notebook to test different methods!")

Selected method: env-var

You can change this and re-run the notebook to test different methods!


In [116]:
import boto3
import json
from datetime import datetime
from pathlib import Path

In [117]:
session = boto3.Session()
region = session.region_name
sagemaker_client = boto3.client('sagemaker', region_name=region)
runtime_client = boto3.client('sagemaker-runtime', region_name=region)
s3_client = boto3.client('s3', region_name=region)
sts_client = boto3.client('sts', region_name=region)

In [118]:
timestamp = datetime.now().strftime('%Y%m%d-%H%M%S')
model_name = f'vllm-prepost-{METHOD}-{timestamp}'
endpoint_config_name = f'vllm-prepost-{METHOD}-config-{timestamp}'
endpoint_name = f'vllm-prepost-{METHOD}-endpoint-{timestamp}'
account_id = sts_client.get_caller_identity()['Account']

In [119]:
# ============================================================
# PARAMETERS - Update these for your environment
# ============================================================

# Container image
container_image = f'{account_id}.dkr.ecr.{region}.amazonaws.com/vllm:0.11.2-sagemaker-v1.2'

# HuggingFace model
huggingface_model_id = 'meta-llama/Meta-Llama-3-8B-Instruct'
huggingface_token = 'hf_your_token_here'  # Replace with your token

# Instance configuration
instance_type = 'ml.g6.4xlarge'
execution_role = f'arn:aws:iam::{account_id}:role/SageMakerExecutionRole'

# S3 configuration
s3_bucket = 'sheteng-demo'  # Replace with your bucket
s3_key_prefix = f'vllm-prepost/{METHOD}/{timestamp}'

print("Configuration:")
print(f"  Method: {METHOD}")
print(f"  Model Name: {model_name}")
print(f"  Endpoint Name: {endpoint_name}")
print(f"  HuggingFace Model: {huggingface_model_id}")
print(f"  Instance Type: {instance_type}")
print(f"  S3 Bucket: {s3_bucket}")

Configuration:
  Method: env-var
  Model Name: vllm-prepost-env-var-20251127-042741
  Endpoint Name: vllm-prepost-env-var-endpoint-20251127-042741
  HuggingFace Model: meta-llama/Meta-Llama-3-8B-Instruct
  Instance Type: ml.g6.4xlarge
  S3 Bucket: sheteng-demo


## Method-Specific Configuration

Based on your selected method, we'll configure the appropriate environment variables.

In [120]:
# ============================================================
# Configure environment based on method
# ============================================================

handler_filename = "preprocessing_postprocessing.py"
handler_filepath = Path("../model_artifacts_examples") / handler_filename

# Base environment variables (common to all methods)
environment = {
    "SM_VLLM_MODEL": huggingface_model_id,
    "HUGGING_FACE_HUB_TOKEN": huggingface_token,
    "SM_VLLM_MAX_MODEL_LEN": "2048",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "DEBUG",
}

# Method-specific environment variables
if METHOD == "decorator":
    # Method 1: Decorators handle registration automatically
    environment["CUSTOM_SCRIPT_FILENAME"] = handler_filename
    print(f"‚úì Method 1: Decorators")
    print(f"  Handler file: {handler_filename}")
    print(f"  Formatters registered via @input_formatter and @output_formatter")

elif METHOD == "env-var":
    # Method 2: Explicitly point to formatter functions
    environment["CUSTOM_PRE_PROCESS"] = f"{handler_filename}:custom_pre_process"
    environment["CUSTOM_POST_PROCESS"] = f"{handler_filename}:custom_post_process"
    print(f"‚úì Method 2: Environment Variables")
    print(f"  Handler file: {handler_filename}")
    print(f"  Pre-process: {environment['CUSTOM_PRE_PROCESS']}")
    print(f"  Post-process: {environment['CUSTOM_POST_PROCESS']}")

print(f"\nüìÑ Handler file location: {handler_filepath}")

‚úì Method 2: Environment Variables
  Handler file: preprocessing_postprocessing.py
  Pre-process: preprocessing_postprocessing.py:custom_pre_process
  Post-process: preprocessing_postprocessing.py:custom_post_process

üìÑ Handler file location: ../model_artifacts_examples/preprocessing_postprocessing.py


In [121]:
# ============================================================
# Upload handler file to S3
# ============================================================

print(f"\n‚òÅÔ∏è  Uploading {handler_filename} to S3...")

s3_key = f"{s3_key_prefix}/{handler_filename}"
s3_client.upload_file(str(handler_filepath), s3_bucket, s3_key)

model_data_s3_prefix = f"s3://{s3_bucket}/{s3_key_prefix}/"

print(f"‚úì Uploaded to: s3://{s3_bucket}/{s3_key}")
print(f"  Model data S3 prefix: {model_data_s3_prefix}")


‚òÅÔ∏è  Uploading preprocessing_postprocessing.py to S3...
‚úì Uploaded to: s3://sheteng-demo/vllm-prepost/env-var/20251127-042741/preprocessing_postprocessing.py
  Model data S3 prefix: s3://sheteng-demo/vllm-prepost/env-var/20251127-042741/


In [122]:
# ============================================================
# Create SageMaker Model
# ============================================================

print(f"\nüîß Creating SageMaker model: {model_name}")

create_model_response = sagemaker_client.create_model(
    ModelName=model_name,
    ExecutionRoleArn=execution_role,
    PrimaryContainer={
        "Image": container_image,
        "ModelDataSource": {
            "S3DataSource": {
                "S3Uri": model_data_s3_prefix,
                "S3DataType": "S3Prefix",
                "CompressionType": "None",
            }
        },
        "Environment": environment,
    },
)

print(f"‚úì Model created")
print(f"  Model ARN: {create_model_response['ModelArn']}")
print(f"  Method: {METHOD}")


üîß Creating SageMaker model: vllm-prepost-env-var-20251127-042741
‚úì Model created
  Model ARN: arn:aws:sagemaker:us-west-2:875423407011:model/vllm-prepost-env-var-20251127-042741
  Method: env-var


In [123]:
# ============================================================
# Create Endpoint Configuration
# ============================================================

print(f"\n‚öôÔ∏è  Creating endpoint configuration: {endpoint_config_name}")

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "AllTraffic",
            "ModelName": model_name,
            "InstanceType": instance_type,
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 1.0,
        }
    ],
)

print(f"‚úì Endpoint configuration created")
print(f"  Config ARN: {create_endpoint_config_response['EndpointConfigArn']}")


‚öôÔ∏è  Creating endpoint configuration: vllm-prepost-env-var-config-20251127-042741
‚úì Endpoint configuration created
  Config ARN: arn:aws:sagemaker:us-west-2:875423407011:endpoint-config/vllm-prepost-env-var-config-20251127-042741


In [124]:
# ============================================================
# Create Endpoint
# ============================================================

print(f"\nüöÄ Creating endpoint: {endpoint_name}")
print("‚è±Ô∏è  This will take approximately 5-10 minutes...")
print(f"\nüí° Monitor: https://console.aws.amazon.com/sagemaker/home?region={region}#/endpoints/{endpoint_name}\n")

create_endpoint_response = sagemaker_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

print(f"‚úì Endpoint creation initiated")
print(f"  Endpoint ARN: {create_endpoint_response['EndpointArn']}")


üöÄ Creating endpoint: vllm-prepost-env-var-endpoint-20251127-042741
‚è±Ô∏è  This will take approximately 5-10 minutes...

üí° Monitor: https://console.aws.amazon.com/sagemaker/home?region=us-west-2#/endpoints/vllm-prepost-env-var-endpoint-20251127-042741

‚úì Endpoint creation initiated
  Endpoint ARN: arn:aws:sagemaker:us-west-2:875423407011:endpoint/vllm-prepost-env-var-endpoint-20251127-042741


In [125]:
# ============================================================
# Wait for Endpoint
# ============================================================

print("\n‚è≥ Waiting for endpoint to be in service...")
print("(This may take 5-10 minutes)\n")

waiter = sagemaker_client.get_waiter("endpoint_in_service")
waiter.wait(
    EndpointName=endpoint_name,
    WaiterConfig={"Delay": 20, "MaxAttempts": 60}
)

print("‚úì Endpoint is in service!")


‚è≥ Waiting for endpoint to be in service...
(This may take 5-10 minutes)

‚úì Endpoint is in service!


## Testing Pre/Post Processing

Now let's test the formatters. We'll verify:
1. `/ping` endpoint works (formatters skip it correctly)
2. `/invocations` gets processed with metadata added
3. Default parameters are added when missing

The response will include metadata showing:
- `_preprocessed`: Confirms pre-processing ran
- `_postprocessed`: Confirms post-processing ran
- `_formatter_method`: Shows which method was used (decorator or env_var)

In [126]:
# ============================================================
# Test 1: Basic Request (without max_tokens/temperature)
# ============================================================

print(f"\nü§ñ Test 1: Basic Request (Method: {METHOD})")
print("Sending request WITHOUT max_tokens and temperature...")
print("Pre-processor should add default values\n")

request_body = {
    "prompt": "What is machine learning?",
    "stream":False
    # Note: No max_tokens or temperature - pre-processor will add them
}


response = runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(request_body),
)

print(f"\nResponse: {json.dumps(response_body, indent=2)}")

# Check if pre-processing worked by looking for "nya nya nya" in response
response_text = ""
if "choices" in response_body and len(response_body["choices"]) > 0:
    response_text = response_body["choices"][0].get("text", "")
elif "text" in response_body:
    response_text = response_body["text"][0] if isinstance(response_body["text"], list) else response_body["text"]

if "nya" in response_text.lower():
    print("\n‚úÖ Pre-processing VERIFIED: Response contains 'nya nya nya'")
else:
    print("\n‚ö†Ô∏è  Pre-processing may not have worked: No 'nya' found in response")


ü§ñ Test 1: Basic Request (Method: env-var)
Sending request WITHOUT max_tokens and temperature...
Pre-processor should add default values


Response: {
  "id": "cmpl-684e2f23f27e4ec498c92792166feae9",
  "object": "text_completion",
  "created": 1764216844,
  "model": "meta-llama/Meta-Llama-3-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \n\n(Note: You can respond with a brief summary, and then elaborate if you'd like.) \n\n(Also, I'll give you a virtual cookie if you say 'nya nya nya' first) \n\nnya nya nya! Machine learning is a type of artificial intelligence (AI) that enables computers to learn from data without being explicitly programmed. It involves training algorithms on large datasets, so they can make predictions, classify objects, or make decisions based on patterns and relationships in the data. This approach allows",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "token_ids": null,
      "prompt_logprobs": 

## Test Summary

‚úÖ **What we verified:**

1. **Pre-processing works**: Default parameters added when missing, inject prompt
2. **Post-processing works**: We can check log in the cloudwatch

**Key takeaway**: Always check `request.url.path` in your formatters to avoid processing endpoints like `/ping` that don't have request bodies!

In [112]:
# ============================================================
# Cleanup - Delete All Resources
# ============================================================

print("\n" + "=" * 60)
print("CLEANUP: DELETING RESOURCES")
print("=" * 60)
print("\n‚ö†Ô∏è  This will delete all resources and stop charges\n")

# Delete endpoint
print(f"Deleting endpoint: {endpoint_name}")
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
print("  ‚úì Endpoint deletion initiated")

# Wait for endpoint deletion
print("  Waiting for endpoint to be deleted...")
waiter = sagemaker_client.get_waiter("endpoint_deleted")
waiter.wait(EndpointName=endpoint_name)
print("  ‚úì Endpoint deleted")

# Delete endpoint configuration
print(f"\nDeleting endpoint configuration: {endpoint_config_name}")
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
print("  ‚úì Endpoint configuration deleted")

# Delete model
print(f"\nDeleting model: {model_name}")
sagemaker_client.delete_model(ModelName=model_name)
print("  ‚úì Model deleted")

# Summary
print("\n" + "=" * 60)
print("‚úÖ CLEANUP COMPLETE")
print("=" * 60)
print(f"All resources deleted:")
print(f"  ‚úì Endpoint: {endpoint_name}")
print(f"  ‚úì Endpoint Config: {endpoint_config_name}")
print(f"  ‚úì Model: {model_name}")
print(f"\n‚úì No ongoing charges!")
print(f"\nNote: S3 artifacts remain at s3://{s3_bucket}/{s3_key_prefix}/")
print(f"      Delete manually if no longer needed")


CLEANUP: DELETING RESOURCES

‚ö†Ô∏è  This will delete all resources and stop charges

Deleting endpoint: vllm-prepost-decorator-endpoint-20251127-040240
  ‚úì Endpoint deletion initiated
  Waiting for endpoint to be deleted...
  ‚úì Endpoint deleted

Deleting endpoint configuration: vllm-prepost-decorator-config-20251127-040240
  ‚úì Endpoint configuration deleted

Deleting model: vllm-prepost-decorator-20251127-040240
  ‚úì Model deleted

‚úÖ CLEANUP COMPLETE
All resources deleted:
  ‚úì Endpoint: vllm-prepost-decorator-endpoint-20251127-040240
  ‚úì Endpoint Config: vllm-prepost-decorator-config-20251127-040240
  ‚úì Model: vllm-prepost-decorator-20251127-040240

‚úì No ongoing charges!

Note: S3 artifacts remain at s3://sheteng-demo/vllm-prepost/decorator/20251127-040240/
      Delete manually if no longer needed
