# SageMaker V3 Custom InferenceSpec Example

This notebook demonstrates how to create and deploy custom models using InferenceSpec with SageMaker V3 ModelBuilder.

### Prerequisites
Note: Ensure you have sagemaker and ipywidgets installed in your environment. The ipywidgets package is required to monitor endpoint deployment progress in Jupyter notebooks.


In [None]:
# Import required libraries
import json
import uuid
import tempfile
import os
import torch
import torch.nn as nn

from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.spec.inference_spec import InferenceSpec
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.utils.types import ModelServer
from sagemaker.core.resources import EndpointConfig

## Step 1: Create a Simple PyTorch Model

First, let's create a simple neural network model for demonstration.

In [None]:
class SimpleModel(nn.Module):
    """A simple neural network for classification."""
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4, 2)
    
    def forward(self, x):
        return torch.softmax(self.linear(x), dim=1)

# Create and save the model
pytorch_model = SimpleModel()
model_path = tempfile.mkdtemp()

# Save model using TorchScript for deployment
sample_input = torch.tensor([[0.1, 0.2, 0.3, 0.4]], dtype=torch.float32)
traced_model = torch.jit.trace(pytorch_model, sample_input)
model_file = os.path.join(model_path, "model.pth")
torch.jit.save(traced_model, model_file)

print(f"Model saved to: {model_file}")

## Step 2: Define Custom InferenceSpec

Create a custom InferenceSpec that defines how to load and run inference with our model.

In [None]:
class SimpleModelSpec(InferenceSpec):
    """Custom InferenceSpec for our simple PyTorch model."""
    
    def load(self, model_dir: str):
        """Load the PyTorch model from the model directory."""
        model = SimpleModel()
        model_path = os.path.join(model_dir, "model.pth")
        
        if os.path.exists(model_path):
            model = torch.jit.load(model_path, map_location='cpu')
        
        model.eval()
        return model
    
    def invoke(self, input_object: object, model: object):
        """Run inference on the input data."""
        # Handle list input (the expected format)
        if isinstance(input_object, list):
            input_tensor = torch.tensor(input_object, dtype=torch.float32)
        else:
            input_tensor = torch.tensor([[0.1, 0.2, 0.3, 0.4]], dtype=torch.float32)
        
        with torch.no_grad():
            predictions = model(input_tensor)
        
        return predictions.tolist()

print("Custom InferenceSpec defined successfully!")

## Step 3: Create Schema Builder

Define the input/output schema for our model.

In [None]:
# Create schema builder with sample input/output
sample_input = [[0.1, 0.2, 0.3, 0.4]]  # List format for JSON serialization
sample_output = [[0.9, 0.1]]  # Expected output format

schema_builder = SchemaBuilder(sample_input, sample_output)
print("Schema builder created successfully!")

## Step 4: Configure ModelBuilder

Set up the ModelBuilder with our custom InferenceSpec.

In [None]:
# Configuration
MODEL_NAME_PREFIX = "custom-spec-model"
ENDPOINT_NAME_PREFIX = "custom-spec-endpoint"

# Generate unique identifiers
unique_id = str(uuid.uuid4())[:8]
model_name = f"{MODEL_NAME_PREFIX}-{unique_id}"
endpoint_name = f"{ENDPOINT_NAME_PREFIX}-{unique_id}"

# Create ModelBuilder with custom InferenceSpec
inference_spec = SimpleModelSpec()
model_builder = ModelBuilder(
    inference_spec=inference_spec,
    model_path=model_path,
    model_server=ModelServer.TORCHSERVE,
    schema_builder=schema_builder
)

print(f"ModelBuilder configured for model: {model_name}")
print(f"Target endpoint: {endpoint_name}")

## Step 5: Build the Model

Build the model artifacts for deployment.

In [None]:
# Build the model
core_model = model_builder.build(model_name=model_name)
print(f"Model Successfully Created: {core_model.model_name}")

## Step 6: Deploy the Model

Deploy the model to a SageMaker endpoint.

In [None]:
# Deploy the model
core_endpoint = model_builder.deploy(endpoint_name=endpoint_name)
print(f"Endpoint Successfully Created: {core_endpoint.endpoint_name}")

## Step 7: Test the Model

Send test requests to verify the model works correctly.

In [None]:
# Test 1: Single prediction
test_data_1 = [[0.1, 0.2, 0.3, 0.4]]

result_1 = core_endpoint.invoke(
    body=json.dumps(test_data_1),
    content_type="application/json"
)

prediction_1 = json.loads(result_1.body.read().decode('utf-8'))
print(f"Single Prediction: {prediction_1}")

In [None]:
# Test 2: Batch prediction
test_data_2 = [
    [0.1, 0.2, 0.3, 0.4],
    [0.5, 0.6, 0.7, 0.8],
    [0.2, 0.3, 0.4, 0.5]
]

result_2 = core_endpoint.invoke(
    body=json.dumps(test_data_2),
    content_type="application/json"
)

prediction_2 = json.loads(result_2.body.read().decode('utf-8'))
print(f"Batch Prediction: {prediction_2}")

## Step 8: Clean Up Resources

Clean up all created resources and temporary files.

In [None]:
# Clean up AWS resources
core_endpoint_config = EndpointConfig.get(endpoint_config_name=core_endpoint.endpoint_name)

core_model.delete()
core_endpoint.delete()
core_endpoint_config.delete()

# Clean up temporary files
import shutil
shutil.rmtree(model_path)

print("All resources and temporary files successfully deleted!")

## Summary

This notebook demonstrated:
1. Creating a simple PyTorch model
2. Defining a custom InferenceSpec with load() and invoke() methods
3. Setting up schema builders for input/output validation
4. Configuring ModelBuilder with TorchServe
5. Building and deploying the model
6. Testing both single and batch predictions
7. Proper cleanup of resources

Custom InferenceSpecs provide maximum flexibility for deploying any model with custom preprocessing, postprocessing, and inference logic!