A Python SDK for creating custom evaluation metrics for AWS Nova on-demand model evaluation on Sagemaker Training Job with built-in Pydantic validation. For the official integration with AWS Sagemaker training job, please view in the Official AWS Sagemaker Documentation.
git clone https://github.com/aws/nova-custom-eval-sdk.git
cd nova-custom-eval-sdk
pip install .
The SDK provides:
- Pydantic Validation: Automatic input/output validation using Pydantic models
- PreProcessor: For input data transformation with validation
- PostProcessor: For output data formatting with validation
- Decorators: Simplified processor creation (@preprocess, @postprocess)
- Lambda Handler Builder: Easy Lambda function creation
- Exception Handling: Custom error types with validation feedback
See example/run_example.py for a complete working example to run locally.
You need to create a lambda (follow this guide) and upload nova-custom-eval-sdk as a lambda layer in order to use it.
In the github release, you should be able to find a pre-built nova-custom-eval-layer.zip file.
Use below command to upload custom lambda layer.
aws lambda publish-layer-version \
--layer-name nova-custom-eval-layer \
--zip-file fileb://nova-custom-eval-layer.zip \
--compatible-runtimes python3.12 python3.11 python3.10 python3.9
You need to add this layer as custom layer along with the required AWS layer: AWSLambdaPowertoolsPythonV3-python312-arm64 (because of pydantic depencency) to your lambda.
Then update your lambda code with:
from nova_custom_evaluation_sdk.processors.decorators import preprocess, postprocess
from nova_custom_evaluation_sdk.lambda_handler import build_lambda_handler
@preprocess
def preprocessor(event: dict, context) -> dict:
data = event.get('data', {})
return {
"statusCode": 200,
"body": {
"system": data.get("system"),
"prompt": data.get("prompt", ""),
"gold": data.get("gold", "")
}
}
@postprocess
def postprocessor(event: dict, context) -> dict:
# data is already validated and extracted from event
data = event.get('data', [])
inference_output = data.get('inference_output', '')
gold = data.get('gold', '')
metrics = []
inverted_accuracy = 0 if inference_output.lower() == gold.lower() else 1.0
metrics.append({
"metric": "inverted_accuracy_custom",
"value": inverted_accuracy
})
# Add more metrics here
return {
"statusCode": 200,
"body": metrics
}
# Build Lambda handler
lambda_handler = build_lambda_handler(
preprocessor=preprocessor,
postprocessor=postprocessor
)The SDK automatically validates:
{
"process_type": "preprocess",
"data": {
"prompt": "what can you do?",
"gold": "Hello! How can I help you today?",
"system": "You are a helpful assistant"
}
}{
"process_type": "postprocess",
"data": [
{
"prompt": "what can you do",
"inference_output": "Hello! How can I help you today?",
"gold": "Hello! How can I help you today?"
}
]
}# Run all tests
python -m pytest -v
# Run example
python example/run_example.py# Install in development mode
pip install -e .
# Run tests with coverage
python -m pytest tests/ --cov=nova_custom_evaluation_sdkSee CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.