Skip to content

Valid JSONPath failing in QualityCheckStep #4130

@vmatekole

Description

@vmatekole

Describe the bug
Valid JSONPath(s) do not work in ModelQualityCheckConfig. I have tried all 3, resulting in failure:

$['SageMakerOutput'][0]['species']
$['SageMakerOutput']['species']
$.SageMakerOutput[0].species

for the following data:

{"SageMakerOutput":[{"confidence":0.4238993525505066,"prediction":0,"species":"Adelie"}],"body_mass_g":4100,"culmen_depth_mm":19.1,"culmen_length_mm":41.1,"flipper_length_mm":188,"island":"Biscoe","species":"Adelie"}
{"SageMakerOutput":[{"confidence":0.8337568640708923,"prediction":2,"species":"Gentoo"}],"body_mass_g":4650,"culmen_depth_mm":13.7,"culmen_length_mm":40.9,"flipper_length_mm":214,"island":"Biscoe","species":"Gentoo"}
{"SageMakerOutput":[{"confidence":0.4371323883533478,"prediction":0,"species":"Adelie"}],"body_mass_g":3800,"culmen_depth_mm":19.4,"culmen_length_mm":50.6,"flipper_length_mm":193,"island":"Dream","species":"Chinstrap"}

To reproduce
Here I ran with one example of the JSONPath(s) listed above.

model_quality_location = f"{S3_LOCATION}/monitoring/model-quality"

model_quality_baseline_step = QualityCheckStep(
    name="generate-model-quality-baseline",
    
    check_job_config = CheckJobConfig(
        instance_type="ml.t3.xlarge",
        instance_count=1,
        volume_size_in_gb=20,
        sagemaker_session=sagemaker_session,
        role=role,
    ),
    
    quality_check_config = ModelQualityCheckConfig(
        # We are going to use the output of the Transform Step to generate
        # the model quality baseline.
        baseline_dataset=generate_test_predictions_step.properties.TransformOutput.S3OutputPath,

        dataset_format=DatasetFormat.json(lines=True),

        # We need to specify the problem type and the fields where the prediction
        # and groundtruth are so the process knows how to interpret the results.
        problem_type="MulticlassClassification",
        inference_attribute="$['SageMakerOutput'][0]['species']",
        ground_truth_attribute="species",

        output_s3_uri=model_quality_location,
    ),
    
    skip_check=True,
    register_new_baseline=True,
    model_package_group_name=model_package_group_name,
    cache_config=cache_config
)

Expected behavior
To select nested 'species' field within SageMakerOutput and set inference_attribute .

Screenshots or logs

2023-09-19 17:26:32,959 ERROR Main: Column 'SageMakerOutput[0]['species']' does not exist. Did you mean one of the following? [SageMakerOutput, species, culmen_depth_mm, body_mass_g, culmen_length_mm, flipper_length_mm, island];

If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.173.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
  • Framework version: 1.10
  • Python version: 3.8
  • CPU or GPU: CPU
  • Custom Docker image (Y/N):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions