Skip to content

SageMaker Image Classification - Validation accuracy inconsistent #3229

@rauldiaz

Description

@rauldiaz

Describe the bug
A SageMaker image classification model is trained using the VMMRdb dataset. While the training job reports 88.5% of validation accuracy, I can't reproduce the same accuracy after deploying the model endpoint. I can, at most, get 70% validation accuracy.

To reproduce

Prepare training and validation data

# Get training algorithm
from sagemaker.image_uris import retrieve

training_image = retrieve(region=region, framework='image-classification')
print (training_image)

# Create a train data channel with S3_data_type as 'AugmentedManifestFile' and attribute names.
train_data = sagemaker.inputs.TrainingInput(s3_train_location,
                                        distribution='FullyReplicated',
                                        content_type='application/x-recordio',
                                        s3_data_type='AugmentedManifestFile',
                                        attribute_names=['source-ref', 'class'],
                                        input_mode='Pipe',
                                        record_wrapping='RecordIO') 

# Create a train data channel with S3_data_type as 'AugmentedManifestFile' and attribute names.
val_data = sagemaker.inputs.TrainingInput(s3_val_location,
                                             distribution='FullyReplicated',
                                             content_type='application/x-recordio',
                                             s3_data_type='AugmentedManifestFile',
                                             attribute_names=['source-ref', 'class'],
                                             input_mode='Pipe',
                                             record_wrapping='RecordIO') 

data_channels = {'train': train_data, 'validation': val_data}

Prepare training job

model = sagemaker.estimator.Estimator(training_image,
                                      role=role, 
                                      instance_count=1, 
                                      instance_type='ml.p3.2xlarge',
                                      max_run = 360000,
                                      input_mode = 'File',
                                      sagemaker_session=session,
                                      output_path=s3_output_location,
                                      base_job_name='q8-car-recognition')

Set hyperparameters and fit the model

model.set_hyperparameters(num_classes=num_classes,
                          num_training_samples=num_training_samples,
                          epochs=100,
                          mini_batch_size=16,
                          early_stopping=True,
                          early_stopping_min_epochs=10,
                          early_stopping_patience=10,
                          num_layers=50,
                          top_k=1,
                          use_pretrained_model=1,
                          optimizer='sgd',
                          learning_rate=0.1,
                          lr_scheduler_factor=0.1,
                          lr_scheduler_step="50,75,90,95", 
                          augmentation_type='crop_color_transform')

model.fit(inputs=data_channels, logs=True, wait=False)

Inference execution after setting up the model endpoint config

model = sagemaker.model.Model(image_uri=inference_image,
                              model_data=model_location,
                              role=role,
                              name=model_name)
model.deploy(initial_instance_count=1, 
             instance_type='ml.m4.xlarge',
             endpoint_name=endpoint_name,
             serializer=IdentitySerializer("image/jpeg"),
             wait=False
            )
print(model.endpoint_name)
predictor = Predictor(endpoint_name, 
                      sagemaker_session=session, 
                      serializer=IdentitySerializer("image/jpeg")
                     )

for line in val_gt_lines:
    relative_path = Path(*Path(line['source-ref']).parts[-dir_back:])
    test_image = data_dir/relative_path

    # payload = bytearray(test_image.open('rb').read())
    img = cv2.imread(str(test_image))
    short_size=min(img.shape[:2]+(224,))
    # 'A' here is the Albumentations library for fast pre-processing
    preprocessed = A.Compose([
                              # A.augmentations.geometric.resize.SmallestMaxSize(max_size=short_size),
                              A.augmentations.crops.transforms.CenterCrop(short_size, short_size)
                              ],
                            )(image=img)['image']
    
    payload = cv2.imencode('.jpg', preprocessed)[1].tobytes()
    result = json.loads(predictor.predict(payload))

    make_model = id2label[np.argmax(result)]
    confidence_pred = np.max(result)
    label = line['class-metadata']['class-name']
    confidence_label = result[label2id[label]]

    labels.append(label)
    preds.append(make_model)

acc = np.mean([l==p for l,p in zip(labels, preds)]) * 100

Expected behavior
After running all validation images through the endpoint, the variable acc should report ~88.5, which is the reported validation accuracy in the training job. However, it barely reaches 70%.

I've checked the related issue #698, but it is still unclear what the right protocol is to be followed to replicate these results. In my case, I am using augmentation_type but not resize, which means the job is randomly cropping the training images at 224x224 to fit the model. How is it running validation? I've tried taking the central crop (224x224) unsuccessfully. Resizing the image to be 224px in its shorter size, then taking the central crop does not work either. Simply sending the entire image and let the endpoint resize it to 224x224 (losing aspect ratio) works even worse.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.99.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): SageMaker built in Image Classification
  • Framework version: 685385470294.dkr.ecr.eu-west-1.amazonaws.com/image-classification:1
  • CPU or GPU: GPU for training, CPU for inference
  • Custom Docker image (Y/N): N

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions