SageMaker Image Classification - Validation accuracy inconsistent

**Describe the bug**
A SageMaker image classification model is trained using the VMMRdb dataset. While the training job reports 88.5% of validation accuracy, I can't reproduce the same accuracy after deploying the model endpoint. I can, at most, get 70% validation accuracy.

**To reproduce**

Prepare training and validation data
```
# Get training algorithm
from sagemaker.image_uris import retrieve

training_image = retrieve(region=region, framework='image-classification')
print (training_image)

# Create a train data channel with S3_data_type as 'AugmentedManifestFile' and attribute names.
train_data = sagemaker.inputs.TrainingInput(s3_train_location,
                                        distribution='FullyReplicated',
                                        content_type='application/x-recordio',
                                        s3_data_type='AugmentedManifestFile',
                                        attribute_names=['source-ref', 'class'],
                                        input_mode='Pipe',
                                        record_wrapping='RecordIO') 

# Create a train data channel with S3_data_type as 'AugmentedManifestFile' and attribute names.
val_data = sagemaker.inputs.TrainingInput(s3_val_location,
                                             distribution='FullyReplicated',
                                             content_type='application/x-recordio',
                                             s3_data_type='AugmentedManifestFile',
                                             attribute_names=['source-ref', 'class'],
                                             input_mode='Pipe',
                                             record_wrapping='RecordIO') 

data_channels = {'train': train_data, 'validation': val_data}
```

Prepare training job

```
model = sagemaker.estimator.Estimator(training_image,
                                      role=role, 
                                      instance_count=1, 
                                      instance_type='ml.p3.2xlarge',
                                      max_run = 360000,
                                      input_mode = 'File',
                                      sagemaker_session=session,
                                      output_path=s3_output_location,
                                      base_job_name='q8-car-recognition')
```

Set hyperparameters and fit the model

```
model.set_hyperparameters(num_classes=num_classes,
                          num_training_samples=num_training_samples,
                          epochs=100,
                          mini_batch_size=16,
                          early_stopping=True,
                          early_stopping_min_epochs=10,
                          early_stopping_patience=10,
                          num_layers=50,
                          top_k=1,
                          use_pretrained_model=1,
                          optimizer='sgd',
                          learning_rate=0.1,
                          lr_scheduler_factor=0.1,
                          lr_scheduler_step="50,75,90,95", 
                          augmentation_type='crop_color_transform')

model.fit(inputs=data_channels, logs=True, wait=False)
```

Inference execution after setting up the model endpoint config

```
model = sagemaker.model.Model(image_uri=inference_image,
                              model_data=model_location,
                              role=role,
                              name=model_name)
model.deploy(initial_instance_count=1, 
             instance_type='ml.m4.xlarge',
             endpoint_name=endpoint_name,
             serializer=IdentitySerializer("image/jpeg"),
             wait=False
            )
print(model.endpoint_name)
predictor = Predictor(endpoint_name, 
                      sagemaker_session=session, 
                      serializer=IdentitySerializer("image/jpeg")
                     )

for line in val_gt_lines:
    relative_path = Path(*Path(line['source-ref']).parts[-dir_back:])
    test_image = data_dir/relative_path

    # payload = bytearray(test_image.open('rb').read())
    img = cv2.imread(str(test_image))
    short_size=min(img.shape[:2]+(224,))
    # 'A' here is the Albumentations library for fast pre-processing
    preprocessed = A.Compose([
                              # A.augmentations.geometric.resize.SmallestMaxSize(max_size=short_size),
                              A.augmentations.crops.transforms.CenterCrop(short_size, short_size)
                              ],
                            )(image=img)['image']
    
    payload = cv2.imencode('.jpg', preprocessed)[1].tobytes()
    result = json.loads(predictor.predict(payload))

    make_model = id2label[np.argmax(result)]
    confidence_pred = np.max(result)
    label = line['class-metadata']['class-name']
    confidence_label = result[label2id[label]]

    labels.append(label)
    preds.append(make_model)

acc = np.mean([l==p for l,p in zip(labels, preds)]) * 100
```

**Expected behavior**
After running all validation images through the endpoint, the variable acc should report ~88.5, which is the reported validation accuracy in the training job. However, it barely reaches 70%.

I've checked the related issue https://github.com/aws/sagemaker-python-sdk/issues/698, but it is still unclear what the right protocol is to be followed to replicate these results. In my case, I am using `augmentation_type` but not `resize`, which means  the job is randomly cropping the training images at 224x224 to fit the model. How is it running validation? I've tried taking the central crop (224x224) unsuccessfully. Resizing the image to be 224px in its shorter size, then taking the central crop does not work either. Simply sending the entire image and let the endpoint resize it to 224x224 (losing aspect ratio) works even worse.

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 2.99.0
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**: SageMaker built in Image Classification
- **Framework version**: 685385470294.dkr.ecr.eu-west-1.amazonaws.com/image-classification:1
- **CPU or GPU**: GPU for training, CPU for inference
- **Custom Docker image (Y/N)**: N




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SageMaker Image Classification - Validation accuracy inconsistent #3229

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SageMaker Image Classification - Validation accuracy inconsistent #3229

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions