-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
A SageMaker image classification model is trained using the VMMRdb dataset. While the training job reports 88.5% of validation accuracy, I can't reproduce the same accuracy after deploying the model endpoint. I can, at most, get 70% validation accuracy.
To reproduce
Prepare training and validation data
# Get training algorithm
from sagemaker.image_uris import retrieve
training_image = retrieve(region=region, framework='image-classification')
print (training_image)
# Create a train data channel with S3_data_type as 'AugmentedManifestFile' and attribute names.
train_data = sagemaker.inputs.TrainingInput(s3_train_location,
distribution='FullyReplicated',
content_type='application/x-recordio',
s3_data_type='AugmentedManifestFile',
attribute_names=['source-ref', 'class'],
input_mode='Pipe',
record_wrapping='RecordIO')
# Create a train data channel with S3_data_type as 'AugmentedManifestFile' and attribute names.
val_data = sagemaker.inputs.TrainingInput(s3_val_location,
distribution='FullyReplicated',
content_type='application/x-recordio',
s3_data_type='AugmentedManifestFile',
attribute_names=['source-ref', 'class'],
input_mode='Pipe',
record_wrapping='RecordIO')
data_channels = {'train': train_data, 'validation': val_data}
Prepare training job
model = sagemaker.estimator.Estimator(training_image,
role=role,
instance_count=1,
instance_type='ml.p3.2xlarge',
max_run = 360000,
input_mode = 'File',
sagemaker_session=session,
output_path=s3_output_location,
base_job_name='q8-car-recognition')
Set hyperparameters and fit the model
model.set_hyperparameters(num_classes=num_classes,
num_training_samples=num_training_samples,
epochs=100,
mini_batch_size=16,
early_stopping=True,
early_stopping_min_epochs=10,
early_stopping_patience=10,
num_layers=50,
top_k=1,
use_pretrained_model=1,
optimizer='sgd',
learning_rate=0.1,
lr_scheduler_factor=0.1,
lr_scheduler_step="50,75,90,95",
augmentation_type='crop_color_transform')
model.fit(inputs=data_channels, logs=True, wait=False)
Inference execution after setting up the model endpoint config
model = sagemaker.model.Model(image_uri=inference_image,
model_data=model_location,
role=role,
name=model_name)
model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge',
endpoint_name=endpoint_name,
serializer=IdentitySerializer("image/jpeg"),
wait=False
)
print(model.endpoint_name)
predictor = Predictor(endpoint_name,
sagemaker_session=session,
serializer=IdentitySerializer("image/jpeg")
)
for line in val_gt_lines:
relative_path = Path(*Path(line['source-ref']).parts[-dir_back:])
test_image = data_dir/relative_path
# payload = bytearray(test_image.open('rb').read())
img = cv2.imread(str(test_image))
short_size=min(img.shape[:2]+(224,))
# 'A' here is the Albumentations library for fast pre-processing
preprocessed = A.Compose([
# A.augmentations.geometric.resize.SmallestMaxSize(max_size=short_size),
A.augmentations.crops.transforms.CenterCrop(short_size, short_size)
],
)(image=img)['image']
payload = cv2.imencode('.jpg', preprocessed)[1].tobytes()
result = json.loads(predictor.predict(payload))
make_model = id2label[np.argmax(result)]
confidence_pred = np.max(result)
label = line['class-metadata']['class-name']
confidence_label = result[label2id[label]]
labels.append(label)
preds.append(make_model)
acc = np.mean([l==p for l,p in zip(labels, preds)]) * 100
Expected behavior
After running all validation images through the endpoint, the variable acc should report ~88.5, which is the reported validation accuracy in the training job. However, it barely reaches 70%.
I've checked the related issue #698, but it is still unclear what the right protocol is to be followed to replicate these results. In my case, I am using augmentation_type
but not resize
, which means the job is randomly cropping the training images at 224x224 to fit the model. How is it running validation? I've tried taking the central crop (224x224) unsuccessfully. Resizing the image to be 224px in its shorter size, then taking the central crop does not work either. Simply sending the entire image and let the endpoint resize it to 224x224 (losing aspect ratio) works even worse.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.99.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): SageMaker built in Image Classification
- Framework version: 685385470294.dkr.ecr.eu-west-1.amazonaws.com/image-classification:1
- CPU or GPU: GPU for training, CPU for inference
- Custom Docker image (Y/N): N