-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
Hello, I think I encountered a bug in sagemaker.local. I'm trying to test a batch transform with images as input, but I get the following error even before I reach the input_fn of my custom inference script
│ 345 │ │ for element in self.splitter.split(file):
│ ❱ 346 │ │ │ if _payload_size_within_limit(buffer + element, size):
│ 347 │ │ │ │ buffer += element
│ 348 │ │ │ else:
│ 349 │ │ │ │ tmp = buffer
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can only concatenate str (not "bytes") to str
I am not using a splitter (splitter type is None), as it's not necessary on images.
I believe the problem is in line 343 of MultRecordStrategy class
sagemaker-python-sdk/src/sagemaker/local/data.py
Lines 326 to 352 in ae3cc1c
class MultiRecordStrategy(BatchStrategy): | |
"""Feed multiple records at a time for batch inference. | |
Will group up as many records as possible within the payload specified. | |
""" | |
def pad(self, file, size=6): | |
"""Group together as many records as possible to fit in the specified size. | |
Args: | |
file (str): file path to read the records from. | |
size (int): maximum size in MB that each group of records will be | |
fitted to. passing 0 means unlimited size. | |
Returns: | |
generator of records | |
""" | |
buffer = "" | |
for element in self.splitter.split(file): | |
if _payload_size_within_limit(buffer + element, size): | |
buffer += element | |
else: | |
tmp = buffer | |
buffer = element | |
yield tmp | |
if _validate_payload_size(buffer, size): | |
yield buffer |
We can see that the buffer
variable is assumed to be a string, which means it's assumed that the file
variable would not refer to a binary object, which should be possible.
To reproduce
Just run local batch transform with a single image as input. The model doesn't really matter I think, it will fail before any prediction or interaction between data and the model is made.
Expected behavior
I would expect the buffer to be sensitive to weather the file is a string like json or csv, or a binary type like png.
Screenshots or logs
See above.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.237.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch, custom inference and model
- Framework version: 2.5.1
- Python version: 3.11
- CPU or GPU: Both
- Custom Docker image (Y/N): Y, extending the pytorch-inference:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker image
Additional context
Add any other context about the problem here.