-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hey there,
Using the latest version of the SDK and trying to use the 'local' instance type. After setting up all of the necessary docker stuff (non-obvious to do, I'd recommend maybe including that in the sagemaker repo itself rather than as a setup script in the examples repo? but I digress), I found that the container would know what folders to look for the data in, but find that there was no data in it.
Long story short, I confirmed that the issue is here, where it uses bucket_name
which comes from this line, which is just the default bucket. Instead it should use the bucket specified by the S3Input
URI, e.g.
for channel in input_data_config:
uri = channel['DataSource']['S3DataSource']['S3Uri']
print("Downloading URI {0}".format(uri))
parsed_uri = urlparse(uri)
channel_bucket_name = parsed_uri.netloc
key = parsed_uri.path.lstrip('/')
channel_name = channel['ChannelName']
channel_dir = os.path.join(data_dir, channel_name)
os.mkdir(channel_dir)
import pdb; pdb.set_trace()
if uri.lower().startswith("s3://"):
self._download_folder(channel_bucket_name, key, channel_dir)
else:
volumes.append(_Volume(uri, channel=channel_name))
I modified sagemaker/local/image.py
to do the above and it works pretty well. Should I submit that as a PR or would you rather just fix it yourselves (and add all the fun tests you might want to add)?
Thanks!