Skip to content

'local' instance types don't properly download channels #145

@zmjjmz

Description

@zmjjmz

Hey there,

Using the latest version of the SDK and trying to use the 'local' instance type. After setting up all of the necessary docker stuff (non-obvious to do, I'd recommend maybe including that in the sagemaker repo itself rather than as a setup script in the examples repo? but I digress), I found that the container would know what folders to look for the data in, but find that there was no data in it.

Long story short, I confirmed that the issue is here, where it uses bucket_name which comes from this line, which is just the default bucket. Instead it should use the bucket specified by the S3Input URI, e.g.

for channel in input_data_config:                                   
    uri = channel['DataSource']['S3DataSource']['S3Uri']            
    print("Downloading URI {0}".format(uri))                        
    parsed_uri = urlparse(uri)                                      
    channel_bucket_name = parsed_uri.netloc                         
    key = parsed_uri.path.lstrip('/')                               
                                                                    
    channel_name = channel['ChannelName']                           
    channel_dir = os.path.join(data_dir, channel_name)              
    os.mkdir(channel_dir)                                           
                                                                    
    import pdb; pdb.set_trace()                                     
    if uri.lower().startswith("s3://"):                             
        self._download_folder(channel_bucket_name, key, channel_dir)
    else:                                                           
        volumes.append(_Volume(uri, channel=channel_name))          

I modified sagemaker/local/image.py to do the above and it works pretty well. Should I submit that as a PR or would you rather just fix it yourselves (and add all the fun tests you might want to add)?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions