Skip to content

Script mode project structure inflexibility. #556

@ryanpeach

Description

@ryanpeach

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Tensorflow
  • Framework Version: 1.10
  • Python Version: 3.6
  • CPU or GPU: CPU
  • Python SDK Version: latest
  • Are you using a custom image: No

Describe the problem

I have found that a file structure like the following works best for the Tensorflow SDK.

- root
- - - my_project
- - - - - my_package
- - - - - - - code.py
- - - - - main.py
- - - - - setup.py
- - - run_training.py
- - - etc.

Works best for training, where your run_training.py code looks something like this:

from sagemaker.tensorflow import TensorFlow

train_instance_type = 'ml.p2.xlarge' 

from sagemaker import get_execution_role
role = get_execution_role()

tf_estimator = TensorFlow(entry_point='main.py', role=role,
                          train_instance_count=1, train_instance_type=train_instance_type,
                          framework_version='1.11', py_version='py3',
                          source_dir='my_project')

tf_estimator.fit('s3://location...', wait=True)

However, most of my projects out in the wild would have the following structure:

- my_project
- - - data or docs or config etc...
- - - sagemaker
- - - - - run_training.py
- - - my_package
- - - - - code.py
- - - main.py
- - - setup.py

Is there a way the sdk could allow for dependencies and source_dir with some arguments like "include" or "exclude"? I'd like to make the '../' directory the source directory and then include only python files (or certain folders).

One solution to this also could be to allow dependencies to be a map like so:

dependencies = {'my_package': '../my_package', 'main.py': '../main.py'}

Mapping relative to root container directories to local directories. Then container root becomes your source dir. This would be the most flexible of all IMO.

Lastly, as an aside, requirements.txt does not seem to work with the new tensorflow script mode. Is there a reason for this? Will it be fixed soon?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions