-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Update information about saving models in the MXNet README #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
769fd30 to
5128ce1
Compare
Codecov Report
@@ Coverage Diff @@
## master #616 +/- ##
=======================================
Coverage 92.73% 92.73%
=======================================
Files 71 71
Lines 5437 5437
=======================================
Hits 5042 5042
Misses 395 395Continue to review full report at Codecov.
|
doc/using_mxnet.rst
Outdated
|
|
||
| The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, including the following: | ||
|
|
||
| * ``SM_MODEL_DIR``: A string that represents the path where the training job should write the model artifacts to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we should explain this is just '/opt/ml/model' and point to the BYO document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to punt on this and let Eric decide what should point where with these kinds of things
doc/using_mxnet.rst
Outdated
| * ``SM_MODEL_DIR``: A string that represents the path where the training job should write the model artifacts to. | ||
| After training, artifacts in this directory are uploaded to S3 for model hosting. | ||
| * ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host. | ||
| * ``SM_OUTPUT_DATA_DIR``: A string that represents the path to the directory to write output artifacts to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not true. I just tested this actually. Things stored in '/opt/ml/output' doesn't get uploaded. Only reference I can find about this directory is if you put anything in '/opt/ml/output/failure' will show up in the DescribeTrainingJob call response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but our integ tests checking for a success file in the output dir do pass?
|
|
||
| When running your training script on SageMaker, it will have access to some pre-installed third-party libraries including ``mxnet``, ``numpy``, ``onnx``, and ``keras-mxnet``. For more information on the runtime environment, including specific package versions, see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__. | ||
|
|
||
| If there are other packages you want to use with your script, you can include a `requirements.txt <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`__ file in the same directory as your training script to install other dependencies at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the requirement.txt thing still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. it's disabled only for TF script mode.
doc/using_mxnet.rst
Outdated
| mxnet_estimator = MXNet('train.py', | ||
| train_instance_type='ml.p2.xlarge', | ||
| train_instance_count=1, | ||
| framework_version='1.2.1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the hyperparamters in the example training script argparse part here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| SageMaker. For convenience, accepts other types besides str, but | ||
| str() will be called on keys and values to convert them before | ||
| training. | ||
| - ``py_version`` Python version you want to use for executing your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the options are 'py2' and 'py3'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, added.
| - ``train_volume_size`` Size in GB of the EBS volume to use for storing | ||
| input data during training. Must be large enough to store training | ||
| data if input_mode='File' is used (which is the default). | ||
| - ``train_max_run`` Timeout in seconds for training, after which Amazon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a bad name for what it is -_-
doc/using_mxnet.rst
Outdated
| s3 location to a directory in the Docker container. 'Pipe' - Amazon | ||
| SageMaker streams data directly from s3 to the container via a Unix | ||
| named pipe. | ||
| - ``output_path`` s3 location where you want the training result (model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a local file system location for this in local mode. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. this is inaccurate across all of our docs, then. tweaked the wording here
803e53c to
83324b5
Compare
...and some other tweaks. The weirdness around how I formatted the ToC title is because GH doesn't seem to render the ToC directive in a nice way (i.e. the title is just normal paragraph text).
Update: things got a little messy as I was trying to merge in master. So the Sphinx file should be basically the same as the README. The reason it looks like the entire file was changed is because the Sphinx file had Windows-style line endings (CRLF).
Merge Checklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.