Update information about saving models in the MXNet README #616

laurenyu · 2019-01-30T23:55:36Z

...and some other tweaks. The weirdness around how I formatted the ToC title is because GH doesn't seem to render the ToC directive in a nice way (i.e. the title is just normal paragraph text).

Update: things got a little messy as I was trying to merge in master. So the Sphinx file should be basically the same as the README. The reason it looks like the entire file was changed is because the Sphinx file had Windows-style line endings (CRLF).

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have updated the changelog with a description of my changes (if appropriate)
I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov-io · 2019-01-31T20:26:46Z

Codecov Report

Merging #616 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #616   +/-   ##
=======================================
  Coverage   92.73%   92.73%           
=======================================
  Files          71       71           
  Lines        5437     5437           
=======================================
  Hits         5042     5042           
  Misses        395      395

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d37581a...20b1658. Read the comment docs.

doc/using_mxnet.rst

icywang86rui · 2019-02-01T04:19:28Z

doc/using_mxnet.rst

+
+The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, including the following:
+
+* ``SM_MODEL_DIR``: A string that represents the path where the training job should write the model artifacts to.


May be we should explain this is just '/opt/ml/model' and point to the BYO document?

I'm going to punt on this and let Eric decide what should point where with these kinds of things

icywang86rui · 2019-02-01T04:21:22Z

doc/using_mxnet.rst

+* ``SM_MODEL_DIR``: A string that represents the path where the training job should write the model artifacts to.
+  After training, artifacts in this directory are uploaded to S3 for model hosting.
+* ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host.
+* ``SM_OUTPUT_DATA_DIR``: A string that represents the path to the directory to write output artifacts to.


I think this is not true. I just tested this actually. Things stored in '/opt/ml/output' doesn't get uploaded. Only reference I can find about this directory is if you put anything in '/opt/ml/output/failure' will show up in the DescribeTrainingJob call response.

but our integ tests checking for a success file in the output dir do pass?

https://github.com/aws/sagemaker-containers/blob/8b02b03a4edd6bfc838d955993c74a3100b121ab/src/sagemaker_containers/_files.py#L28-L34

https://github.com/aws/sagemaker-mxnet-container/blob/master/test/integration/__init__.py#L19-L22

icywang86rui · 2019-02-01T04:30:21Z

doc/using_mxnet.rst

+
+When running your training script on SageMaker, it will have access to some pre-installed third-party libraries including ``mxnet``, ``numpy``, ``onnx``, and ``keras-mxnet``. For more information on the runtime environment, including specific package versions, see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__.
+
+If there are other packages you want to use with your script, you can include a `requirements.txt <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`__ file in the same directory as your training script to install other dependencies at runtime.


Does the requirement.txt thing still work?

yes. it's disabled only for TF script mode.

icywang86rui · 2019-02-01T04:31:19Z

doc/using_mxnet.rst

+    mxnet_estimator = MXNet('train.py',
+                            train_instance_type='ml.p2.xlarge',
+                            train_instance_count=1,
+                            framework_version='1.2.1')


I would add the hyperparamters in the example training script argparse part here.

icywang86rui · 2019-02-01T04:32:09Z

doc/using_mxnet.rst

+   SageMaker. For convenience, accepts other types besides str, but
+   str() will be called on keys and values to convert them before
+   training.
+-  ``py_version`` Python version you want to use for executing your


Add the options are 'py2' and 'py3'?

yeah, added.

icywang86rui · 2019-02-01T04:32:43Z

doc/using_mxnet.rst

+-  ``train_volume_size`` Size in GB of the EBS volume to use for storing
+   input data during training. Must be large enough to store training
+   data if input_mode='File' is used (which is the default).
+-  ``train_max_run`` Timeout in seconds for training, after which Amazon


This is such a bad name for what it is -_-

icywang86rui · 2019-02-01T04:34:16Z

doc/using_mxnet.rst

+   s3 location to a directory in the Docker container. 'Pipe' - Amazon
+   SageMaker streams data directly from s3 to the container via a Unix
+   named pipe.
+-  ``output_path`` s3 location where you want the training result (model


We can have a local file system location for this in local mode. Right?

I think so. this is inaccurate across all of our docs, then. tweaked the wording here

src/sagemaker/mxnet/README.rst

laurenyu added 2 commits January 30, 2019 15:51

Move legacy mode information to correct section in MXNet README

6345c36

changelog

e1bc62b

laurenyu requested a review from icywang86rui January 30, 2019 23:55

laurenyu and others added 4 commits January 31, 2019 11:28

Merge branch 'master' into mxnet-readme

d894dd0

Merge branch 'master' into HEAD

981cdfd

Update Sphinx doc

658545c

fix merge

5128ce1

laurenyu force-pushed the mxnet-readme branch from 769fd30 to 5128ce1 Compare January 31, 2019 20:20

laurenyu requested review from ChoiByungWook and eslesar-aws January 31, 2019 20:23

icywang86rui reviewed Feb 1, 2019

View reviewed changes

Address PR comments and fix 'S3', quotes

83324b5

laurenyu force-pushed the mxnet-readme branch from 803e53c to 83324b5 Compare February 1, 2019 17:46

laurenyu added 2 commits February 1, 2019 12:46

Merge branch 'master' into mxnet-readme

b3e8976

Remove dubious variable

1aa6865

ChoiByungWook reviewed Feb 1, 2019

View reviewed changes

src/sagemaker/mxnet/README.rst Outdated Show resolved Hide resolved

revert unintentional wording change

20b1658

icywang86rui approved these changes Feb 1, 2019

View reviewed changes

laurenyu merged commit e1b459f into aws:master Feb 1, 2019

laurenyu deleted the mxnet-readme branch February 1, 2019 22:57


		The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, including the following:

		* ``SM_MODEL_DIR``: A string that represents the path where the training job should write the model artifacts to.


		When running your training script on SageMaker, it will have access to some pre-installed third-party libraries including ``mxnet``, ``numpy``, ``onnx``, and ``keras-mxnet``. For more information on the runtime environment, including specific package versions, see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__.

		If there are other packages you want to use with your script, you can include a `requirements.txt <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`__ file in the same directory as your training script to install other dependencies at runtime.

Update information about saving models in the MXNet README #616

Update information about saving models in the MXNet README #616

Uh oh!

Conversation

laurenyu commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Checklist

Uh oh!

codecov-io commented Jan 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

laurenyu commented Jan 30, 2019 •

edited

Loading

codecov-io commented Jan 31, 2019 •

edited

Loading