Skip to content

Conversation

@laurenyu
Copy link
Contributor

@laurenyu laurenyu commented Jan 30, 2019

...and some other tweaks. The weirdness around how I formatted the ToC title is because GH doesn't seem to render the ToC directive in a nice way (i.e. the title is just normal paragraph text).

Update: things got a little messy as I was trying to merge in master. So the Sphinx file should be basically the same as the README. The reason it looks like the entire file was changed is because the Sphinx file had Windows-style line endings (CRLF).

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@laurenyu laurenyu requested a review from icywang86rui January 30, 2019 23:55
@codecov-io
Copy link

codecov-io commented Jan 31, 2019

Codecov Report

Merging #616 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #616   +/-   ##
=======================================
  Coverage   92.73%   92.73%           
=======================================
  Files          71       71           
  Lines        5437     5437           
=======================================
  Hits         5042     5042           
  Misses        395      395

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d37581a...20b1658. Read the comment docs.


The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, including the following:

* ``SM_MODEL_DIR``: A string that represents the path where the training job should write the model artifacts to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should explain this is just '/opt/ml/model' and point to the BYO document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to punt on this and let Eric decide what should point where with these kinds of things

* ``SM_MODEL_DIR``: A string that represents the path where the training job should write the model artifacts to.
After training, artifacts in this directory are uploaded to S3 for model hosting.
* ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host.
* ``SM_OUTPUT_DATA_DIR``: A string that represents the path to the directory to write output artifacts to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not true. I just tested this actually. Things stored in '/opt/ml/output' doesn't get uploaded. Only reference I can find about this directory is if you put anything in '/opt/ml/output/failure' will show up in the DescribeTrainingJob call response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


When running your training script on SageMaker, it will have access to some pre-installed third-party libraries including ``mxnet``, ``numpy``, ``onnx``, and ``keras-mxnet``. For more information on the runtime environment, including specific package versions, see `SageMaker MXNet Containers <#sagemaker-mxnet-containers>`__.

If there are other packages you want to use with your script, you can include a `requirements.txt <https://pip.pypa.io/en/stable/user_guide/#requirements-files>`__ file in the same directory as your training script to install other dependencies at runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the requirement.txt thing still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. it's disabled only for TF script mode.

mxnet_estimator = MXNet('train.py',
train_instance_type='ml.p2.xlarge',
train_instance_count=1,
framework_version='1.2.1')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add the hyperparamters in the example training script argparse part here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

SageMaker. For convenience, accepts other types besides str, but
str() will be called on keys and values to convert them before
training.
- ``py_version`` Python version you want to use for executing your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the options are 'py2' and 'py3'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, added.

- ``train_volume_size`` Size in GB of the EBS volume to use for storing
input data during training. Must be large enough to store training
data if input_mode='File' is used (which is the default).
- ``train_max_run`` Timeout in seconds for training, after which Amazon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a bad name for what it is -_-

s3 location to a directory in the Docker container. 'Pipe' - Amazon
SageMaker streams data directly from s3 to the container via a Unix
named pipe.
- ``output_path`` s3 location where you want the training result (model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a local file system location for this in local mode. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. this is inaccurate across all of our docs, then. tweaked the wording here

@laurenyu laurenyu merged commit e1b459f into aws:master Feb 1, 2019
@laurenyu laurenyu deleted the mxnet-readme branch February 1, 2019 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants