Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run example training on yarn failed. #179

Closed
luoli523 opened this issue Sep 30, 2016 · 1 comment
Closed

Run example training on yarn failed. #179

luoli523 opened this issue Sep 30, 2016 · 1 comment

Comments

@luoli523
Copy link

luoli523 commented Sep 30, 2016

Guys:
I am trying to launch example traing_mnist.py on Yarn cluster which with 50 nodes, the launch script like below:

${mxnethome}/tools/launch.py -n 2 \
      --launcher yarn \
     python ${mxnethome}/example/image-classification/train_mnist.py --network lenet --kv-store dist_async

It keeps failed and the yarn container told me below logs:

File "./train_mnist.py", line 1, in <module>
    import find_mxnet
ImportError: No module named find_mxnet

and there is no method to add -file option to the script of tools/launch.py?

Any one run the distributed training example on Yarn successfully?

@avspavan
Copy link

I have the same issue, can anyone help here?

@tqchen tqchen closed this as completed Nov 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants