-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add [GP]GPU support #1
Comments
See also some "nvidia" option at https://github.com/apache/mesos/blob/master/docs/configuration.md |
We need to think how we want to handle the auto op device placement in TF. This could be overrided but we need to find some good default for data and model parallel cases because users could frequently have few GPU resource in the cluster and many CPU |
yeah, auto device placement is a bit annoying in TF. TF offers a context named no idea how TF team is going to solve this problem. |
Can you formulate a comment on this on the original TF ticket? So we can collect a comment from mrry of the TF team. |
OK, this is definitely a huge pain cosidering my poor English :( |
Don't worry.. Seems good to me. It is just a technical discussion 😄 |
For GPU docker images we need to run preferably nvidia-docker command instead of docker on mesos slaves. How can be handled this? |
I am still thinking of how to implement GPU support. and we do not have a GPU cluster for test right now. maybe I can submit a PR based on my guessing , and you can do a PR-to-PR or submit a new working RP for this? |
Yes could be useful. We have a slave node with GPU resources so we can test and continue the discussion. /cc @lenlen @mtamburrano |
@bhack @windreamer , guys, I am using a quick-win solution bypass the nvidia-docker command. What the nvidia-docker actually doing is to create a docker volume and map it to the cuda container then. So I tell mesos/docker map the wanted volume directly. BTW, I have 5 GPU servers for testing. I'd like to share sth. with you guys. |
@vitan Thank you for the feedback. Can you try the last version with nvidia-docker handling assigned GPU resources with multiple tasks? I hope that @windreamer can contribute this upstream to TF soon to attract more users but we need the help of people with multiple GPU like you to test some uses cases. |
@bhack I am needing more input from you, given I have no more info about you guys. So "the latest version" is the latest TF, isn't it? and I am appreciating you if anyone give me some multiple tasks sample. |
@lenlen Do you have any protocol of an experiment to run on 5 GPU? |
cf: tensorflow/tensorflow#1996 (comment)
@bhack :
Also,
tfmesos
need to allocate and isolate [GP]GPU resources.The text was updated successfully, but these errors were encountered: