Add [GP]GPU support #1

windreamer · 2016-04-21T01:08:14Z

cf: tensorflow/tensorflow#1996 (comment)

I think that the docker image could be based on
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/README.md.
Expecially if we want to have GPU support
http://www.nvidia.com/object/apache-mesos.html and https://mesosphere.com/blog/2015/11/10/mesos-nvidia-gpus/
See also NVIDIA/nvidia-docker#60.

Also, tfmesos need to allocate and isolate [GP]GPU resources.

The text was updated successfully, but these errors were encountered:

bhack · 2016-04-21T08:08:14Z

See also some "nvidia" option at https://github.com/apache/mesos/blob/master/docs/configuration.md

bhack · 2016-04-27T07:45:52Z

We need to think how we want to handle the auto op device placement in TF. This could be overrided but we need to find some good default for data and model parallel cases because users could frequently have few GPU resource in the cluster and many CPU

windreamer · 2016-04-27T07:50:31Z

yeah, auto device placement is a bit annoying in TF. TF offers a context named tf.train.replica_device_setter which place variables to ps devices in round-robin manner. but considering GPU resources this is still not ideal.

no idea how TF team is going to solve this problem.

bhack · 2016-04-27T07:53:57Z

Can you formulate a comment on this on the original TF ticket? So we can collect a comment from mrry of the TF team.

windreamer · 2016-04-27T07:56:05Z

OK,

this is definitely a huge pain cosidering my poor English :(
anyway the issue is on the way :)

bhack · 2016-04-27T07:57:43Z

Don't worry.. Seems good to me. It is just a technical discussion 😄

windreamer · 2016-04-27T08:18:10Z

tensorflow/tensorflow#2126

bhack · 2016-04-27T09:04:10Z

For GPU docker images we need to run preferably nvidia-docker command instead of docker on mesos slaves. How can be handled this?

windreamer · 2016-04-27T09:12:25Z

I am still thinking of how to implement GPU support. and we do not have a GPU cluster for test right now. maybe I can submit a PR based on my guessing , and you can do a PR-to-PR or submit a new working RP for this?

bhack · 2016-04-27T09:15:07Z

Yes could be useful. We have a slave node with GPU resources so we can test and continue the discussion. /cc @lenlen @mtamburrano

vitan · 2016-06-13T15:16:01Z

@bhack @windreamer , guys, I am using a quick-win solution bypass the nvidia-docker command. What the nvidia-docker actually doing is to create a docker volume and map it to the cuda container then. So I tell mesos/docker map the wanted volume directly.

BTW, I have 5 GPU servers for testing. I'd like to share sth. with you guys.

bhack · 2016-06-13T15:50:13Z

@vitan Thank you for the feedback. Can you try the last version with nvidia-docker handling assigned GPU resources with multiple tasks? I hope that @windreamer can contribute this upstream to TF soon to attract more users but we need the help of people with multiple GPU like you to test some uses cases.

vitan · 2016-06-14T03:38:24Z

@bhack I am needing more input from you, given I have no more info about you guys. So "the latest version" is the latest TF, isn't it? and I am appreciating you if anyone give me some multiple tasks sample.

bhack · 2016-06-14T07:41:13Z

@lenlen Do you have any protocol of an experiment to run on 5 GPU?

bhack · 2016-06-14T07:42:46Z

@vitan For "last version" I mean the PR at #3

windreamer mentioned this issue Apr 27, 2016

Gpu support #3

Merged

windreamer closed this as completed Aug 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add [GP]GPU support #1

Add [GP]GPU support #1

windreamer commented Apr 21, 2016

bhack commented Apr 21, 2016

bhack commented Apr 27, 2016

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016 •

edited

Loading

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016 •

edited

Loading

vitan commented Jun 13, 2016

bhack commented Jun 13, 2016

vitan commented Jun 14, 2016

bhack commented Jun 14, 2016

bhack commented Jun 14, 2016

Add [GP]GPU support #1

Add [GP]GPU support #1

Comments

windreamer commented Apr 21, 2016

bhack commented Apr 21, 2016

bhack commented Apr 27, 2016

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016 • edited Loading

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016

windreamer commented Apr 27, 2016

bhack commented Apr 27, 2016 • edited Loading

vitan commented Jun 13, 2016

bhack commented Jun 13, 2016

vitan commented Jun 14, 2016

bhack commented Jun 14, 2016

bhack commented Jun 14, 2016

bhack commented Apr 27, 2016 •

edited

Loading

bhack commented Apr 27, 2016 •

edited

Loading