Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better implementation of device placement? #67

Closed
mingyr opened this issue Oct 15, 2017 · 2 comments
Closed

A better implementation of device placement? #67

mingyr opened this issue Oct 15, 2017 · 2 comments

Comments

@mingyr
Copy link

mingyr commented Oct 15, 2017

It is known Sonnet module fails to merit the simple device placement directive, which was addressed as in issue #61 , there @kosklain offered a good solution. However there is a tiny problem in practice. Just look at the demonstrating code below:

import tensorflow as tf
import sonnet as snt

class RCNNOutput(snt.AbstractModule):
    def __init__(self, out_size, name = "rcnn_output"):
        super(RCNNOutput, self).__init__(name = name)
        with self._enter_variable_scope():
            bf = snt.BatchFlatten(name = "bf")
            fc0 = snt.Linear(output_size = 256, name = "fc0")
            fc1 = snt.Linear(output_size = out_size, name = "fc1")
            self._seq = snt.Sequential([bf, fc0, tf.nn.relu, fc1], name = "seq")

    def _build(self, inputs):
        return self._seq(inputs)

def test():
    import numpy as np
    from tensorflow.core.framework import node_def_pb2

    num_gpus = 2

    def get_device_setter(gpu_id):
        def device_setter(op):
            _variable_ops = ["Variable", "VariableV2", "VarHandleOp"]
            node_def = op if isinstance(op, node_def_pb2.NodeDef) else op.node_def
            return  '/cpu:0' if node_def.op in _variable_ops else '/gpu:%d' % gpu_id
        return device_setter

    with tf.device(get_device_setter(0)):
        rcnn_output = RCNNOutput(4)

    with tf.device('/cpu:0'):
        t = tf.constant(np.ones([8, 32, 32, 3]), np.float32)
        ts = tf.split(t, num_gpus)

    total_outputs = []
    for i in range(num_gpus):
        with tf.device(get_device_setter(i)):
            output = rcnn_output(ts[i])
            total_outputs.append(output)

    total_outputs = tf.add_n(total_outputs)

    writer = tf.summary.FileWriter("rcnn_output_output", tf.get_default_graph())

    config = tf.ConfigProto(log_device_placement = True)

    with tf.Session(config = config) as sess:
        sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])

        v = sess.run(total_outputs)
        print(v)

    writer.close()

if __name__ == "__main__":
    test()

The first concern is the following code snippet in the above code example:

    with tf.device(get_device_setter(0)):
        rcnn_output = RCNNOutput(4)
        t = tf.constant(np.ones([8, 32, 32, 3]), np.float32)

Here 0 is just for some placeholder purpose, since the intention is to have the model parameters placed on CPU, in order to be shared across multiple GPUs. Although I could define get_device_setter in some prototype like get_device_setter(gpu_id = 0), probably someone still makes the criticism that here the sole intention is to construct everything on CPU, why a GPU ID involves?

The criticism is tended to be a little stronger when the code snippet below is added together:

    with tf.device('/cpu:0'):
        t = tf.constant(np.ones([8, 32, 32, 3]), np.float32)
        ts = tf.split(t, num_gpus)

The intention here is we have all inputs prepared on CPU. So people will say, why you mix directives together, why you can not place them under one directive? Simply answering because Sonnet doesn't support the simple device placement directive, so it is can not be done in a harmonic way, is probably not a good answer to ease every skepticism.

So I just wonder could it be done a little nicer than the above approach, namely use a more obvious directive for a neatly placement control both for inputs preparation and model construction?

Many thanks.

@malcolmreynolds
Copy link
Collaborator

@mingyr Hi, thanks for your detailed comment.

I've looked into this before and unfortunately there are no publicly exposed APIs from TF which allow us to have more control over this. I would like to be able to capture the surrounding device context when constructing a module, as your middle example shows. However unlike the various kinds of scopes, tf.device doesn't return anything you can hang onto and reenter, eg:

with tf.variable_scope('some_name') as scope:
  print(scope)  # Prints a VariableScope object we can query, reenter, etc

with tf.device('/cpu:0') as device_information:
  print(device_information)  # prints `None`

Without being able to capture the context, we cannot respect device placement directives around where we construct modules. If you read the TF source you see that this works by accessing various private members of the Graph object and others, and it would be a huge maintenance burden for us to start relying on these private APIs.

We've had some conversations with the TF team about creating an external more advanced use of device placement (and other things(), and those discussions are ongoing. I hope to have some improvement in this area in the public version soon.

It's very useful to hear from users about the pain points of the library, so once again thankyou.

@mingyr
Copy link
Author

mingyr commented Oct 20, 2017

@malcolmreynolds I am sincerely appreciating the detailed explanation and fully understanding the situation. The important thing is not we invent something perfect but we invent something based on which we can advance our daily job. I am glad that Sonnet is such a library which helps me a great deal. Thanks for the contribution made by you engineers at Deepmind which benefits the deep learning community a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants