A better implementation of device placement? #67

mingyr · 2017-10-15T06:59:05Z

It is known Sonnet module fails to merit the simple device placement directive, which was addressed as in issue #61 , there @kosklain offered a good solution. However there is a tiny problem in practice. Just look at the demonstrating code below:

import tensorflow as tf
import sonnet as snt

class RCNNOutput(snt.AbstractModule):
    def __init__(self, out_size, name = "rcnn_output"):
        super(RCNNOutput, self).__init__(name = name)
        with self._enter_variable_scope():
            bf = snt.BatchFlatten(name = "bf")
            fc0 = snt.Linear(output_size = 256, name = "fc0")
            fc1 = snt.Linear(output_size = out_size, name = "fc1")
            self._seq = snt.Sequential([bf, fc0, tf.nn.relu, fc1], name = "seq")

    def _build(self, inputs):
        return self._seq(inputs)

def test():
    import numpy as np
    from tensorflow.core.framework import node_def_pb2

    num_gpus = 2

    def get_device_setter(gpu_id):
        def device_setter(op):
            _variable_ops = ["Variable", "VariableV2", "VarHandleOp"]
            node_def = op if isinstance(op, node_def_pb2.NodeDef) else op.node_def
            return  '/cpu:0' if node_def.op in _variable_ops else '/gpu:%d' % gpu_id
        return device_setter

    with tf.device(get_device_setter(0)):
        rcnn_output = RCNNOutput(4)

    with tf.device('/cpu:0'):
        t = tf.constant(np.ones([8, 32, 32, 3]), np.float32)
        ts = tf.split(t, num_gpus)

    total_outputs = []
    for i in range(num_gpus):
        with tf.device(get_device_setter(i)):
            output = rcnn_output(ts[i])
            total_outputs.append(output)

    total_outputs = tf.add_n(total_outputs)

    writer = tf.summary.FileWriter("rcnn_output_output", tf.get_default_graph())

    config = tf.ConfigProto(log_device_placement = True)

    with tf.Session(config = config) as sess:
        sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])

        v = sess.run(total_outputs)
        print(v)

    writer.close()

if __name__ == "__main__":
    test()

The first concern is the following code snippet in the above code example:

    with tf.device(get_device_setter(0)):
        rcnn_output = RCNNOutput(4)
        t = tf.constant(np.ones([8, 32, 32, 3]), np.float32)

Here 0 is just for some placeholder purpose, since the intention is to have the model parameters placed on CPU, in order to be shared across multiple GPUs. Although I could define get_device_setter in some prototype like get_device_setter(gpu_id = 0), probably someone still makes the criticism that here the sole intention is to construct everything on CPU, why a GPU ID involves?

The criticism is tended to be a little stronger when the code snippet below is added together:

    with tf.device('/cpu:0'):
        t = tf.constant(np.ones([8, 32, 32, 3]), np.float32)
        ts = tf.split(t, num_gpus)

The intention here is we have all inputs prepared on CPU. So people will say, why you mix directives together, why you can not place them under one directive? Simply answering because Sonnet doesn't support the simple device placement directive, so it is can not be done in a harmonic way, is probably not a good answer to ease every skepticism.

So I just wonder could it be done a little nicer than the above approach, namely use a more obvious directive for a neatly placement control both for inputs preparation and model construction?

Many thanks.

The text was updated successfully, but these errors were encountered:

malcolmreynolds · 2017-10-18T14:25:18Z

@mingyr Hi, thanks for your detailed comment.

I've looked into this before and unfortunately there are no publicly exposed APIs from TF which allow us to have more control over this. I would like to be able to capture the surrounding device context when constructing a module, as your middle example shows. However unlike the various kinds of scopes, tf.device doesn't return anything you can hang onto and reenter, eg:

with tf.variable_scope('some_name') as scope:
  print(scope)  # Prints a VariableScope object we can query, reenter, etc

with tf.device('/cpu:0') as device_information:
  print(device_information)  # prints `None`

Without being able to capture the context, we cannot respect device placement directives around where we construct modules. If you read the TF source you see that this works by accessing various private members of the Graph object and others, and it would be a huge maintenance burden for us to start relying on these private APIs.

We've had some conversations with the TF team about creating an external more advanced use of device placement (and other things(), and those discussions are ongoing. I hope to have some improvement in this area in the public version soon.

It's very useful to hear from users about the pain points of the library, so once again thankyou.

mingyr · 2017-10-20T03:18:34Z

@malcolmreynolds I am sincerely appreciating the detailed explanation and fully understanding the situation. The important thing is not we invent something perfect but we invent something based on which we can advance our daily job. I am glad that Sonnet is such a library which helps me a great deal. Thanks for the contribution made by you engineers at Deepmind which benefits the deep learning community a lot.

malcolmreynolds closed this as completed Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A better implementation of device placement? #67

A better implementation of device placement? #67

mingyr commented Oct 15, 2017

malcolmreynolds commented Oct 18, 2017

mingyr commented Oct 20, 2017

A better implementation of device placement? #67

A better implementation of device placement? #67

Comments

mingyr commented Oct 15, 2017

malcolmreynolds commented Oct 18, 2017

mingyr commented Oct 20, 2017