# using gpus

following along with [this](https://www.tensorflow.org/programmers_guide/using_gpu)

In [1]:
import tensorflow as tf

import utils

  from ._conv import register_converters as _register_converters


## supported devices

supported devices are `CPU` and `GPU` (note: `TPU` is not supported in the same way; see neighboring notebook)

if both exist and an operation *can* be executed on a gpu, then gpu devices will be given preference.

## logging device placement

logging of device placement is a configuration option that you can set in the `config` of a tensorflow session:

In [2]:
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

In [3]:
# Creates a session with log_device_placement set to True.
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    print(sess.run(c))

[[22. 28.]
 [49. 64.]]


as with a previous logging experience, log messages are written to `stdout` and therefore unavilable for we `jupyter notebook` plebes. I ran that code in the terminal and the ouptut was:

```
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-07-04 12:45:48.273487: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-07-04 12:45:48.273501: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-07-04 12:45:48.273509: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]
```

## manual device placement

sometimes we might want to control the device(s) on which computation happens. we can do this with a `tf.device` context manager:

```python
# Creates a graph.
with tf.device('/cpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    print(sess.run(c))
```

when I run that from the cli, I get the following additional log messages:

```
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: TITAN V, pci bus id: 0000:05:00.0, compute capability: 7.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: TITAN V, pci bus id: 0000:09:00.0, compute capability: 7.0
2018-07-04 15:09:41.394792: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: TITAN V, pci bus id: 0000:05:00.0, compute capability: 7.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: TITAN V, pci bus id: 0000:09:00.0, compute capability: 7.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-07-04 15:09:41.395242: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2018-07-04 15:09:41.395257: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:CPU:0
a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2018-07-04 15:09:41.395264: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:CPU:0

```

so we see that the computation of tensors `a` and `b` is logged as remaining on the cpu 0 even though the `matmul` operation is promoted to the `gpu`

## allowing gpu memory growth

default gpu memory usage is to obtain a lock on as much of the visible gpu memory as is possible. you can change this with the `config.gpu_options.allow_growth` paramet as follows:

```python
config = tf.ConfigProto()

# true to allow growth from small number,
# false to take as much as is available immediately
config.gpu_options.allow_growth = True

with tf.Session(config=config, ...) as sess:
    ...
```

alternatively, you could change the overall fraction of the GPU memory a process is allowed to consume with the `config.gpu_options.per_process_gpu_memory_fraction` parameter:

```python
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
```

## using a single gpu on a multi-gpu system

you can specify a single gpu by index. this is covered about a million times. again, `tf.device` context manager. example:

```python
# Creates a graph.
with tf.device('/device:GPU:2'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    print(sess.run(c))
```

if `gpu:N` doesn't exist, you're get an error (oddly, not when the tensor is assigned, but instead when the operation is calculated. not sure why that would be):

In [6]:
# Creates a graph.
with tf.device('/device:GPU:2'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    try:
        print(sess.run(c))
    except tf.errors.InvalidArgumentError:
        print('see, there it is!')

see, there it is!


interestingly, they have a second configuration option for sessions `allow_soft_placement` that will attempt to place the items on the requested device but will safely fall back in the event of an error:

In [7]:
# Creates a graph.
with tf.device('/device:GPU:2'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.
config = tf.ConfigProto(
    allow_soft_placement=True,
    log_device_placement=True
)
with tf.Session(config=config) as sess:
    print(sess.run(c))

[[22. 28.]
 [49. 64.]]


## using multiple gpus

in order to use multiple gpus, they recomend a "multi-tower" paradigm:

```python
c = []
for d in ['/device:GPU:0', '/device:GPU:1']:
    with tf.device(d):
        a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
        b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
        c.append(tf.matmul(a, b))

with tf.device('/cpu:0'):
    s = tf.add_n(c)

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    print(sess.run(s))
```

again, running this in a terminal which has two gpus exposed I get

```
MatMul_1: (MatMul): /job:localhost/replica:0/task:0/device:GPU:1
2018-07-04 15:42:48.835288: I tensorflow/core/common_runtime/placer.cc:886] MatMul_1: (MatMul)/job:localhost/replica:0/task:0/device:GPU:1
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-07-04 15:42:48.835305: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
AddN: (AddN): /job:localhost/replica:0/task:0/device:CPU:0
2018-07-04 15:42:48.835314: I tensorflow/core/common_runtime/placer.cc:886] AddN: (AddN)/job:localhost/replica:0/task:0/device:CPU:0
Const_3: (Const): /job:localhost/replica:0/task:0/device:GPU:1
2018-07-04 15:42:48.835322: I tensorflow/core/common_runtime/placer.cc:886] Const_3: (Const)/job:localhost/replica:0/task:0/device:GPU:1
Const_2: (Const): /job:localhost/replica:0/task:0/device:GPU:1
2018-07-04 15:42:48.835330: I tensorflow/core/common_runtime/placer.cc:886] Const_2: (Const)/job:localhost/replica:0/task:0/device:GPU:1
Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-07-04 15:42:48.835338: I tensorflow/core/common_runtime/placer.cc:886] Const_1: (Const)/job:localhost/replica:0/task:0/device:GPU:0
Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-07-04 15:42:48.835345: I tensorflow/core/common_runtime/placer.cc:886] Const: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[ 44.  56.]
 [ 98. 128.]]
```

# summary

not a lot of new information here

+ if you want to use a specific `cpu` or `gpu`, use a `tf.device` context manager around your variable declarations
+ you can log device information with a configuration variable `log_device_placement`
+ you can be permissive about failure to allocate devices with `allow_soft_placement`