Many features of TensorFlow including their computational graphs lend themselves naturally to being computed in parallel. Computational graphs can be split over different processors as well as in processing different batches. This recipe demonstrates how to access different processors on the same machine.

## Getting ready...

In this recipe, we will explore different commands that will allow one to access various devices on their system. The recipe will also demonstrate how to find out which devices TensorFlow is using.

## How to do it...

In [1]:
import tensorflow as tf
tf.debugging.set_log_device_placement(True)

1. To find out which devices TensorFlow is using for operations, we will activates the logs for device placement by setting tf.debugging.set_log_device_placement to True. If a TensorFlow operation is implemented for CPU and GPU devices, the operation will be executed by default on a GPU device if a GPU is available:

In [2]:
tf.debugging.set_log_device_placement(True)

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0


2. It is also possible to use the tensor device attribute that returns the name of the device on which this tensor will be assigned:

In [3]:
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
print(a.device)
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
print(b.device)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0


3. By default, TensorFlow automatically decides how to distribute computation across computing devices (CPUs and GPUs). Sometimes we need to select the device to use by creating a device context with the tf.device function. Each operation executed in this context will use the selected device:

In [4]:
tf.debugging.set_log_device_placement(True)
with tf.device('/device:CPU:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:CPU:0


4. If we move the matmul opeartion out of the context, this operation will be executed on a GPU device if it's available:

In [5]:
tf.debugging.set_log_device_placement(True)
with tf.device('/device:CPU:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)

Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0


5. When using GPUs, TensorFlow automatically takes up a large portion of the GPU memory. While this is usually desired, we can take steps to be more careful with GPU memory allocation. While TensorFlow never releases GPU memory, we can slowly grow its allocation to the maximum limit (only when needed) by setting a GPU memory growth option. Note that physical devices cannot be modified after being initialized:

In [2]:
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    try:
        tf.config.experimental.set_memory_growth(gpu_devices[0], True)
    except RuntimeError as e:
        # Memory growth cannot be modififed after GPU has been initialized
        print(e)

6. If we desire to limit the GPU memory used by TensorFlow, we can also create a virtual GPU device and set the maximum memory limit (in MB) to allocate on this virtual GPU. Note that virtual GPU devices cannot be modififed after being initialized:

In [3]:
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    try:
        tf.config.experimental.set_virtual_device_configuration(gpu_devices[0], 
                               [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    except RuntimeError as e:
        # Virtual devices cannot be modified after being initialized
        print(e)

7. It is also possible to simulate virtual GPU devices with a single physical GPU:

In [4]:
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    try:
        tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
                                        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
                                         tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    except RuntimeError as e:
        print(e)

8. Sometimes we need to write robust code that can determine whether it is running with the GPU available or not. TensorFlow has a built-in function that can test whether the GPU is available. This is helpful when we want to write code that takes advantage of the GPU when it is available and assign specific operations to it. This is done with the following code:

In [5]:
if tf.test.is_built_with_cuda():
    # Run GPU specific code here
    pass

9. If we have to assign specific operations to certain devices, we can use the following code. This will perform simple calculations and assign operations to the main CPU and two auxiliary GPUs:

In [6]:
print("Num GPUs Available: ", len(tf.config.list_logical_devices('GPU')))

if tf.test.is_built_with_cuda():
    with tf.device('/cpu:0'):
        a = tf.constant([1.0, 3.0, 5.0], shape = [1, 3])
        b = tf.constant([2.0, 4.0, 6.0], shape = [3, 1])
        
        with tf.device('/gpu:0'):
            c = tf.matmul(a, b)
            c = tf.reshape(c, [-1])
            
        with tf.device('/gpu:1'):
            d = tf.matmul(b, a)
            flat_d = tf.reshape(d, [-1])
        
        combined = tf.multiply(c, flat_d)
    print(combined)

Num GPUs Available:  2
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op Mul in device /job:localhost/replica:0/task:0/device:CPU:0
tf.Tensor([  88.  264.  440.  176.  528.  880.  264.  792. 1320.], shape=(9,), dtype=float32)


We can see that the first two operations are performed on the main CPU, while the next two are one our first auxiliary GPU, and the last two on our second auxiliary GPU.