

<img src="../../Utilities/gpu.jpg" width="250">

__Author:__ Christian Urcuqui 

__Date:__ 06 September 2018

__Last update:__ 06 September 2018

# Using GPUs


Sometimes a system has multiple computing devices. In TensorFlow, the supported device types are:

+ _"/cpu:0"_: The CPU of your machine.
+ _"/device:GPU:0"_: The GPU of your machine, if you have one.
+ _"/device:GPU:1"_ The second GPU of your machine, etc.

If TensorFlow has both CPU and GPU devices will be taken with more priority. 


In [1]:
import tensorflow as tf

In [6]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15887857036927593207
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4960695091
locality {
  bus_id: 1
  links {
  }
}
incarnation: 9534848789399615652
physical_device_desc: "device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1"
]


In [4]:
# It creates a graph 
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# We will create a session with log_device_placement set to True
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    # Next, it runs the operation
    print(sess.run(c))

[[22. 28.]
 [49. 64.]]


What happen if you have a GPU that has Cuda multiprocessors lower than 8?, see the next picture:

<img src="../../Utilities/devices.png" width="900">

So, if you want to use the GPU that Cuda didn't index, it is neccesary to change some variables like the next example

In [8]:
import os

os.environ["TF_MIN_GPU_MULTIPROCESSOR_COUNT"] = "4"

print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13687777094807301944
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4960695091
locality {
  bus_id: 1
  links {
  }
}
incarnation: 10940467501179716792
physical_device_desc: "device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1"
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 1469146726
locality {
  bus_id: 1
  links {
  }
}
incarnation: 11574546631922703894
physical_device_desc: "device: 1, name: GeForce GTX 750 Ti, pci bus id: 0000:08:00.0, compute capability: 5.0"
]


## Manual device placement 

If you want to have more control of what is the device to run, you can use it with _tf.device_ to create a device context such that all the operations within that context will have the same device assignment. 

In [3]:
# It creates a graph
with tf.device('/cpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# It creates a session with log_device_placement set to True
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Next, it runs the operation
print(sess.run(c))

[[22. 28.]
 [49. 64.]]


As we saw _a_ and _b_ are assigned to _cpu:0_. Since a device was not assigned for the Matmul operation, but TensorFlow runtime will choose one based on the operation and available devices(for this case _gpu0_) and automatically copy tensors between devices if they are requiered. 

## Allowing GPU memory growth

By default, TensorFlow maps nearly all of the GPU memory of all GPUs visible to the process. This is done to more efficiently use the relative precious GPU memory resources on the devices by reducing memory fragmentation. 

Sometimes it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. In order to manage these resources, TensorFLow provides two Config options on the Session.

+ __allow_growth__: it allocates only as much GPU memory based on runtime allocations. We can choose this option in the ConfigPro.


In [6]:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)


+ __per_process_gpu_memory:__ it allows us to determine the fraction of the overall amount of memory that each visible GPU should be allocated. For example, we will tell TensorFLow to allocate 40% of the total memory of each GPU by:

In [None]:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config)

## Using a single GPU on a multi-GPU sytem

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default, in order to change it, you must specify the preference explicitly:

In [11]:
# It creates a graph.
with tf.device("/device:GPU:1"):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# The next line creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# We sill run the Matmul operation
print(sess.run(c))


[[22. 28.]
 [49. 64.]]


If you want that TensorFLow to automatically choose an existing and supported device to run the operations, we can set __allow_soft_placement__ to __True__ in the configuration option when we will make the session. 

In [7]:
# It ceates a graph
with tf.device('/device:GPU:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set
# to True.
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))


[[22. 28.]
 [49. 64.]]


## Using multiple GPUs

In [10]:
# Creates a graph.
c = []
for d in ['/device:GPU:0', '/device:GPU:1']:
    with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
    sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))


[[ 44.  56.]
 [ 98. 128.]]


 ## References

+ https://www.tensorflow.org/guide/using_gpu