<a href="https://colab.research.google.com/github/michelucci/aadl2-code/blob/master/chapter2/TensorFlow_and_hardware_acceleration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TensorFlow and hardware acceleration

(C) 2019 Umberto Michelucci

# Confirm TensorFlow can see the GPU

Simply select "GPU" in the Accelerator drop-down in Notebook Settings (either through the Edit menu or the command palette at cmd/ctrl-shift-P).

In [0]:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


# Observe TensorFlow speedup on GPU relative to CPU

This example constructs a matrix multiplication between two big tensors to check the performance improvements.

In [0]:
print(tf.test.is_gpu_available())

True


The following function will return in a list, all the devices available. If you have enabled GPU acceleration for this notebook in google Colab for example you should see two GPUs devices.

In [0]:
from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type.endswith('GPU')]

In [0]:
get_available_gpus()

['/device:XLA_GPU:0', '/device:GPU:0']

In [0]:
  local_device_protos = device_lib.list_local_devices()
  print(local_device_protos)

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 3004393512909135219
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 16323739484962476874
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 16797530695469281809
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11281553818
locality {
  bus_id: 1
  links {
  }
}
incarnation: 904680829004167430
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"
]


In [0]:
type(local_device_protos)

list

# Performance comparison

In [0]:
import tensorflow as tf
import timeit

# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

In [0]:
sess = tf.Session(config=config)
#sess.run(tf.global_variables_initializer())

In [0]:
%%time
with tf.device('/gpu:0'):
  tensor1 = tf.random_normal((10000, 10000))
  tensor2 = tf.random_normal((10000, 10000))
  prod = tf.linalg.matmul(tensor1, tensor2)
  prod_sum = tf.reduce_sum(prod)
  
  sess.run(prod_sum)

CPU times: user 765 ms, sys: 541 ms, total: 1.31 s
Wall time: 1.56 s


In [0]:

%%time
with tf.device('/cpu:0'):
  tensor1 = tf.random_normal((10000, 10000))
  tensor2 = tf.random_normal((10000, 10000))
  prod = tf.linalg.matmul(tensor1, tensor2)
  prod_sum = tf.reduce_sum(prod)
  
  sess.run(prod_sum)

CPU times: user 1min 5s, sys: 822 ms, total: 1min 6s
Wall time: 33.8 s


# Smaller matrices

Now let's try with smaller matrixes ```100x100```

In [0]:
%%time
with tf.device('/gpu:0'):
  tensor1 = tf.random_normal((100, 100))
  tensor2 = tf.random_normal((100, 100))
  prod = tf.linalg.matmul(tensor1, tensor2)
  prod_sum = tf.reduce_sum(prod)
  
  sess.run(prod_sum)

CPU times: user 21.3 ms, sys: 1.05 ms, total: 22.3 ms
Wall time: 24.7 ms


In [0]:
%%time
with tf.device('/cpu:0'):
  tensor1 = tf.random_normal((100, 100))
  tensor2 = tf.random_normal((100, 100))
  prod = tf.linalg.matmul(tensor1, tensor2)
  prod_sum = tf.reduce_sum(prod)
  
  sess.run(prod_sum)

CPU times: user 18.3 ms, sys: 2.01 ms, total: 20.3 ms
Wall time: 19.9 ms


Now the difference is negligible.