# 关于校验Tensorflow中GPU的使用情况

>基本指令查看
- 查看显存使用情况：watch -n 1 nvidia-smi
- 查看tensorflow中的可识别设备情况：  
打开python窗口，输入：
```python
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
```
返回了
```shell
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 10091552854752830998
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 16478447750412646183
physical_device_desc: "device: XLA_CPU device"
]
```
即有两个设备：CPU和XLA_CPU
- 查看jupyter日志：tail -F /data/logs/jupyter.log

Question：没有GPU设备？

# 编写测试程序

编写下面的测试程序，并监听日志：

In [1]:
import tensorflow as tf
with tf.device('/cpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True))
    tf.global_variables_initializer()
    print '---cpu---:\n',sess.run(c)

  from ._conv import register_converters as _register_converters


---cpu---:
[[22. 28.]
 [49. 64.]]


返回了
```shell
2018-12-07 19:29:47.567669: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-07 19:29:47.570860: I tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2018-12-07 19:29:47.572208: I tensorflow/core/common_runtime/placer.cc:927] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:CPU:0
2018-12-07 19:29:47.572227: I tensorflow/core/common_runtime/placer.cc:927] init: (NoOp)/job:localhost/replica:0/task:0/device:CPU:0
2018-12-07 19:29:47.572235: I tensorflow/core/common_runtime/placer.cc:927] a: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-12-07 19:29:47.572244: I tensorflow/core/common_runtime/placer.cc:927] b: (Const)/job:localhost/replica:0/task:0/device:CPU:0
```

In [3]:
import tensorflow as tf
with tf.device('/device:GPU:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True))
    tf.global_variables_initializer()
    print '---XLA_CPU---:',sess.run(c)

---XLA_CPU---: [[22. 28.]
 [49. 64.]]


返回了
```shell
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:CPU:0
init: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0
a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
[I 19:30:13.292 NotebookApp] Kernel restarted: f3ad19d1-912f-4fd6-922f-d7b3f40a410d
[I 19:30:13.701 NotebookApp] Adapting to protocol v5.1 for kernel f3ad19d1-912f-4fd6-922f-d7b3f40a410d
[I 19:30:13.701 NotebookApp] Restoring connection for f3ad19d1-912f-4fd6-922f-d7b3f40a410d:38b4318a71134437ac088756c921ff55
[I 19:30:13.701 NotebookApp] Replaying 3 buffered messages
2018-12-07 19:30:16.297483: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-07 19:30:16.300893: I tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2018-12-07 19:30:16.302147: I tensorflow/core/common_runtime/placer.cc:927] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:XLA_CPU:0
2018-12-07 19:30:16.302164: I tensorflow/core/common_runtime/placer.cc:927] init: (NoOp)/job:localhost/replica:0/task:0/device:XLA_CPU:0
2018-12-07 19:30:16.302171: I tensorflow/core/common_runtime/placer.cc:927] a: (Const)/job:localhost/replica:0/task:0/device:XLA_CPU:0
2018-12-07 19:30:16.302178: I tensorflow/core/common_runtime/placer.cc:927] b: (Const)/job:localhost/replica:0/task:0/device:XLA_CPU:0
2018-12-07 19:30:16.326939: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz
2018-12-07 19:30:16.327458: I tensorflow/compiler/xla/service/service.cc:149] XLA service 0x558db4b2e0b0 executing computations on platform Host. Devices:
2018-12-07 19:30:16.327476: I tensorflow/compiler/xla/service/service.cc:157]   StreamExecutor device (0): <undefined>, <undefined>
```

# 怀疑是因为tensorflow版本导致，这里进行降级
输入：
```shell
pip list | grep tensorflow
```
返回：
```
tensorflow                         1.12.0                   
tensorflow-data-validation         0.9.0                    
tensorflow-gpu                     1.12.0                   
tensorflow-metadata                0.9.0                    
tensorflow-serving-api             1.12.0                   
tensorflow-transform               0.11.0 
```
这里进行如下操作：
```shell
pip install tensorflow==1.8.0
pip install tensorflow-gpu==1.8.0
```


>如果出现了问题请回复原始的tf版本
```shell
pip install tensorflow==1.12.0
pip install tensorflow-gpu==1.12.0
```

# 实际1.8.0更能较好支持gpu，复原1.12.0

然后莫名其妙的，好了

In [4]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 13795605669416223653, name: "/device:XLA_GPU:0"
 device_type: "XLA_GPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 10736823567353150985
 physical_device_desc: "device: XLA_GPU device", name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 11584388614102884367
 physical_device_desc: "device: XLA_CPU device", name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 7895151412
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 2302359680209861215
 physical_device_desc: "device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1"]