Dst tensor is not initialized #38

Open
burness opened this Issue Jun 5, 2016 · 19 comments

Projects

None yet
@burness
burness commented Jun 5, 2016

Hi @aymericdamien I have run this script logistic regression.py
But i met a problem "Dst tensor is not initialized".
The detail log is here:

Epoch: 0001 cost= 29.917553501
Epoch: 0002 cost= 21.929896693
Epoch: 0003 cost= 21.063875407
Epoch: 0004 cost= 20.457020144
Epoch: 0005 cost= 20.084428289
Epoch: 0006 cost= 19.814794980
Epoch: 0007 cost= 19.674670629
Epoch: 0008 cost= 19.510438999
Epoch: 0009 cost= 19.309689613
Epoch: 0010 cost= 19.223995275
Epoch: 0011 cost= 19.161345129
Epoch: 0012 cost= 18.985856709
Epoch: 0013 cost= 18.917688493
Epoch: 0014 cost= 18.832972273
Epoch: 0015 cost= 18.742634454
Epoch: 0016 cost= 18.695894625
Epoch: 0017 cost= 18.643278683
Epoch: 0018 cost= 18.609112186
Epoch: 0019 cost= 18.444614899
Epoch: 0020 cost= 18.532375607
Epoch: 0021 cost= 18.437554449
Epoch: 0022 cost= 18.310914770
Epoch: 0023 cost= 18.289282742
Epoch: 0024 cost= 18.214274961
Epoch: 0025 cost= 18.293197173
Optimization Finished!
Accuracy:
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
<ipython-input-17-f661f1e1e9de> in <module>()
     24     # Calculate accuracy
     25     accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
---> 26     print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in eval(self, feed_dict, session)
    500 
    501     """
--> 502     return _eval_using_default_session(self, feed_dict, self.graph, session)
    503 
    504 

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in _eval_using_default_session(tensors, feed_dict, graph, session)
   3332                        "the tensor's graph is different from the session's "
   3333                        "graph.")
-> 3334   return session.run(tensors, feed_dict)
   3335 
   3336 

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/home/burness/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
    657       # pylint: disable=protected-access
    658       raise errors._make_specific_exception(node_def, op, error_message,
--> 659                                             e.code)
    660       # pylint: enable=protected-access
    661 

InternalError: Dst tensor is not initialized.
     [[Node: _recv_Placeholder_1_0/_27513 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_267__recv_Placeholder_1_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
     [[Node: Mean_6/_27517 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_277_Mean_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
@aymericdamien
Owner

Are you using GPU? Usually, this error raises when GPU memory is full.

@burness
burness commented Jun 6, 2016

@aymericdamien Thanks! I found the reason : I use ipython notebook to run the code , but i forget to close another one , the script and it waster too much memory

@subodhp
subodhp commented Sep 27, 2016

Yup, GPU Memory Full is the reason. IPython kernels stuck in background processes does that.

Thanks,
Subodh
thesubodh.com

@laventura

@burness @subodhp I'm getting the same error ("Ran out of memory")
[MacbookPro 2013 with 16 GM RAM, GPU (2GB RAM), TensorFlow 0.11, CUDA 8.0, CUDNN 5.x]

I tried shutting down the Jupyter Notebook and restarting it... but it crashed with the same error.
Is this solved?
How does one resolve GPU memory full errors?

Thanks!

`I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 256 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 31488 totalling 30.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 46609152 totalling 44.45MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 44.48MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 57622528
InUse: 46643200
MaxInUse: 46643200
NumAllocs: 11
MaxAllocSize: 46609152

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ********************************************************************xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 390.6KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Reshape_1/_2__cf__2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,10] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
`

@laventura
laventura commented Nov 3, 2016 edited

I rebooted my Macbook and started afresh.
system: [MacbookPro 2013, with 16 GB RAM, GPU with 2GB RAM; Tensor Flow 0.11, CUDA 8.0, CUDNN 5.x]
Here's the error I get (see attached error-tf.txt at the bottom for all detail).

  1. How is the free memory only 20.49 MiB (on a recently rebooted system) if there's 2.0 GiB available to the GPU?
  2. Is there a way to track GPU memory usage?
  3. Is there a way to disable GPU usage for an iPython notebook?

Thanks!

Some relevant parts I see:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 20.49MiB
...

I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 21487616
InUse: 33792
MaxInUse: 65280
NumAllocs: 9
MaxAllocSize: 31488

W

tensorflow/core/common_runtime/bfc_allocator.cc:270] *___________________________________________________________________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 29.91MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Const = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,784] values: -0.5 -0.49607843 -0.5...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]

error-tf.txt

@pumplerod

laventura, did you ever find a solution to the gpu out of memory error? I have the same problem with the same setup. Though I got an error trying to allocate 10.8Mib

@laventura

@pumplerod - I found a solution / kludge that somehow seems to work, although I can't explain why / how.

Before starting your Jupyter notebook / tensorflow program, set this:

export CUDA_VISIBLE_DEVICES=1

This seems to work in that the scripts work OK. Not sure if this is a requirement.
Give it a try and see.

@pumplerod

Wow. Thanks. That seems to have worked. Not sure how it's related, but before trying your solution I got rid of the error by specifying
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))

However when I tried to run training it crashed the jupyter notebook.

@sapeyes
sapeyes commented Dec 16, 2016

@pumplerod

Oh yours is very helpful for me. I got an error message about only 29Mib out of memory.
I added your code with fraction 0.8 since there was 80% free memory (from 2GiB, 1.6GiB was free).
My code started working. After that, I deleted ALL GPU options and this still works. very curious..

@laventura

Update on this:

Earlier -- the GPU was being recognized by an older TensorFlow. Now, when I upgraded TF to 0.11rc2 and later to 0.12

Now, my TF does not recognize any GPU at all.

Also, the deviceQuery does not report any GPU either. I'm going totally bonkers in this CUDA hell.

See details here:
tensorflow/tensorflow#2882

Also on NVIDIA Devtalk, if any one has any bright insights - would be very helpful to me!
https://devtalk.nvidia.com/default/topic/990015/cuda-setup-and-installation/help-cuda-7-5-or-8-devicequery-failing-not-working-on-macbookpro-2013-os-x-10-11-gt750m/

@Mazecreator

Just stumbled upon this thread. I think you have hidden your GPU from the CUDA drivers with this line:

export CUDA_VISIBLE_DEVICES=1

What this is telling CUDA is that it should only use "Device 1" in your system. So, unless you have 2 GPU devices, you have hidden the primary "Device 0". I am sure if you set this as follows TF will see your GPU again, but your other problems may return:

export CUDA_VISIBLE_DEVICES=0
@laventura
laventura commented Jan 26, 2017 edited

@Mazecreator & Others,

Indeed; when I set CUDA_VISIBLE_DEVICES=0, the deviceQuery returns successfully. However, now TensorFlow complains again with "Dst Tensor Not initialized" !!

This is so frustrating!!

It appears that CUDA is leaking memory... I see that free memory listed (when a python script starts) keeps getting less and less... though I dont know for sure if that's the problem.
The workaround suggested above (set TF's GPUOptions) are all workarounds - they require manual code / intervention in existing scripts that were supposed to work OK.

See here: deviceQuery

 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_HOME 
/usr/local/cuda
 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_VISIBLE_DEVICES

 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 750M"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147024896 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            926 MHz (0.93 GHz)
  Memory Clock rate:                             2508 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 750M
Result = PASS
 py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ 

 py35 ▶ ~ ▶ Developer ❯ CUDA ❯ cuda-smi ▶ master ▶ ❓ ▶ $ ▶ ./cuda-smi 
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 369.92 of 2047.6 MB (i.e. 18.1%) Free

Running a Python script with TensorFlow:

 py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶ python imagenet_inference.py 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 305.92MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): 	Total Chunks: 1, Chunks in use: 0 97.01MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 144.00MiB was 128.00MiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60500 of size 139520
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82800 of size 1228800
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700bae800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700baec00 of size 3538944
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0ec00 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0f200 of size 2654208
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197200 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197800 of size 1769472
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701347800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x701347c00 of size 101725184
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 512 totalling 512B
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1024 totalling 2.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1536 totalling 3.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 139520 totalling 136.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1228800 totalling 1.17MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1769472 totalling 1.69MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2654208 totalling 2.53MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3538944 totalling 3.38MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 8.91MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                   111063040
InUse:                     9337856
MaxInUse:                  9337856
NumAllocs:                      11
MaxAllocSize:              3538944

W tensorflow/core/common_runtime/bfc_allocator.cc:274] *********___________________________________________________________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 144.00MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:965] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "imagenet_inference.py", line 19, in <module>
    sess.run(init)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'Variable_10/initial_value', defined at:
  File "imagenet_inference.py", line 16, in <module>
    probs = AlexNet(x, feature_extract=False)
  File "/Users/aa/Developer/courses/self_driving_carnd/traffic-signs/CarND-Alexnet-Feature-Extraction/alexnet.py", line 139, in AlexNet
    fc6W = tf.Variable(net_data["fc6"][0])
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
    expected_shape=expected_shape)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 333, in _init_from_args
    initial_value, name="initial_value", dtype=dtype)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Dst tensor is not initialized.
	 [[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

 py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶ 

@monajalal

I get the same error and I have 12GB of GPU memory:

mona@pascal:~/computer_vision/VPilot$ python train.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1938: UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
  warnings.warn('\n'.join(msg))
Epoch 1/1000
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 412.50MiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4547d60
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:83:00.0
Total memory: 11.92GiB
Free memory: 534.50MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 512.0KiB was 512.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741400 of size 4096
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742600 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742e00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743600 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743e00 of size 222806528
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 21 Chunks of size 256 totalling 5.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1024 totalling 1.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 2048 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 4096 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 222806528 totalling 212.48MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 212.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                   222822400
InUse:                   222822400
MaxInUse:                222822400
NumAllocs:                      27
MaxAllocSize:            222806528
 
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ***********************************************************************************************xxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 512.0KiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
     [[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
  File "train.py", line 55, in <module>
    callbacks=[ckp_callback]
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1553, in fit_generator
    class_weight=class_weight)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1316, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1919, in __call__
    session = get_session()
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 121, in get_session
    _initialize_variables()
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 275, in _initialize_variables
    sess.run(tf.initialize_variables(uninitialized_variables))
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Dst tensor is not initialized.
     [[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
 
Caused by op u'Const_37', defined at:
  File "train.py", line 55, in <module>
    callbacks=[ckp_callback]
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1450, in fit_generator
    self._make_train_function()
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 761, in _make_train_function
    self.total_loss)
  File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 234, in get_updates
    accumulators = [K.zeros(shape) for shape in shapes]
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 482, in zeros
    return variable(tf.constant_initializer(0., dtype=tf_dtype)(shape),
  File "/home/mona/tensorflow/_python_build/tensorflow/python/ops/init_ops.py", line 145, in _initializer
    return constant_op.constant(value, dtype=dtype, shape=shape)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/constant_op.py", line 167, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 2388, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 1300, in __init__
    self._traceback = _extract_stack()
 
InternalError (see above for traceback): Dst tensor is not initialized.
     [[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
@laventura
laventura commented Jan 26, 2017 edited

@monajalal --

It appears that the GPU is running out of memory for some reason.
WHY that is happening, I can't say; it is the most confounding thing since the executed programs have ended.

Probably a memory leak?? If so, it could be at the GPU driver level??

See here too: tensorflow/tensorflow#7025 (comment)

I've tried searching for how to release/clear GPU memory, but haven't found anything good / credible / useful.

Do let me know if you or anyone comes across a solution.

Until then, this TensorFlow + GPU combo is a total fail for me (on my Macbook). 😡

@normanheckscher
Contributor

MacBook Nvidia GPU isn't dedicated and shares resources with TensorFlow and the screen.

I regularly have out of memory issues. Using mid 2012 rMBP with GeForce 650.

Before running TensorFlow, I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card. Doing this releases some memory and I can execute TensofFlow scripts. Not all memory is cleared when I check memory with cuda-smi. Can quickly see which graphics card is being used with gfx.io app. I found it good to disable WebGL in safari (although it's needed for Tensorboard). Restarting Safari and pycharm before running TensorFlow scripts is helpful to clear GPU memory. Stop non-essential apps in the background is also helpful.

https://github.com/phvu/cuda-smi

https://gfx.io

An OSX issue is possibility?

MacBook isn't the best "all in one" dev platform for TensorFlow, it can be made to work... albeit frustratingly.

Would be good to force OSX to use integrated video chip for screen and Nvidia for dedicated TensorFlow. I'm totally unsure, however some early discussions about the hardware were indicating that apple has locked down certain parts of the GPU access... so if it can't be used exclusively now... it's likely to be difficult/impossible to do in the future.

@laventura

@normanheckscher - Thanks for the tips. Good to know about the Macbook GPU.

I downloaded gfx.io -it's helpful in understanding when the GPU is being used.
I've used cuda-smi; it's useful in showing the free GPU mem, but doesn't really show the processes using it. I was hoping an nvidia-smi kind of thing would exist for Macs.

When you said "I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card" which 'resource monitor video card' column do you refer to? In ActivityMonitor? If so, I didn't find it. :-(

Yeah, I try closing most of the programs that use GPUs (mostly Chrome etc. that I use) before running TF scripts. Sometimes, the TF scripts run out of mem almost immediately after a fresh reboot, which is kind of confounding.

I'm just coming to a slow realization that TensorFlow + GPU combo isn't a very effective/efficient on Macbooks. 😕

I'm rather sadly investigating Theano combo (instead of TF) on top of Keras, which is my main high-level framework of choice. Sadly bcos I dont know enough Theano and dont have enough bandwidth to learn it effectively. :-/

@normanheckscher
Contributor

Sorry @laventura I meant the Activity Monitor for OSX. If you go to CPU or Memory tabs where you can look for the running processes you can select "View>Columns>Graphics Card" and a new column with "Requires High Perf GPU will appear. Sort by this column and you can see which processes are using the Nvidia card.

MacBookPro can be used for learning and development. I want to use TensorFlow and I find OSX is a very good environment to work in, so I deal with these little irritations while I get myself up to speed with TensorFlow. When my models need more memory I'll make the call as to building a headless Linux box or going with a service such as AWS. If I was starting from scratch I'd consider a dedicated GPU notebook that could run Linux, however, I'm not flush with cash and I don't see the need to purchase a new hardware environment when the one I have works.

Best of luck to you.

@bpanahij
bpanahij commented Jan 28, 2017 edited

This is not just a MacBook issue. I am seeing this on my laptop with a GTX1060 (6GB) Running ubuntu.

This seems to help:
fchollet/keras#3675

Use:

max_q_size=1,
pickle_safe=False

in fit_generator()

After adding these two options I am up and running again.

@UkiDLucas
UkiDLucas commented Feb 16, 2017 edited

I can run MacBook Pro NVidia GPU, but only for minimal applications:

import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8) #0.333
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))

When I increase the number of Conv2D filters e.g. from 32 to 64 I am starting to get DEAD KERNEL, so I lower number of images I process per batch from e.g. 256 to 24.

You have to keep trying until you get the right balance between the depth of your neural network, batch size and amount of GPU memory.

In the end, it is much faster than CPU, but too fragile, after much frustration, I am going back to CPU and more powerful Linux GPU instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment